scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	9a9f0d3a0f	main: fix exception handling when initializing data or commitlog dirs Exception handling was broken because after io checker, storage_io_error exception is wrapped around system error exceptions. Also the message when handling exception wasn't precise enough for all cases. For example, lack of permission to write to existing data directory. Fixes #883. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b2dc75010a06f16ab1b676ce905ae12e930a700a.1478542388.git.raphaelsc@scylladb.com>	2016-11-14 12:34:10 +02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Takuya ASADA	587d375e19	main: exit with 1 when verify_seastar_io_scheduler() failed Since we are exiting Scylla process in engine().at_exit() using ::_exit(0), even verify_seastar_io_scheduler() throwing an exception, scylla always exit with 0. Systemd misunderstands scylla-server.service was shutdown successfully because of this, so we need to pass correct exit code to ::_exit() here. Fixes #1674 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1475065607-15486-1-git-send-email-syuu@scylladb.com>	2016-10-17 13:57:00 +03:00
Raphael S. Carvalho	76862d0d9c	main: start compaction procedure after commit log is replayed Commit log replay is a synchronous operation in bootstrap, so services will only be started after it's completed. By starting compaction before, less bandwidth will be available to both and consequently boot will be slowed down. Fix is simply about moving compaction, which is an asynchronous operation after commitlog replay is over. Fixes #1620. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d2a173a4ee4d474317b970c6b39530e61067fea9.1475527955.git.raphaelsc@scylladb.com>	2016-10-06 18:25:24 +03:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Gleb Natapov	26ae8e8365	implement listen_on_broadcast_address option When using multiple physical network interfaces, set this to true to listen on broadcast_address in addition to the listen_address, allowing nodes to communicate in both interfaces. Ignore this property if the network configuration automatically routes between the public and private networks such as EC2. Message-Id: <20160921094810.GA28654@scylladb.com>	2016-09-26 08:49:54 +03:00
Pekka Enberg	f1d0401ed2	main: Use proper logger for API server messages We have a "startlog" that we can use to print out API server messages. Message-Id: <1474358312-26510-1-git-send-email-penberg@scylladb.com>	2016-09-20 11:09:59 +03:00
Tomasz Grabiec	9476bc5a31	Introduce --abort-on-lsa-bad-alloc command line option Useful for triggerring core dump on allocation failure inside LSA, which makes it easier to debug allocation failures. They normally don't cause aborts, just fail the current operation, which makes it hard to figure out what was the cause of allocation failure. Message-Id: <1470233631-18508-1-git-send-email-tgrabiec@scylladb.com>	2016-08-03 17:26:44 +03:00
Amnon Heiman	bb4268a8a5	Add prometheus API This patch adds the prometheus API it adds the proto library to the compilation, adds an optional configuration parameter to change the prometheus listening port and start the prometheus API in main. To disable the prometheus API, set its listening port to 0. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1470228764-19545-2-git-send-email-amnon@scylladb.com>	2016-08-03 15:55:18 +03:00
Duarte Nunes	9ffdf4a5cd	db: Implement size_estimates_recorder This patch implements the size_estimates_recorder, which periodically writes estimations for all the non-system column families in the size_estimates system table. The size_estimates_recorder class corresponds to the one in Cassandra's SizeEstimatesRecorder.java. Estimation is carried out by shard 0. Since we're estimating based on data in shared sstables, having multiple shards doing this would skew the results. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-19 09:44:58 +00:00
Paweł Dziepak	7e06499458	repair: convert hashing to streamed_mutations This patch makes hashing for repair calculate checksums in a way that doesn't require rebuilding whole mutation. Unfortunately, such checksums are incompatible with the old ones so the old way for computing checksums is preserved for compatibility reasons. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Gleb Natapov	726b79ea91	messaging_service: enable internode_compression option Use LZ4 for internode compression if enabled. Message-Id: <20160711141734.GZ18455@scylladb.com>	2016-07-11 18:30:21 +03:00
Raphael S. Carvalho	85cb2a6d35	database: trigger compaction on boot At the moment, we only trigger compaction after creating a new sstable as a result of memtable flush, or some other event such as changing compaction strategy of a column family. However, it's important to trigger compaction on boot too. That will happen after loading all column families. Fixes #1404. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <54d38a418157454eec97aaba6b8a6b6e51484db4.1467135349.git.raphaelsc@scylladb.com>	2016-06-29 13:47:42 +03:00
Avi Kivity	5b81448ed6	main: add scylla --version option Fixes #1384. Message-Id: <1466691517-29964-1-git-send-email-avi@scylladb.com>	2016-06-23 16:24:03 +02:00
Avi Kivity	5af22f6cb1	main: handle exceptions during startup If we don't, std::terminate() causes a core dump, even though an exception is sort-of-expected here and can be handled. Add an exception handler to fix. Fixes #1379. Message-Id: <1466595221-20358-1-git-send-email-avi@scylladb.com>	2016-06-23 09:25:33 +03:00
Nadav Har'El	3372052d48	Rewriting shared sstables only after all shards loaded sstables After commit `faa4581`, each shard only starts splitting its shared sstables after opening all sstables. This was important because compaction needs to be aware of all sstables. However, another bug remained: If one shard finishes loading its sstables and starts the splitting compactions, and in parallel a different shard is still opening sstables - the second shard might find a half-written sstable being written by the first shard, and abort on a malformed sstable. So in this patch we start the shared sstable rewrites - on all shards - only after all shards finished loading their sstables. Doing this is easy, because main.cc already contains a list of sequential steps where each uses invoke_on_all() to make sure the step completes on all shards before continuing to the next step. Fixes #1371 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>	2016-06-20 16:25:24 +03:00
Avi Kivity	85bb5ea064	Merge "Reduce LSA reclaim latency" from Tomasz "Reclaiming many segments was observed to cause up to multi-ms latency. With the new setting, the latency of reclamation cycle with full segments (worst case mode) is below 1ms. I saw no difference in throughput in a CQL write micro benchmark in neither of these workloads: - full segments, reclaim by random eviction - sparse segments (3% occupancy), reclaim by compaction and no eviction Fixes #1274."	2016-06-16 10:47:57 +03:00
Tomasz Grabiec	75f899cc93	lsa: Make reclamation step configurable via config	2016-06-14 15:13:15 +02:00
Vlad Zolotarov	d3960f0bbb	tracing: rearrange shut down tracing::tracing local instance is dereferenced from a cql_server::connection::process_request(), therefore tracing::tracing service may be stop()ed only after a CQL server service is down. On the other hand it may not be stopped before RPC service is down because a remote side may request a tracing for a specific command too. This patch splits the tracing::tracing stop() into two phases: 1) Flush all pending tracing records and stop the backend. 2) Stop the service. The first phase is called after CQL server is down and before RPC is down. The second phase is called after RPC is down. Fixes #1339 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465840496-19990-1-git-send-email-vladz@cloudius-systems.com>	2016-06-14 07:58:04 +03:00
Asias He	e6f63a50e1	main: Delay the messaging_service api registration Since messaging_service is fully initialized in storage_service::init_server which calls messaging_service::start_listen, we need to delay the messaging_service api registration after it.	2016-06-08 11:13:35 +08:00
Vlad Zolotarov	4b43b08ffc	main: start a tracing service Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Pekka Enberg	0255318bf3	Revert "Revert "main: change order between storage service and drain execution during exit"" This reverts commit `b3ed55be1d`. The issue is in the failing dtest, not this commit. Gleb writes: "The bug is in the test, not the patch. Test waits for repair session to end one way or the other when node is killed, but for nodetool to know if repair is completed it needs to poll for it. If node dies before nodetool managed to see repair completion it will stuck forever since jmx is alive, but does not provide answers any more. The patch changes timing, repair is completed much close to exit now, so problem appears, but it may happen even without the patch. The fix is for dtest to kill jmx as part of killing a node operation." Now that Lucas fixed the problem in scylla-ccm, revert the revert.	2016-06-01 08:48:50 +03:00
Pekka Enberg	b3ed55be1d	Revert "main: change order between storage service and drain execution during exit" This reverts commit `0ebd8b18b7`. The change breaks repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test	2016-05-30 12:48:09 +03:00
Avi Kivity	b50cb3eca8	config: rename compact_on_idle compact_on_idle will lead users to thinking we're talking about sstable compaction, not log-structured-allocator compaction. Rename the variable to reduce the probability of confusion. Message-Id: <1464261650-14136-1-git-send-email-avi@scylladb.com>	2016-05-30 08:39:13 +03:00
Gleb Natapov	0ebd8b18b7	main: change order between storage service and drain execution during exit Even the comment says drain_on_shutdown should be called first, but for that in has to be registered last. Fixes #862 Message-Id: <1463579574-15789-2-git-send-email-gleb@scylladb.com>	2016-05-29 11:39:24 +03:00
Piotr Jastrzebski	136b8148d2	Use idle CPU to compact LSA memory Register an idle CPU handler that compacts a single segment every time there's nothing better to execute on CPU. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c26aa608a1e0752fb9e6db1833ef3ba1de95f161.1464169748.git.piotr@scylladb.com>	2016-05-26 12:43:53 +03:00
Pekka Enberg	94e7e61cd0	api: Register snitch API earlier Currently, we register snitch API in set_server_gossip_settle() which waits until a node has joined the cluster. This makes 'nodetool status' not properly show the status of a joining node. Fix the issue by registering snitch API earlier. Fixes #1269. Message-Id: <1463576381-15484-1-git-send-email-penberg@scylladb.com>	2016-05-20 14:24:14 +03:00
Raphael S. Carvalho	bf18025937	main: stop compaction manager earlier Avi says: "During shutdown, we prevent new compactions, but perhaps too late. Memtables are flushed and these can trigger compaction." To solve that, let's stop compaction manager at a very early step of shutdown. We will still try to stop compaction manager in database::stop() because user may ask for a shutdown before scylla was fully started. It's fine to stop compaction manager twice. Only the first call will actually stop the manager. Fixes #1238. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <c64ab11f3c91129c424259d317e48abc5bde6ff3.1462496694.git.raphaelsc@scylladb.com>	2016-05-06 07:41:29 +03:00
Takuya ASADA	2bfc8e8c12	main: add tcp_syncookies sanity check Check net.ipv4.tcp_syncookies, show error message when it set to 0. Fixes #1118 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1460738415-3798-1-git-send-email-syuu@scylladb.com>	2016-04-21 14:55:26 +03:00
Avi Kivity	e43dbac836	main: cancel pending atomic deletions on shutdown A shared sstable must be compacted by all shards before it can be deleted. Since we're stoping, that's not going to happen. Cancel those pending deletions to let anyone waiting on them to continue.	2016-04-14 17:14:26 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Glauber Costa	e750a94300	sanity check Seastar's I/O queue configuration While Seastar in general can accept any parameter for its I/O queues, Scylla in particular shouldn't run with them disabled. Such will be the status when the max-io-requests parameter is not enabled. On top of that, we would like to have enough depth per I/O queue not to allow for shard-local parallelism. Therefore, we will require a minimum per-queue capacity of 4. In machines where the disk iodepth is not enough to allow for 4 concurrent requests per shard, one should reduce the number of I/O queues. For --max-io-requests, we will check the parameter itself. However, the --num-io-queues parameter is not mandatory, and given enough concurrent requests, Seastar's default configuration can very well just be doing the right thing. So for that, we will check the final result of each I/O queue. As it is the case with other checks of the sorts, this can be overridden by the --developer-mode switch. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <63bf7e91ac10c95810351815bb8f5e94d75592a5.1458836000.git.glauber@scylladb.com>	2016-03-25 11:33:57 +03:00
Gleb Natapov	48c83163b9	init: make more initialization threaded Since initialization now runs in a thread storage, messaging and gossiper services initialization code may take advantage of it too. Message-Id: <20160323094732.GF2282@scylladb.com>	2016-03-23 11:53:11 +02:00
Gleb Natapov	ea92064d38	avoid invoke_on_all during developer-mode application if possible Message-Id: <20160315145327.GW6117@scylladb.com>	2016-03-22 10:40:30 +02:00
Benoît Canet	3b1d3d977d	exceptions: Shutdown communications on non file I/O errors Apply the same treatment to non file filesystem I/O errors. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:54 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Pekka Enberg	69dacf9063	main: Fix broadcast_address and listen_address validation errors Fix the validation error message to look like this: Scylla version 666.development-20160316.49af399 starting ... WARN 2016-03-17 12:24:15,137 [shard 0] config - Option partitioner is not (yet) used. WARN 2016-03-17 12:24:15,138 [shard 0] init - NOFILE rlimit too low (recommended setting 200000, minimum setting 10000; you may run out of file descriptors. ERROR 2016-03-17 12:24:15,138 [shard 0] init - Bad configuration: invalid 'listen_address': eth0: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> > (Invalid argument) Exiting on unhandled exception of type 'bad_configuration_error': std::exception Instead of: Exiting on unhandled exception of type 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >': Invalid argument Fixes #1051. Message-Id: <1458210329-4488-1-git-send-email-penberg@scylladb.com>	2016-03-17 14:59:00 +02:00
Pekka Enberg	972fc6e014	main: Defer API server hooks until commitlog replay Defer registering services to the API server until commitlog has been replayed to ensure that nobody is able to trigger sstable operations via 'nodetool' before we are ready for them. Message-Id: <1458116227-4671-1-git-send-email-penberg@scylladb.com>	2016-03-17 10:04:35 +02:00
Asias He	d79dbfd4e8	main: Defer initalization of streaming Streaming is used by bootstrap and repair. Streaming uses storage_proxy class to apply the frozen_mutation and db/column_family class to invalidate row cache. Defer the initalization just before repair and bootstrap init. Message-Id: <8e99cf443239dd8e17e6b6284dab171f7a12365c.1458034320.git.asias@scylladb.com>	2016-03-15 11:56:34 +02:00
Pekka Enberg	eb13f65949	main: Defer REPAIR_CHECKSUM_RANGE RPC verb registration after commitlog replay Register the REPAIR_CHECKSUM_RANGE messaging service verb handler after we have replayed the commitlog to avoid responding with bogus checksums. Message-Id: <1458027934-8546-1-git-send-email-penberg@scylladb.com>	2016-03-15 11:56:18 +02:00
Gleb Natapov	5076f4878b	main: Defer storage proxy RPC verb registration after commitlog replay Message-Id: <20160315071229.GM6117@scylladb.com>	2016-03-15 09:18:12 +02:00
Pekka Enberg	1429213b4c	main: Defer migration manager RPC verb registration after commitlog replay Defer registering migration manager RPC verbs after commitlog has has been replayed so that our own schema is fully loaded before other other nodes start querying it or sending schema updates. Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com>	2016-03-14 18:03:16 +01:00
Glauber Costa	6c4e31bbdb	main: when scanning SSTables, run shard 0 first Deletion of previous stale, temporary SSTables is done by Shard0. Therefore, let's run Shard0 first. Technically, we could just have all shards agree on the deletion and just delete it later, but that is prone to races. Those races are not supposed to happen during normal operation, but if we have bugs, they can. Scylla's Github Issue #1014 is an example of a situation where that can happen, making existing problems worse. So running a single shard first and getting making sure that all temporary tables are deleted provides extra protection against such situations. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:06:05 -05:00
Gleb Natapov	16135c2084	make initialization run in a thread While looking at initialization code I felt like my head is going to explode. Moving initialization into a thread makes things a little bit better. Only lightly tested. Message-Id: <20160310163142.GE28529@scylladb.com>	2016-03-10 17:42:05 +01:00
Gleb Natapov	176aa25d35	fix developer-mode parameter application on SMP I am almost sure we want to apply it once on each shard, and not multiple times on a single shard. Message-Id: <20160310155804.GB28529@scylladb.com>	2016-03-10 17:17:48 +01:00
Pekka Enberg	5dd1fda6cf	main: Initialize system keyspace earlier We start services like gossiper before system keyspace is initialized which means we can start writing too early. Shuffle code so that system keyspace is initialized earlier. Refs #1014 Message-Id: <1457593758-9444-1-git-send-email-penberg@scylladb.com>	2016-03-10 10:39:27 +01:00
Avi Kivity	a1ff21f6ea	main: sanity check cpu support We require SSE 4.2 (for commitlog CRC32), verify it exists early and bail out if it does not. We need to check early, because the compiler may use newer instructions in the generated code; the earlier we check, the lower the probability we hit an undefined opcode exception. Message-Id: <1456665401-18252-1-git-send-email-avi@scylladb.com>	2016-02-29 11:41:54 +02:00
Takuya ASADA	0f87922aa6	main: notify service start completion ealier, to reduce systemd unit startup time Fixes #910 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1455830245-11782-1-git-send-email-syuu@scylladb.com>	2016-02-23 14:33:16 +02:00
Nadav Har'El	7dc843fc1c	repair: stop ongoing repairs during shutdown When shutting down a node gracefully, this patch asks all ongoing repairs started on this node to stop as soon as possible (without completing their work), and then waits for these repairs to finish (with failure, usually, because they didn't complete). We need to do this, because if the repair loop continues to run while we start destructing the various services it relies on, it can crash (as reported in #699, although the specific crash reported there no longer occurs after some changes in the streaming code). Additionally, it is important that to stop the ongoing repair, and not wait for it to complete its normal operation, because that can take a very long time, and shutdown is supposed to not take more than a few seconds. Fixes #699. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1455218873-6201-1-git-send-email-nyh@scylladb.com>	2016-02-14 16:52:41 +02:00
Gleb Natapov	2ae1ae2d18	Cleanup messaging_service.hh includes a bit. Forward declare some classes instead. Message-Id: <1454496142-14537-2-git-send-email-gleb@scylladb.com>	2016-02-04 13:22:24 +02:00

1 2 3 4

186 Commits