scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Author	SHA1	Message	Date
Takuya ASADA	2bfc8e8c12	main: add tcp_syncookies sanity check Check net.ipv4.tcp_syncookies, show error message when it set to 0. Fixes #1118 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1460738415-3798-1-git-send-email-syuu@scylladb.com>	2016-04-21 14:55:26 +03:00
Avi Kivity	e43dbac836	main: cancel pending atomic deletions on shutdown A shared sstable must be compacted by all shards before it can be deleted. Since we're stoping, that's not going to happen. Cancel those pending deletions to let anyone waiting on them to continue.	2016-04-14 17:14:26 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Glauber Costa	e750a94300	sanity check Seastar's I/O queue configuration While Seastar in general can accept any parameter for its I/O queues, Scylla in particular shouldn't run with them disabled. Such will be the status when the max-io-requests parameter is not enabled. On top of that, we would like to have enough depth per I/O queue not to allow for shard-local parallelism. Therefore, we will require a minimum per-queue capacity of 4. In machines where the disk iodepth is not enough to allow for 4 concurrent requests per shard, one should reduce the number of I/O queues. For --max-io-requests, we will check the parameter itself. However, the --num-io-queues parameter is not mandatory, and given enough concurrent requests, Seastar's default configuration can very well just be doing the right thing. So for that, we will check the final result of each I/O queue. As it is the case with other checks of the sorts, this can be overridden by the --developer-mode switch. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <63bf7e91ac10c95810351815bb8f5e94d75592a5.1458836000.git.glauber@scylladb.com>	2016-03-25 11:33:57 +03:00
Gleb Natapov	48c83163b9	init: make more initialization threaded Since initialization now runs in a thread storage, messaging and gossiper services initialization code may take advantage of it too. Message-Id: <20160323094732.GF2282@scylladb.com>	2016-03-23 11:53:11 +02:00
Gleb Natapov	ea92064d38	avoid invoke_on_all during developer-mode application if possible Message-Id: <20160315145327.GW6117@scylladb.com>	2016-03-22 10:40:30 +02:00
Benoît Canet	3b1d3d977d	exceptions: Shutdown communications on non file I/O errors Apply the same treatment to non file filesystem I/O errors. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:54 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Pekka Enberg	69dacf9063	main: Fix broadcast_address and listen_address validation errors Fix the validation error message to look like this: Scylla version 666.development-20160316.49af399 starting ... WARN 2016-03-17 12:24:15,137 [shard 0] config - Option partitioner is not (yet) used. WARN 2016-03-17 12:24:15,138 [shard 0] init - NOFILE rlimit too low (recommended setting 200000, minimum setting 10000; you may run out of file descriptors. ERROR 2016-03-17 12:24:15,138 [shard 0] init - Bad configuration: invalid 'listen_address': eth0: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> > (Invalid argument) Exiting on unhandled exception of type 'bad_configuration_error': std::exception Instead of: Exiting on unhandled exception of type 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >': Invalid argument Fixes #1051. Message-Id: <1458210329-4488-1-git-send-email-penberg@scylladb.com>	2016-03-17 14:59:00 +02:00
Pekka Enberg	972fc6e014	main: Defer API server hooks until commitlog replay Defer registering services to the API server until commitlog has been replayed to ensure that nobody is able to trigger sstable operations via 'nodetool' before we are ready for them. Message-Id: <1458116227-4671-1-git-send-email-penberg@scylladb.com>	2016-03-17 10:04:35 +02:00
Asias He	d79dbfd4e8	main: Defer initalization of streaming Streaming is used by bootstrap and repair. Streaming uses storage_proxy class to apply the frozen_mutation and db/column_family class to invalidate row cache. Defer the initalization just before repair and bootstrap init. Message-Id: <8e99cf443239dd8e17e6b6284dab171f7a12365c.1458034320.git.asias@scylladb.com>	2016-03-15 11:56:34 +02:00
Pekka Enberg	eb13f65949	main: Defer REPAIR_CHECKSUM_RANGE RPC verb registration after commitlog replay Register the REPAIR_CHECKSUM_RANGE messaging service verb handler after we have replayed the commitlog to avoid responding with bogus checksums. Message-Id: <1458027934-8546-1-git-send-email-penberg@scylladb.com>	2016-03-15 11:56:18 +02:00
Gleb Natapov	5076f4878b	main: Defer storage proxy RPC verb registration after commitlog replay Message-Id: <20160315071229.GM6117@scylladb.com>	2016-03-15 09:18:12 +02:00
Pekka Enberg	1429213b4c	main: Defer migration manager RPC verb registration after commitlog replay Defer registering migration manager RPC verbs after commitlog has has been replayed so that our own schema is fully loaded before other other nodes start querying it or sending schema updates. Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com>	2016-03-14 18:03:16 +01:00
Glauber Costa	6c4e31bbdb	main: when scanning SSTables, run shard 0 first Deletion of previous stale, temporary SSTables is done by Shard0. Therefore, let's run Shard0 first. Technically, we could just have all shards agree on the deletion and just delete it later, but that is prone to races. Those races are not supposed to happen during normal operation, but if we have bugs, they can. Scylla's Github Issue #1014 is an example of a situation where that can happen, making existing problems worse. So running a single shard first and getting making sure that all temporary tables are deleted provides extra protection against such situations. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:06:05 -05:00
Gleb Natapov	16135c2084	make initialization run in a thread While looking at initialization code I felt like my head is going to explode. Moving initialization into a thread makes things a little bit better. Only lightly tested. Message-Id: <20160310163142.GE28529@scylladb.com>	2016-03-10 17:42:05 +01:00
Gleb Natapov	176aa25d35	fix developer-mode parameter application on SMP I am almost sure we want to apply it once on each shard, and not multiple times on a single shard. Message-Id: <20160310155804.GB28529@scylladb.com>	2016-03-10 17:17:48 +01:00
Pekka Enberg	5dd1fda6cf	main: Initialize system keyspace earlier We start services like gossiper before system keyspace is initialized which means we can start writing too early. Shuffle code so that system keyspace is initialized earlier. Refs #1014 Message-Id: <1457593758-9444-1-git-send-email-penberg@scylladb.com>	2016-03-10 10:39:27 +01:00
Avi Kivity	a1ff21f6ea	main: sanity check cpu support We require SSE 4.2 (for commitlog CRC32), verify it exists early and bail out if it does not. We need to check early, because the compiler may use newer instructions in the generated code; the earlier we check, the lower the probability we hit an undefined opcode exception. Message-Id: <1456665401-18252-1-git-send-email-avi@scylladb.com>	2016-02-29 11:41:54 +02:00
Takuya ASADA	0f87922aa6	main: notify service start completion ealier, to reduce systemd unit startup time Fixes #910 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1455830245-11782-1-git-send-email-syuu@scylladb.com>	2016-02-23 14:33:16 +02:00
Nadav Har'El	7dc843fc1c	repair: stop ongoing repairs during shutdown When shutting down a node gracefully, this patch asks all ongoing repairs started on this node to stop as soon as possible (without completing their work), and then waits for these repairs to finish (with failure, usually, because they didn't complete). We need to do this, because if the repair loop continues to run while we start destructing the various services it relies on, it can crash (as reported in #699, although the specific crash reported there no longer occurs after some changes in the streaming code). Additionally, it is important that to stop the ongoing repair, and not wait for it to complete its normal operation, because that can take a very long time, and shutdown is supposed to not take more than a few seconds. Fixes #699. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1455218873-6201-1-git-send-email-nyh@scylladb.com>	2016-02-14 16:52:41 +02:00
Gleb Natapov	2ae1ae2d18	Cleanup messaging_service.hh includes a bit. Forward declare some classes instead. Message-Id: <1454496142-14537-2-git-send-email-gleb@scylladb.com>	2016-02-04 13:22:24 +02:00
Tomasz Grabiec	355874281a	sstables: Do not register exit hooks from static initializer Fixes #868. Registerring exit hooks while reactor is already iterating over exit hooks is not allowed and currently leads to undefined behavior observed in #868. While we should make the failure more user friendly, registering exit hooks concurrently with shutdown will not be allowed. We don't expect exit hooks to be registered after exit starts because this would violate the guarantee which says that exit hooks are executed in reverse order of registration. Starting exit sequence in the middle of initialization sequence would result in use after free errors. Btw, I'm not sure if currently there's anything which prevents this To solve this problem, move the exit hook to initilization sequence. In case of tests, the cleanup has to be called explicitly.	2016-02-03 17:35:50 +01:00
Takuya ASADA	4162fb158c	main: raise SIGSTOP only when scylla become ready supervisor_notify() calls periodically, to log message on systemd. So raise(SIGSTOP) will called multiple times, upstart doesn't expected that. We need to call it just one time. Fixes #846 Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-01-27 23:30:26 +09:00
Takuya ASADA	b4accd8904	main: autodetect systemd/upstart We can autodetect systemd/upstart by environment variables, don't need program argument. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-01-27 23:29:32 +09:00
Asias He	b2f2c1c28c	storage_service: Add drain on shutdown logic We register engine().at_exit() callbacks when we initialize the services. We do not really call the callbacks at the moment due to #293. It is pretty hard to see the whole picture in which order the services are shutdown. Instead of for each services to register a at_exit() callbacks, I proposal to have a single at_exit() callback which do the shutdown for all the services. In cassandra, the shutdown work is done in storage_service::drain_on_shutdown callbacks. In this patch, the drain_on_shutdown is executed during shutdown. As a result, the proper gossip shutdown is executed and fixes #790. With this patch, when Ctrl-C on a node, it looks like: INFO [shard 0] storage_service - Drain on shutdown: starts INFO [shard 0] gossip - Announcing shutdown INFO [shard 0] storage_service - Node 127.0.0.1 state jump to normal INFO [shard 0] storage_service - Drain on shutdown: stop_gossiping done INFO [shard 0] storage_service - CQL server stopped INFO [shard 0] storage_service - Drain on shutdown: shutdown rpc and cql server done INFO [shard 0] storage_service - Drain on shutdown: shutdown messaging_service done INFO [shard 0] storage_service - Drain on shutdown: flush column_families done INFO [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO [shard 0] storage_service - Drain on shutdown: done	2016-01-27 11:45:52 +08:00
Amnon Heiman	b1845cddec	Breaking the API initialization into stages The API needs to be available at an early stage of the initialization, on the other hand not all the specific APIs are available at that time. This patch breaks the API initialization into stages, in each stage additional commands will be available. While setting that the api header files was broken into api_init.hh that is relevent to the main and to api.hh which holds the different api helper functions. Fixes #754 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1453822331-16729-2-git-send-email-amnon@scylladb.com>	2016-01-26 17:41:31 +02:00
Avi Kivity	71eb79aedd	main: exit with code 0 on shutdown To avoid confusing systemd. Fixes #823. Message-Id: <1453220473-28712-1-git-send-email-avi@scylladb.com>	2016-01-26 16:26:53 +02:00
Takuya ASADA	b92a075a34	main: support supervisor_notify() on Ubuntu Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1453422886-26297-1-git-send-email-syuu@scylladb.com>	2016-01-24 12:10:41 +02:00
Pekka Enberg	733584c44d	main: Start the API service as the last step This reverts commit `f0d68e4` ("main: start the http server in the first step"). The service layer is not ready to serve clients before it's fully up and running which causes early startup crashes everywhere. Message-Id: <1452768015-22763-1-git-send-email-penberg@scylladb.com>	2016-01-14 12:55:50 +02:00
Avi Kivity	39f81b95d6	main: make --developer-mode relax dma requirements With Docker we might be running on a filesystem that does not support DMA (aufs; or tmpfs on boot2docker), so let --developer-mode allow running on those file systems. Message-Id: <1452593083-25601-1-git-send-email-avi@scylladb.com>	2016-01-12 13:34:46 +02:00
Avi Kivity	3d5f6de683	main: notify systemd of startup progress Send current startup stage via sd_notify STATUS variable; let it know that startup is complete via READY=1. Fixes #760.	2016-01-12 11:58:24 +02:00
Avi Kivity	3377739fa3	main: wait for API http server to start Wait for the future returned by the http server start process to resolve, so we know it is started. If it doesn't, we'll hit the or_terminate() further down the line and exit with an error code. Message-Id: <1452092806-11508-3-git-send-email-avi@scylladb.com>	2016-01-07 16:44:07 +02:00
Asias He	933614bdf9	main: Change API server starting message It comes from the Seastar HTTP server and is inaccurate. Message-Id: <6a634437d2bd4368400010e25969e215894c2df9.1452162686.git.asias@scylladb.com>	2016-01-07 15:53:28 +02:00
Asias He	8c909122a6	gossip: Add wait_for_gossip_to_settle Implement the wait for gossip to settle logic in the bootup process. CASSANDRA-4288 Fixes: bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test 1) start node2 2) wait for cql connection with node2 is ready 3) stop node2 4) delete data and commitlog directory for node2 5) start node2 In step 5, sometimes I saw in shadow round of node2, it gets node2's status as BOOT from other nodes in the cluster instead of NORMAL. The problem is we do not wait for gossip to settle before we start cql server, as a result, when we stop node2 in step 3), other nodes in the cluster have not got node2's status update to NORMAL.	2016-01-07 10:09:25 +02:00
Nadav Har'El	f5b2135a80	repair: repair_checksum_range message This patch adds a new type of message, "REPAIR_CHECKSUM_RANGE" to scylla's "messaging_service" RPC mechanism, for the use of repair: With this message the repair's master host tells a slave host to calculate the checksum of a column-family's partitions in a given token range, and return that checksum. The implementation of this message uses the checksum_range() function defined in the previous patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-01-05 15:38:40 +02:00
Avi Kivity	2ba4910385	main: verify that the NOFILE rlimit is sufficient Require 10k files, recommend 200k. Allow bypassing via --developer-mode. Fixes #692.	2015-12-30 11:02:08 +02:00
Avi Kivity	c26689f325	init: bail out if running not on an XFS filesystem Allow an override via '--developer-mode true', and use it in the docker setup, since that cannot be expected to use XFS. Fixes #658.	2015-12-30 10:56:21 +02:00
Amnon Heiman	f0d68e4161	main: start the http server in the first step This change set the http server to start as the first step in the boot order. It is helpfull if some other step takes a long time or stuck. Fixes #725 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-29 14:20:57 +02:00
Pekka Enberg	ca1f9f1c9a	main: Fix implicitly disabled client encryption options The start_native_transport() function in storage_service expects the 'enabled' option to be defined. If the option is not defined, it means that encryption is implicitly disabled. Fixes #718.	2015-12-28 16:24:49 +02:00
Calle Wilund	fae3bb7a24	storage_service: Set up CQL server as SSL if specified * Massage user options in main * Use them in storage_service, and if needed, load certificates etc and pass to transport/cql server. Conflicts: service/storage_service.cc	2015-12-28 10:13:48 +00:00
Calle Wilund	70f293d82e	main/init: Use server_encryption_options * Reads server_encryption_options * Interpret the above, and load and initialize credentials and use with messaging service init if required	2015-12-28 10:10:35 +00:00
Glauber Costa	e299127e81	main: check if options file can be read. If we can't open the file, we will fail with a misterious error. It is a costumary scenario, though, since people who are unaware or have just forgotten about seastar's restriction of direct io access may put those files in tmpfs and other mount points. We have a direct_io check that is designed exactly for this purpose, so as to give the user a better error message. This patch makes use of it. Fixes #644 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2015-12-27 12:20:40 +02:00
Avi Kivity	167addbfe1	main: remove issue #417 (poll mode) warning Fixed.	2015-12-09 19:00:32 +02:00
Asias He	2022117234	failure_detector: Enable phi_convict_threshold option Adjusts the sensitivity of the failure detector on an exponential scale. Use as: $ scylla --phi-convict-threshold 9 Default to 8.	2015-11-30 11:09:36 +02:00
Asias He	7ddf8963f5	config: Enable broadcast_rpc_address option With this patch, start two nodes node 1: scylla --rpc-address 127.0.0.1 --broadcast-rpc-address 127.0.0.11 node 2: scylla --rpc-address 127.0.0.2 --broadcast-rpc-address 127.0.0.12 On node 1: cqlsh> SELECT rpc_address from system.peers; rpc_address ------------- 127.0.0.12 which means client should use this address to connect node 2 for cql and thrift protocol.	2015-11-24 10:07:31 +08:00
Asias He	2c8867c348	config: Enable storage_port option	2015-10-29 08:58:41 +08:00
Asias He	8218ab7922	storage_service: Implement start_native_transport and start_rpc_server They are used for APIs. Share the code in main.cc as well.	2015-10-27 21:48:37 +08:00
Pekka Enberg	a772938e73	transport/server: Round-robin CQL request load balancing Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-10-27 13:24:58 +02:00
Vlad Zolotarov	5613979a85	utils::fb_utilities: add the ability to set a broadcast address Add utils::fb_utilities::set_broadcast_address(). Set it to either broadcast_address or listen_address configuration value if appropriate values are set. If none of the two values above are set - abort the application. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Simplify the utils::fb_utilities::get_broadcast() logic.	2015-10-26 14:10:39 +02:00

1 2 3 4

158 Commits