scylladb

Author	SHA1	Message	Date
Vlad Zolotarov	2d8fcde695	init: add a proper message when there is a bad 'seeds' configuration Fixes #2193 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1490912678-32004-1-git-send-email-vladz@scylladb.com>	2017-04-02 10:41:52 +03:00
Tomasz Grabiec	388315c1ff	sstables: Expose index metrics	2017-03-28 18:10:39 +02:00
Amnon Heiman	7b04841dda	main: Name the http servers In main there are two http servers that start, the API and prometheus. This patch name them accordingly so their metrics will have more meaning. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1489055282-10887-1-git-send-email-amnon@scylladb.com>	2017-03-09 12:30:49 +02:00
Amnon Heiman	4e8d73098f	main: Prometheus should start as early as possible There is no need to wait when starting the prometheus server. As it is up to each of the modules to register its metrics when it is ready. This is especially important when debuging boot issues. This patch moves the prometheus initilization to be done at an early stage of the boot sequencec. Fixes #2144 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1489041986-28974-1-git-send-email-amnon@scylladb.com>	2017-03-09 11:26:51 +02:00
Vlad Zolotarov	f2e4629254	main.cc: expose scylla version as a gauge metrics Add a new metric that exposes the current ScyllaDB version as a gauge metrics. The version is exposed as a label with the "version" key. Fixes #1979 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1487083703-27929-1-git-send-email-vladz@scylladb.com>	2017-02-16 16:57:55 +02:00
Calle Wilund	c4c4eb06c4	main.cc: remove scylla dns dependency Use seastar facilities instead.	2017-02-06 11:36:57 +00:00
Calle Wilund	feffc2bbe1	main/init: Lookup inet addresses from config by dns lookup I.e. allow symbolic names in addition to ip addresses.	2017-02-06 09:45:37 +00:00
Calle Wilund	ff8f82f21c	scylla tls: Add option support for client auth and tls opts Refs #1813 (fixes scylla part) Added require_client_auth and priority_string options to server_encryption_options/client_encryption_options an process them. Allows TLS method/algo specification. Also enabled enforcing known cert authentication for both node-to-node and client communication.	2017-02-06 09:45:09 +00:00
Avi Kivity	0591303b72	Merge "avoid excessive memory usage during resharding" from Rapahel "Intended to reduce memory usage when resharding by sharing sstable components among shards. File descriptors are also shared from now on, meaning that a much smaller number of file descriptors will be used during resharding. Fixes #1951." branch 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla * 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla: db: avoid excessive memory usage during resharding checked_file_impl: add support to dup sstables: group sstable components that can be shared among shards sstables: rename sstable member	2017-01-09 20:43:50 +02:00
Raphael S. Carvalho	68dfcf5256	db: avoid excessive memory usage during resharding After resharding, sstables may be owned by all shards, which means that file descriptors and memory usage for metadata will increase by a factor equal to number of shards. That can easily lead to OOM. SSTable components are immutable, so they can be stored in one shard and shared with others that need it. We use the following formula to decide which shard will open the sstable and share it with the others: (generation % smp::count), which is the inverse of how we calculate generation for new sstables. So if no resharding is performed, everything is shard-local. With this approach, resource usage due to loaded sstables will be evenly distributed among shards. For this approach to work, we now only populate keyspaces from shard 0. It's now the sole responsible for iterating through column family dirs. In addition, most of population functions are now free and take distributed database object as parameter. Fixes #1951. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-09 15:24:36 -02:00
Avi Kivity	85f4e16336	main: fix incorrect low memory warning A spurious division by smp::count warns that memory is low even when plenty is available. Fix by removing the division. Fix #2002. Message-Id: <20170108122216.27233-1-avi@scylladb.com> Tested-by: Benoît Canet <benoit@scylladb.com>	2017-01-08 15:14:36 +02:00
Gleb Natapov	9ed3346f98	main: fix error reporting about low memory Message-Id: <20170108112144.GT1829@scylladb.com>	2017-01-08 13:46:48 +02:00
Vlad Zolotarov	492295eb7f	init: move supervisor_notify() out of main.cc Transform the supervisor_notify() and related functions into the "supervisor" class and place this class implementation in a separate .cc file. This is going to fix the compilation breakage of tests introduced by a commit `8014adc2a1` init: serialize the creation of system_traces KS objects Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1483663955-20096-1-git-send-email-vladz@scylladb.com>	2017-01-06 10:10:55 +00:00
Vlad Zolotarov	8014adc2a1	init: serialize the creation of system_traces KS objects Serialize the creation of a system_traces KS objects when they do not exist - the initial cluster boot. Avoid creating them in parallel by different cluster Nodes in order to avoid issue #420. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1483552503-12873-3-git-send-email-vladz@scylladb.com>	2017-01-05 12:41:38 +01:00
Nadav Har'El	45f19f2633	main: better error message on failing to start Prometheus Previously, if the Prometheus port (by default, 0.0.0.0:9180) could not be opened, the following message appeared in the log about 10 seconds into the run, and Scylla crashed. ERROR 2017-01-01 19:31:04,066 [shard 0] seastar - Exiting on unhandled exception: std::system_error (error system:98, Address already in use) The puzzled user would have no idea which address was already in use, why, or why Scylla stopped. In this patch, before the above message we get the much more informative message: ERROR 2017-01-01 19:58:19,080 [shard 0] init - Could not start Prometheus API server on 0.0.0.0:9180: std::system_error (error system:98, Address already in use) We continue to print the original message - and exit - in this case, under the assumption that it's better not to run the database while improperly configured. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170102121304.2060-1-nyh@scylladb.com>	2017-01-04 14:58:26 +02:00
Avi Kivity	339cc0c2fa	main: verify sufficient memory per shard Refuse to boot if we don't have at least 1 GiB per shard, unless in developer mode. The primary violator here is docker, but since it starts in developer mode, it won't get fixed. We need some extra logic for this case. Message-Id: <20161221090222.28677-1-avi@scylladb.com>	2016-12-27 12:05:52 +02:00
Amnon Heiman	70b2a1bfd4	Set the prometheus prefix to scylla This patch make the prometheus prefix configurable and set the default value to scylla. Fixes #1964 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1482671970-21487-1-git-send-email-amnon@scylladb.com>	2016-12-25 15:21:53 +02:00
Raphael S. Carvalho	27fb8ec512	db: avoid excessive disk usage during sstable resharding Shared sstables will now be resharded in the same order to guarantee that all shards owning a sstable will agree on its deletion nearly the same time, therefore, reducing disk space requirement. That's done by picking which column family to reshard in UUID order, and each individual column family will reshard its shared sstables in generation order. Fixes #1952. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <87ff649ed24590c55c00cbb32bffd8fa2743e36e.1482342754.git.raphaelsc@scylladb.com>	2016-12-21 23:18:06 +02:00
Vlad Zolotarov	62cad0f5f5	tracing: don't start tracing until a Tracing service is fully initialized RPC messaging service is initialized before the Tracing service, so we should prevent creation of tracing spans before the service is fully initialized. We will use an already existing "_down" state and extend it in a way that !_down equals "started", where "started" is TRUE when the local service is fully initialized. We will also split the Tracing service initialization into two parts: 1) Initialize the sharded object. 2) Start the tracing service: - Create the I/O backend service. - Enable tracing. Fixes issue #1939 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1481836429-28478-1-git-send-email-vladz@scylladb.com>	2016-12-21 12:40:14 +02:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Tomasz Grabiec	f7197dabf8	commitlog: Fix replay to not delete dirty segments The problem is that replay will unlink any segments which were on disk at the time the replay starts. However, some of those segments may have been created by current node since the boot. If a segment is part of reserve for example, it will be unlinked by replay, but we will still use that segment to log mutations. Those mutations will not be visible to replay after a crash though. The fix is to record preexisting segents before any new segments will have a chance to be created and use that as the replay list. Introduced in `abe7358767`. dtest failure: commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup Message-Id: <1481117436-6243-1-git-send-email-tgrabiec@scylladb.com>	2016-12-07 15:54:47 +02:00
Takuya ASADA	2976799ef2	main: fix startup failing on Ubuntu 15.10/16.04 Since Ubuntu 15.10/16.04 still uses Upstart to manage GUI session (not as init), when we directly launch Scylla on Ubuntu's GUI Terminal(not using systemctl or initctl), raise(SIGSTOP) mistakenly calls (Because GUI session has "UPSTART_JOB" environment variable, won't happen when running Scylla as systemd service). To avoid this, we need to verify UPSTART_JOB == "scylla-server". If it's part of GUI session UPSTART_JOB has to be "unity7", we need to avoid raise(SIGSTOP) in that case. Fixes #1199 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1480620421-28967-1-git-send-email-syuu@scylladb.com>	2016-12-05 16:28:25 +02:00
Avi Kivity	28857e42e7	Merge " Virtualize size_estimates system table" from Duarte "We currently write the size_estimates system table for every schema on a periodic basis, currently set to 5 minutes, which can interfere with an ongoing workload. This patchset virtualizes it such that queries are intercepted and we calculate the results on the fly, only for the ranges the caller is interested in. Fixes #1616" * 'virtual-estimates/v4' of github.com:duarten/scylla: size_estimates_virtual_reader: Add unit test db: Delete size_estimates_recorder size_estimates: Add virtual reader column_family: Add support for virtual readers storage_service: get_local_tokens() returns a future nonwrapping_range: Add slice() function range: Find a sequence's lower and upper bounds system_keyspace: Build mutations for size estimates size_estimates: Store the token range as bytes range_estimates: Add schema murmur3_partitioner: Convert maximum_token to sstring	2016-11-28 10:12:59 +02:00
Avi Kivity	07d5a20bae	Wire up sharding ignore msb parameter to configuration We might have used a fancy map<sstring, any> to pass the parameters, but that's overkill for now.	2016-11-22 22:40:47 +02:00
Duarte Nunes	6a37d87c76	db: Delete size_estimates_recorder Now that access to the size_estimates system is virtualized, we no longer need the recorder. Fixes #1616 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 11:15:05 +00:00
Raphael S. Carvalho	9a9f0d3a0f	main: fix exception handling when initializing data or commitlog dirs Exception handling was broken because after io checker, storage_io_error exception is wrapped around system error exceptions. Also the message when handling exception wasn't precise enough for all cases. For example, lack of permission to write to existing data directory. Fixes #883. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b2dc75010a06f16ab1b676ce905ae12e930a700a.1478542388.git.raphaelsc@scylladb.com>	2016-11-14 12:34:10 +02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Takuya ASADA	587d375e19	main: exit with 1 when verify_seastar_io_scheduler() failed Since we are exiting Scylla process in engine().at_exit() using ::_exit(0), even verify_seastar_io_scheduler() throwing an exception, scylla always exit with 0. Systemd misunderstands scylla-server.service was shutdown successfully because of this, so we need to pass correct exit code to ::_exit() here. Fixes #1674 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1475065607-15486-1-git-send-email-syuu@scylladb.com>	2016-10-17 13:57:00 +03:00
Raphael S. Carvalho	76862d0d9c	main: start compaction procedure after commit log is replayed Commit log replay is a synchronous operation in bootstrap, so services will only be started after it's completed. By starting compaction before, less bandwidth will be available to both and consequently boot will be slowed down. Fix is simply about moving compaction, which is an asynchronous operation after commitlog replay is over. Fixes #1620. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d2a173a4ee4d474317b970c6b39530e61067fea9.1475527955.git.raphaelsc@scylladb.com>	2016-10-06 18:25:24 +03:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Gleb Natapov	26ae8e8365	implement listen_on_broadcast_address option When using multiple physical network interfaces, set this to true to listen on broadcast_address in addition to the listen_address, allowing nodes to communicate in both interfaces. Ignore this property if the network configuration automatically routes between the public and private networks such as EC2. Message-Id: <20160921094810.GA28654@scylladb.com>	2016-09-26 08:49:54 +03:00
Pekka Enberg	f1d0401ed2	main: Use proper logger for API server messages We have a "startlog" that we can use to print out API server messages. Message-Id: <1474358312-26510-1-git-send-email-penberg@scylladb.com>	2016-09-20 11:09:59 +03:00
Tomasz Grabiec	9476bc5a31	Introduce --abort-on-lsa-bad-alloc command line option Useful for triggerring core dump on allocation failure inside LSA, which makes it easier to debug allocation failures. They normally don't cause aborts, just fail the current operation, which makes it hard to figure out what was the cause of allocation failure. Message-Id: <1470233631-18508-1-git-send-email-tgrabiec@scylladb.com>	2016-08-03 17:26:44 +03:00
Amnon Heiman	bb4268a8a5	Add prometheus API This patch adds the prometheus API it adds the proto library to the compilation, adds an optional configuration parameter to change the prometheus listening port and start the prometheus API in main. To disable the prometheus API, set its listening port to 0. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1470228764-19545-2-git-send-email-amnon@scylladb.com>	2016-08-03 15:55:18 +03:00
Duarte Nunes	9ffdf4a5cd	db: Implement size_estimates_recorder This patch implements the size_estimates_recorder, which periodically writes estimations for all the non-system column families in the size_estimates system table. The size_estimates_recorder class corresponds to the one in Cassandra's SizeEstimatesRecorder.java. Estimation is carried out by shard 0. Since we're estimating based on data in shared sstables, having multiple shards doing this would skew the results. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-19 09:44:58 +00:00
Paweł Dziepak	7e06499458	repair: convert hashing to streamed_mutations This patch makes hashing for repair calculate checksums in a way that doesn't require rebuilding whole mutation. Unfortunately, such checksums are incompatible with the old ones so the old way for computing checksums is preserved for compatibility reasons. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Gleb Natapov	726b79ea91	messaging_service: enable internode_compression option Use LZ4 for internode compression if enabled. Message-Id: <20160711141734.GZ18455@scylladb.com>	2016-07-11 18:30:21 +03:00
Raphael S. Carvalho	85cb2a6d35	database: trigger compaction on boot At the moment, we only trigger compaction after creating a new sstable as a result of memtable flush, or some other event such as changing compaction strategy of a column family. However, it's important to trigger compaction on boot too. That will happen after loading all column families. Fixes #1404. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <54d38a418157454eec97aaba6b8a6b6e51484db4.1467135349.git.raphaelsc@scylladb.com>	2016-06-29 13:47:42 +03:00
Avi Kivity	5b81448ed6	main: add scylla --version option Fixes #1384. Message-Id: <1466691517-29964-1-git-send-email-avi@scylladb.com>	2016-06-23 16:24:03 +02:00
Avi Kivity	5af22f6cb1	main: handle exceptions during startup If we don't, std::terminate() causes a core dump, even though an exception is sort-of-expected here and can be handled. Add an exception handler to fix. Fixes #1379. Message-Id: <1466595221-20358-1-git-send-email-avi@scylladb.com>	2016-06-23 09:25:33 +03:00
Nadav Har'El	3372052d48	Rewriting shared sstables only after all shards loaded sstables After commit `faa4581`, each shard only starts splitting its shared sstables after opening all sstables. This was important because compaction needs to be aware of all sstables. However, another bug remained: If one shard finishes loading its sstables and starts the splitting compactions, and in parallel a different shard is still opening sstables - the second shard might find a half-written sstable being written by the first shard, and abort on a malformed sstable. So in this patch we start the shared sstable rewrites - on all shards - only after all shards finished loading their sstables. Doing this is easy, because main.cc already contains a list of sequential steps where each uses invoke_on_all() to make sure the step completes on all shards before continuing to the next step. Fixes #1371 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>	2016-06-20 16:25:24 +03:00
Avi Kivity	85bb5ea064	Merge "Reduce LSA reclaim latency" from Tomasz "Reclaiming many segments was observed to cause up to multi-ms latency. With the new setting, the latency of reclamation cycle with full segments (worst case mode) is below 1ms. I saw no difference in throughput in a CQL write micro benchmark in neither of these workloads: - full segments, reclaim by random eviction - sparse segments (3% occupancy), reclaim by compaction and no eviction Fixes #1274."	2016-06-16 10:47:57 +03:00
Tomasz Grabiec	75f899cc93	lsa: Make reclamation step configurable via config	2016-06-14 15:13:15 +02:00
Vlad Zolotarov	d3960f0bbb	tracing: rearrange shut down tracing::tracing local instance is dereferenced from a cql_server::connection::process_request(), therefore tracing::tracing service may be stop()ed only after a CQL server service is down. On the other hand it may not be stopped before RPC service is down because a remote side may request a tracing for a specific command too. This patch splits the tracing::tracing stop() into two phases: 1) Flush all pending tracing records and stop the backend. 2) Stop the service. The first phase is called after CQL server is down and before RPC is down. The second phase is called after RPC is down. Fixes #1339 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465840496-19990-1-git-send-email-vladz@cloudius-systems.com>	2016-06-14 07:58:04 +03:00
Asias He	e6f63a50e1	main: Delay the messaging_service api registration Since messaging_service is fully initialized in storage_service::init_server which calls messaging_service::start_listen, we need to delay the messaging_service api registration after it.	2016-06-08 11:13:35 +08:00
Vlad Zolotarov	4b43b08ffc	main: start a tracing service Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Pekka Enberg	0255318bf3	Revert "Revert "main: change order between storage service and drain execution during exit"" This reverts commit `b3ed55be1d`. The issue is in the failing dtest, not this commit. Gleb writes: "The bug is in the test, not the patch. Test waits for repair session to end one way or the other when node is killed, but for nodetool to know if repair is completed it needs to poll for it. If node dies before nodetool managed to see repair completion it will stuck forever since jmx is alive, but does not provide answers any more. The patch changes timing, repair is completed much close to exit now, so problem appears, but it may happen even without the patch. The fix is for dtest to kill jmx as part of killing a node operation." Now that Lucas fixed the problem in scylla-ccm, revert the revert.	2016-06-01 08:48:50 +03:00
Pekka Enberg	b3ed55be1d	Revert "main: change order between storage service and drain execution during exit" This reverts commit `0ebd8b18b7`. The change breaks repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test	2016-05-30 12:48:09 +03:00
Avi Kivity	b50cb3eca8	config: rename compact_on_idle compact_on_idle will lead users to thinking we're talking about sstable compaction, not log-structured-allocator compaction. Rename the variable to reduce the probability of confusion. Message-Id: <1464261650-14136-1-git-send-email-avi@scylladb.com>	2016-05-30 08:39:13 +03:00
Gleb Natapov	0ebd8b18b7	main: change order between storage service and drain execution during exit Even the comment says drain_on_shutdown should be called first, but for that in has to be registered last. Fixes #862 Message-Id: <1463579574-15789-2-git-send-email-gleb@scylladb.com>	2016-05-29 11:39:24 +03:00

1 2 3 4 5

211 Commits