scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Vlad Zolotarov	89375d4c2a	service::storage_proxy: tracing: instrument read_digest and read_mutation_data Instrument read_digest and read_mutation_data handlers similarly to a read_data handler instrumentation. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465304055-4263-1-git-send-email-vladz@cloudius-systems.com>	2016-06-09 14:32:42 +02:00
Pekka Enberg	8df5aa7b0c	utils/exceptions: Whitelist EEXIST and ENOENT in should_stop_on_system_error() There are various call-sites that explicitly check for EEXIST and ENOENT: $ git grep "std::error_code(E" database.cc: if (e.code() != std::error_code(EEXIST, std::system_category())) { database.cc: if (e.code() != std::error_code(ENOENT, std::system_category())) { database.cc: if (e.code() != std::error_code(ENOENT, std::system_category())) { database.cc: if (e.code() != std::error_code(ENOENT, std::system_category())) { sstables/sstables.cc: if (e.code() == std::error_code(ENOENT, std::system_category())) { sstables/sstables.cc: if (e.code() == std::error_code(ENOENT, std::system_category())) { Commit `961e80a` ("Be more conservative when deciding when to shut down due to disk errors") turned these errors into a storage_io_exception that is not expected by the callers, which causes 'nodetool snapshot' functionality to break, for example. Whitelist the two error codes to revert back to the old behavior of io_check(). Message-Id: <1465454446-17954-1-git-send-email-penberg@scylladb.com>	2016-06-09 10:03:04 +02:00
Pekka Enberg	02d033667a	utils: Improve storage_io_exception error message Make storage_io_exception exception error message less cryptic by actually including the human-readable error message from std::system_error... Before: nodetool: Scylla API server HTTP POST to URL '/storage_service/snapshots' failed: Storage io error errno: 2 After: nodetool: Scylla API server HTTP POST to URL '/storage_service/snapshots' failed: Storage I/O error: 2: No such file or directory We can improve this further by including the name of the file that the I/O error happened on. Message-Id: <1465452061-15474-1-git-send-email-penberg@scylladb.com>	2016-06-09 09:58:00 +02:00
Tomasz Grabiec	d5a2d7a88d	row_cache: Add eviciton and removal counters Fixes #1273. Message-Id: <1465315433-8473-1-git-send-email-tgrabiec@scylladb.com>	2016-06-08 16:08:32 -04:00
Nadav Har'El	721f7d1d4f	Rewrite shared sstables soon after startup Several shards may share the same sstable - e.g., when re-starting scylla with a different number of shards, or when importing sstables from an external source. Sharing an sstable is fine, but it can result in excessive disk space use because the shared sstable cannot be deleted until all the shards using it have finished compacting it. Normally, we have no idea when the shards will decide to compact these sstables - e.g., with size- tiered-compaction a large sstable will take a long time until we decide to compact it. So what this patch does is to initiate compaction of the shared sstables - on each shard using it - so that a soon as possible after the restart, we will have the original sstable is split into separate sstables per shard, and the original sstable can be deleted. If several sstables are shared, we serialize this compaction process so that each shard only rewrites one sstable at a time. Regular compactions may happen in parallel, but they will not not be able to choose any of the shared sstables because those are already marked as being compacted. Commit `3f2286d0` increased the need for this patch, because since that commit, if we don't delete the shared sstable, we also cannot delete additional sstables which the different shards compacted with it. For one scylla user, this resulted in so much excessive disk space use, that it literally filled the whole disk. After this patch commit `3f2286d0`, or the discussion in issue #1318 on how to improve it, is no longer necessary, because we will never compact a shared sstable together with any other sstable - as explained above, the shared sstables are marked as "being compacted" so the regular compactions will avoid them. Fixes #1314. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1465406235-15378-1-git-send-email-nyh@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-06-08 15:44:29 -04:00
Raphael S. Carvalho	1b8e170254	compaction: retry compaction until strategy is satisfied Previously, we were using a stat to decide if compaction should be retried, but that's not efficient. The information is also lost after node is restarted. After these changes, compaction will be retried until strategy is satisfied, i.e. there is nothing to compact. We will now be doing the following in a loop: Get compaction job from compaction strategy. If cannot run, finish the loop. Otherwise, compact this column family. Go back to start of the loop. By the way, pending_compactions stat will be deprecated after this commit. Previously, it was increased to indicate the want for compaction and decreased when compaction finished. Now, we can compact more than we asked for, so it would be decreased below 0. Also, it's the strategy that will tell the want for compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <899df0d8d807f6b5d9bb8600d7c63b4e260cc282.1465398243.git.raphaelsc@scylladb.com>	2016-06-08 11:31:56 -04:00
Avi Kivity	7bd4b7ca63	cql3: split use_statement into raw and prepared variants Rather than having one class fulfil both roles, have one class per role, disentangling dependencies. Message-Id: <1465053407-20931-1-git-send-email-avi@scylladb.com>	2016-06-08 16:48:45 +03:00
Yoav Kleinberger	43071bf488	tools/scyllatop: handle absentee metrics Sometimes a metric previously reported from collectd is not available anymore. Previously, this caused scyllatop to log and exception to the user - which in effect destroyes the user experience and inhibits monitoring other metrics. This patch makes ScyllaTop handle this problem. It will display such metrics and 'not available', and exclude them from some and average computations. Closes issue #1287. Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <1465301178-27544-1-git-send-email-yoav@scylladb.com>	2016-06-08 16:35:55 +03:00
Vlad Zolotarov	24624b2600	tests/gossiping_property_file_snitch_test: cancel O_DIRECT enforcement Cancel O_DIRECT enforcement on shard 0 (a default I/O shard for this snitch) to ensure proper functioning on any FS (e.g. ecryptfs). Otherwise tests fails on file systems not supporting O_DIRECT. Fixes #1324 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465385087-20510-1-git-send-email-vladz@cloudius-systems.com>	2016-06-08 16:21:44 +03:00
Tomasz Grabiec	287ff7dbd3	Merge tag 'ms_update/v2' from seastar-dev.git From Asias: In `f27e5d2a6` (messaging_service: Delay listening ms during boot up), messaging_service startup is splitted into two stages. Adjust the api registration code and fix up the messaging_service stop code.	2016-06-08 10:25:14 +02:00
Paul McGuire	8326fe7760	Clean up idl-compiler pyparsing usage This patch makes a few minor improvements in the parser: - merge first and rest into 2-argument form of Word to define identifier – should give some performance boost, simpler code - replace Literal(keyword_string) with Keyword(keyword_string) throughout - stricter parsing, avoids misinterpreting identifiers with keywords - replace expr.setResultsName("name") with expr("name") throughout – this is a style change (no actual change in underlying parser behavior), but I find this form easier to follow - add calls to setName to make exceptions more readable Message-Id: <005901d1bbd2$711f7bb0$535e7310$@austin.rr.com>	2016-06-08 08:13:05 +03:00
Asias He	b36d3be5d4	messaging_service: Fix messaging_service::stop There are two problems: 1. _server_tls is not stopped 2. _server and _server_tls might not be created if messaging_service::start_listen is not called yet.	2016-06-08 11:13:36 +08:00
Asias He	e6f63a50e1	main: Delay the messaging_service api registration Since messaging_service is fully initialized in storage_service::init_server which calls messaging_service::start_listen, we need to delay the messaging_service api registration after it.	2016-06-08 11:13:35 +08:00
Asias He	f7d25e6bae	messaging_service: Handle _server is not created in foreach_server_connection_stats It is possible _server is not created yet when foreach_server_connection_stats is called. Handle this case.	2016-06-08 11:13:35 +08:00
Gleb Natapov	9635e67a84	config: adjust boost::program_options validator to work with db::string_map Fixes #1320 Message-Id: <20160607064511.GX9939@scylladb.com>	2016-06-07 10:42:27 +03:00
Amnon Heiman	2cf882c365	rate_moving_average: mean_rate is not initilized The rate_moving_average is used by timed_rate_moving_average to return its internal values. If there are no timed event, the mean_rate is not propertly initilized. To solve that the mean_rate is now initilized to 0 in the structure definition. Refs #1306 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1465231006-7081-1-git-send-email-amnon@scylladb.com>	2016-06-07 09:38:58 +03:00
Vlad Zolotarov	ce08bc611c	tracing: fix debug compilation Define flush_period as a const and not as constexpr. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465240516-20128-1-git-send-email-vladz@cloudius-systems.com>	2016-06-06 22:15:27 -04:00
Avi Kivity	e25380347a	Merge "tracing: probabilistic tracing" from Vlad "This series includes some fixes to and adds a probabilistic tracing feature."	2016-06-06 11:25:18 -04:00
Benoît Canet	b508aaf0d9	docker: Add the production environment variable This variable if set to true will activate developer mode. It will be set by using the -e option of docker run. The xfs bind mount behavior and the cpuset behavior will be set by using the relevant docker command lines options and documented in the scylla/docker howto. Fixes: #1267 Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1465213713-2537-1-git-send-email-benoit@scylladb.com>	2016-06-06 16:28:17 +03:00
Benoît Canet	c771854120	docker: Start scylla on ubuntu docker Make it behave on par with redhat version Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1465218003-2740-1-git-send-email-benoit@scylladb.com>	2016-06-06 16:27:03 +03:00
Vlad Zolotarov	0611417c76	api::storage_service: add set_trace_probability/get_trace_probability Trace probability defines a probability for the next CQL command to be traced. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 15:44:28 +03:00
Vlad Zolotarov	905190ac06	tracing: add support for probabilistic tracing Add a support for defining a probability (a value in a [0,1] range) for tracing the next CQL request. Traces for requests that are chosen to be traced due to this feature are not going to flushed immediately. Use std::subtract_with_carry_engine (implements the "lagged Fibonacci" algorithm) random number engine for fastest generation of random integer values. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 15:41:01 +03:00
Vlad Zolotarov	779ff88c76	tracing: add flush timer Flush pending sessions to the storage every 2 seconds. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 14:34:08 +03:00
Vlad Zolotarov	4b008ac5ea	tracing: rework maximum sessions amount back pressure strategy A tracing session life cycle includes 3 stages: 1) Active: when new trace records are being added to this session. 2) Pending for flushing to a storage: when session is over but not yet flushed to the storage ("backend"). 3) Flushing: when session's records are being flushed to the storage and this process is not yet completed. Sessions may accumulate in each of the stages above and we should limit the maximum amount of sessions being accumulated in each of them in order to avoid OOM situation. Current in-tree implementation only limits the number of tracing sessions accumulated in the first ("Active") stage. Since currently every closing session is being immediately flushed (as long as "settraceprobability" is not implemented) the second stage never accumulates tracing sessions. The third stage is currently not controlled at all and if, for instance, we succeed to push enough tracing session towards a slow storage backend, they may accumulate there consuming an uncontrolled amount of memory and may eventually consume all of it. This patch fixes this unpleasant situation by implying the following strategy: - Limit the total amount of accumulated tracing sessions in all stages above together by a static value - 2 times "flush threshold". "2 times" is needed to allow new tracing sessions to accumulate in the stage 2 while sessions in the stage 3 are still being processed. - Forcefully flush sessions in the stage 2 to the storage when their count reaches a "flush threshold". This would ensure that there will not more than totally (2 * "flush threshold") sessions (in any stage) on each shard. An advantage of this strategy is its simplicity - we only need a single threshold to control all stages. If we feel that we needed a finer graining for each stage we may add separate limits for each of them in the future. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 13:50:41 +03:00
Pekka Enberg	b407031401	Update scylla-ami submodule * dist/ami/files/scylla-ami 72ae258...863cc45 (3): > Move --cpuset/--smp parameter settings from scylla_sysconfig_setup to scylla_ami_setup > convert scylla_install_ami to bash script > 'sh -x -e' is not valid since all scripts converted to bash script, so remove them	2016-06-06 13:37:21 +03:00
Vlad Zolotarov	35402b965f	service/client_state: don't try to dereference a tracing state if it's not initialized Call for a tracing::tracing::create_session() doesn't promise a session creation. Check that the session is actually created before trying to use it. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 13:00:31 +03:00
Vlad Zolotarov	139fa9d1bd	tracing: minor cleanups - Make small functions on a fast path "inline". - Add "const" qualifier where needed. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 13:00:31 +03:00
Avi Kivity	961e80ab74	Be more conservative when deciding when to shut down due to disk errors Currently we only shut down on EIO. Expand this to shut down on any system_error. This may cause us to shut down prematurely due to a transient error, but this is better than not shutting down due to a permanent error (such as ENOSPC or EPERM). We may whitelist certain errors in the future to improve the behavior. Fixes #1311. Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>	2016-06-06 10:56:34 +02:00
Raphael S. Carvalho	17b56eb459	compaction: leveled: improve log message for overlapping table Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <2dcbe3c8131f1d88a3536daa0b6cdd25c6e41d76.1464883077.git.raphaelsc@scylladb.com>	2016-06-05 18:20:01 +03:00
Raphael S. Carvalho	588ce915d6	compaction: disable parallel compaction for leveled strategy It was discussed that leveled strategy may not benefit from parallel compaction feature because almost all compaction jobs will have similar size. It was also found that leveled strategy wasn't working correctly with it because two overlapping sstable (targetting the same level) could be created in parallel by two ongoing compaction. Fixes #1293. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <60fe165d611c0283ca203c6d3aa2662ab091e363.1464883077.git.raphaelsc@scylladb.com>	2016-06-05 18:20:00 +03:00
Amnon Heiman	5f84e55bf6	histogram: total need to be increment on plus operator The total counter (the one that count the actual number of sample points) should be incremented when adding histograms. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1464172277-4251-1-git-send-email-amnon@scylladb.com>	2016-06-05 12:09:36 +03:00
Tomasz Grabiec	57413618e8	Merge branch 'range-tombstone-v9' from https://github.com/duarten/scylla.git From Duarte: This patchset adds the range_tombstone_list data structure, used to hold a set of disjoint range tombstones, and changes the internal representation of row tombstones to use that data structure. Fixes #1155 [tgrabiec: Added compound_wrapper::make_empty(const schema&) overload to fix compilation failure in tracing code]	2016-06-02 22:17:17 +02:00
Raphael S. Carvalho	3f4500cb71	db: compaction strategy changes via alter table must have immediate effect At the moment, compaction strategy changes via ALTER TABLE have no effect until node restart. Tomek says: "Statements of the following form should have immediate effect: ALTER TABLE t WITH compaction = { 'class' : 'LeveledCompactionStrategy' };" Fixes #877. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <3b72c494f887643b82a272ef0a9995edb970382c.1464726828.git.raphaelsc@scylladb.com>	2016-06-02 16:59:50 +02:00
Pekka Enberg	d03f65d94e	database: Don't use std::cbegin() and std::cend() They're not supported by GCC 4.9. Fixes #1305 Message-Id: <1464877984-27856-1-git-send-email-penberg@scylladb.com>	2016-06-02 16:57:24 +02:00
Duarte Nunes	c970d682d1	storage_service: Announce range tombstones feature This patch enables the RANGE_TOMBSTONES supported feature, meaning that the node is capable of accepting row entry tombstones as range tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	70083efee2	sstables: Read and write range tombstone bounds This patch uses the composite_marker to add inclusiveness information to the prefixes of a range tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	7628e403a3	sstables: Drop code for tombstone merging Since Scylla now supports proper range tombstones, the code for reading ranges from sstables and converting them to overlapping tombstones is no longer necessary, and is, in fact, wasteful as the internal representation converts overlapping tombstones back to ranges. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	79bff2742f	random_mutation_generator: Generate range tombstones Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	95594b8171	mutations: Encapsulate row tombstones difference This patch moves the difference between two mutation_partition's row_tombstones inside the range_tombstone_list. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	91aac30f12	mutations: Row tombstones are now a set of ranges This patch changes the type of the mutation partition's row_tombstones to be a range_tombstone_list, so that they are now represented as a set of disjoint ranges. All of its usages are updated accordingly. Fixes #1155 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	e46537b7d3	storage_service: Include range tombstones feature This patch adds the range tombstones feature, which is not enabled yet, to the storage_service, so that consumers can query for it. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	17a544c4a6	gossip: Add feature default ctor and operator= This allows a feature to be declared and initialized later. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	2c82dcd309	gossip: Decouple feature lifetime from the gossiper This patch changes the gms::feature destructor so it checks whether the gossiper has been stopped before trying to unregister the feature. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	351aaf9738	range_tombstone: Introduce range_tombstone_to_prefix_tombstone_converter This patch extracts the code from sstables/partition.cc which is used to transform a set of range tombstones into a set of overlapping scylladb tombstones. The range_tombstone_merger will be used to send mutations to nodes not yet updated to support the internal range tombstone representation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	f7809bcaef	range_tombstone_list: Add unit test Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	284bb6b66f	range_tombstone_list: Make it ReversiblyMergeable This patch implements the ReversiblyMergeable cancellative monoid for the range_tombstone_list. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	86030885c8	mutations: Introduce range tombstone list This class is responsible for representing a set of range tombstones as non-overlapping disjoint sets of range tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	6a111fdd01	mutations: Introduce the range_tombstone class This patch introduces the range_tombstone class, composed of a [start, end] pair of clustering_key_prefixes, the type of inclusiveness of each bound, and a tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	dc8319ed91	keys: Remove schema argument from make_empty An empty key is independent of the schema. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Duarte Nunes	7f8c35dd8c	idl: Add range tombstone IDL This patch adds the range tombstone IDL, preserving backwards compatibility. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00

1 2 3 4 5 ...

9516 Commits