scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-09 01:11:57 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	d98f013b07	tests: Extract simple_schema	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	01374c41f2	sstables: Move workaround for out-of-order range tombstones to mp_row_consumer This is a preliminary step before adding support for fast-forwarding to mp_row_consumer, so that range handling can be solely in mp_row_consumer rather than split between it and sstable_streamed_mutation. This also alleviates #2080 by reading all tombstones only up to the first row, after that range tombstones are treated like other fragments.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	d41a7c5eb4	sstables: Drop default mp_row_consumer constructor	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	56f1ad7841	sstables: Swap order of values in "proceed" so that "no" is assigned 0	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	58c29be45c	util/optimized_optional: Make printable	2017-03-10 14:42:21 +01:00
Tomasz Grabiec	a32cf6c4cc	position_in_partition: Add is_static_row() in the view	2017-03-10 14:42:21 +01:00
Tomasz Grabiec	e4db643730	range_tombstone_stream: Add reset()	2017-03-10 14:42:21 +01:00
Tomasz Grabiec	48ad2e2d64	range_tombstone_stream: Add get_next(position_in_partition_view)	2017-03-10 14:42:21 +01:00
Tomasz Grabiec	084747b1ee	sstables: streamed_mutation: Stop reading when end of slice reached As part of this change, skip detection detection is refactored. This simplifies reasoning about mp_row_consumer's state a bit because now is_mutation() is not reset externally and only depends on current position of the reader. It will prove useful when we extend mutation reader to decide if it should skip to the next partition up front before calling _context.read(), so that we can for instance skip using index instead. Fixes #2088.	2017-03-10 14:42:19 +01:00
Tomasz Grabiec	55358cacc5	sstables: Switch is_in_range() to position_in_partition Makes it immune to #1446 and is a prerequisite for implementing forwarding in mp_row_consumer.	2017-03-09 21:15:11 +01:00
Paweł Dziepak	aaae8db033	loggers should not have external linkage Message-Id: <20170309111034.20929-1-pdziepak@scylladb.com>	2017-03-09 12:27:20 +01:00
Gleb Natapov	d34f3a0440	batchlog: introduce batch_size_fail_threshold_in_kb option Add batch_size_fail_threshold_in_kb to prevent huge batch from been applied and causing troubles. Also do not warn or fail if only one partition is affected. Fixes: #2128 Message-Id: <20170309111247.GE8197@scylladb.com>	2017-03-09 12:20:17 +01:00
Amnon Heiman	7b04841dda	main: Name the http servers In main there are two http servers that start, the API and prometheus. This patch name them accordingly so their metrics will have more meaning. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1489055282-10887-1-git-send-email-amnon@scylladb.com>	2017-03-09 12:30:49 +02:00
Glauber Costa	a7b0a899a3	dist: don't execute dpdk scripts if not in dpdk mode The scripts are not liking very much being executed inside docker. Since we don't really need those variables set outside DPDK scenarios, just don't set them. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <1488823691-9014-1-git-send-email-glauber@scylladb.com>	2017-03-09 11:40:08 +02:00
Avi Kivity	efd96a448c	Merge "Add execution stages" from Paweł "These patches introduce execution stages to Scylla in order to improve icache friendliness. The places were stages are added are not chosen very carefully and rather introduced at between different subsystems: cql, storage proxy and database. This already results in a rather significant improvement and can be tuned later if necessary. Performance results: perf_simple_query -c4 --duration 60 (medians) before after diff write 83017.75 242876.04 +192.6% read 61709.16 168258.26 +172.7% The real life improvements aren't as good because it is much harder to collect sufficiently high number of operations in a batch." Additional benchmarking from Paweł: "I did some tests on my local setup. * Latency at light loads Scylla running on 16 logical CPUs (8 cores) with 64 GB of RAM. cassandra-stress -rate threads=32 write latency master seda median 1.2 0.6 95th 1.6 0.8 99th 1.7 0.9 99.9th 2.5 1.3 max 26.4 24.2 Flags '--poll-mode' and '--defragment-memory-on-idle false' didn't improve situation for master. See also attached graph write_99.svg and write_999.svg. read latency master seda median 0.8 0.6 95th 1.0 0.9 99th 1.1 1.0 99.9th 1.4 1.2 max 18.5 18.0 See also attached graph read_99.svg and read_999.svg. * Server 100% loaded, dataset fitting in memory (throughput) Scylla running on 2 cores with 64 GB of RAM. 4x scylla-bench with the uniform workload (concurrency of each s-b: 512 for writes, 256 for reads). There were no cache misses during reads. master seda diff writes 107722.4 168482.26 +56.4% reads 51049.48 76158.19 +49.2% * Server 100% loaded, writes being flushed and compacted (throughput) Scylla running on 2 cores with 4 GB of RAM. 4x scylla-bench with the uniform workload, concurrency 256 each. master seda diff writes 79575.77 114206.11 +43.5% See attached graph: writes_with_flushes_and_compaction.png (first run: master, second: seda)." * tag 'pdziepak/scylla-execution-stages/v1-rebased' of github.com:cloudius-systems/seastar-dev: transport: make process_request_one() an execution stage mutation_query: add an execution stage db: make database::query() an execution stage db: make apply an execution stage storage_proxy: make mutate() an execution stage cql3: make batch statement an execution stage cql3: make modification statement an execution stage cql3: make select statement an execution stage mutation_reader: make mutation_source nothrow movable	2017-03-09 11:29:43 +02:00
Paweł Dziepak	74f35864ef	transport: make process_request_one() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	a78501c206	mutation_query: add an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	b5f0e590be	db: make database::query() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	38c1501f4d	db: make apply an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	cfde2ad5b4	storage_proxy: make mutate() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	827357cb08	cql3: make batch statement an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	dce785089a	cql3: make modification statement an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	d005b20071	cql3: make select statement an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	12135dbe21	mutation_reader: make mutation_source nothrow movable	2017-03-09 09:27:43 +00:00
Amnon Heiman	4e8d73098f	main: Prometheus should start as early as possible There is no need to wait when starting the prometheus server. As it is up to each of the modules to register its metrics when it is ready. This is especially important when debuging boot issues. This patch moves the prometheus initilization to be done at an early stage of the boot sequencec. Fixes #2144 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1489041986-28974-1-git-send-email-amnon@scylladb.com>	2017-03-09 11:26:51 +02:00
Asias He	39d2e59e7e	repair: Fix midpoint is not contained in the split range assertion in split_and_add We have: auto halves = range.split(midpoint, dht::token_comparator()); We saw a case where midpoint == range.start, as a result, range.split will assert becasue the range.start is marked non-inclusive, so the midpoint doesn't appear to be contain()ed in the range - hence the assertion failure. Fixes #2148 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Asias He <asias@scylladb.com> Message-Id: <93af2697637c28fbca261ddfb8375a790824df65.1489023933.git.asias@scylladb.com>	2017-03-09 09:09:17 +01:00
Avi Kivity	b8e4113dba	Merge seastar upstream * seastar 5861f99...84a0b70 (13): > build: don't error out on [[deprecated]] APIs > Merge "Introduce execution stages" from Paweł > Remove unused include statement > http: catch and count errors in read and respond > Merge "Adding metrics configuration" from Amnon > future: add concepts for map_reduce(), when_all_succeed() > doxygen: exclude c-ares directory > scripts/posix_net_conf.sh: add --use-cpu-mask option > file: take flush into account when calculating size for truncate in optimize_queue() > Fixing the prometheus cleanup patch > Merge "posix_net_conf.sh: better distribute ingress processing" from Vlad > prometheus: code clean up > future: relax finally() constraints even more	2017-03-08 20:02:05 +02:00
Tomasz Grabiec	abf8e83c8d	gdb: Cast gdb.Values to int Fails with newer GDB with: TypeError: %x format: an integer is required, not gdb.Value Message-Id: <1488981412-22279-1-git-send-email-tgrabiec@scylladb.com>	2017-03-08 19:43:48 +02:00
Paweł Dziepak	6db6d25f66	Merge "Avoid loosing changes to keyspace parameters of system_auth and tracing keyspaces" form Tomek "If a node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129." * tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev: db: Create default auth and tracing keyspaces using lowest timestamp migration_manager: Append actual keyspace mutations with schema notifications	2017-03-08 10:59:47 +00:00
Nadav Har'El	506e074ba4	sstable decompression: fix skip() to end of file The skip() implementation for the compressed file input stream incorrectly handled the case of skipping to the end of file: In that case we just need to update the file pointer, but not skip anywhere in the compressed disk file; In particular, we must NOT call locate() to find the relevant on-disk compressed chunk, because there is none - locate() can only be called on actual positions of bytes, not on the one-past-end-of-file position. Fixes #2143 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170308100057.23316-1-nyh@scylladb.com>	2017-03-08 12:35:05 +02:00
Tomasz Grabiec	d6425e7646	db: Create default auth and tracing keyspaces using lowest timestamp If the node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129.	2017-03-07 19:19:15 +01:00
Tomasz Grabiec	06d4ad1bdd	migration_manager: Append actual keyspace mutations with schema notifications There is a workaround for notification race, which attaches keyspace mutations to other schema changes in case the target node missed the keyspace creation. Currently that generated keyspace mutations on the spot instead of using the ones stored in schema tables. Those mutations would have current timestamp, as if the keyspace has been just modified. This is problematic because this may generate an overwrite of keyspace parameters with newer timestamp but with stale values, if the node is not up to date with keyspace metadata. That's especially the case when booting up a node without enabling auto_bootstrap. In such case the node will not wait for schema sync before creating auth tables. Such table creation will attach potentially out of date mutations for keyspace metadata, which may overwrite changes made to keyspace paramteters made earlier in the cluster. Refs #2129.	2017-03-07 19:19:15 +01:00
Avi Kivity	1b5ba63676	sstable: fix unhandled exception in atomic_deletion_manager::delete_atomically() The current code is assymetric: the first N-1 shards to delete a set receive a synthetic future to wait on, while the last deletion receives the result of the delete operation (which also broadcasts completion to the first N-1 operations. This results, in case of an error, with the Nth future being reported as an unhandled error. Fix by making everything symmetric: all N callers receive a synthetic future. Nobody waits for the deletion operation (which still broadcasts its completion to all waiters, so errors are not lost). Message-Id: <20170305151607.14264-1-avi@scylladb.com>	2017-03-07 12:41:12 +02:00
Avi Kivity	439b38f5ab	Merge "Improvements to counter implementation" from Paweł "This series adds various optimisations to counter implementation (nothing extreme, mostly just avoiding unnecessary operations) as well as some missing features such as tracing and dropping timed out queries. Performance was tested using: perf-simple-query -c4 --counters --duration 60 The following results are medians. before after diff write 18640.41 33156.81 +77.9% read 58002.32 62733.93 +8.2%" * tag 'pdziepak/optimise-counters/v3' of github.com:cloudius-systems/seastar-dev: (30 commits) cell_locker: add metrics for lock acquisition storage_proxy: count counter updates for which the node was a leader storage_proxy: use counter-specific timeout for writes storage_proxy: transform counter timeouts to mutation_write_timeout_exception db: avoid allocations in do_apply_counter_update() tests/counters: add test for apply reversability counters: attempt to apply in place atomic_cell: add COUNTER_IN_PLACE_REVERT flag counters: add equality operators counters: implement decrement operators for shard_iterator counters: allow using both views and mutable_views atomic_cell: introduce atomic_cell_mutable_view managed_bytes: add cast to mutable_view bytes: add bytes_mutable_view utils: introduce mutable_view db: add more tracing events for counter writes db: propagate tracing state for counter writes tests/cell_locker: add test for timing out lock acquisition counter_cell_locker: allow setting timeouts db: propagate timeout for counter writes ...	2017-03-07 11:48:13 +02:00
Tomasz Grabiec	ecfa9e40de	Merge 'duarte/lsa/hist-cleanup/v2' from github.com:duarten/scylla histogram cleanups from Duarte.	2017-03-07 10:33:50 +01:00
Gleb Natapov	5c4158daac	memtable: do not yield while holding reclaim_lock Holding reclaim_lock while yielding may cause memory allocations to fail. Fixes #2139 Message-Id: <20170306153151.GA5902@scylladb.com>	2017-03-06 17:24:22 +01:00
Gleb Natapov	d7bdf16a16	memtable: do not open code logalloc::reclaim_lock use logalloc::reclaim_lock prevents reclaim from running which may cause regular allocation to fail although there is enough of free memory. To solve that there is an allocation_section which acquire reclaim_lock and if allocation fails it run reclaimer outside of a lock and retries the allocation. The patch make use of allocation_section instead of direct use of reclaim_lock in memtable code. Fixes #2138. Message-Id: <20170306160050.GC5902@scylladb.com>	2017-03-06 17:24:22 +01:00
Avi Kivity	1af9e3a5cb	Merge "database: fix the 'nodetool clearsnapshot'" from Vlad "Work on this series started with fixing the 'nodetool clearsnapshot'. The current master code ignores the snapshots in deleted keyspaces (issue #2045). I noticed that in many places our code has to build the path to some directory/file it simply had the sstring(<path1>) + "/" + sstring(<path2>) constructs which may cause us issues if somebody decides to complile/run scylla on not-Unix-based OS, like Microsoft Windows. I understand that this is a long shot but if we can make it right now - why not to. The answer is boost::filesystem::path class - its synchronous parts, of course. I decided to take an initiative and fix the issues above and then use the fixed code for fixing the issue #2045: - Fix some minor issues in the existing code. - Extend the lister class and move it into the separate files outside database.cc. On the way I've found an issue in the existing code (issue #2071). This series fixes this one too (PATCH2)."	2017-03-06 16:45:31 +02:00
Glauber Costa	2d620a25fb	raid script: improve test for mounted filesystem The current test for whether or not the filesystem is mounted is weak and will fail if multiple pieces of the hierarchy are mounted. util-linux ships with a mountpoint command that does exactly that, so we'll use that instead. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <1488742801-4907-1-git-send-email-glauber@scylladb.com>	2017-03-06 15:59:29 +02:00
Gleb Natapov	7f5923f510	storage_service: handle empty token list correctly boost::split() return one empty string if called on an empty input. Trying to cast an empty string to a token value results in a bad_lexical_cast exception. Fix it by handling empty token list explicitly. Message-Id: <20170302125405.GU11471@scylladb.com>	2017-03-06 15:31:33 +02:00
Takuya ASADA	6602221442	dist/redhat: enables discard on CentOS/RHEL RAID0 Since CentOS/RHEL raid module disables discard by default, we need enable it again to use. Fixes #2033 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1488407037-4795-1-git-send-email-syuu@scylladb.com>	2017-03-06 12:21:42 +02:00
Duarte Nunes	ca4f5cabd4	lsa: Extract log_histogram class Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-04 14:47:19 +01:00
Avi Kivity	24d6560fbc	Update scylla-ami submodule * dist/ami/files/scylla-ami d5a4397...eedd12f (3): > Rewrite disk discovery to handle EBS and NVMEs. > add --developer-mode option > trivial cleanup: replace tab in indent	2017-03-04 13:29:32 +02:00
Duarte Nunes	5c73978b68	thrift/handler: Enable Aggregator concept with GCC6_CONCEPT Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170303172509.16844-1-duarte@scylladb.com>	2017-03-04 13:27:16 +02:00
Duarte Nunes	2b6abd5a91	lsa: Make log_histogram more generic Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-03 17:59:17 +01:00
Duarte Nunes	3819e6d55f	lsa: log_histogram cleanups Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-03 17:09:07 +01:00
Tomasz Grabiec	22199abf50	gc_clock: Remove orphaned comment Message-Id: <1488381379-8618-1-git-send-email-tgrabiec@scylladb.com>	2017-03-02 12:56:09 +02:00
Tomasz Grabiec	6a83fe5534	Merge 'pdziepak/optimise-commitlog-entry-writer/v1' from seastar-dev.git From Paweł: These patches optimise commitlog_entry_writer so that it avoids copying column mapping, which is a particularly expensive operation. perf_simple_query -c4 --write --duration 60 (medians) before after diff write 79434.35 89247.54 +12.3% Tested with: commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup commitlog_test.py:TestCommitLog.test_commitlog_replay_with_alter_table commitlog_test.py:TestCommitLog.test_commitlog_replay_with_counters	2017-03-02 11:37:42 +01:00
Paweł Dziepak	04b80272f2	cell_locker: add metrics for lock acquisition	2017-03-02 09:05:12 +00:00
Paweł Dziepak	00b42c477f	storage_proxy: count counter updates for which the node was a leader	2017-03-02 09:05:12 +00:00

1 2 3 4 5 ...

11508 Commits