scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 12:17:02 +00:00

Author	SHA1	Message	Date
Takuya ASADA	d016dd4b74	dist: schedule daily fstrim for data directory and commitlog directory Schedule daily fstrim for data directory and commitlog directory, witch is recommended by Scylla doc: http://www.scylladb.com/doc/admin/#schedule-fstrim Fixes #1347 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1489447472-2981-1-git-send-email-syuu@scylladb.com>	2017-03-14 11:51:53 +02:00
Amnon Heiman	295a981c61	storage_proxy: metrics should have unique name Metrics should have their unique name. This patch changes throttled_writes of the queu lenght to current_throttled_writes. Without it, metrics will be reported twice under the same name, which may cause errors in the prometheus server. This could be related to scylladb/seastar#250 Fixes #2163. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20170314081456.6392-1-amnon@scylladb.com>	2017-03-14 11:19:39 +02:00
Tomasz Grabiec	ed530dfb3a	tests: sstables: Add test for skipping within a compressed stream Refs #2143.	2017-03-13 13:08:24 +01:00
Tomasz Grabiec	1e0af2efc3	Update seastar submodule * seastar 84a0b70...fd29fd0 (4): > Fix smp::submit_to() with function reference > execution_stage: add concept restraint for operator() > core/temporary_buffer: Add operator==() > map_reduce: allow reducer to take accumulated value by rref	2017-03-13 10:13:03 +01:00
Paweł Dziepak	60c6b9a240	Merge "Implement sstable_streamed_mutation::fast_forward_to()" from Tomasz "This replaces use of a generic forwarding wrapper in sstable reader with specialized implentation. Forwarding doesn't yet utilize indexes in this series, only integrates it with mp_row_consumer, which is a prerequisite. It's still an optimization, since mp_row_consumer will not try to consume past the range as it used to. Sending early for easier consumption." * tag 'tgrabiec/forwarding-of-mp-row-consumer-v2' of github.com:scylladb/seastar-dev: sstables: Remove use of forwarding wrapper sstables: Implement sstable_streamed_mutation::fast_forward_to() sstables: Extract and use clustering_ranges_walker tests: sstables: Add test for handling of repeated tombstones sstables: Extract writer parameters into config objects tests: Move as_mutation_source() helper to header tests: Extract ensure_monotonic_positions() to streamed_mutation_assertions streamed_mutation: Add streamed_mutation_returning() helper tests: mutation_source_test: Add test case for forwarding to a full range tests: simple_schema: Add fragment factories tests: Extract simple_schema sstables: Move workaround for out-of-order range tombstones to mp_row_consumer sstables: Drop default mp_row_consumer constructor sstables: Swap order of values in "proceed" so that "no" is assigned 0 util/optimized_optional: Make printable position_in_partition: Add is_static_row() in the view range_tombstone_stream: Add reset() range_tombstone_stream: Add get_next(position_in_partition_view) sstables: streamed_mutation: Stop reading when end of slice reached sstables: Switch is_in_range() to position_in_partition	2017-03-10 13:55:46 +00:00
Tomasz Grabiec	1f1b516b31	sstables: Remove use of forwarding wrapper	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	d7afab21e7	sstables: Implement sstable_streamed_mutation::fast_forward_to() Handling of forwarding is done inside mp_row_consumer, because it allows us to filter out irrelevant data sooner and thus more efficiently. Becuase static row can be now skipped as well, _skip_clustering_row was renamed to more generic _skip_in_progress.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	4750216387	sstables: Extract and use clustering_ranges_walker Extracted from mp_row_consumer.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	88ccc99017	tests: sstables: Add test for handling of repeated tombstones	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	124dde30db	sstables: Extract writer parameters into config objects Also enables users to change the default promoted index block size.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	ad1e69c4c5	tests: Move as_mutation_source() helper to header	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	6f409d367b	tests: Extract ensure_monotonic_positions() to streamed_mutation_assertions	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	dc7b93a326	streamed_mutation: Add streamed_mutation_returning() helper	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	06a964b3a0	tests: mutation_source_test: Add test case for forwarding to a full range	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	929842ad3f	tests: simple_schema: Add fragment factories	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	d98f013b07	tests: Extract simple_schema	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	01374c41f2	sstables: Move workaround for out-of-order range tombstones to mp_row_consumer This is a preliminary step before adding support for fast-forwarding to mp_row_consumer, so that range handling can be solely in mp_row_consumer rather than split between it and sstable_streamed_mutation. This also alleviates #2080 by reading all tombstones only up to the first row, after that range tombstones are treated like other fragments.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	d41a7c5eb4	sstables: Drop default mp_row_consumer constructor	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	56f1ad7841	sstables: Swap order of values in "proceed" so that "no" is assigned 0	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	58c29be45c	util/optimized_optional: Make printable	2017-03-10 14:42:21 +01:00
Tomasz Grabiec	a32cf6c4cc	position_in_partition: Add is_static_row() in the view	2017-03-10 14:42:21 +01:00
Tomasz Grabiec	e4db643730	range_tombstone_stream: Add reset()	2017-03-10 14:42:21 +01:00
Tomasz Grabiec	48ad2e2d64	range_tombstone_stream: Add get_next(position_in_partition_view)	2017-03-10 14:42:21 +01:00
Tomasz Grabiec	084747b1ee	sstables: streamed_mutation: Stop reading when end of slice reached As part of this change, skip detection detection is refactored. This simplifies reasoning about mp_row_consumer's state a bit because now is_mutation() is not reset externally and only depends on current position of the reader. It will prove useful when we extend mutation reader to decide if it should skip to the next partition up front before calling _context.read(), so that we can for instance skip using index instead. Fixes #2088.	2017-03-10 14:42:19 +01:00
Duarte Nunes	16bcf8d085	db/schema_tables: Avoid copying keyspace name This patch changes a lambda argument type so the keyspace name is passed by reference instead of copying it, in read_schema_for_keyspaces(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170309213134.10331-1-duarte@scylladb.com>	2017-03-10 11:03:56 +02:00
Duarte Nunes	d32c848d73	utils/logalloc: Change linkage of hist_options to external Change linkage of segment_descriptor_hist_options to external to keep good old GCC5 happy, despite C++11 allowing static linkage of non-type template arguments. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170309213206.10383-1-duarte@scylladb.com>	2017-03-10 11:02:51 +02:00
Tomasz Grabiec	55358cacc5	sstables: Switch is_in_range() to position_in_partition Makes it immune to #1446 and is a prerequisite for implementing forwarding in mp_row_consumer.	2017-03-09 21:15:11 +01:00
Paweł Dziepak	aaae8db033	loggers should not have external linkage Message-Id: <20170309111034.20929-1-pdziepak@scylladb.com>	2017-03-09 12:27:20 +01:00
Gleb Natapov	d34f3a0440	batchlog: introduce batch_size_fail_threshold_in_kb option Add batch_size_fail_threshold_in_kb to prevent huge batch from been applied and causing troubles. Also do not warn or fail if only one partition is affected. Fixes: #2128 Message-Id: <20170309111247.GE8197@scylladb.com>	2017-03-09 12:20:17 +01:00
Amnon Heiman	7b04841dda	main: Name the http servers In main there are two http servers that start, the API and prometheus. This patch name them accordingly so their metrics will have more meaning. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1489055282-10887-1-git-send-email-amnon@scylladb.com>	2017-03-09 12:30:49 +02:00
Glauber Costa	a7b0a899a3	dist: don't execute dpdk scripts if not in dpdk mode The scripts are not liking very much being executed inside docker. Since we don't really need those variables set outside DPDK scenarios, just don't set them. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <1488823691-9014-1-git-send-email-glauber@scylladb.com>	2017-03-09 11:40:08 +02:00
Avi Kivity	efd96a448c	Merge "Add execution stages" from Paweł "These patches introduce execution stages to Scylla in order to improve icache friendliness. The places were stages are added are not chosen very carefully and rather introduced at between different subsystems: cql, storage proxy and database. This already results in a rather significant improvement and can be tuned later if necessary. Performance results: perf_simple_query -c4 --duration 60 (medians) before after diff write 83017.75 242876.04 +192.6% read 61709.16 168258.26 +172.7% The real life improvements aren't as good because it is much harder to collect sufficiently high number of operations in a batch." Additional benchmarking from Paweł: "I did some tests on my local setup. * Latency at light loads Scylla running on 16 logical CPUs (8 cores) with 64 GB of RAM. cassandra-stress -rate threads=32 write latency master seda median 1.2 0.6 95th 1.6 0.8 99th 1.7 0.9 99.9th 2.5 1.3 max 26.4 24.2 Flags '--poll-mode' and '--defragment-memory-on-idle false' didn't improve situation for master. See also attached graph write_99.svg and write_999.svg. read latency master seda median 0.8 0.6 95th 1.0 0.9 99th 1.1 1.0 99.9th 1.4 1.2 max 18.5 18.0 See also attached graph read_99.svg and read_999.svg. * Server 100% loaded, dataset fitting in memory (throughput) Scylla running on 2 cores with 64 GB of RAM. 4x scylla-bench with the uniform workload (concurrency of each s-b: 512 for writes, 256 for reads). There were no cache misses during reads. master seda diff writes 107722.4 168482.26 +56.4% reads 51049.48 76158.19 +49.2% * Server 100% loaded, writes being flushed and compacted (throughput) Scylla running on 2 cores with 4 GB of RAM. 4x scylla-bench with the uniform workload, concurrency 256 each. master seda diff writes 79575.77 114206.11 +43.5% See attached graph: writes_with_flushes_and_compaction.png (first run: master, second: seda)." * tag 'pdziepak/scylla-execution-stages/v1-rebased' of github.com:cloudius-systems/seastar-dev: transport: make process_request_one() an execution stage mutation_query: add an execution stage db: make database::query() an execution stage db: make apply an execution stage storage_proxy: make mutate() an execution stage cql3: make batch statement an execution stage cql3: make modification statement an execution stage cql3: make select statement an execution stage mutation_reader: make mutation_source nothrow movable	2017-03-09 11:29:43 +02:00
Paweł Dziepak	74f35864ef	transport: make process_request_one() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	a78501c206	mutation_query: add an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	b5f0e590be	db: make database::query() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	38c1501f4d	db: make apply an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	cfde2ad5b4	storage_proxy: make mutate() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	827357cb08	cql3: make batch statement an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	dce785089a	cql3: make modification statement an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	d005b20071	cql3: make select statement an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	12135dbe21	mutation_reader: make mutation_source nothrow movable	2017-03-09 09:27:43 +00:00
Amnon Heiman	4e8d73098f	main: Prometheus should start as early as possible There is no need to wait when starting the prometheus server. As it is up to each of the modules to register its metrics when it is ready. This is especially important when debuging boot issues. This patch moves the prometheus initilization to be done at an early stage of the boot sequencec. Fixes #2144 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1489041986-28974-1-git-send-email-amnon@scylladb.com>	2017-03-09 11:26:51 +02:00
Asias He	39d2e59e7e	repair: Fix midpoint is not contained in the split range assertion in split_and_add We have: auto halves = range.split(midpoint, dht::token_comparator()); We saw a case where midpoint == range.start, as a result, range.split will assert becasue the range.start is marked non-inclusive, so the midpoint doesn't appear to be contain()ed in the range - hence the assertion failure. Fixes #2148 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Asias He <asias@scylladb.com> Message-Id: <93af2697637c28fbca261ddfb8375a790824df65.1489023933.git.asias@scylladb.com>	2017-03-09 09:09:17 +01:00
Avi Kivity	b8e4113dba	Merge seastar upstream * seastar 5861f99...84a0b70 (13): > build: don't error out on [[deprecated]] APIs > Merge "Introduce execution stages" from Paweł > Remove unused include statement > http: catch and count errors in read and respond > Merge "Adding metrics configuration" from Amnon > future: add concepts for map_reduce(), when_all_succeed() > doxygen: exclude c-ares directory > scripts/posix_net_conf.sh: add --use-cpu-mask option > file: take flush into account when calculating size for truncate in optimize_queue() > Fixing the prometheus cleanup patch > Merge "posix_net_conf.sh: better distribute ingress processing" from Vlad > prometheus: code clean up > future: relax finally() constraints even more	2017-03-08 20:02:05 +02:00
Tomasz Grabiec	abf8e83c8d	gdb: Cast gdb.Values to int Fails with newer GDB with: TypeError: %x format: an integer is required, not gdb.Value Message-Id: <1488981412-22279-1-git-send-email-tgrabiec@scylladb.com>	2017-03-08 19:43:48 +02:00
Paweł Dziepak	6db6d25f66	Merge "Avoid loosing changes to keyspace parameters of system_auth and tracing keyspaces" form Tomek "If a node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129." * tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev: db: Create default auth and tracing keyspaces using lowest timestamp migration_manager: Append actual keyspace mutations with schema notifications	2017-03-08 10:59:47 +00:00
Nadav Har'El	506e074ba4	sstable decompression: fix skip() to end of file The skip() implementation for the compressed file input stream incorrectly handled the case of skipping to the end of file: In that case we just need to update the file pointer, but not skip anywhere in the compressed disk file; In particular, we must NOT call locate() to find the relevant on-disk compressed chunk, because there is none - locate() can only be called on actual positions of bytes, not on the one-past-end-of-file position. Fixes #2143 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170308100057.23316-1-nyh@scylladb.com>	2017-03-08 12:35:05 +02:00
Tomasz Grabiec	d6425e7646	db: Create default auth and tracing keyspaces using lowest timestamp If the node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129.	2017-03-07 19:19:15 +01:00
Tomasz Grabiec	06d4ad1bdd	migration_manager: Append actual keyspace mutations with schema notifications There is a workaround for notification race, which attaches keyspace mutations to other schema changes in case the target node missed the keyspace creation. Currently that generated keyspace mutations on the spot instead of using the ones stored in schema tables. Those mutations would have current timestamp, as if the keyspace has been just modified. This is problematic because this may generate an overwrite of keyspace parameters with newer timestamp but with stale values, if the node is not up to date with keyspace metadata. That's especially the case when booting up a node without enabling auto_bootstrap. In such case the node will not wait for schema sync before creating auth tables. Such table creation will attach potentially out of date mutations for keyspace metadata, which may overwrite changes made to keyspace paramteters made earlier in the cluster. Refs #2129.	2017-03-07 19:19:15 +01:00
Avi Kivity	1b5ba63676	sstable: fix unhandled exception in atomic_deletion_manager::delete_atomically() The current code is assymetric: the first N-1 shards to delete a set receive a synthetic future to wait on, while the last deletion receives the result of the delete operation (which also broadcasts completion to the first N-1 operations. This results, in case of an error, with the Nth future being reported as an unhandled error. Fix by making everything symmetric: all N callers receive a synthetic future. Nobody waits for the deletion operation (which still broadcasts its completion to all waiters, so errors are not lost). Message-Id: <20170305151607.14264-1-avi@scylladb.com>	2017-03-07 12:41:12 +02:00

1 2 3 4 5 ...

11525 Commits