scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Author	SHA1	Message	Date
Duarte Nunes	ec75eac37d	ring_position_exponential_vector_sharder: Take ranges by rvalue Avoids some copies. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170814093310.29200-1-duarte@scylladb.com>	2017-08-14 12:55:43 +03:00
Avi Kivity	cb2c5016ea	Merge seastar upstream * seastar 7a49ae5...edb73ab (11): > scripts: perftune.py: change the network module mode auto selection heuristic > net/tls: explicitly ignore ready future during shutdown > Use python2 explicitly as an interpreter for Python v2 scripts > peering_sharded_service: prevent over-run the container > Add link to documentation to the README.md > Add guidelines for contributing to Seastar > sharded: fix move constructor for peering_sharded_service services > Provide a convenient way to lazy-convert to string the values of pointers > tutorial: overhaul semaphores section > simple-stream: Make fragmented::write_substream return simple if possible > simple-stream: Make simple/fragmented memory output stream top level	2017-08-14 10:29:27 +03:00
Raphael S. Carvalho	050a7019b8	sstables/index_reader: fix index reader for summary entry spanning lots of keys quantity prevents index_reader from reading all index entries of a summary entry that span more than min_index_interval entries. That can happen after introduction of size-based sampling, and consequently, sstable will not be able to return a key which logical position in summary entry is beyond min_index_interval. It's ok to not use quantity because index_reader will read all indexes until either next summary entry or end of file is reached. Fixes test_sstable_conforms_to_mutation_source Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170812045821.25269-1-raphaelsc@scylladb.com>	2017-08-12 09:44:16 +03:00
Duarte Nunes	08e284a07e	combined_mutation_reader: Don't drop mutation readers This patch fixes a regression introduced in `a6b9186ca`. We should keep the readers around in case a subsequent call to fast_forward() will require them. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811160444.12795-1-duarte@scylladb.com>	2017-08-11 19:17:29 +03:00
Duarte Nunes	44b6da2e90	test.py: Add combined_mutation_reader_test Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811155017.9899-1-duarte@scylladb.com>	2017-08-11 18:54:11 +03:00
Avi Kivity	dbf8625ac9	Merge "size-based sampling for sstable summary" from Raphael "Fixes #1842." * 'size_based_sampling_v3' of github.com:raphaelsc/scylla: tests: test summary entry spanning more keys than min interval db/config: introduce sstable_summary_ratio option sstables: introduce size-based sampling for sstable summary sstables: make components_writer::offset const qualified and uint64_t sstables: make writer::offset const qualified and uint64_t	2017-08-11 18:41:45 +03:00
Duarte Nunes	e7d56884c0	list_reader_selector: Prevent infinite loop In case the readers are empty. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811153142.8926-1-duarte@scylladb.com>	2017-08-11 18:34:55 +03:00
Vladimir Krivopalov	003e8cf250	Use python2 explicitly as an interpreter for Python v2 scripts Signed-off-by: Vladimir Krivopalov <vladimir.krivopalov@gmail.com> Message-Id: <20170811032712.4362-1-vladimir.krivopalov@gmail.com>	2017-08-11 18:08:11 +03:00
Duarte Nunes	20337053ad	Don't use literal lambdas These are only available in C++17. Fixes the build after `b5460c2`. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-11 13:08:42 +02:00
Duarte Nunes	b5460c2990	Merge "Support `duration` type" from Jesse "This patch series adds support for the `duration` type in CQL, which was added to Cassandra in 3.10. As part of this work, it was necessary also to add support for the `vint` and `unsigned vint` types to the native protocol implementation, which are part of v5 of the specification. To test interactively, it is necessary to use cqlsh distributed with Cassandra, as the version we distribute does not yet support the duration type." * 'jhk/duration_protocol/v5' of https://github.com/hakuch/scylla: Support `duration` CQL native type CQL native protocol: Add support for `vint` serialization duration_test.cc: Add test for printing zero duration duration.cc: Remove nop `const` qualifier on return type Change `const` qualifier declaration order for `duration` duration.cc: Simplify range checking Rename `duration` to `cql_duration`	2017-08-11 10:56:55 +01:00
Duarte Nunes	bcf21aacc2	storage_proxy: Directly call query_nonsingular_mutations_locally Instead of duplicating the branch. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811001559.25788-1-duarte@scylladb.com>	2017-08-11 09:06:01 +03:00
Duarte Nunes	a3ee99554b	service/storage_proxy: Remove out of date comment Now that we don't go directly to reconciliation for range queries, the result isn't required to have the row and partition counts calculated (we no longer transform a reconciled_result to a query::result). Furthermore, this line was causing a lot of dtests to fail on account of them not expecting an error line in the logs. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170810225351.12610-1-duarte@scylladb.com>	2017-08-11 09:04:23 +03:00
Raphael S. Carvalho	5124f94358	tests: test summary entry spanning more keys than min interval Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 01:37:06 -03:00
Raphael S. Carvalho	872412d31a	db/config: introduce sstable_summary_ratio option Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 01:36:21 -03:00
Raphael S. Carvalho	8726ee937d	sstables: introduce size-based sampling for sstable summary Currently, a summary entry is added after min_index_interval index entries were written. Not taking into account size of index entries becomes a problem with large partitions which may create big index entries due to promoted indexes. Read performance is affected as a consequence because index entries spanned by summary are all read from disk to serve request. What we wanna do is to also add a summary entry after index reaches a boundary. To deal with oversampling, we want to write 1 byte to summary for every 2000 bytes written to data file (this will be eventually made into an option in the config file). Both conditions must be met to avoid under or oversampling. That way, the amount of data needed from index file to satify the request is drastically reduced. Fixes #1842. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 00:30:12 -03:00
Raphael S. Carvalho	da7489720b	sstables: make components_writer::offset const qualified and uint64_t Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-10 21:48:11 -03:00
Raphael S. Carvalho	881c479be8	sstables: make writer::offset const qualified and uint64_t Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-10 21:46:39 -03:00
Jesse Haber-Kucharsky	509626fe08	Support `duration` CQL native type `duration` is a new native type that was introduced in Cassandra 3.10 [1]. Support for parsing and the internal representation of the type was added in `8fa47b74e8`. Important note: The version of cqlsh distributed with Scylla does not have support for durations included (it was added to Cassandra in [2]). To test this change, you can use cqlsh distributed with Cassandra. Duration types are useful when working with time-series tables, because they can be used to manipulate date-time values in relative terms. Two interesting applications are: - Aggregation by time intervals [3]: `SELECT * FROM my_table GROUP BY floor(time, 3h)` - Querying on changes in date-times: `SELECT ... WHERE last_heartbeat_time < now() - 3h` (Note: neither of these is currently supported, though columns with duration values are.) Internally, durations are represented as three signed counters: one for months, for days, and for nanoseconds. Each of these counters is serialized using a variable-length encoding which is described in version 5 of the CQL native protocol specification. The representation of a duration as three counters means that a semantic ordering on durations doesn't exist: Is `1mo` greater than `1mo1d`? We cannot know, because some months have more days than others. Durations can only have a concrete absolute value when they are "attached" to absolute date-time references. For example, `2015-04-31 at 12:00:00 + 1mo`. That duration values are not comparable presents some difficulties for the implementation, because most CQL types are. Like in Cassandra's implementation [2], I adopted a similar strategy to the way restrictions on the `counter` type are checked. A type "references" a duration if it is either a duration or it contains a duration (like a `tuple<..., duration, ...>`, or a UDT with a duration member). The following restrictions apply on durations. Note that some of these contexts are either experimental features (materialized views), or not currently supported at run-time (though support exists in the parser and code, so it is prudent to add the restrictions now): - Durations cannot appear in any part of a primary key, either for tables or materialized views. - Durations cannot be directly used as the element type of a `set`, nor can they be used as the key type of a `map`. Because internal ordering on durations is based on a byte-level comparison, this property of Cassandra was intended to help avoid user confusion around ordering of collection elements. - Secondary indexes on durations are not supported. - "Slice" relations (<=, <, >=, >) are not supported on durations with `WHERE` restrictions (like `SELECT ... WHERE span <= 3d`). Multi-column restrictions only work with clustering columns, which cannot be `duration` due to the first rule. - "Slice" relations are not supported on durations with query conditions (like `UPDATE my_table ... IF span > 5us`). Backwards incompatibility note: As described in the documentation [4], duration literals take one of two forms: either ISO 8601 formats (there are three), or a "standard" format. The ISO 8601 formats start with "P" (like "P5W"). Therefore, identifiers that have this form are no longer supported. Fixes #2240. [1] https://issues.apache.org/jira/browse/CASSANDRA-11873 [2] `bfd57d13b7` [3] https://issues.apache.org/jira/browse/CASSANDRA-11871 [4] http://cassandra.apache.org/doc/latest/cql/types.html#working-with-durations	2017-08-10 15:01:10 -04:00
Jesse Haber-Kucharsky	91dab1d998	CQL native protocol: Add support for `vint` serialization Version 5 of the native protocol for CQL [1] adds the `vint` and `unsigned vint` types. An unsigned integer encoded as a `vint` has a variable size based on the magnitude of the value. The first byte indicates the total number of bytes. For signed integers, a "zig-zag" encoding scheme ensures that small negative values are encoded as short-length `vint`s (0 -> 0, -1 -> 1, 1 -> 2, 2 -> 3, -2 -> 4, etc). [1] https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec	2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky	77489f843f	duration_test.cc: Add test for printing zero duration It's somewhat counter-intuitive, but Cassandra also formats zero-valued duration values as an empty string.	2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky	d9c027c2dd	duration.cc: Remove nop `const` qualifier on return type These have no effect according to the Clang static analyzer.	2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky	54c3cf0201	Change `const` qualifier declaration order for `duration` The vast majority of the code-base is written in left-`const` style, and consistency is important.	2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky	1889b036b1	duration.cc: Simplify range checking	2017-08-10 14:11:23 -04:00
Avi Kivity	301358e440	Merge "Optimize combined_mutation_reader for disjoint sstable ranges" from Botond "sstables will sometimes have narrow/disjont ranges (e.g. LCS L1+). This can be exploited when reading from a range of sstables by opening sstables on-demand thus saving memory, processing and potentially I/O. To achieve this combined_mutation_reader is refactored such that the reader selection logic is moved-out into a reader_selector class. combined_mutation_reader now takes a reader_selector instance in its constructor and asks it for new readers for the current ring position on every call to operator()(). At the moment two specializations of reader_selector are provided: * list_reader_selector which implements the current logic, that is using a provided mutation_reader list, and * incremental_reader_selector which implements the on-demand opening logic discussed above. Fixes #1935" * 'bdenes/optimize_combined_reader-v6' of https://github.com/denesb/scylla: Add combined_mutation_reader_test unit test Remove range_sstable_reader Add incremental_reader_selector Add reader_selector to combined_mutation_reader sstable_set::incremental_selector: select() now returns a selection	2017-08-10 15:16:30 +03:00
Botond Dénes	9ee9988097	Add combined_mutation_reader_test unit test	2017-08-10 12:38:10 +03:00
Botond Dénes	3e97a5cd6b	Remove range_sstable_reader range_sstable_reader is replaced with combined_mutation_reader, using the incremental_reader_selector.	2017-08-10 12:38:10 +03:00
Botond Dénes	bfc74f1312	Add incremental_reader_selector incremental_reader_selector is a specialization of reader_selector for the case when sstables have narrow and/or disjoint token ranges. To exploit this it creates new readers on-demand when their sstable's token range intersects with the current ring position.	2017-08-10 12:38:02 +03:00
Botond Dénes	a6b9186cab	Add reader_selector to combined_mutation_reader combined_mutation_reader now accepts as a constructor argument a reader_selector instance whoose task is to create new readers on each call to operator()() if needed and possible. This way it is possible to control how readers are created through different specializations of reader_selector. The previous logic is refactored into list_reader_selector which is using a pre-provided mutation_reader list and forwards all of them to combined_mutation_reader at once.	2017-08-10 12:37:40 +03:00
Takuya ASADA	1cb0fff146	dist/common/scripts/scylla_raid_setup: handle '--disks' parameter correctly when disk list is end with ',' We should handle parameters correctly even it's malformed. Fixes #2402 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1499266239-27551-1-git-send-email-syuu@scylladb.com>	2017-08-10 11:42:33 +03:00
Takuya ASADA	8e115d69a9	dist/debian: append postfix '~DISTRIBUTION' to scylla package version We are moving to aptly to release .deb packages, that requires debian repository structure changes. After the change, we will share 'pool' directory between distributions. However, our .deb package name on specific release is exactly same between distributions, so we have file name confliction. To avoid the problem, we need to append distribution name on package version. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1502312935-22348-1-git-send-email-syuu@scylladb.com>	2017-08-10 10:53:56 +03:00
Vlad Zolotarov	1b4594b03a	transport::server::process_prepare() don't ignore errors on other shards If storing of the statement fails on any shard we should fail the whole PREPARE request. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1502325392-31169-13-git-send-email-vladz@scylladb.com>	2017-08-10 10:32:37 +03:00
Jesse Haber-Kucharsky	352e9f60ba	Rename `duration` to `cql_duration` `std::chrono::duration` is a prolific enough name that it's best to disambiguate.	2017-08-09 15:15:20 -04:00
Botond Dénes	94fc550e68	sstable_set::incremental_selector: select() now returns a selection A seletion contains - in addition to the list of sstables - a next_token which is a hint as to what is the next best token to call select() with. This should be the smallest token such that at the next call to select() the least number of new sstables will be returned, without skipping any.	2017-08-09 16:27:33 +03:00
Takuya ASADA	3077416ecc	dist/debian: Backport scalability fix of _Unwind_Find_FDE to out gcc for Debian 8 Since we provide custom build gcc only for Debian 8, the fix is not apply to Ubuntu/Debian 9. Fixes #2646 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1502239191-12649-1-git-send-email-syuu@scylladb.com>	2017-08-09 12:19:52 +03:00
Avi Kivity	7217b7ab36	Merge "Use range_streamer everywhere" from Asias "With this series, all the following cluster operations: - bootstrap - rebuild - decommission - removenode will use the same code to do the streaming. The range_streamer is now extended to support both fetch from and push to peer node. Another big change is now the range_streamer will stream less ranges at a time, so less data, per stream_plan and range_streamer will remember which ranges are failed to stream and can retry later. The retry policy is very simple at the moment it retries at most 5 times and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes .... Later, we can introduce api for user to decide when to stop retrying and the retry interval. The benefits: - All the cluster operation shares the same code to stream - We can know the operation progress, e.g., we can know total number of ranges need to be streamed and number of ranges finished in bootstrap, decommission and etc. - All the cluster operation can survive peer node down during the operation which usually takes long time to complete, e.g., when adding a new node, currently if any of the existing node which streams data to the new node had issue sending data to the new node, the whole bootstrap process will fail. After this patch, we can fix the problematic node and restart it, the joining node will retry streaming from the node again. - We can fail streaming early and timeout early and retry less because all the operations use stream can survive failure of a single stream_plan. It is not that important for now to have to make a single stream_plan successful. Note, another user of streaming, repair, is now using small stream_plan as well and can rerun the repair for the failed ranges too. This is one step closer to supporting the resumable add/remove node opeartions." * tag 'asias/use_range_streamer_everywhere_v4' of github.com:cloudius-systems/seastar-dev: storage_service: Use the new range_streamer interface for removenode storage_service: Use the new range_streamer interface for decommission storage_service: Use the new range_streamer interface for rebuild storage_service: Use the new range_streamer interface for bootstrap dht: Extend range_streamer interface	2017-08-09 10:00:25 +03:00
Takuya ASADA	98fc7b376d	dist/redhat: install mdadm/xfsprogs on package install time We experienced 'Constructing RAID volume...' takes too much time on some AMIs, this is because setup script stuck at 'yum -y install mdadm xfsprogs'. We don't have to install these packages on AMI startup time, we should preinstall them on AMI creating time. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1502192796-21040-1-git-send-email-syuu@scylladb.com>	2017-08-09 09:10:34 +03:00
Piotr Jastrzebski	4137517cdc	Check arguments of table_helper::setup_keyspace to make sure all table helpers passed as arguments are for the right keyspace. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <10edacd509880bb18180f13e8c28593d068c5c7b.1501688729.git.piotr@scylladb.com>	2017-08-08 15:55:06 +03:00
Piotr Jastrzebski	2d8a80f211	Make table_helper constructor safer by taking keyspace name by value and storing it inside the object. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <a5dab41647348ae311e023fe5592aec650c6e32a.1501688729.git.piotr@scylladb.com>	2017-08-08 15:55:06 +03:00
Daniel Fiala	06089474c9	Print warning if user uses default cluster_name * Configuration for cluster_name is commented-out in config file. * Default value set to empty string and if not rewritten by user then warning is printed and value is reset to "ScyllaDB Cluster". Fixes #2648. Message-Id: <20170808113322.9313-1-daniel@scylladb.com>	2017-08-08 14:47:17 +03:00
Avi Kivity	a71138fc84	config: mark column_index_size_in_kb as Used Fixes #2681 Message-Id: <20170808100415.16296-1-avi@scylladb.com>	2017-08-08 11:08:00 +01:00
Ultrabug	2022da2405	Add overall python code QA and guidelines with flake8 ScyllaDB loves python & python loves ScyllaDB. It would benefit the project to start enforcing some code guidelines and basic QA with a linter along a PEP8 respect thanks to flake8. This patch adds a tox config to at least start with an assessment of the work to be done on all .py files in the code base. To reduce its noise, tests on long lines (> 80char) are ignored for now. Signed-off-by: Ultrabug <ultrabug@gentoo.org> Message-Id: <20170726134242.8927-1-ultrabug@gentoo.org>	2017-08-08 11:15:45 +03:00
Raphael S. Carvalho	dddbd34b52	sstables: close index file when sstable writer fails index's file output stream uses write behind but it's not closed when sstable write fails and that may lead to crash. It happened before for data file (which is obviously easier to reproduce for it) and was fixed by `0977f4fdf8`. Fixes #2673. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170807171146.10243-1-raphaelsc@scylladb.com>	2017-08-08 09:53:14 +03:00
Asias He	49360992d9	storage_service: Use the new range_streamer interface for removenode So that removenode operation will now stream small ranges at a time and restream the failed ranges.	2017-08-07 16:31:48 +08:00
Asias He	6b8dc85f12	storage_service: Use the new range_streamer interface for decommission So that decommission operation will now stream small ranges at a time and restream the failed ranges.	2017-08-07 16:31:48 +08:00
Asias He	24584b8509	storage_service: Use the new range_streamer interface for rebuild So that rebuild operation will now stream small ranges at a time and restream the failed ranges.	2017-08-07 16:31:47 +08:00
Asias He	f239b11a84	storage_service: Use the new range_streamer interface for bootstrap So that bootstrap operation will now stream small ranges at a time and restream the failed ranges.	2017-08-07 16:31:47 +08:00
Asias He	6810031ba7	dht: Extend range_streamer interface After this patch and the following patches to use the new range_streamder interface, all the following cluster operations: - bootstrap - rebuild - decommission - removenode will use the same code to do the streaming. The range_streamer is now extended to support both fetch from and push to peer node. Another big change is now the range_streamer will stream less ranges at a time, so less data, per stream_plan and range_streamer will remember which ranges are failed to stream and can retry later. The retry policy is very simple at the moment it retries at most 5 times and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes .... Later, we can introduce api for user to decide when to stop retrying and the retry interval. The benefits: - All the cluster operation shares the same code to stream - We can know the operation progress, e.g., we can know total number of ranges need to be streamed and number of ranges finished in bootstrap, decommission and etc. - All the cluster operation can survive peer node down during the operation which usually takes long time to complete, e.g., when adding a new node, currently if any of the existing node which streams data to the new node had issue sending data to the new node, the whole bootstrap process will fail. After this patch, we can fix the problematic node and restart it, the joining node will retry streaming from the node again. - We can fail streaming early and timeout early and retry less because all the operations use stream can survive failure of a single stream_plan. It is not that important for now to have to make a single stream_plan successful. Note, another user of streaming, repair, is now using small stream_plan as well and can rerun the repair for the failed ranges too. This is one step closer to supporting the resumable add/remove node opeartions.	2017-08-07 16:31:47 +08:00
Avi Kivity	86de6cc7fb	Merge seastat upstream * seastar f14d2a3...7a49ae5 (8): > sharded: improve support for cooperating sharded<> services > sharded: support for peer services > semaphore: add a version of with_semaphore that takes a duration timeout > scripts: perftune.py: fix the CPU mask generation for more than 64 CPUs > Revert "future-utils: make when_all() (vector variant) exception safe" > Revert "future-utils: fix gross compilation errors in when_all()" > future-utils: fix gross compilation errors in when_all() > future-utils: make when_all() (vector variant) exception safe Includes change to batchlog_manager constructor to adapt it to seastar::sharded::start() change.	2017-08-06 17:47:47 +03:00
Avi Kivity	3edec66903	Revert "repair: Make send_repair_checksum_range timeout" This reverts commit `98757069a5`. We have the failure detector which will detect an unresponsive node and fail the RPC. Adding a timeout can just introduce false positives.	2017-08-06 13:09:36 +03:00
Avi Kivity	621926d914	dist: debian: escape "$" character for make	2017-08-05 16:51:03 +03:00

1 2 3 4 5 ...

12879 Commits