scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Vlad Zolotarov	6267bb63f4	tracing::tracing: move collectd metrics registration to metrics registration layer Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-01-10 16:24:54 -05:00
Avi Kivity	1ff0eef0a8	intrusive_set_external_comparator: avoid using boost::intrusive::value_traits_pointers boost::intrusive::value_traits_pointers was introduced in boost 1.56, while we also support boost 1.55. Replace with an equivalent expression. (with additions by Asias) Message-Id: <20170110084700.19994-1-avi@scylladb.com>	2017-01-10 18:16:56 +02:00
Pekka Enberg	3d0217ec43	db/schema_tables: Fix system keyspace table list Commit `f0c28e1` ("db/schema_tables: Add schema_functions and schema_aggregates tables") forgot to add the newly added tables to the db::schema_tables::ALL list, which is used for authorization checks, for example. Fixes the following auth_test.py dtest failures: ('Unable to connect to any servers', {'127.0.0.1': Unauthorized('Error from server: code=2100 [Unauthorized] message="User cathy has no SELECT permission on <table system.schema_functions> or any of its parents"',)}) Message-Id: <1484045277-4997-1-git-send-email-penberg@scylladb.com>	2017-01-10 13:55:04 +01:00
Avi Kivity	0591303b72	Merge "avoid excessive memory usage during resharding" from Rapahel "Intended to reduce memory usage when resharding by sharing sstable components among shards. File descriptors are also shared from now on, meaning that a much smaller number of file descriptors will be used during resharding. Fixes #1951." branch 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla * 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla: db: avoid excessive memory usage during resharding checked_file_impl: add support to dup sstables: group sstable components that can be shared among shards sstables: rename sstable member	2017-01-09 20:43:50 +02:00
Raphael S. Carvalho	68dfcf5256	db: avoid excessive memory usage during resharding After resharding, sstables may be owned by all shards, which means that file descriptors and memory usage for metadata will increase by a factor equal to number of shards. That can easily lead to OOM. SSTable components are immutable, so they can be stored in one shard and shared with others that need it. We use the following formula to decide which shard will open the sstable and share it with the others: (generation % smp::count), which is the inverse of how we calculate generation for new sstables. So if no resharding is performed, everything is shard-local. With this approach, resource usage due to loaded sstables will be evenly distributed among shards. For this approach to work, we now only populate keyspaces from shard 0. It's now the sole responsible for iterating through column family dirs. In addition, most of population functions are now free and take distributed database object as parameter. Fixes #1951. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-09 15:24:36 -02:00
Raphael S. Carvalho	9200e389c2	checked_file_impl: add support to dup That's needed for sstable fd sharing to work. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-09 13:33:30 -02:00
Avi Kivity	77cb2b452f	Merge "CQL 3.3.1 support" from Pekka "This patch series adds support for CQL 3.3.1. The changes to CQL are listed here: https://github.com/apache/cassandra/blob/cassandra-2.2/doc/cql3/CQL.textile#changes The following CQL features are already supported by Scylla: - TRUNCATE TABLE alias - Double-dollar string literals - Aggregate functions: MIN, MAX, SUM, and AVG This series adds the following CQL features: - New data types: tinyint, smallint, date, and time - CQL binary protocol v4 (required by the new data types) - Advertise Cassandra 2.2.8 version from Scylla so that drivers correctly detect the presence of CQL 3.3.1 The following CQL features are not supported by Scylla: - Role-based access control (issue #1941) - JSON data type - User-defined functions (UDFs) - User-defined aggregates (UDAs) The following CQL binary protocol v4 changes are not implemented by this series: - Read_failure and Write_failure error codes are not implemented. They error codes not used by the smart drivers but as they are propagated to application code, we eventually need to wire them up to our storage proxy implementation. - Function_failure error code is only used by user-defined functions and the fromJson function, which are not implemented by Scylla. Fixes #1284." * 'penberg/cql-3.3.1/v5' of github.com:cloudius-systems/seastar-dev: version: Bump Cassandra version to 2.2.8 db/schema_tables: Add schema_functions and schema_aggregates tables tests/type_tests: TIME type test cases tests/cql_query_test: TIME type test cases cql3: TIME data type support tests/type_tests: DATE type test cases tests/cql_query_test: DATE type test cases cql3: DATE type support date.h: 64-bit year and days representation licenses: Add utils/date.h license utils/date.h: Import date and time library sources tests/type_tests: TINYINT and SMALLINT type test cases tests/cql_query_test: TINYINT and SMALLINT type test cases cql3: TINYINT and SMALLINT data type support types: Fix integer_type_impl::parse_int() for bytes	2017-01-09 11:54:45 +02:00
Avi Kivity	8f36dca6f1	storage_proxy: prevent short read due to buffer size limit from being swallowed during range scan mutation_result_merger::get() assumes that the merged result may be a short read if at least one of the partial results is a short read (in other words, if none of the partial results is a short read, then the merged result is also not a short read). However this is not true; because we update the memory accounter incrementally, we may stop scanning early. All the partial results are full; but we did not scan the entire range. Fix by changing the short_read variable initialization from `no` (which assumes we'll encounter a short read indication when processing one of the batches) to `this->short_read()`, which also takes into account the memory accounter. Fixes #2001. Message-Id: <20170108111315.17877-1-avi@scylladb.com>	2017-01-09 09:21:43 +00:00
Pekka Enberg	856d0e40fb	version: Bump Cassandra version to 2.2.8 Advertise Cassandra 2.2.8 version to the drivers: CQL 3.3.1 language version and CQL binary protocol version 4 support.	2017-01-09 10:42:21 +02:00
Pekka Enberg	f0c28e1b2d	db/schema_tables: Add schema_functions and schema_aggregates tables The 3.0.3 Java driver, for example, search for the tables and fails when we advertise Cassandra 2.2 version from Scylla.	2017-01-09 10:42:21 +02:00
Pekka Enberg	10facd7db8	tests/type_tests: TIME type test cases	2017-01-09 10:42:21 +02:00
Pekka Enberg	a49ee9387e	tests/cql_query_test: TIME type test cases	2017-01-09 10:42:20 +02:00
Pekka Enberg	93e6592296	cql3: TIME data type support This adds support for the TIME data type introduced in CQL 3.3.1. Refs #1284	2017-01-09 10:42:20 +02:00
Pekka Enberg	9ceea7bbc4	tests/type_tests: DATE type test cases	2017-01-09 10:42:20 +02:00
Pekka Enberg	f0cbfb9e4f	tests/cql_query_test: DATE type test cases	2017-01-09 10:42:20 +02:00
Pekka Enberg	9def7db381	cql3: DATE type support This adds support for the DATE type introduced in CQL 3.3.1. Refs #1284	2017-01-09 10:42:20 +02:00
Pekka Enberg	f83503c09e	date.h: 64-bit year and days representation We need 64-bit year and days representation to support the boundary values of the CQL data type, which is implemented using Joda Time library's DateTime type.	2017-01-09 10:42:20 +02:00
Pekka Enberg	41df14f62d	licenses: Add utils/date.h license	2017-01-09 10:42:20 +02:00
Pekka Enberg	7f2fc6470c	utils/date.h: Import date and time library sources This patch imports the "date.h" date and time library based on the C++11 <chrono> header, which is proposed for standadization: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0355r1.html We need it to implement support for the CQL date type. Import repository https://github.com/HowardHinnant/date Import commit: commit 2935f80109b8cfc15eb1243afe35f7ec3530f971 Author: Howard Hinnant <howard.hinnant@gmail.com> Date: Sun Jan 1 15:02:08 2017 -0500 Have get_version check for the file named version first	2017-01-09 10:39:54 +02:00
Takuya ASADA	42c1e1e0e8	dist/common/systemd: run node-exporter.service as scylla user For security reason, we should run node-exporter.service as scylla user, instead of root. Fixes #1968 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1483543419-16541-1-git-send-email-syuu@scylladb.com>	2017-01-09 09:51:47 +02:00
Paweł Dziepak	3339cced05	sstables: file_writer: make write() non-virtual Noone overrides file_writer::write() so there is no reason to inhibit optimisations and cause compiler to emit indirect calls. Message-Id: <20170104163618.26251-1-pdziepak@scylladb.com>	2017-01-09 09:47:37 +02:00
Takuya ASADA	5422a8e046	dist/ubuntu: generate Ubuntu/Debian revision correctly Ubuntu Packaging Guide says if there's no upstream package (means it's not ported from Debian), revision should be "0ubuntu1", not "ubuntu1" which is we currently using. On Debian, Debian Policy Manual says it's conventional to restart revision from 1 when upstream version increased, so we should specify it to "1". To do it in single script, we will generate the revision on building time. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1483498658-27491-1-git-send-email-syuu@scylladb.com>	2017-01-09 09:45:46 +02:00
Takuya ASADA	920683a882	dist/common/scripts: add scylla_cpuscaling_setup To setup cpu scaling governor to 'performance', add new script to do it on scylla_setup. Fixes #1895 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1483542216-12195-1-git-send-email-syuu@scylladb.com>	2017-01-09 09:44:41 +02:00
Avi Kivity	97ab0d9feb	build: track system header changes too Changes to boost headers should trigger a rebuild if they change.	2017-01-08 20:49:19 +02:00
Avi Kivity	85f4e16336	main: fix incorrect low memory warning A spurious division by smp::count warns that memory is low even when plenty is available. Fix by removing the division. Fix #2002. Message-Id: <20170108122216.27233-1-avi@scylladb.com> Tested-by: Benoît Canet <benoit@scylladb.com>	2017-01-08 15:14:36 +02:00
Amnon Heiman	8cd3d7445c	scylla_setup: remove the uuid file creation Scylla housekeeping can crete a uuid file if it is missing. There is no longer need to create one for it. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1483866553-13855-3-git-send-email-amnon@scylladb.com>	2017-01-08 14:11:04 +02:00
Amnon Heiman	32888fc0aa	scylla-housekeeping: Create a uuid file if one is missing This patch gets housekeeping to create a uuid file if a path to a uuid file is upplied but the file is missing. Because it import the uuid lib, uuid parameters where renamed. Fixes #1987 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1483866553-13855-2-git-send-email-amnon@scylladb.com>	2017-01-08 14:11:03 +02:00
Gleb Natapov	9ed3346f98	main: fix error reporting about low memory Message-Id: <20170108112144.GT1829@scylladb.com>	2017-01-08 13:46:48 +02:00
Raphael S. Carvalho	eed2a7d065	sstables: group sstable components that can be shared among shards We intend to share immutable sstable components among shards to reduce excessive memory usage when resharding shared sstables. This change is about grouping those components into a structure, and using foreign ptr to make sure that the structure will be deleted by whichever shard created it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:19 -02:00
Raphael S. Carvalho	a492f8dfaf	sstables: rename sstable member Rename _components to _recognized_components because _components will be used to name a field with shareable components. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:17 -02:00
Avi Kivity	38b2fa27ad	Merge seastar upstream * seastar 1c8e389...240b0bf (15): > file/dup: don't decrease refcnt twice when file is explicitly closed > reactor: Add missing CentOS 7.2 dependency systemtap-sdt-devel > reactor: Cleaning the smp queue metrics when shuting down > metrics: metrics keep the value map while unregistering > change the reactor load metrics to utilization > Merge "ASan fiber switches" from Paweł > tls: Add missing credentials_builder::set_client_auth method > collectd: create metrics with the right format > io_queue: remove owner number from metric name > reactor: change the load metric name to load > Merge "reactor: stop using signals for task_quota timer" > metrics: Allow initializing the metric_group in its constructor > Update DPDK to 16.11 > Revert "rpc: Avoid using zero-copy interface of output_stream" > core::metrics_groups: add a clear() method	2017-01-06 16:34:51 +02:00
Vlad Zolotarov	492295eb7f	init: move supervisor_notify() out of main.cc Transform the supervisor_notify() and related functions into the "supervisor" class and place this class implementation in a separate .cc file. This is going to fix the compilation breakage of tests introduced by a commit `8014adc2a1` init: serialize the creation of system_traces KS objects Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1483663955-20096-1-git-send-email-vladz@scylladb.com>	2017-01-06 10:10:55 +00:00
Avi Kivity	be11b054e1	Merge "Reduce the size of mutation_partition" from Piotr "Reduce the size of mutation_partition by implementing intrusive set using bi::rbtree_algorithms directly and using tree nodes optimized for size. This will reduce the size of mutation_partition by: 24 bytes + <number of cql rows> * 8 bytes This should have a positive impact on performance because mutation_partitions are stored both in memtable and cache. Fixes #742." * 'haaawk/742' of github.com:cloudius-systems/seastar-dev: intrusive_set: rename size() to calculate_size() Make intrusive_set_external_comparator::_value_traits static Implement intrusive set using rbtree_algorithms mutation_partition: make apply_reversibly_intrusive_set nongeneric mutation_partition: take schema in find_row and clustered_row mutation_partition: Extract intrusive set logic to a class. mutation_partition: Replace value_comp with key_comp calls	2017-01-05 17:34:10 +02:00
Tomasz Grabiec	cd630fece6	db: Make system tables use the commitlog Before this patch system table writes were not writing to commit log because database::add_column_family() disables writes to commit log for the table which is added if _commitlog is not set at that time. Fix by initializing commit log before system tables are created. Fixes #1986. Fixes recent regression in batch_test.py:TestBatch.replay_after_schema_change_test after scylla-jmx was updated to not flush system tables on nodetool flush. Could cause system keyspace writes to be delayed for more than before under heavy write workload. Refs #1926. Message-Id: <1483618117-4535-1-git-send-email-tgrabiec@scylladb.com>	2017-01-05 14:53:51 +02:00
Avi Kivity	eb520e7352	storage_proxy: fix result ordering for parallel partition range scans During a range scan, we try to avoid sorting according to partition range when we can do so. This is when we scan fewer than smp::count shards -- each shard's range is strictly ordered with respect to the others. However, we use the wrong key for the sort -- we use the shard number. But if we started at shard s > 0 and wrapped around to shard 0, then shard 0's range will be after the range belonging to shard s, but will sort before it. Fix by storing the iteration order as the sort key. We use that when we know that shards do not overlap (shards < smp::count) and the index within the source partition range vector when they do. Fixes #1998. Message-Id: <20170105114253.17492-1-avi@scylladb.com>	2017-01-05 12:51:37 +01:00
Vlad Zolotarov	8014adc2a1	init: serialize the creation of system_traces KS objects Serialize the creation of a system_traces KS objects when they do not exist - the initial cluster boot. Avoid creating them in parallel by different cluster Nodes in order to avoid issue #420. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1483552503-12873-3-git-send-email-vladz@scylladb.com>	2017-01-05 12:41:38 +01:00
Vlad Zolotarov	d3b8b67e66	service::storage_service: serialize the system_auth KS initialization Move the system_auth KS initialization to be before Node moves to the NORMAL state. This way we will serialize this code running on different Nodes and avoid hitting issue #420. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1483552503-12873-2-git-send-email-vladz@scylladb.com>	2017-01-05 12:36:06 +01:00
Piotr Jastrzebski	b159e08764	intrusive_set: rename size() to calculate_size() This hopefully will make it more apparent that the time complexity of this method is O(N) not O(1). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 12:21:43 +01:00
Piotr Jastrzebski	b47a296053	Make intrusive_set_external_comparator::_value_traits static _value_traits can be shared among all instances and there's no need to store it in every single one. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 12:21:10 +01:00
Avi Kivity	4667641f5f	result_memory_tracker: fix too-short short reads 1.6 truncates paged queries early to avoid overrunning server memory with too-large query results, but in the case of partition range queries, this terminates too early due to an uninitialized variable holding the maximum result size. This results in slow performance due to additional round trips. Fix by initializing the maximum result size from the result_memory_tracker running on the coordinating shard. Fixes #1995. Message-Id: <20170105103915.10633-1-avi@scylladb.com>	2017-01-05 10:51:55 +00:00
Piotr Jastrzebski	041b0a65ac	Implement intrusive set using rbtree_algorithms This new implementation takes less memory because it does not store comparator. It also uses tree nodes optimized for size. This means that instead of storing an enum field \|color\| they embed this information inside pointer to parent. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:46:58 +01:00
Piotr Jastrzebski	a0c20f5c49	mutation_partition: make apply_reversibly_intrusive_set nongeneric apply_reversibly_intrusive_set is used only in one place and always with rows_type. There's no need for it to be generic. This will allow changing intrusive set implementation. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Piotr Jastrzebski	4bbe05dd47	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Piotr Jastrzebski	fe3c91db90	mutation_partition: Extract intrusive set logic to a class. It will make it easier to change the implementation of the intrusive set. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Piotr Jastrzebski	da67ac7ae4	mutation_partition: Replace value_comp with key_comp calls This will reduce the size of bi::set API being used. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Pekka Enberg	0ea5652354	tests/type_tests: TINYINT and SMALLINT type test cases	2017-01-05 10:57:35 +02:00
Pekka Enberg	41e3327ebc	tests/cql_query_test: TINYINT and SMALLINT type test cases	2017-01-05 10:57:35 +02:00
Pekka Enberg	fcaa743e3d	cql3: TINYINT and SMALLINT data type support This adds support for the TINYINT and SMALLINT data types introduced in CQL 3.3.1. Refs #1284	2017-01-05 10:57:35 +02:00
Pekka Enberg	257fa541f1	types: Fix integer_type_impl::parse_int() for bytes The integer_type_impl::parse_int() function uses boost::lexical_cast() under the hood, which parses 8-bit numbers as characters. Fix the function to lexical cast to 64-bit integer and convert the result to integer_type_impl template type.	2017-01-05 10:57:35 +02:00
Nadav Har'El	45f19f2633	main: better error message on failing to start Prometheus Previously, if the Prometheus port (by default, 0.0.0.0:9180) could not be opened, the following message appeared in the log about 10 seconds into the run, and Scylla crashed. ERROR 2017-01-01 19:31:04,066 [shard 0] seastar - Exiting on unhandled exception: std::system_error (error system:98, Address already in use) The puzzled user would have no idea which address was already in use, why, or why Scylla stopped. In this patch, before the above message we get the much more informative message: ERROR 2017-01-01 19:58:19,080 [shard 0] init - Could not start Prometheus API server on 0.0.0.0:9180: std::system_error (error system:98, Address already in use) We continue to print the original message - and exit - in this case, under the assumption that it's better not to run the database while improperly configured. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170102121304.2060-1-nyh@scylladb.com>	2017-01-04 14:58:26 +02:00

1 2 3 4 5 ...

11125 Commits