scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
Piotr Jastrzebski	6528f3a963	Make sure mutation_reader for sstables can be fast-forwarded Fixes #2145. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> [tgrabiec: Extracted from a series, fixed title] Message-Id: <1495639745-19387-1-git-send-email-tgrabiec@scylladb.com>	2017-05-24 16:36:24 +01:00
Tomasz Grabiec	6cf2841654	mvcc: Extract partition_snapshot_reader to separate header Right know whole world includes it transitively, which results in painful recompiles when the code changes. Relax dependencies. Message-Id: <1495620201-8046-1-git-send-email-tgrabiec@scylladb.com>	2017-05-24 12:13:15 +01:00
Asias He	f792c78c96	streaming: Do not abort session too early in idle detection Streaming ususally takes long time to complete. Abort it on false positive idle detection can be very wasteful. Increase the abort timeout from 10 minutes to a very large timeout, 300 minutes. The real idle session will be aborted eventually if other mechanisms, e.g., streaming manager has gossip callback for on_remove and on_restart event to abort, do not abort the session. Fixes #2197 Message-Id: <57f81bfebfdc6f42164de5a84733097c001b394e.1494552921.git.asias@scylladb.com>	2017-05-24 12:29:50 +03:00
Paweł Dziepak	3b9c0a6ae2	Merge "loading_cache: fix the known complexity issue in the shrink() method" from Vlad Use the boost::intrusive containers in order to achieve a O(1) complexity for both "LRU list" update and to minimize the memory overhead in the hash table item to "LRU list" item connection: - Make the timestamped_val be both a bi::list and a bi::unordered_set item. - Make a bi::unordered_set be a cache backend instead of the std::unordered_map. As a result dropping k LRU items becomes an O(k) operation instead of O(N log N), where N is a total number of all cached items: - Every time a value is read - move it to the front of the "LRU list" (O(1)). - When we need to remove k LRU items: - Repeat k times: - Take an element from the back of the "LRU list". (O(1)). - Remove it from the bi::unordered_set and dispose. (O(1)). We use an auto-unlink configuration for bi::list, therefore disposing an item is going to auto unlink it from the list. * 'permissions_cache_move_to_intrusive-v1' of github.com:scylladb/seastar-dev: utils::loading_cache: cleanup utils/loading_cache.hh: use intrusive list to store the lru entry utils::loading_cache: implement automatic rehashing utils::loading_cache: make the underlying map to be an intrusive unordered_set	2017-05-23 16:18:16 +01:00
Tomasz Grabiec	cec8d7f38c	gdb: Fix error about gdb.Value not being convertible to int by %x format Message-Id: <1495538843-27777-1-git-send-email-tgrabiec@scylladb.com>	2017-05-23 15:38:58 +03:00
Avi Kivity	fd0e1eb1e2	Merge "Fixes for mutation algebra" from Tomasz "Enforces commutativity of addition: m1 + m2 == m2 + m1 and consistency of difference and addition with equality: m1 + (m2 - m1) == m1 + m2" * tag 'tgrabiec/fix-range-tombstone-commutativity-v2' of github.com:cloudius-systems/seastar-dev: mutation: Make compare_*_for_merge() consistent with equals() tests: mutation: Improve assertion failure message tests: Use default equality in test_mutation_diff_with_random_generator mutation: Make counter cell difference consistent with apply tests: range_tombstone_list_test: Improve error message tests: range_tombstone_list: Check adjacent range merging range_tombstone_list: Merge adjacent range tombstones in apply() tests: mutation: Check commutativity of mutation addition range_tombstone_list: Avoid violating set invariant range_tombstone_list: Make tombstone merging commutative range_tombstone_list: Add erase() operation to the reverter range_tombstone_list: Make all undo operations ordered relative to each other utils: Extract to_boost_visitor() to a separate header allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-23 15:20:38 +03:00
Tomasz Grabiec	804f46f684	mutation: Make compare_*_for_merge() consistent with equals() equals() considers expiring cells to be different form non-expiring cells, but compare_row_marker_for_merge() considers them equal. Fix the latter to pick expiring cells. The choice was arbitrary.	2017-05-23 13:35:03 +02:00
Tomasz Grabiec	c1475a8eb2	tests: mutation: Improve assertion failure message	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	d15880b3b7	tests: Use default equality in test_mutation_diff_with_random_generator	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	9dbae279ad	mutation: Make counter cell difference consistent with apply The case when both cells are dead was not handled properly, the diff was always empty, whereas the cell with higher timestamp should win. Caused test_mutation_diff_with_random_generator to fail.	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	951da421db	tests: range_tombstone_list_test: Improve error message	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	bee40b4628	tests: range_tombstone_list: Check adjacent range merging	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	3c509308ab	range_tombstone_list: Merge adjacent range tombstones in apply() Needed for equivalence to work correctly with difference and addition: m1 + (m2 - m1) = m1 + m2 Fixes #2158.	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	ef4c7c458c	tests: mutation: Check commutativity of mutation addition	2017-05-23 12:11:12 +02:00
Tomasz Grabiec	1dea251ca2	range_tombstone_list: Avoid violating set invariant The code was inserting an entry with the same key as its successor, and only later adjusting the key of the old entry. This is violating set's invariant of unique keys, and insertion may cause rebalancing. I don't know if this violation actually causes problems currently, but it's safer not to. Fix by first updating the existing entry and then inserting the new one.	2017-05-23 12:11:12 +02:00
Tomasz Grabiec	a2a22e5f00	range_tombstone_list: Make tombstone merging commutative Example of non-commutative case: a = [1, 5]@t2 b = {[2, 3]@t1, [4, 5]@t1} a + b = [1, 5]@t2 b + a = [1, 4)@t2, [4, 5]@t2 After this patch, both will yield [1, 5]@t2. The patch also changes the logic to handle overlaps of tombstones with equal timestamps to be handled symmetrically. They are now merged instead of split on either of the boundary. Refs #2158.	2017-05-23 12:11:12 +02:00
Tomasz Grabiec	c4dac7c80f	range_tombstone_list: Add erase() operation to the reverter	2017-05-23 12:11:12 +02:00
Tomasz Grabiec	935709cddc	range_tombstone_list: Make all undo operations ordered relative to each other Later operation may depend on the result of previous operation. Same dependency is present when reverting the operations. Fixes assertion failure in update reverter.	2017-05-23 12:11:12 +02:00
Vlad Zolotarov	2d4d198fb9	utils::loading_cache: cleanup - Remove "_" at the beginning of the type names. - s/Pred/EqualPred/ Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 23:02:18 -04:00
Vlad Zolotarov	fd59a548c0	utils/loading_cache.hh: use intrusive list to store the lru entry Fix the shrink() O(n log n) complexity issue by constantly pushing the corresponding intrusive list entry to the head of the list every time the values are read. This will keep the list ordered by the last read time from the most recently read to the least recently read entry. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 23:00:18 -04:00
Vlad Zolotarov	0c4e9efce7	utils::loading_cache: implement automatic rehashing - Start the cache with 256 buckets - the minimum number of buckets. - Limit the maximal number of buckets by 1M buckets. - Keep the load factor between 0.25 and 1.0 as long as the number of buckets is between the minimum and the maximum values mentioned above. - Grow and shrink the hash every "refresh" period if needed. - Enable bi::power_2_buckets and bi::compare_hash bi::unordered_set options. - Enable bi::unordered_set_base_hook's bi::store_hash option. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 22:57:44 -04:00
Vlad Zolotarov	2be3596a4f	utils::loading_cache: make the underlying map to be an intrusive unordered_set Make the underlying map to be a boost::intrusive::unordered_set<timestamped_val> instead of std::unordered_set<Key, timestamped_val>. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 18:45:13 -04:00
Tomasz Grabiec	5aeb9eb70c	utils: Extract to_boost_visitor() to a separate header	2017-05-22 19:30:02 +02:00
Tomasz Grabiec	69e2eccf68	allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-22 19:30:02 +02:00
Raphael S. Carvalho	4b4a1883aa	refresh: do not use default priority for loading new sstables Metadata is read using default priority class, which can significantly slow down the process under high load. Compaction class can be used, and if it turns out to be a problem, we can switch to a special class for it. Fixes #1859. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170517184546.17497-1-raphaelsc@scylladb.com>	2017-05-22 19:03:17 +03:00
Avi Kivity	ef428d008c	Merge "reduce memory requirement for loading sstables" from Rapahel "fixes a problem in which memory requirement for loading in-memory components of sstables is very high due to unlimited parallelism." * 'mem_requirement_sstable_load_v2_2' of github.com:raphaelsc/scylla: database: fix indentation of distributed_loader::open_sstable database: reduce memory requirement to load sstables sstables: loads components for a sstable in parallel sstables: enable read ahead for read of in-memory components sstables: make random_access_reader work with read ahead	2017-05-22 18:23:03 +03:00
Raphael S. Carvalho	28206993a4	database: fix indentation of distributed_loader::open_sstable Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-05-22 11:52:52 -03:00
Raphael S. Carvalho	a4e414cb3b	database: reduce memory requirement to load sstables SSTable load temporarily uses more space than needed to store metadata, due to: 1) All components are read using read_simple() which uses 128k buffer. file::dma_read_bulk() will allocate 128k, and may potentially allocate another big buffer (128k - read) for file::read_maybe_eof(). 2) read_filter() may use double the space it needs to. Due to the fact that sstable loading parallelism is unlimited, Scylla may require much more memory to load all sstables, and that may lead to OOM. Higher the number of sstables higher the memory overhead. To confirm this problem, I wrote a test[1] which loads 30k sstables in parallel and reports the memory usage peak in the end. When loading 30k sstables, each of which metadata is ~300kb, memory usage peak was ~18G. When loading completed, only ~9GB were needed to store all the metadata. [1]: https://gist.github.com/raphaelsc/2db37b4fb34301833ab9eeed3b1a524d To fix this problem, we need to set a limit on load parallelism (let's start with a small number like 3 and adjust later if needed) and rely on readahead so that the requirement drops considerably without increasing boot time. Actually, boot time is improved by it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2017-05-22 11:52:51 -03:00
Raphael S. Carvalho	043fae2ef5	sstables: loads components for a sstable in parallel Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2017-05-22 11:52:49 -03:00
Raphael S. Carvalho	0ac729fd57	sstables: enable read ahead for read of in-memory components Read ahead 4 is used. Let's adjust it later if needed. File size is used to prevent file_input_stream from issuing useless reads beyond file size with read ahead enabled. We can switch to variant without length once file_input_stream handles it properly. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-05-22 11:52:37 -03:00
Raphael S. Carvalho	77b8870cf3	sstables: make random_access_reader work with read ahead Scylla crashes if read ahead is enabled by file_random_access_reader because a call to seek() destroys the existing input stream without closing it first. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-05-22 11:52:33 -03:00
Duarte Nunes	6ac73b57fb	cql3/statements/select_statement: Remove dead code Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170522100230.17393-1-duarte@scylladb.com>	2017-05-22 14:32:12 +03:00
Avi Kivity	5828ddcca4	Merge seastar upstream * seastar 4af898c...68dbf60 (4): > dpdk: follow namespace changes to fix compile error > perftune.py: fix regression introduced in df5f74ac > doc: typo in README.md > posix_net: load-balance connections	2017-05-22 12:39:48 +03:00
Asias He	b56ba02335	gossip: Make bootstrap more robust The bootstrapping node will be a gossip only member, until the streaming finishes and the node becomes NORMAL state. If during this time, the bootstrapping node is overwhelmed with streaming, it is possible the node will delay the update the gossip heartbeat. Be forgiving for the bootstrapping node and do not remove it from gossip too fast. Otherwise, streaming rpc verbs will not be resent becasue the node is not in gossip membership anymore. Fixes #2150 Message-Id: <286d7035d854f2a48abf4e1e2e3bfcb8b22b9ca2.1494553580.git.asias@scylladb.com>	2017-05-21 19:25:40 +03:00
Takuya ASADA	7777b558c4	dist/redhat: Use mock for CentOS/RHEL rpms Enable mock for CentOS/RHEL, also support cross building by mock. Fixes #630 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20170513171200.14926-1-syuu@scylladb.com>	2017-05-21 19:22:54 +03:00
Avi Kivity	2f23648b9e	Revert "dist: add conflict with Cassandra" This reverts commit `da55aecca3`. Instead of an install-time conflict, we'll add a run-time conflict.	2017-05-21 18:37:59 +03:00
Alexys Jacob	c8116b4252	scylla_raid_setup: fix typo on print_usage Simple typo fix on the usage message output, the script name was not correct. Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20170519145851.6205-1-ultrabug@gentoo.org>	2017-05-21 18:01:28 +03:00
Avi Kivity	5b182537db	Merge seastar upstream * seastar 8aef5f5...4af898c (4): > memory: fix debug build > tests: fix slab_test build > xen: fix fallouts from seastar namespace change > build: make swagger generated files depend on the code generator	2017-05-21 13:48:24 +03:00
Alexys Jacob	8dbad4f34a	scylla_sysconfig_setup: fix typo on print_usage Simple typo fix on the usage message output, the script name was not correct. Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20170519143227.2741-1-ultrabug@gentoo.org>	2017-05-21 13:41:43 +03:00
Alexys Jacob	c0756d97b8	scylla_setup: fix typos on cpu scaling messages This fixes typos on CPU scaling related messages. Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20170519143703.3574-1-ultrabug@gentoo.org>	2017-05-21 13:41:42 +03:00
Glauber Costa	5f99158889	api: return correct values for bloom filter statistics We are currently suspecting that the bloom filter false positive ratio is not being respected. While trying to debug that, I found out that we have a more basic problem: The numbers are all meaningless, because the stats are wrong. We are accumulating by summing the ratios together. It's easy to see how this doesn't work, if we look at an example where the ratio for some CFs is zero: SST1: false = 1, total = 2. ratio = 0.5 SST2: false = 0, total = 98 . ratio = 0. The real ratio in this example is 1 / (98 + 2) = 1 %, but the displayed ratio will be 0.5 + 0 = 0.5. This patch will map reduce all the sstables together keeping both numerator and denominator, yielding the right value at the end. To do that, we'll reuse the existing ratio_holder class, which already does exactly what we want. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20170518222333.16307-1-glauber@scylladb.com>	2017-05-21 13:11:22 +03:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Avi Kivity	dab2783b58	Merge seastar upstream * seastar 45b718b...f726938 (2): > memory: add --mbind option to supress warning message when running Seastar apps on container > Add support for Gentoo Linux irqbalance configuration detection.	2017-05-20 21:15:46 +03:00
Avi Kivity	c8cb3d6ff5	Merge "Materialized views: bug fixes and unit tests" from Duarte "This series fixes bugs related to materialized views, most pertaining to column filtering in the where clause." * 'materialized-views/bug-fixes/v1' of https://github.com/duarten/scylla: tests/view_schema_test: Add more test cases tests/cql_assertions: Add assertion for row set equality single_column_relation: Correctly print IN relation statement_restrictions: Allow filtering regular columns for views statement_restrictions: Relax clustering restrictions for views statement_restrictions: Relax partition restrictions for views cql3/statements: Prevent setting default ttl on view cql3/restrictions: Complete implementation of is_satisfied_by() db/view: Re-implement clustering_prefix_matches() db/view: Re-implement partition_key_matches() db/view: Generate regular tombstone for base deletions db/view: Consider cell liveness when generating updates db/view: Don't generate view updates for static rows	2017-05-20 13:52:56 +03:00
Tomasz Grabiec	cd4d15672b	utils: estimated_histogram: Fix clear() It was a no-op. It doesn't seem currently used, but I will have a use for it soon. Message-Id: <1495198172-1969-1-git-send-email-tgrabiec@scylladb.com>	2017-05-19 14:34:34 +01:00
Paweł Dziepak	c560cf9d9d	Merge "fixes and improvements in the permissions cache implementation" from Vlad "There are numerous issues in the current implementation of permissions cache starting from the logical errors and bugs and ending with the suboptimal implementation described in the issue #2262." * 'permissions_cache_fixes-v4' of github.com:scylladb/seastar-dev: utils::loading_cache: avoid the reads storm when the key is not in the cache utils::loading_cache: cleanup utils::loading_cache: align the constrains in the constructor with the parameters description utils::loading_cache: refresh in the background auth::auth: add operator<<() for a permission_cache key auth::auth::permissions_cache: use the values from the configuration - don't try to be smart db::config: define a saner default value for permissions_validity_in_ms	2017-05-18 13:33:05 +01:00
Vlad Zolotarov	6a63c87a9f	utils::loading_cache: avoid the reads storm when the key is not in the cache Use a mutex to serialize producers when the key is not present in the cache. Fixes #2262 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-18 07:55:48 -04:00
Tomasz Grabiec	3fc1703ccf	range: Fix SFINAE rule for picking the best do_lower_bound()/do_upper_bound() overload mutation_partition has a slicing constructor which is supposed to copy only the rows from the query range. The rows are located using nonwrapping_range::lower_bound() and nonwrapping_range::lower_bound(). Those two have two different implementations chosen with SFINAE. One is using std::lower_bound(), and one is using container's built in lower_bound() should it exist. We're using intrusive tree in mutation_partition, so container's lower_bound() is preferred. It's O(log N) whereas std::lower_bound() is O(N), because tree's iterator is not random access. However, the current rule for picking container's lower_bound() never triggers, because lower_bound() has two overloads in the container: ./range.hh:618:14: error: decltype cannot resolve address of overloaded function typename = decltype(&std::remove_reference<Range>::type::upper_bound)> ^~~~~~~~ As a result, the overload which uses std::lower_bound() is used. Spotted when running perf_fast_forward with wide partition limit in cache lifted off. It's so slow that I timeouted waiting for the result (> 16 min). Fixes #2395. Message-Id: <1495048614-9913-1-git-send-email-tgrabiec@scylladb.com>	2017-05-18 13:28:10 +03:00
Avi Kivity	ba31619594	tests: fix partitioner_test for g++ 5 It can't make the leap from dht::ring_position to stdx::optional<range_bound<dht::ring_position>> for some reason.	2017-05-18 13:09:41 +03:00
Pekka Enberg	30b5933db2	Merge "Add Gentoo Linux support to utility and setup scripts" from Alexys "These patches add support to setup and operate ScyllaDB on Gentoo Linux. * scylla_setup and related scripts * node_health_check I have kept them as simple as possible and tested them to setup and operate succesfully a three nodes cluster running on Gentoo Linux." * 'gentoo_linux_support' of github.com:ultrabug/scylla: scylla_setup: add gentoo linux installation detection prometheus node_exporter install: add support for gentoo linux raid setup: add support for gentoo linux ntp setup: add support for gentoo linux kernel check: add support for gentoo linux cpuscaling setup: add support for gentoo linux coredump setup: add support for gentoo linux detect gentoo linux on selinux setup add gentoo_variant detection and SYSCONFIG setup	2017-05-18 09:41:13 +03:00

1 2 3 4 5 ...

12103 Commits