scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	730e48bf60	configure.py: Always add a rule for building gen_crc_combine_table Fixes a build failure when only the scylla binary was selected for building like this: ./configure.py --with scylla In this case the rule for gen_crc_combine_table was missing, but it is needed to build crc_combine_table.o Message-Id: <1544010138-21282-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `edbef7400b`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	af6d4f40e1	utils/gz: Fix compilation on non-x86 archs gen_crc_combine_table is now executed on every build, so it should not fail on unsupported archs. The generated file will not contain data, but this is fine since it should not be used. Another problem is that u32 and u64 aliases were not visible in the #else branch in crc_combine.cc Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `9a4c00beb7`)	2018-12-08 13:42:43 +02:00
Avi Kivity	9d8507de09	Merge "Optimize checksum_combine() for CRC32" from Tomek " zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977). Refs #3874 " * tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla: tests: perf_checksum: Test fast_crc32_combine() tests: Rename libdeflate_test to checksum_utils_test tests: libdeflate: Add more tests for checksum_combine() tests: libdeflate: Check both libdeflate and default checksummers sstables: Use fast_crc_combine() in the default checksummer utils/gz: Add fast implementation of crc32_combine() utils/gz: Add pre-computed polynomials utils/gz: Import Barett reduction implementation from libdeflate utils: Extract clmul() from crc.hh (cherry picked from commit `b098b5b987`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	07c980845d	utils/crc: Add clmul_u32() implementation Needed for backporting dependent changes. Extracted from: commit `79136e895f` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Thu Nov 1 03:26:16 2018 +0000 utils/crc: calculate crc in parallel	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	c52b8239d0	configure.py: Compile against Westmere on x86 Needed for backporting dependent changes. Extracted from: commit `79136e895f` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Thu Nov 1 03:26:16 2018 +0000 utils/crc: calculate crc in parallel	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	5a07a4fac8	configure.py: Use armv8-a+crc+crypto ISA on aarch64 Needed for backporting dependent changes. Extracted from: commit `1c48e3fbec` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Mon Oct 29 02:58:19 2018 +0000 utils/crc: leverage arm64 crc extension	2018-12-08 13:42:43 +02:00
Avi Kivity	b9c046b17b	Merge "Optimize checksum computation for the MC sstable format" from Tomek " One part of the improvement comes from replacing zlib's CRC32 with the one from libdeflate, which is optimized for modern architecture and utilizes the PCLMUL instruction. perf_checksum test was introduced to measure performance of various checksumming operations. Results for 514 B (relevant for writing with compression enabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 58414 16.711us 3.483ns 16.708us 16.725us crc_test.perf_adler_combine 165788278 6.059ns 0.031ns 6.027ns 7.519ns crc_test.perf_zlib_crc32_combine 59546 16.767us 26.191ns 16.741us 16.801us --- crc_test.perf_deflate_crc32_checksum 12705072 83.267ns 4.580ns 78.687ns 98.964ns crc_test.perf_adler_checksum 3918014 206.701ns 23.469ns 183.231ns 258.859ns crc_test.perf_zlib_crc32_checksum 2329682 428.787ns 0.085ns 428.702ns 510.085ns Results for 64 KB (relevant for writing with compression disabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 25364 38.393us 17.683ns 38.375us 38.545us crc_test.perf_adler_combine 169797143 5.842ns 0.009ns 5.833ns 6.901ns crc_test.perf_zlib_crc32_combine 26067 38.663us 95.094ns 38.546us 40.523us --- crc_test.perf_deflate_crc32_checksum 202821 4.937us 14.426ns 4.912us 5.093us crc_test.perf_adler_checksum 44684 22.733us 206.263ns 22.492us 25.258us crc_test.perf_zlib_crc32_checksum 18839 53.049us 36.117ns 53.013us 53.274us The new CRC32 implementation (deflate_crc32) doesn't provide a fast checksum_combine() yet, it delegates to zlib so it's as slow as the latter. Because for CRC32 checksum_combine() is several orders of magnitude slower than checksum(), we avoid calling checksum_combine() completely for this checksummer. We still do it for adler32, which has combine() which is faster than checksum(). SStable write performance was evaluated by running: perf_fast_forward --populate --data-directory /tmp/perf-mc \ --rows=10000000 -c1 -m4G --datasets small-part Below is a summary of the average frag/s for a memtable flush. Each result is an average of about 20 flushes with stddev of about 4k. Before: [1] MC,lz4: 330'903 [2] LA,lz4: 450'157 [3] MC,checksum: 419'716 [4] LA,checksum: 459'559 After: [1'] MC,lz4: 446'917 ([1] + 35%) [2'] LA,lz4: 456'046 ([2] + 1.3%) [3'] MC,checksum: 462'894 ([3] + 10%) [4'] LA,checksum: 467'508 ([4] + 1.7%) After this series, the performance of the MC format writer is similar to that of the LA format before the series. There seems to be a small but consistent improvement for LA too. I'm not sure why. " * tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla: tests: perf: Introduce perf_checksum tests: Add test for libdeflate CRC32 implementation sstables: compress: Use libdeflate for crc32 sstables: compress: Rename crc32_utils to zlib_crc32_checksummer licenses: Add libdeflate license Integrate libdeflate with the build system Add libdeflate submodule sstables: Avoid checksum_combine() for the crc32 checksummer sstables: compress: Avoid unnecessary checksum_combine() sstables: checksum_utils: Add missing include (cherry picked from commit `5e759b0c07`)	2018-12-08 13:42:43 +02:00
Avi Kivity	979cb636b8	Update seastar submodule * seastar e64281d...1651a2a (1): > tests: perf: Make do_not_optimize() take the argument by const&	2018-12-08 13:42:43 +02:00
Botond Dénes	59cf9d9070	querier: fix evict_one() and evict_all_for_table() Both of these have the same problem. They remove the to-be-evicted entries from `_entries` but they don't unregister the `entry` from the `read_concurrency_semaphore`. This results in the `reader_concurrency_semaphore` being left with a dangling pointer to the entries will trigger segfault when it tries to evict the associated inactive reads. Also add a unit test for `evict_all_for_table()` to check that it works properly (`evict_one()` is only used in tests, so no dedicated test for it). Fixes: #3962 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <57001857e3791c6385721b624d33b667ccda2e7d.1544010868.git.bdenes@scylladb.com> (cherry picked from commit `77dbc7d09a`)	2018-12-06 11:38:44 +02:00
Duarte Nunes	c9ec9d4087	Merge seastar upstream * seastar 880826e...e64281d (2): > core/semaphore: Change the access of semaphore_units main ctor > Merge "Add semaphore_units<>::split() function" from Duarte Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-05 20:25:17 +00:00
Gleb Natapov	2e8fefbc5a	storage_proxy: store hint for CL=ANY if all nodes replied with failure Current code assumes that request failed if all replicas replied with failure, but this is not true for CL=ANY requests. Take it into account. Fixed: #3565 (cherry picked from commit `17197fb005`)	2018-12-05 20:14:58 +00:00
Gleb Natapov	6be0635029	storage_proxy: complete write request early if all replicas replied with success of failure Currently if write request reaches CL and all replicas replied, but some replied with failures, the request will wait for timeout to be retired. Detect this case and retire request immediately instead. Fixes #3566 (cherry picked from commit `d1d04eae3c`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	04a544c0a2	storage_proxy: check that write failure response comes from recognized replica Before accounting failure response we need to make sure it comes from a replica that participates in the request. (cherry picked from commit `76ab3d716b`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	028f9b95d1	storage_proxy: move code executed on write timeout into separate function Currently the callback is in lambda, but we will want to call the code not only during timer expiration. (cherry picked from commit `7bc68aa0eb`)	2018-12-05 20:14:57 +00:00
Avi Kivity	54258ca8eb	Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte " This series changes hinted handoff to work with `frozen_mutation`s instead of naked `mutation`s. Instead of unfreezing a mutation from the commitlog entry and then freezing it again for sending, now we'll just keep the read, frozen mutation. Tests: unit(release) " * 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla: db/hints/manager: Use frozen_mutation instead of mutation db/hints/manager: Use database::find_schema() db/commitlog/commitlog_entry: Allow moving the contained mutation service/storage_proxy: send_to_endpoint overload accepting frozen_mutation service/storage_proxy: Build a shared_mutation from a frozen_mutation service/storage_proxy: Lift frozen_mutation_and_schema service/storage_proxy: Allow non-const ranges in mutate_prepare() (cherry picked from commit `1891779e64`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	c9a030f1f0	storage_proxy: count number of timed out write attempts after CL is reached It is useful to have this counter to investigate the reason for read repairs. Non zero value means that writes were lost after CL is reached and RR is expected. Message-Id: <20181009120900.GF22665@scylladb.com> (cherry picked from commit `207b57a892`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	1c7daef554	storage_proxy: do not pass write_stats down to send_to_live_endpoints write_stats is referenced from write handler which is available in send_to_live_endpoints already. No need to pass it down. Message-Id: <20181009133017.GA14449@scylladb.com> (cherry picked from commit `319ece8180`)	2018-12-05 20:14:57 +00:00
Duarte Nunes	f8195a77b0	db/view/view_builder: Don't timeout waiting for view to be built Remove the timeout argument to db::view::view_builder::wait_until_built(), a test-only function to wait until a given materialized view has finished building. This change is motivated by the fact that some tests running on slow environments will timeout. Instead of incrementally increasing the timeout, remove it completely since tests are already run under an exterior timeout. Fixes #3920 Tests: unit release(view_build_test, view_schema_test) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181115173902.19048-1-duarte@scylladb.com> (cherry picked from commit `6fbf792777`)	2018-12-05 19:20:36 +00:00
Duarte Nunes	5b724c80ab	db/view: Don't copy keyspace name Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181022104527.14555-1-duarte@scylladb.com> (cherry picked from commit `f3a5ec0fd9`)	2018-12-05 19:19:26 +00:00
Nadav Har'El	4a7ae81b3f	materialized views: update stats.write statistics in all cases mutate_MV usually calls send_to_endpoint() to push view update to remote view replicas. This function gets passed a statistics object, service::storage_proxy_stats::write_stats and, in particular, updates its "writes" statistic which counts the number of ongoing writes. In the case that the paired view replica happens to be the same node, we avoid calling send_to_endpoint() and call mutate_locally() instead. That function does not take a write_stats object, so the "writes" statistic doesn't get incremented for the duration of the write. So we should do this explicitly. Co-authored-by: Nadav Har'El <nyh@scylladb.com> Co-authored-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `1d5f8d0015`)	2018-12-05 19:19:26 +00:00
Piotr Sarna	3cf26a60a2	auth: add abort_source to waiting for schema agreement When the auth service is requested to stop during bootstrap, it might have still not reached schema agreement. Currently, waiting for this agreement is done in an infinite loop, without taking abort_source into account. This patch introduces checking if abort was requested and breaking the loop in such case, so auth service can terminate. Tests: unit (release) dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test) Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com> (cherry picked from commit `7b0a3fbf8a`)	2018-12-04 14:33:05 +00:00
Tomasz Grabiec	2103d0d52b	sstables: Write Statistics.db offset map entries in the same order as Cassandra Before this patch we were writing offset map enteies in unspecified order, the one returned by std::unorderd_map. Cassandra writes them sorted by metadata_type. Use the same order for improved compatibility. Fixes #3955. Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `aa19f98d18`)	2018-12-04 14:30:19 +02:00
Avi Kivity	16ee3b3ebe	Merge "Make inactive shard readers evictable" from Botond " This series attempts to solve the regressions recently discovered in performance of multi-partition range-scans. Namely that they: * Flood the reader concurrency semaphore's queues, trampling other reads. * Behave very badly when too many of them is running concurrently (trashing). * May deadlock if enough of them is running without a timeout. The solution for these problems is to make inactive shard readers evictable. This should address all three issues listed above, to varying degrees: * Shard readers will now not cling onto their permits for the entire duration of the scan, which might be a lot of time. * Will be less affected by infinite concurrency (more than the node can handle) as each scan now can make progress by evicting inactive shard readers belonging to other scans. * Will not deadlock at all. In addition to the above fix, this series also bundles two further improvements: * Add a mechanism to `reader_concurrecy_semaphore` to be notified of newly inserted evictables. * General cleanups and fixes for `multishard_combining_reader` and `foreign_reader`. I can unbundle these mini series and send them separately, if the maintainers so prefer, altough considering that this series will have to be backported to 3.0, I think this present form is better. Fixes: #3835 " * 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits) tests/multishard_mutation_query_test: test stateless query too tests/querier_cache: fail resource-based eviction test gracefully tests/querier_cache: simplify resource-based eviction test tests/mutation_reader_test: add test_multishard_combining_reader_next_partition tests/mutation_reader_test: restore indentation tests/mutation_reader_test: enrich pause-related multishard reader test multishard_combining_reader: use pause-resume API query::partition_slice: add clear_ranges() method position_in_partition: add region() accessor foreign_reader: add pause-resume API tests/mutation_reader_test: implement the pause-resume API query_mutations_on_all_shards(): implement pause-resume API make_multishard_streaming_reader(): implement the pause-resume API database: add accessors for user and streaming concurrency semaphores reader_lifecycle_policy: extend with a pause-resume API query_mutations_on_all_shards(): restore indentation query_mutations_on_all_shards(): simplify the state-machine multishard_combining_reader: use the reader lifecycle policy multishard_combining_reader: add reader lifecycle policy multishard_combining_reader: drop unnecessary `reader_promise` member ... (cherry picked from commit `414b14a6bd`)	2018-12-04 12:13:13 +02:00
Duarte Nunes	b0a9c40ab1	service/storage_proxy: Consider target liveness in sent_to_endpoint() So we don't attempt to send mutations to unreachable endpoints and instead store a hint for them, we now check the endpoint status and populate dead_endpoints accordingly in storage_proxy::send_to_endpoint(). Fixes #3820 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181007100640.2182-1-duarte@scylladb.com> (cherry picked from commit `30d6ed8f92`)	2018-12-03 18:38:05 +00:00
Duarte Nunes	53924e5c7f	service/storage_proxy: Fix formatting of send_to_endpoint() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181006204756.32232-1-duarte@scylladb.com> (cherry picked from commit `a69d468101`)	2018-12-03 18:37:59 +00:00
Avi Kivity	befe0012f5	Merge "Fix multiple summary regeneration bugs." from Vladimir " This patchset addresses two recently discovered bugs both triggered by summary regeneration: Tests: unit {release} + Validated with debug build of Scylla (ASAN) that no use-after-free occurs when re-generating Summary.db. " * 'projects/sstables-30/summary-regeneration/v1' of https://github.com/argenet/scylla: tests: Add test reading SSTables in 'mc' format with missing summary. sstables: When loading, read statistics before summary. database: Capture io_priority_class by reference to avoid dangling ref. (cherry picked from commit `009cbd3dcb`)	2018-12-02 13:32:09 +02:00
Duarte Nunes	1953c5fa61	Merge 'Fix filtering with LIMIT' from Piotr " This series adds proper handling of filtering queries with LIMIT. Previously the limit was erroneously applied before filtering, which leads to truncated results. To avoid that, paged filtering queries now use an enhanced pager, which remembers how many rows dropped and uses that information to fetch for more pages if the limit is not yet reached. For unpaged filtering queries, paging is done internally as in case of aggregations to avoid returning keeping huge results in memory. Also, previously, all limited queries used the page size counted from max(page size, limit). It's not good for filtering, because with LIMIT 1 we would then query for rows one-by-one. To avoid that, filtered queries ask for the whole page and the results are truncated if need be afterwards. Tests: unit (release) " * 'fix_filtering_with_limit_2' of https://github.com/psarna/scylla: tests: add filtering with LIMIT test tests: split filtering tests from cql_query_test cql3: add proper handling of filtering with LIMIT service/pager: use dropped_rows to adjust how many rows to read service/pager: virtualize max_rows_to_fetch function cql3: add counting dropped rows in filtering pager (cherry picked from commit `1afda28cf3`)	2018-12-02 12:07:46 +02:00
Duarte Nunes	b72a94b53e	Merge 'Fix checking if system tables need view updates' from Piotr " This miniseries ensures that system tables are not checked for having view updates, because they never do. What's more, distributed system table is used in the process, so it's unsafe to query the table while streaming it. Tests: unit (release), dtest(update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test) " * 'fix_checking_if_system_tables_need_view_updates_3' of https://github.com/psarna/scylla: streaming: don't check view building of system tables database: add is_internal_keyspace streaming: remove unused sstable_is_staging bool class (cherry picked from commit `d09d4bbd91`)	2018-11-28 15:39:34 +00:00
Piotr Sarna	3f82b697f2	main: fix deinitialization order for view update generator View update generator should be stopped only after drain_on_shutdown() is performed on storage service. Message-Id: <4d2bda4c73422a2ebf46d6dcd06c95d960839889.1543230849.git.sarna@scylladb.com> (cherry picked from commit `6ab8235369`)	2018-11-27 12:34:50 +00:00
Takuya ASADA	ee1ef853e5	dist/common/systemd/scylla-housekeeping-restart.service.mustache: specify correct repo for Debian variants We do specify correct repo for both Red Hat/Debian variants on -deily, but mistakenly don't for -restart, so do same on -restart. Fixes #3906 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181109224509.27380-1-syuu@scylladb.com> (cherry picked from commit `7740cd2142`)	2018-11-27 09:59:05 +02:00
Raphael S. Carvalho	6e7e7f3822	sstables: deprecate sstable metadata's ancestors The reason for that is that it's not available in sstable format mc, so we can no longer rely on it in common code for the currently supported formats. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com> (cherry picked from commit `d29482dce8`)	2018-11-24 12:36:40 +02:00
Paweł Dziepak	82a36edc9d	Merge "Optimize sstable writing of the MC format" from Tomasz " Tested with perf_fast_forward from: github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1 Using the following command line: build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \ --data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \ --datasets small-part The average reported flush throughput was (stdev for the avergages is around 4k): - for mc before the series: 367848 frag/s - for lc before the series: 463458 frag/s (= mc.before +25%) - for mc after the series: 429276 frag/s (= mc.before +16%) - for lc after the series: 466495 frag/s (= mc.before +26%) Refs #3874. " * tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla: sstables: mc: Avoid serialization of promoted index when empty sstables: mc: Avoid double serialization of rows tests: sstable 3.x: Do not compare Statistics component utils: Introduce memory_data_sink schema: Optimize column count getters sstables: checksummed_file_data_sink_impl: Bypass output_stream (cherry picked from commit `4aa5d83590`)	2018-11-24 12:36:40 +02:00
Avi Kivity	d4efa3c9b2	Update seastar submodule * seastar d6647df...880826e (1): > fstream: Introduce make_file_data_sink()	2018-11-24 12:36:40 +02:00
Avi Kivity	324dae3e12	Merge "compress: Restore lz4 as default compressor" from Duarte " Enables sstable compression with LZ4 by default, which was the long-time behavior until a regression turned off compression by default. Fixes #3926 " * 'restore-default-compression/v2' of https://github.com/duarten/scylla: tests/cql_query_test: Assert default compression options compress: Restore lz4 as default compressor tests: Be explicit about absence of compression (cherry picked from commit `bb85a21a8f`)	2018-11-21 16:45:22 +02:00
Tomasz Grabiec	c0ffc9a2b7	utils: phased_barrier: Make advance_and_await() have strong exception guarantees Currently, when advance_and_await() fails to allocate the new gate object, it will throw bad_alloc and leave the phased_barrier object in an invalid state. Calling advance_and_await() again on it will result in undefined behavior (typically SIGSEGV) beacuse _gate will be disengaged. One place affected by this is table::seal_active_memtable(), which calls _flush_barrier.advance_and_await(). If this throws, subsequent flush attempts will SIGSEGV. This patch rearranges the code so that advance_and_await() has strong exception guarantees. Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com> Fixes #3931. (cherry picked from commit `57e25fa0f8`)	2018-11-21 12:17:27 +02:00
Glauber Costa	f81fa5f75c	remove monitor if sstable write failed In (almost) all SSTable write paths, we need to inform the monitor that the write has failed as well. The monitor will remove the SSTable from controller's tracking at that point. Except there is one place where we are not doing that: streaming of big mutations. Streaming of big mutations is an interesting use case, in which it is done in 2 parts: if the writing of the SSTable fails right away, then we do the correct thing. But the SSTables are not commited at that point and the monitors are still kept around with the SSTables until a later time, when they are finally committed. Between those two points in time, it is possible that the streaming code will detect a failure and manually call fail_streaming_mutations(), which marks the SSTable for deletions. At that point we should propagate that information to the monitor as well, but we don't. Fixes #3732 (hopefully) Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181114213618.16789-1-glauber@scylladb.com> (cherry picked from commit `9f403334c8`)	2018-11-20 19:27:54 +02:00
Glauber Costa	6fd1cfcfce	sstables: correctly parse estimated histograms In commit `a33f0d6`, we changed the way we handle arrays during the write and parse code to avoid reactor stalls. Some potentially big loops were transformed into futurized loops, and also some calls to vector resizes were replaced by a reserve + push_back idiom. The latter broke parsing of the estimated histogram. The reason being that the vectors that are used here are already initialized internally by the estimated_histogram object. Therefore, when we push_back, we don't fill the array all the way from index 0, but end up with a zeroed beginning and only push back some of the elements we need. We could revert this array to a resize() call. After all, the reason we are using reserve + push_back is to avoid calling the constructor member for each element, but We don't really expect the integer specialization to do any of that. However, to avoid confusion with future developers that may feel tempted to converted this as well for the sake of consistency, it is safer to just make sure these arrays are zeroed. Fixes #3918 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181116130853.10473-1-glauber@scylladb.com> (cherry picked from commit `c6811bd877`)	2018-11-17 17:20:00 +02:00
Nadav Har'El	9d458ffea9	Materialized Views and Secondary Index: no longer experimental After this patch, the Materialized Views and Secondary Index features are considered generally-available and no longer require passing an explicit "--experimental=on" flag to Scylla. The "--experimental=on" flag and the db::config::check_experimental() function remain unused, as we graduated the only two features which used this flag. However, we leave the support for experimental features in the code, to make it easier to add new experimental features in the future. Another reason to leave the command-line parameter behind is so existing scripts that still use it will not break. Fixes #3917 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115144456.25518-1-nyh@scylladb.com> (cherry picked from commit `78ed7d6d0c`)	2018-11-15 19:50:30 +02:00
Duarte Nunes	9776a048e7	Merge 'Generating view updates during streaming' from Piotr During streaming, there are cases when we should invoke the view write path. In particular, if we're streaming because of repair or if a view has not yet finished building and we're bootstrapping a new node. The design constraints are: 1) The streamed writes should be visible to new writes, but the sstable should not participate in compaction, or we would lose the ability to exclude the streamed writes on a restart; 2) The streamed writes must not be considered when generating view updates for them; 3) Resilient to node restarts; 4) Resilient to concurrent stream sessions, possibly streaming mutations for overlapping ranges. We achieve this by writing the streamed writes to an sstable in a different folder, call it "staging". We achieve 1) by publishing the sstable to the column family sstable set, but excluding it from compactions. We do these steps upon boot, by looking at the staging directory, thus achieving 3). Fixes #3275 * 'streaming_view_to_staging_sstables_9' of https://github.com/psarna/scylla: (29 commits) tests: add materialized views test tests: add view update generator to cql test env main: add registering staging sstables read from disk database: add a check if loaded sstable is already staging database: add get_staging_sstable method streaming: stream tables with views through staging sstables streaming: add system distributed keyspace ref to streaming streaming: add view update generator reference to streaming main: add generating missed mv updates from staging sstables storage_service: move initializing sys_dist_ks before bootstrap db/view: add view_update_from_staging_generator service db/view: add view updating consumer table: add stream_view_replica_updates table: split push_view_replica_updates table: add as_mutation_source_excluding table: move push_view_replica_updates to table.cc database: add populating tables with staging sstables database: add creating /staging directory for sstables database: add sstable-excluding reader table: add move_sstable_from_staging_in_thread function ... (cherry picked from commit `a38f6078fb`)	2018-11-15 17:46:20 +02:00
Asias He	10cf97375e	streaming: Expose reason for streaming On receiving a mutation_fragment or a mutation triggered by a streaming operation, we pass an enum stream_reason to notify the receiver what the streaming is used for. So the receiver can decide further operation, e.g., send view updates, beyond applying the streaming data on disk. Fixes #3276 Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com> (cherry picked from commit `7f826d3343`)	2018-11-15 17:45:31 +02:00
Paweł Dziepak	e6355a9a01	Merge "Write static rows for all partitions if there are static columns" from Vladimir " It appears that in case when there are any static columns in serialization header, Cassandra would write a (possibly empty) static row to every partition in the SSTables file. This patchset alings Scylla's logic with that of Cassandra. Note that Scylla optimizes the case when no partition contains a static row because it keeps track of updated columns that Scylla currently does not do - see #3901 for details. Fixes #3900. " * 'projects/sstables-30/write-all-static-rows/v1' of https://github.com/argenet/scylla: tests: Test writing empty static rows for partitions in tables with static columns. sstables: Ignore empty static rows on reading. sstables: Write empty static rows when there are static columns in the table. (cherry picked from commit `6469a1b451`)	2018-11-12 15:59:35 -08:00
Raphael S. Carvalho	e57907a1d5	sstables: fix procedure to get fully expired sstables with MC format MC format lacks ancestors metadata, so we need to workaround it by using ancestors in metadata collector, which is only available for a sstable written during this instance. It works fine here because we only want to know if a sstable recently compacted has an ancestor which wasn't yet deleted. Fixes #3852. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <20181102154951.22950-1-raphaelsc@scylladb.com> (cherry picked from commit `1c5934c934`)	2018-11-06 16:03:18 +02:00
Pekka Enberg	f94b46e7e0	docker: Switch to 3.0 RPM repository	2018-11-01 19:40:10 +02:00
Avi Kivity	6847c12668	Merge "dist: use perftune.py for disks tuning" from Vlad " Use perftune.py for tuning disks: - Distribute/pin disks' IRQs: - For NVMe drives: evenly among all present CPUs. - For non-NVMe drives: according to chosen tuning mode. - For all disks used by scylla: - Tune nomerges - Tune I/O scheduler. It's important to tune NIC and disks together in order to keep IRQ pinning in the same mode. Disk are detected and tuned based on the current content of /etc/scylla/scylla.yaml configuration file. " Fixes #3831. * 'use_perftune_for_disks-v3' of https://github.com/vladzcloudius/scylla: dist: change the sysconfig parameter name to reflect the new semantics scylla_util.py::sysconfig_parser: introduce has_option() dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics dist: don't distribute posix_net_conf.sh any more dist: use perftune.py to tune disks and NIC (cherry picked from commit `f170e3e589`)	2018-11-01 19:19:04 +02:00
Avi Kivity	80b86def1f	Update seastar submodule * seastar 0c8a2c8...d6647df (3): > scripts: perftune.py: properly merge parameters from the command line and the configuration file > scripts: perftune.py: prioritize I/O schedulers > Merge "scripts: perftune.py: support different I/O schedulers" from Vlad Ref #3831.	2018-11-01 19:18:07 +02:00
Vlad Zolotarov	c6de9ea39b	config: enable hinted handoff by default Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181019180401.12400-1-vladz@scylladb.com> (cherry picked from commit `4d1bb719a4`)	2018-11-01 10:41:44 +02:00
Avi Kivity	94bed81c1d	Update seastar submodule * seastar 39b89de...0c8a2c8 (1): > prometheus: Allow preemption between each metric See scylladb/seastar#469.	2018-10-31 19:21:21 +02:00
Hagit Segev	0f3a21f0bb	release: prepare for 3.0-rc1 scylla-3.0.rc1	2018-10-31 12:08:43 +02:00
Tomasz Grabiec	976db7e9e0	Merge "Proper support for static rows in SSTables 3.x" from Vladimir This patchset addresses two issues with static rows support in SSTables 3.x. ('mc' format): 1. Since collections are allowed in static rows, we need to check for complex deletion, set corresponding flag and write tombstones, if any. 2. Column indices need to be partitioned for static columns the same way they are partitioned for regular ones. * github.com/argenet/scylla.git projects/sstables-30/columns-proper-order-followup/v1: sstables: Partition static columns by atomicity when reading/writing SSTables 3.x. sstables: Use std::reference_wrapper<> instead of a helper structure. sstables: Check for complex deletion when writing static rows. tests: Add/fix comments to test_write_interleaved_atomic_and_collection_columns. tests: Add test covering inverleaved atomic and collection cells in static row. (cherry picked from commit `62c7685b0d`)	2018-10-30 14:51:21 +01:00
Nadav Har'El	996b86b804	Materalized views: fix race condition in resharding while view building When a node reshards (i.e., restarts with a different number of CPUs), and is in the middle of building a view for a pre-existing table, the view building needs to find the right token from which to start building on all shards. We ran the same code on all shards, hoping they would all make the same decision on which token to continue. But in some cases, one shard might make the decision, start building, and make progress - all before a second shard goes to make the decision, which will now be different. This resulted, in some rare cases, in the new materialized view missing a few rows when the build was interrupted with a resharding. The fix is to add the missing synchronization: All shards should make the same decision on whether and how to reshard - and only then should start building the view. Fixes #3890 Fixes #3452 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181028140549.21200-1-nyh@scylladb.com> (cherry picked from commit `b8337f8c9d`)	2018-10-29 09:52:25 +00:00

1 2 3 4 5 ...

16723 Commits