scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 20:57:00 +00:00

Author	SHA1	Message	Date
Nadav Har'El	e91c741ef5	secondary indexes: fail attempts to create a CUSTOM INDEX Cassandra supports a "CREATE CUSTOM INDEX" to create a secondary index with a custom implementation. The only custom implementation that Cassandra supports is SASI. But Scylla doesn't support this, or any other custom index implementation. If a CREATE CUSTOM INDEX statement is used, we shouldn't silently ignore the "CUSTOM" tag, we should generate an error. This patch also includes a regression test that "CREATE CUSTOM INDEX" statements with valid syntax fail (before this patch, they succeeded). Fixes #3977 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181211224545.18349-2-nyh@scylladb.com> (cherry picked from commit `a0379209e6`)	2018-12-12 00:32:35 +00:00
Nadav Har'El	b18e9e115d	Fix typo in error message Interestingly, this typo was copied from the original Cassandra source code :-) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181211224545.18349-1-nyh@scylladb.com> (cherry picked from commit `36db4fba23`)	2018-12-12 00:32:35 +00:00
Avi Kivity	0b86ab0d2a	build: build libdeflate with user selected C compiler If the user specified a C compiler, use it to build libdeflate. Fixes #3978. Message-Id: <20181211145604.14847-1-avi@scylladb.com> (cherry picked from commit `34a31a807d`)	2018-12-11 19:24:24 +02:00
Duarte Nunes	97cd9108d6	db/system_distributed_keyspace: Create the schema with min_timestamp Different nodes can concurrently create the distributed system keyspace on boot, before the "if not exists" clause can take effect. However, the resulting schema mutations will be different since different nodes use different timestamps. This patch forces the timestamps to be the same across all nodes, so we save some schema mismatches. This fixes a bug exposed by `ca5dfdf`, whereby the initialization of the distributed system keyspace is done before waiting for schema agreement. While waiting for schema agreement in storage_service::join_token_ring(), the node still hasn't joined the ring and schemas can't be pulled from it, so nodes can deadlock. A similar situation can happen between a seed node and a non-seed node, where the seed node progresses to a different "wait for schema agreement" barrier, but still can't make progress because it can't pull the schema from the non-seed node still trying to join the ring. Finally, it is assumed that changes to the schema of the current distributed system keyspace tables will be protected by a cluster feature and a subsequent schema synchronization, such that all nodes will be at a point where schemas can be transferred around. Fixes #3976 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181211113407.20075-1-duarte@scylladb.com> (cherry picked from commit `89ae3fbf11`)	2018-12-11 14:53:30 +00:00
Hagit Segev	f81fe96b0b	release: prepare for 3.0-rc2	2018-12-11 12:32:34 +02:00
Avi Kivity	91ce3a7957	sstables: fix overflow in clustering key blocks header bit access _ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too. Otherwise, a shift in the range 32-63 will produce wrong results. Fix by using a 64-bit mask. Found by Fedora 29's ubsan. Fixes #3973. Message-Id: <20181209120549.21371-1-avi@scylladb.com> (cherry picked from commit `7c7da0b462`)	2018-12-10 14:10:27 +02:00
Takuya ASADA	af7e58f4c5	dist/offline_installer/redhat: fix missing dependencies Offline installer with Scylla 3.0 causes dependency error on CentOS, added missing packages. Fixes #3969 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181207020711.23055-1-syuu@scylladb.com> (cherry picked from commit `a2d0ebf4d9`)	2018-12-10 14:10:15 +02:00
Amos Kong	bd3373b511	scylla_setup: only ask for nic in interactive mode Current scylla_setup still asks for nic even nic is already assigned in cmdline. Fixes #3908 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <6b867e17a5583c495c771a37d5fa1e8366b1d61b.1542337635.git.amos@scylladb.com> (cherry picked from commit `09a3b11c2f`)	2018-12-09 19:26:34 +02:00
Gleb Natapov	4820130abe	storage_proxy: fix crash during write timeout callback invocation rh_entry address is captured inside timeout's callback lambda, so the structure should not be moved after it is created. Change the code to create rh_entry in-place instead of moving it into the map. Fixes #3972. Message-Id: <20181206164043.GN25283@scylladb.com> (cherry picked from commit `9fb79bf379`)	2018-12-09 15:25:52 +02:00
Tomasz Grabiec	9b299241e5	Merge "Fixes for collecting stats in SST3 + more tests" from Vladimir This patchset fixes several remaining issues found during thorough testing of SSTables 3.x statistics and enriches ~30 unit tests with statistics validation against Cassandra-generated golden copies. * https://github.com/argenet/scylla/tree/projects/sstables-30/sst3-tests-statistics/v1: sstables: Enforce estimated_partitions in generate_summary() to be always positive. sstables: Don't enforce default max_local_deletion_time value for 'mc' files. sstables: Update TTL/local deletion stats for non-expiring and live liveness_info. sstables: Collect statistics when writing RT markers to SSTables 3.x. tests: Return sstable_assertions from validate_read() helper. tests: Introduce helper for validating stats metadata in SSTables 3.x tests. tests: Add stats metadata validation to test_write_static_row. tests: Add stats metadata validation to test_write_composite_partition_key. tests: Add stats metadata validation to test_write_composite_clustering_key. tests: Add stats metadata validation to test_write_wide_partitions. tests: Add stats metadata validation to write_ttled_row tests: Add stats metadata validation to write_ttled_column tests: Add stats metadata validation to write_deleted_column tests: Add stats metadata validation to write_deleted_row tests: Add stats metadata validation to write_collection_wide_update tests: Add stats metadata validation to write_collection_incremental_update tests: Add stats metadata validation to write_multiple_partitions tests: Add stats metadata validation to write_multiple_rows tests: Add stats metadata validation to write_missing_columns_large_set tests: Add stats metadata validation to write_different_types tests: Add stats metadata validation to write_empty_clustering_values tests: Add stats metadata validation to write_large_clustering_key tests: Add stats metadata validation to write_compact_table tests: Add stats metadata validation to write_user_defined_type_table tests: Add stats metadata validation to write_simple_range_tombstone tests: Add stats metadata validation to write_adjacent_range_tombstones tests: Add stats metadata validation to write_non_adjacent_range_tombstones tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows tests: Add stats metadata validation to write_range_tombstone_same_start_with_row tests: Add stats metadata validation to write_range_tombstone_same_end_with_row tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests. (cherry picked from commit `bb24d378b2`)	2018-12-08 14:08:46 +02:00
Avi Kivity	745a98e151	Merge "Fix deadlocking multishard readers" from Botond " Multishard combining readers, running concurrently, with limited concurrency and no timeout may deadlock, due to inactive shard readers sitting on permits. To avoid this we have to make sure that all shard readers belonging to a multishard combining readers, that are not currently active, can be evicted to free up their permits, ensuring that all readers can make progress. Making inactive shard readers evictable is the solution for this problem, however the original series introducing this solution (`414b14a6bd`) did not go all they way and left some loose ends. These loose ends are tied up by this mini-series. Namely, two issues remained: * The last reader to reach EOS was not paused (made evictable). * Readers created/resumed as part of a read-ahead were not paused immediately after finishing the read-ahead. This series fixes both of these. Fixes: #3865 Tests: unit(release, debug) " * 'fix-multishard-reader-deadlock/v1' of https://github.com/denesb/scylla: multishard_combining_reader: pause readers after reading ahead multishard_combining_reader: pause all EOS'd readers (cherry picked from commit `21b4b2b9a1`)	2018-12-08 14:08:46 +02:00
Avi Kivity	b9c99af18b	Merge "Fix tombstone histogram when writing SSTables 3.x" from Vladimir " This patchset extends a number of existing tests to check SSTables statistics for 'mc' format and fixes an issue discovered with the help of one of the tests. Tests: unit {release} " * 'projects/sstables-30/check-stats/v2' of https://github.com/argenet/scylla: tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions. tests: Run sstable_tombstone_histogram_test for all SSTables versions. tests: Run min_max_clustering_key_test on all SSTables versions. tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions. tests: Run test_sstable_max_local_deletion_time on all SSTables versions. tests: Extend test checking tombstones histogram to cover all SSTables versions. sstables: Properly track row-level tombstones when writing SSTables 3.x. tests: Run min_max_clustering_key_test_2 for all SSTables versions. tests: Make reusable_sst() helper accept SSTables version parameter. (cherry picked from commit `f073ea5f87`)	2018-12-08 14:08:44 +02:00
Asias He	cded9c7ac7	gossip: Fix race in real_mark_alive and shutdown msg In dtest, we have self.check_rows_on_node(node1, 2000) self.check_rows_on_node(node2, 2000) which introduce the following cluster operations: 1) Initially: - node1 up - node2 up 2) self.check_rows_on_node(node1, 2000) - node2 down - node2 up (A: node2 will call gossiper::real_mark_alive when node2 boots up to mark node1 up) 3) self.check_rows_on_node(node2, 2000) - node1 down (B: node1 will send shutdown gossip message to node2, node2 will mark node1 down) - node1 up (C: when node1 is up, node2 will call gossiper::real_mark_alive) Since there is no guarantee the order of Operation A and Operation B, it is possible node2 will mark node1 as status=shutdown and mark node1 is UP. In Operation C, node2 will call gossiper::real_mark_alive to mark node1 up, but since node2 might think node1 is already up, node2 will exit early in gossiper::real_mark_alive and not log "InetAddress 127.0.0.1 is now UP, status={}" As a result, dtest fails to see node2 reports node1 is up when it boots node1 and fail the test. TimeoutError: 23 Nov 2018 10:44:19 [node2] Missing: ['127.0.0.1.* now UP'] In the log we can see node1 marked as DOWN and UP almost at the same time on node2: INFO 2018-11-23 22:31:29,999 [shard 0] gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown INFO 2018-11-23 22:31:30,006 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = shutdown Fixes #3940 Tests: dtest with 20 consecutive succesful runs Message-Id: <996dc325cbcc3f94fc0b7569217aa65464eaaa1c.1543213511.git.asias@scylladb.com> (cherry picked from commit `eeeb2da7bb`)	2018-12-08 13:42:43 +02:00
Gleb Natapov	4acfc5ed8f	hints: make hints manager more resilient to unexpected directory content Currently if hints directory contains unexpected directories Scylla fails to start with unhandled std::invalid_argument exception. Make the manager ignore malformed files instead and try to proceed anyway. Message-Id: <20181121134618.29936-2-gleb@scylladb.com> (cherry picked from commit `b4a8802edc`)	2018-12-08 13:42:43 +02:00
Gleb Natapov	cb9199bc7f	hints: add auxiliary function for scanning high level hints directory We scan hints directory in two places: to search for files to replay and to search for directories to remove after resharding. The code that translates directory name to a shard is duplicated. It is simple now, so not a bit issue but in case it grows better have it in one place. Message-Id: <20181121134618.29936-1-gleb@scylladb.com> (cherry picked from commit `9433d02624`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	695ff5383f	Merge "Correct the usage of row ttl and add write-read test" from Piotr Fixes the condition which determines whether a row ttl should be used for a cell and adds a test that uses each generated mutation to populate mutation source and then verifies that it can read back the same mutation. * seastar-dev.git haaawk/sst3/write-read-test/v3: Fix use_row_ttl condition Add test_all_data_is_read_back (cherry picked from commit `b8c405c019`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	730e48bf60	configure.py: Always add a rule for building gen_crc_combine_table Fixes a build failure when only the scylla binary was selected for building like this: ./configure.py --with scylla In this case the rule for gen_crc_combine_table was missing, but it is needed to build crc_combine_table.o Message-Id: <1544010138-21282-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `edbef7400b`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	af6d4f40e1	utils/gz: Fix compilation on non-x86 archs gen_crc_combine_table is now executed on every build, so it should not fail on unsupported archs. The generated file will not contain data, but this is fine since it should not be used. Another problem is that u32 and u64 aliases were not visible in the #else branch in crc_combine.cc Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `9a4c00beb7`)	2018-12-08 13:42:43 +02:00
Avi Kivity	9d8507de09	Merge "Optimize checksum_combine() for CRC32" from Tomek " zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977). Refs #3874 " * tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla: tests: perf_checksum: Test fast_crc32_combine() tests: Rename libdeflate_test to checksum_utils_test tests: libdeflate: Add more tests for checksum_combine() tests: libdeflate: Check both libdeflate and default checksummers sstables: Use fast_crc_combine() in the default checksummer utils/gz: Add fast implementation of crc32_combine() utils/gz: Add pre-computed polynomials utils/gz: Import Barett reduction implementation from libdeflate utils: Extract clmul() from crc.hh (cherry picked from commit `b098b5b987`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	07c980845d	utils/crc: Add clmul_u32() implementation Needed for backporting dependent changes. Extracted from: commit `79136e895f` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Thu Nov 1 03:26:16 2018 +0000 utils/crc: calculate crc in parallel	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	c52b8239d0	configure.py: Compile against Westmere on x86 Needed for backporting dependent changes. Extracted from: commit `79136e895f` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Thu Nov 1 03:26:16 2018 +0000 utils/crc: calculate crc in parallel	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	5a07a4fac8	configure.py: Use armv8-a+crc+crypto ISA on aarch64 Needed for backporting dependent changes. Extracted from: commit `1c48e3fbec` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Mon Oct 29 02:58:19 2018 +0000 utils/crc: leverage arm64 crc extension	2018-12-08 13:42:43 +02:00
Avi Kivity	b9c046b17b	Merge "Optimize checksum computation for the MC sstable format" from Tomek " One part of the improvement comes from replacing zlib's CRC32 with the one from libdeflate, which is optimized for modern architecture and utilizes the PCLMUL instruction. perf_checksum test was introduced to measure performance of various checksumming operations. Results for 514 B (relevant for writing with compression enabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 58414 16.711us 3.483ns 16.708us 16.725us crc_test.perf_adler_combine 165788278 6.059ns 0.031ns 6.027ns 7.519ns crc_test.perf_zlib_crc32_combine 59546 16.767us 26.191ns 16.741us 16.801us --- crc_test.perf_deflate_crc32_checksum 12705072 83.267ns 4.580ns 78.687ns 98.964ns crc_test.perf_adler_checksum 3918014 206.701ns 23.469ns 183.231ns 258.859ns crc_test.perf_zlib_crc32_checksum 2329682 428.787ns 0.085ns 428.702ns 510.085ns Results for 64 KB (relevant for writing with compression disabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 25364 38.393us 17.683ns 38.375us 38.545us crc_test.perf_adler_combine 169797143 5.842ns 0.009ns 5.833ns 6.901ns crc_test.perf_zlib_crc32_combine 26067 38.663us 95.094ns 38.546us 40.523us --- crc_test.perf_deflate_crc32_checksum 202821 4.937us 14.426ns 4.912us 5.093us crc_test.perf_adler_checksum 44684 22.733us 206.263ns 22.492us 25.258us crc_test.perf_zlib_crc32_checksum 18839 53.049us 36.117ns 53.013us 53.274us The new CRC32 implementation (deflate_crc32) doesn't provide a fast checksum_combine() yet, it delegates to zlib so it's as slow as the latter. Because for CRC32 checksum_combine() is several orders of magnitude slower than checksum(), we avoid calling checksum_combine() completely for this checksummer. We still do it for adler32, which has combine() which is faster than checksum(). SStable write performance was evaluated by running: perf_fast_forward --populate --data-directory /tmp/perf-mc \ --rows=10000000 -c1 -m4G --datasets small-part Below is a summary of the average frag/s for a memtable flush. Each result is an average of about 20 flushes with stddev of about 4k. Before: [1] MC,lz4: 330'903 [2] LA,lz4: 450'157 [3] MC,checksum: 419'716 [4] LA,checksum: 459'559 After: [1'] MC,lz4: 446'917 ([1] + 35%) [2'] LA,lz4: 456'046 ([2] + 1.3%) [3'] MC,checksum: 462'894 ([3] + 10%) [4'] LA,checksum: 467'508 ([4] + 1.7%) After this series, the performance of the MC format writer is similar to that of the LA format before the series. There seems to be a small but consistent improvement for LA too. I'm not sure why. " * tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla: tests: perf: Introduce perf_checksum tests: Add test for libdeflate CRC32 implementation sstables: compress: Use libdeflate for crc32 sstables: compress: Rename crc32_utils to zlib_crc32_checksummer licenses: Add libdeflate license Integrate libdeflate with the build system Add libdeflate submodule sstables: Avoid checksum_combine() for the crc32 checksummer sstables: compress: Avoid unnecessary checksum_combine() sstables: checksum_utils: Add missing include (cherry picked from commit `5e759b0c07`)	2018-12-08 13:42:43 +02:00
Avi Kivity	979cb636b8	Update seastar submodule * seastar e64281d...1651a2a (1): > tests: perf: Make do_not_optimize() take the argument by const&	2018-12-08 13:42:43 +02:00
Botond Dénes	59cf9d9070	querier: fix evict_one() and evict_all_for_table() Both of these have the same problem. They remove the to-be-evicted entries from `_entries` but they don't unregister the `entry` from the `read_concurrency_semaphore`. This results in the `reader_concurrency_semaphore` being left with a dangling pointer to the entries will trigger segfault when it tries to evict the associated inactive reads. Also add a unit test for `evict_all_for_table()` to check that it works properly (`evict_one()` is only used in tests, so no dedicated test for it). Fixes: #3962 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <57001857e3791c6385721b624d33b667ccda2e7d.1544010868.git.bdenes@scylladb.com> (cherry picked from commit `77dbc7d09a`)	2018-12-06 11:38:44 +02:00
Duarte Nunes	c9ec9d4087	Merge seastar upstream * seastar 880826e...e64281d (2): > core/semaphore: Change the access of semaphore_units main ctor > Merge "Add semaphore_units<>::split() function" from Duarte Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-05 20:25:17 +00:00
Gleb Natapov	2e8fefbc5a	storage_proxy: store hint for CL=ANY if all nodes replied with failure Current code assumes that request failed if all replicas replied with failure, but this is not true for CL=ANY requests. Take it into account. Fixed: #3565 (cherry picked from commit `17197fb005`)	2018-12-05 20:14:58 +00:00
Gleb Natapov	6be0635029	storage_proxy: complete write request early if all replicas replied with success of failure Currently if write request reaches CL and all replicas replied, but some replied with failures, the request will wait for timeout to be retired. Detect this case and retire request immediately instead. Fixes #3566 (cherry picked from commit `d1d04eae3c`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	04a544c0a2	storage_proxy: check that write failure response comes from recognized replica Before accounting failure response we need to make sure it comes from a replica that participates in the request. (cherry picked from commit `76ab3d716b`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	028f9b95d1	storage_proxy: move code executed on write timeout into separate function Currently the callback is in lambda, but we will want to call the code not only during timer expiration. (cherry picked from commit `7bc68aa0eb`)	2018-12-05 20:14:57 +00:00
Avi Kivity	54258ca8eb	Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte " This series changes hinted handoff to work with `frozen_mutation`s instead of naked `mutation`s. Instead of unfreezing a mutation from the commitlog entry and then freezing it again for sending, now we'll just keep the read, frozen mutation. Tests: unit(release) " * 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla: db/hints/manager: Use frozen_mutation instead of mutation db/hints/manager: Use database::find_schema() db/commitlog/commitlog_entry: Allow moving the contained mutation service/storage_proxy: send_to_endpoint overload accepting frozen_mutation service/storage_proxy: Build a shared_mutation from a frozen_mutation service/storage_proxy: Lift frozen_mutation_and_schema service/storage_proxy: Allow non-const ranges in mutate_prepare() (cherry picked from commit `1891779e64`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	c9a030f1f0	storage_proxy: count number of timed out write attempts after CL is reached It is useful to have this counter to investigate the reason for read repairs. Non zero value means that writes were lost after CL is reached and RR is expected. Message-Id: <20181009120900.GF22665@scylladb.com> (cherry picked from commit `207b57a892`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	1c7daef554	storage_proxy: do not pass write_stats down to send_to_live_endpoints write_stats is referenced from write handler which is available in send_to_live_endpoints already. No need to pass it down. Message-Id: <20181009133017.GA14449@scylladb.com> (cherry picked from commit `319ece8180`)	2018-12-05 20:14:57 +00:00
Duarte Nunes	f8195a77b0	db/view/view_builder: Don't timeout waiting for view to be built Remove the timeout argument to db::view::view_builder::wait_until_built(), a test-only function to wait until a given materialized view has finished building. This change is motivated by the fact that some tests running on slow environments will timeout. Instead of incrementally increasing the timeout, remove it completely since tests are already run under an exterior timeout. Fixes #3920 Tests: unit release(view_build_test, view_schema_test) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181115173902.19048-1-duarte@scylladb.com> (cherry picked from commit `6fbf792777`)	2018-12-05 19:20:36 +00:00
Duarte Nunes	5b724c80ab	db/view: Don't copy keyspace name Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181022104527.14555-1-duarte@scylladb.com> (cherry picked from commit `f3a5ec0fd9`)	2018-12-05 19:19:26 +00:00
Nadav Har'El	4a7ae81b3f	materialized views: update stats.write statistics in all cases mutate_MV usually calls send_to_endpoint() to push view update to remote view replicas. This function gets passed a statistics object, service::storage_proxy_stats::write_stats and, in particular, updates its "writes" statistic which counts the number of ongoing writes. In the case that the paired view replica happens to be the same node, we avoid calling send_to_endpoint() and call mutate_locally() instead. That function does not take a write_stats object, so the "writes" statistic doesn't get incremented for the duration of the write. So we should do this explicitly. Co-authored-by: Nadav Har'El <nyh@scylladb.com> Co-authored-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `1d5f8d0015`)	2018-12-05 19:19:26 +00:00
Piotr Sarna	3cf26a60a2	auth: add abort_source to waiting for schema agreement When the auth service is requested to stop during bootstrap, it might have still not reached schema agreement. Currently, waiting for this agreement is done in an infinite loop, without taking abort_source into account. This patch introduces checking if abort was requested and breaking the loop in such case, so auth service can terminate. Tests: unit (release) dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test) Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com> (cherry picked from commit `7b0a3fbf8a`)	2018-12-04 14:33:05 +00:00
Tomasz Grabiec	2103d0d52b	sstables: Write Statistics.db offset map entries in the same order as Cassandra Before this patch we were writing offset map enteies in unspecified order, the one returned by std::unorderd_map. Cassandra writes them sorted by metadata_type. Use the same order for improved compatibility. Fixes #3955. Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `aa19f98d18`)	2018-12-04 14:30:19 +02:00
Avi Kivity	16ee3b3ebe	Merge "Make inactive shard readers evictable" from Botond " This series attempts to solve the regressions recently discovered in performance of multi-partition range-scans. Namely that they: * Flood the reader concurrency semaphore's queues, trampling other reads. * Behave very badly when too many of them is running concurrently (trashing). * May deadlock if enough of them is running without a timeout. The solution for these problems is to make inactive shard readers evictable. This should address all three issues listed above, to varying degrees: * Shard readers will now not cling onto their permits for the entire duration of the scan, which might be a lot of time. * Will be less affected by infinite concurrency (more than the node can handle) as each scan now can make progress by evicting inactive shard readers belonging to other scans. * Will not deadlock at all. In addition to the above fix, this series also bundles two further improvements: * Add a mechanism to `reader_concurrecy_semaphore` to be notified of newly inserted evictables. * General cleanups and fixes for `multishard_combining_reader` and `foreign_reader`. I can unbundle these mini series and send them separately, if the maintainers so prefer, altough considering that this series will have to be backported to 3.0, I think this present form is better. Fixes: #3835 " * 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits) tests/multishard_mutation_query_test: test stateless query too tests/querier_cache: fail resource-based eviction test gracefully tests/querier_cache: simplify resource-based eviction test tests/mutation_reader_test: add test_multishard_combining_reader_next_partition tests/mutation_reader_test: restore indentation tests/mutation_reader_test: enrich pause-related multishard reader test multishard_combining_reader: use pause-resume API query::partition_slice: add clear_ranges() method position_in_partition: add region() accessor foreign_reader: add pause-resume API tests/mutation_reader_test: implement the pause-resume API query_mutations_on_all_shards(): implement pause-resume API make_multishard_streaming_reader(): implement the pause-resume API database: add accessors for user and streaming concurrency semaphores reader_lifecycle_policy: extend with a pause-resume API query_mutations_on_all_shards(): restore indentation query_mutations_on_all_shards(): simplify the state-machine multishard_combining_reader: use the reader lifecycle policy multishard_combining_reader: add reader lifecycle policy multishard_combining_reader: drop unnecessary `reader_promise` member ... (cherry picked from commit `414b14a6bd`)	2018-12-04 12:13:13 +02:00
Duarte Nunes	b0a9c40ab1	service/storage_proxy: Consider target liveness in sent_to_endpoint() So we don't attempt to send mutations to unreachable endpoints and instead store a hint for them, we now check the endpoint status and populate dead_endpoints accordingly in storage_proxy::send_to_endpoint(). Fixes #3820 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181007100640.2182-1-duarte@scylladb.com> (cherry picked from commit `30d6ed8f92`)	2018-12-03 18:38:05 +00:00
Duarte Nunes	53924e5c7f	service/storage_proxy: Fix formatting of send_to_endpoint() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181006204756.32232-1-duarte@scylladb.com> (cherry picked from commit `a69d468101`)	2018-12-03 18:37:59 +00:00
Avi Kivity	befe0012f5	Merge "Fix multiple summary regeneration bugs." from Vladimir " This patchset addresses two recently discovered bugs both triggered by summary regeneration: Tests: unit {release} + Validated with debug build of Scylla (ASAN) that no use-after-free occurs when re-generating Summary.db. " * 'projects/sstables-30/summary-regeneration/v1' of https://github.com/argenet/scylla: tests: Add test reading SSTables in 'mc' format with missing summary. sstables: When loading, read statistics before summary. database: Capture io_priority_class by reference to avoid dangling ref. (cherry picked from commit `009cbd3dcb`)	2018-12-02 13:32:09 +02:00
Duarte Nunes	1953c5fa61	Merge 'Fix filtering with LIMIT' from Piotr " This series adds proper handling of filtering queries with LIMIT. Previously the limit was erroneously applied before filtering, which leads to truncated results. To avoid that, paged filtering queries now use an enhanced pager, which remembers how many rows dropped and uses that information to fetch for more pages if the limit is not yet reached. For unpaged filtering queries, paging is done internally as in case of aggregations to avoid returning keeping huge results in memory. Also, previously, all limited queries used the page size counted from max(page size, limit). It's not good for filtering, because with LIMIT 1 we would then query for rows one-by-one. To avoid that, filtered queries ask for the whole page and the results are truncated if need be afterwards. Tests: unit (release) " * 'fix_filtering_with_limit_2' of https://github.com/psarna/scylla: tests: add filtering with LIMIT test tests: split filtering tests from cql_query_test cql3: add proper handling of filtering with LIMIT service/pager: use dropped_rows to adjust how many rows to read service/pager: virtualize max_rows_to_fetch function cql3: add counting dropped rows in filtering pager (cherry picked from commit `1afda28cf3`)	2018-12-02 12:07:46 +02:00
Duarte Nunes	b72a94b53e	Merge 'Fix checking if system tables need view updates' from Piotr " This miniseries ensures that system tables are not checked for having view updates, because they never do. What's more, distributed system table is used in the process, so it's unsafe to query the table while streaming it. Tests: unit (release), dtest(update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test) " * 'fix_checking_if_system_tables_need_view_updates_3' of https://github.com/psarna/scylla: streaming: don't check view building of system tables database: add is_internal_keyspace streaming: remove unused sstable_is_staging bool class (cherry picked from commit `d09d4bbd91`)	2018-11-28 15:39:34 +00:00
Piotr Sarna	3f82b697f2	main: fix deinitialization order for view update generator View update generator should be stopped only after drain_on_shutdown() is performed on storage service. Message-Id: <4d2bda4c73422a2ebf46d6dcd06c95d960839889.1543230849.git.sarna@scylladb.com> (cherry picked from commit `6ab8235369`)	2018-11-27 12:34:50 +00:00
Takuya ASADA	ee1ef853e5	dist/common/systemd/scylla-housekeeping-restart.service.mustache: specify correct repo for Debian variants We do specify correct repo for both Red Hat/Debian variants on -deily, but mistakenly don't for -restart, so do same on -restart. Fixes #3906 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181109224509.27380-1-syuu@scylladb.com> (cherry picked from commit `7740cd2142`)	2018-11-27 09:59:05 +02:00
Raphael S. Carvalho	6e7e7f3822	sstables: deprecate sstable metadata's ancestors The reason for that is that it's not available in sstable format mc, so we can no longer rely on it in common code for the currently supported formats. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com> (cherry picked from commit `d29482dce8`)	2018-11-24 12:36:40 +02:00
Paweł Dziepak	82a36edc9d	Merge "Optimize sstable writing of the MC format" from Tomasz " Tested with perf_fast_forward from: github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1 Using the following command line: build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \ --data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \ --datasets small-part The average reported flush throughput was (stdev for the avergages is around 4k): - for mc before the series: 367848 frag/s - for lc before the series: 463458 frag/s (= mc.before +25%) - for mc after the series: 429276 frag/s (= mc.before +16%) - for lc after the series: 466495 frag/s (= mc.before +26%) Refs #3874. " * tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla: sstables: mc: Avoid serialization of promoted index when empty sstables: mc: Avoid double serialization of rows tests: sstable 3.x: Do not compare Statistics component utils: Introduce memory_data_sink schema: Optimize column count getters sstables: checksummed_file_data_sink_impl: Bypass output_stream (cherry picked from commit `4aa5d83590`)	2018-11-24 12:36:40 +02:00
Avi Kivity	d4efa3c9b2	Update seastar submodule * seastar d6647df...880826e (1): > fstream: Introduce make_file_data_sink()	2018-11-24 12:36:40 +02:00
Avi Kivity	324dae3e12	Merge "compress: Restore lz4 as default compressor" from Duarte " Enables sstable compression with LZ4 by default, which was the long-time behavior until a regression turned off compression by default. Fixes #3926 " * 'restore-default-compression/v2' of https://github.com/duarten/scylla: tests/cql_query_test: Assert default compression options compress: Restore lz4 as default compressor tests: Be explicit about absence of compression (cherry picked from commit `bb85a21a8f`)	2018-11-21 16:45:22 +02:00

1 2 3 4 5 ...

16739 Commits