scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 14:33:08 +00:00

Author	SHA1	Message	Date
Botond Dénes	c899191ad5	reader_concurrency_semaphore: use the correct types in the constructor Previously there was a type mismatch for `count` and `memory`, between the actual type used to store them in the class (signed) and the type of the parameters in the constructor (unsigned). Although negative numbers are completely valid for these members, initializing them to negative numbers don't make sense, this is why they used unsigned types in the constructor. This restriction can backfire however when someone intends to give these parameters the maximum possible value, which, when interpreted as a signed value will be `-1`. What's worse the caller might not even be aware of this unsigned->signed conversion and be very suprised when they find out. So to prevent surprises, expose the real type of these members, trusting the clients of knowing what they are doing. Also add a `no_limits` constructor, so clients don't have to make sure they don't overflow internal types. (cherry picked from commit `e1d8237e6b`)	2018-12-18 14:34:33 +02:00
Botond Dénes	a3563e5f7d	reader_concurrency_semaphore: add consume_resources() (cherry picked from commit `dfd649a6b4`)	2018-12-18 14:34:33 +02:00
Botond Dénes	78c5b09694	reader_concurrency_semaphore::inactive_read_handle: add operator bool() (cherry picked from commit `21b44adbfe`)	2018-12-18 14:34:33 +02:00
Avi Kivity	46efc08882	Update seastar submodule * seastar 6700dc3...08f1258 (1): > reactor: disable nowait aio due to a kernel bug Fixes #3996.	2018-12-17 17:00:14 +02:00
Avi Kivity	c95433c967	Update seastar submodule * seastar 1651a2a...6700dc3 (3): > build: link against libatomic > core/semaphore: Allow combining semaphore_units() > core/shared_ptr: Allow releasing a lw_shared_ptr to a non-const object Fixes #3996.	2018-12-17 15:53:01 +02:00
Piotr Sarna	df3b6fb4a8	cql3: refuse to create index on COMPACT STORAGE with ck To follow C* compatibility, creating an index on COMPACT STORAGE table should be disallowed not only on base primary keys, but also when the base table contains clustering keys. Message-Id: <ab40c39730aff2e164d11ee5159ff62b8ec9e8e8.1544698186.git.sarna@scylladb.com> (cherry picked from commit `6743af5dbd`)	2018-12-17 09:45:43 +02:00
Piotr Sarna	44ee43bb17	cql3: add refusing to create an index on static column Secondary indexes on static columns are not yet supported, so creating such index should return an appropriate error. Fixes #3993 Message-Id: <700b0a71e80da52d2d5250edacc12626b55681fa.1544785127.git.sarna@scylladb.com> (cherry picked from commit `63bd43e57e`)	2018-12-17 09:44:52 +02:00
Asias He	aac363ca86	storage_service: Notify NEW_NODE only when a node is new node This is a backport of CASSANDRA-11038. Before this, a restarted node will be reported as new node with NEW_NODE cql notification. To fix, only send NEW_NODE notification when the node was not part of the cluster Fixes: #3979 Tests: pushed_notifications_test.py:TestPushedNotifications.restart_node_test Message-Id: <453d750b98b5af510c4637db25b629f07dd90140.1544583244.git.asias@scylladb.com> (cherry picked from commit `71c1681f6c`)	2018-12-16 13:59:19 +02:00
Duarte Nunes	9e6cc5b024	service/storage_proxy: Embed the expire timer in the response handler Embedding the expire timer for a write response in the abstract_write_response_handler simplifies the code as it allows removing the rh_entry type. It will also make the timeout easily accessible inside the handler, for future patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181213111818.39983-1-duarte@scylladb.com> (cherry picked from commit `f8878238ed`)	2018-12-13 13:24:09 +00:00
Duarte Nunes	13b72c7b92	Merge branch 'gossip: Send node UP event to cql client after cql server is up' from Asias " This is a backport of CASSANDRA-8236. Before this patch, scylla sends the node UP event to cql client when it sees a new node joins the cluster, i.e., when a new node's status becomes NORMAL. The problem is, at this time, the cql server might not be ready yet. Once the client receives the UP event, it tries to connect to the new node's cql port and fails. To fix, a new application_sate::RPC_READY is introduced, new node sets RPC_READY to false when it starts gossip in the very beginning and sets RPC_READY to true when the cql server is ready. The RPC_READY is a bad name but I think it is better to follow Cassandra. Nodes with or without this patch are supposed to work together with no problem. Refs #3843 " * 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev: storage_service: Use cql_ready facility storage_service: Handle application_state::RPC_READY storage_service: Add notify_cql_change storage_service: Add debug log in notify_joined storage_service: Add extra check in notify_joined storage_service: Add notify_joined storage_service: Add debug log in notify_up storage_service: Add extra check in notify_up storage_service: Add notify_up storage_service: Make notify_left log debug level storage_service: Introduce notify_left storage_service: Add debug log in notify_down storage_service: Introduce notify_down storage_service: Add set_cql_ready gossip: Add gossiper::is_cql_ready gms: Add endpoint_state::is_cql_ready gms: Add application_state::RPC_READY gms: Introduce cql_ready in versioned_value (cherry picked from commit `a42b2895c2`)	2018-12-13 12:06:59 +00:00
Avi Kivity	6b011fbe0a	build: pass C compiler configuration in dist package build Just like we allow customizing the C++ compiler, we should allow customizing the C compiler. Ref #3978 Message-Id: <20181211172821.30830-1-avi@scylladb.com> (cherry picked from commit `fa96e07e6b`) scylla-3.0.rc2	2018-12-12 14:41:38 +02:00
Tomasz Grabiec	9dd4e1b01f	sstables: index_reader: Avoid schema copy in advance_to() Introduced in `7e15e43`. Exposed by perf_fast_forward: running: large-partition-skips on dataset large-part-ds1 Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s (...) 1 0 5.268780 8000000 1518378 1 1 31.695985 4000000 126199 Message-Id: <1544614272-21970-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `0a853b8866`)	2018-12-12 14:38:49 +02:00
Nadav Har'El	e91c741ef5	secondary indexes: fail attempts to create a CUSTOM INDEX Cassandra supports a "CREATE CUSTOM INDEX" to create a secondary index with a custom implementation. The only custom implementation that Cassandra supports is SASI. But Scylla doesn't support this, or any other custom index implementation. If a CREATE CUSTOM INDEX statement is used, we shouldn't silently ignore the "CUSTOM" tag, we should generate an error. This patch also includes a regression test that "CREATE CUSTOM INDEX" statements with valid syntax fail (before this patch, they succeeded). Fixes #3977 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181211224545.18349-2-nyh@scylladb.com> (cherry picked from commit `a0379209e6`)	2018-12-12 00:32:35 +00:00
Nadav Har'El	b18e9e115d	Fix typo in error message Interestingly, this typo was copied from the original Cassandra source code :-) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181211224545.18349-1-nyh@scylladb.com> (cherry picked from commit `36db4fba23`)	2018-12-12 00:32:35 +00:00
Avi Kivity	0b86ab0d2a	build: build libdeflate with user selected C compiler If the user specified a C compiler, use it to build libdeflate. Fixes #3978. Message-Id: <20181211145604.14847-1-avi@scylladb.com> (cherry picked from commit `34a31a807d`)	2018-12-11 19:24:24 +02:00
Duarte Nunes	97cd9108d6	db/system_distributed_keyspace: Create the schema with min_timestamp Different nodes can concurrently create the distributed system keyspace on boot, before the "if not exists" clause can take effect. However, the resulting schema mutations will be different since different nodes use different timestamps. This patch forces the timestamps to be the same across all nodes, so we save some schema mismatches. This fixes a bug exposed by `ca5dfdf`, whereby the initialization of the distributed system keyspace is done before waiting for schema agreement. While waiting for schema agreement in storage_service::join_token_ring(), the node still hasn't joined the ring and schemas can't be pulled from it, so nodes can deadlock. A similar situation can happen between a seed node and a non-seed node, where the seed node progresses to a different "wait for schema agreement" barrier, but still can't make progress because it can't pull the schema from the non-seed node still trying to join the ring. Finally, it is assumed that changes to the schema of the current distributed system keyspace tables will be protected by a cluster feature and a subsequent schema synchronization, such that all nodes will be at a point where schemas can be transferred around. Fixes #3976 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181211113407.20075-1-duarte@scylladb.com> (cherry picked from commit `89ae3fbf11`)	2018-12-11 14:53:30 +00:00
Hagit Segev	f81fe96b0b	release: prepare for 3.0-rc2	2018-12-11 12:32:34 +02:00
Avi Kivity	91ce3a7957	sstables: fix overflow in clustering key blocks header bit access _ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too. Otherwise, a shift in the range 32-63 will produce wrong results. Fix by using a 64-bit mask. Found by Fedora 29's ubsan. Fixes #3973. Message-Id: <20181209120549.21371-1-avi@scylladb.com> (cherry picked from commit `7c7da0b462`)	2018-12-10 14:10:27 +02:00
Takuya ASADA	af7e58f4c5	dist/offline_installer/redhat: fix missing dependencies Offline installer with Scylla 3.0 causes dependency error on CentOS, added missing packages. Fixes #3969 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181207020711.23055-1-syuu@scylladb.com> (cherry picked from commit `a2d0ebf4d9`)	2018-12-10 14:10:15 +02:00
Amos Kong	bd3373b511	scylla_setup: only ask for nic in interactive mode Current scylla_setup still asks for nic even nic is already assigned in cmdline. Fixes #3908 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <6b867e17a5583c495c771a37d5fa1e8366b1d61b.1542337635.git.amos@scylladb.com> (cherry picked from commit `09a3b11c2f`)	2018-12-09 19:26:34 +02:00
Gleb Natapov	4820130abe	storage_proxy: fix crash during write timeout callback invocation rh_entry address is captured inside timeout's callback lambda, so the structure should not be moved after it is created. Change the code to create rh_entry in-place instead of moving it into the map. Fixes #3972. Message-Id: <20181206164043.GN25283@scylladb.com> (cherry picked from commit `9fb79bf379`)	2018-12-09 15:25:52 +02:00
Tomasz Grabiec	9b299241e5	Merge "Fixes for collecting stats in SST3 + more tests" from Vladimir This patchset fixes several remaining issues found during thorough testing of SSTables 3.x statistics and enriches ~30 unit tests with statistics validation against Cassandra-generated golden copies. * https://github.com/argenet/scylla/tree/projects/sstables-30/sst3-tests-statistics/v1: sstables: Enforce estimated_partitions in generate_summary() to be always positive. sstables: Don't enforce default max_local_deletion_time value for 'mc' files. sstables: Update TTL/local deletion stats for non-expiring and live liveness_info. sstables: Collect statistics when writing RT markers to SSTables 3.x. tests: Return sstable_assertions from validate_read() helper. tests: Introduce helper for validating stats metadata in SSTables 3.x tests. tests: Add stats metadata validation to test_write_static_row. tests: Add stats metadata validation to test_write_composite_partition_key. tests: Add stats metadata validation to test_write_composite_clustering_key. tests: Add stats metadata validation to test_write_wide_partitions. tests: Add stats metadata validation to write_ttled_row tests: Add stats metadata validation to write_ttled_column tests: Add stats metadata validation to write_deleted_column tests: Add stats metadata validation to write_deleted_row tests: Add stats metadata validation to write_collection_wide_update tests: Add stats metadata validation to write_collection_incremental_update tests: Add stats metadata validation to write_multiple_partitions tests: Add stats metadata validation to write_multiple_rows tests: Add stats metadata validation to write_missing_columns_large_set tests: Add stats metadata validation to write_different_types tests: Add stats metadata validation to write_empty_clustering_values tests: Add stats metadata validation to write_large_clustering_key tests: Add stats metadata validation to write_compact_table tests: Add stats metadata validation to write_user_defined_type_table tests: Add stats metadata validation to write_simple_range_tombstone tests: Add stats metadata validation to write_adjacent_range_tombstones tests: Add stats metadata validation to write_non_adjacent_range_tombstones tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows tests: Add stats metadata validation to write_range_tombstone_same_start_with_row tests: Add stats metadata validation to write_range_tombstone_same_end_with_row tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests. (cherry picked from commit `bb24d378b2`)	2018-12-08 14:08:46 +02:00
Avi Kivity	745a98e151	Merge "Fix deadlocking multishard readers" from Botond " Multishard combining readers, running concurrently, with limited concurrency and no timeout may deadlock, due to inactive shard readers sitting on permits. To avoid this we have to make sure that all shard readers belonging to a multishard combining readers, that are not currently active, can be evicted to free up their permits, ensuring that all readers can make progress. Making inactive shard readers evictable is the solution for this problem, however the original series introducing this solution (`414b14a6bd`) did not go all they way and left some loose ends. These loose ends are tied up by this mini-series. Namely, two issues remained: * The last reader to reach EOS was not paused (made evictable). * Readers created/resumed as part of a read-ahead were not paused immediately after finishing the read-ahead. This series fixes both of these. Fixes: #3865 Tests: unit(release, debug) " * 'fix-multishard-reader-deadlock/v1' of https://github.com/denesb/scylla: multishard_combining_reader: pause readers after reading ahead multishard_combining_reader: pause all EOS'd readers (cherry picked from commit `21b4b2b9a1`)	2018-12-08 14:08:46 +02:00
Avi Kivity	b9c99af18b	Merge "Fix tombstone histogram when writing SSTables 3.x" from Vladimir " This patchset extends a number of existing tests to check SSTables statistics for 'mc' format and fixes an issue discovered with the help of one of the tests. Tests: unit {release} " * 'projects/sstables-30/check-stats/v2' of https://github.com/argenet/scylla: tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions. tests: Run sstable_tombstone_histogram_test for all SSTables versions. tests: Run min_max_clustering_key_test on all SSTables versions. tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions. tests: Run test_sstable_max_local_deletion_time on all SSTables versions. tests: Extend test checking tombstones histogram to cover all SSTables versions. sstables: Properly track row-level tombstones when writing SSTables 3.x. tests: Run min_max_clustering_key_test_2 for all SSTables versions. tests: Make reusable_sst() helper accept SSTables version parameter. (cherry picked from commit `f073ea5f87`)	2018-12-08 14:08:44 +02:00
Asias He	cded9c7ac7	gossip: Fix race in real_mark_alive and shutdown msg In dtest, we have self.check_rows_on_node(node1, 2000) self.check_rows_on_node(node2, 2000) which introduce the following cluster operations: 1) Initially: - node1 up - node2 up 2) self.check_rows_on_node(node1, 2000) - node2 down - node2 up (A: node2 will call gossiper::real_mark_alive when node2 boots up to mark node1 up) 3) self.check_rows_on_node(node2, 2000) - node1 down (B: node1 will send shutdown gossip message to node2, node2 will mark node1 down) - node1 up (C: when node1 is up, node2 will call gossiper::real_mark_alive) Since there is no guarantee the order of Operation A and Operation B, it is possible node2 will mark node1 as status=shutdown and mark node1 is UP. In Operation C, node2 will call gossiper::real_mark_alive to mark node1 up, but since node2 might think node1 is already up, node2 will exit early in gossiper::real_mark_alive and not log "InetAddress 127.0.0.1 is now UP, status={}" As a result, dtest fails to see node2 reports node1 is up when it boots node1 and fail the test. TimeoutError: 23 Nov 2018 10:44:19 [node2] Missing: ['127.0.0.1.* now UP'] In the log we can see node1 marked as DOWN and UP almost at the same time on node2: INFO 2018-11-23 22:31:29,999 [shard 0] gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown INFO 2018-11-23 22:31:30,006 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = shutdown Fixes #3940 Tests: dtest with 20 consecutive succesful runs Message-Id: <996dc325cbcc3f94fc0b7569217aa65464eaaa1c.1543213511.git.asias@scylladb.com> (cherry picked from commit `eeeb2da7bb`)	2018-12-08 13:42:43 +02:00
Gleb Natapov	4acfc5ed8f	hints: make hints manager more resilient to unexpected directory content Currently if hints directory contains unexpected directories Scylla fails to start with unhandled std::invalid_argument exception. Make the manager ignore malformed files instead and try to proceed anyway. Message-Id: <20181121134618.29936-2-gleb@scylladb.com> (cherry picked from commit `b4a8802edc`)	2018-12-08 13:42:43 +02:00
Gleb Natapov	cb9199bc7f	hints: add auxiliary function for scanning high level hints directory We scan hints directory in two places: to search for files to replay and to search for directories to remove after resharding. The code that translates directory name to a shard is duplicated. It is simple now, so not a bit issue but in case it grows better have it in one place. Message-Id: <20181121134618.29936-1-gleb@scylladb.com> (cherry picked from commit `9433d02624`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	695ff5383f	Merge "Correct the usage of row ttl and add write-read test" from Piotr Fixes the condition which determines whether a row ttl should be used for a cell and adds a test that uses each generated mutation to populate mutation source and then verifies that it can read back the same mutation. * seastar-dev.git haaawk/sst3/write-read-test/v3: Fix use_row_ttl condition Add test_all_data_is_read_back (cherry picked from commit `b8c405c019`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	730e48bf60	configure.py: Always add a rule for building gen_crc_combine_table Fixes a build failure when only the scylla binary was selected for building like this: ./configure.py --with scylla In this case the rule for gen_crc_combine_table was missing, but it is needed to build crc_combine_table.o Message-Id: <1544010138-21282-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `edbef7400b`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	af6d4f40e1	utils/gz: Fix compilation on non-x86 archs gen_crc_combine_table is now executed on every build, so it should not fail on unsupported archs. The generated file will not contain data, but this is fine since it should not be used. Another problem is that u32 and u64 aliases were not visible in the #else branch in crc_combine.cc Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `9a4c00beb7`)	2018-12-08 13:42:43 +02:00
Avi Kivity	9d8507de09	Merge "Optimize checksum_combine() for CRC32" from Tomek " zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977). Refs #3874 " * tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla: tests: perf_checksum: Test fast_crc32_combine() tests: Rename libdeflate_test to checksum_utils_test tests: libdeflate: Add more tests for checksum_combine() tests: libdeflate: Check both libdeflate and default checksummers sstables: Use fast_crc_combine() in the default checksummer utils/gz: Add fast implementation of crc32_combine() utils/gz: Add pre-computed polynomials utils/gz: Import Barett reduction implementation from libdeflate utils: Extract clmul() from crc.hh (cherry picked from commit `b098b5b987`)	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	07c980845d	utils/crc: Add clmul_u32() implementation Needed for backporting dependent changes. Extracted from: commit `79136e895f` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Thu Nov 1 03:26:16 2018 +0000 utils/crc: calculate crc in parallel	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	c52b8239d0	configure.py: Compile against Westmere on x86 Needed for backporting dependent changes. Extracted from: commit `79136e895f` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Thu Nov 1 03:26:16 2018 +0000 utils/crc: calculate crc in parallel	2018-12-08 13:42:43 +02:00
Tomasz Grabiec	5a07a4fac8	configure.py: Use armv8-a+crc+crypto ISA on aarch64 Needed for backporting dependent changes. Extracted from: commit `1c48e3fbec` Author: Yibo Cai (Arm Technology China) <Yibo.Cai@arm.com> Date: Mon Oct 29 02:58:19 2018 +0000 utils/crc: leverage arm64 crc extension	2018-12-08 13:42:43 +02:00
Avi Kivity	b9c046b17b	Merge "Optimize checksum computation for the MC sstable format" from Tomek " One part of the improvement comes from replacing zlib's CRC32 with the one from libdeflate, which is optimized for modern architecture and utilizes the PCLMUL instruction. perf_checksum test was introduced to measure performance of various checksumming operations. Results for 514 B (relevant for writing with compression enabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 58414 16.711us 3.483ns 16.708us 16.725us crc_test.perf_adler_combine 165788278 6.059ns 0.031ns 6.027ns 7.519ns crc_test.perf_zlib_crc32_combine 59546 16.767us 26.191ns 16.741us 16.801us --- crc_test.perf_deflate_crc32_checksum 12705072 83.267ns 4.580ns 78.687ns 98.964ns crc_test.perf_adler_checksum 3918014 206.701ns 23.469ns 183.231ns 258.859ns crc_test.perf_zlib_crc32_checksum 2329682 428.787ns 0.085ns 428.702ns 510.085ns Results for 64 KB (relevant for writing with compression disabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 25364 38.393us 17.683ns 38.375us 38.545us crc_test.perf_adler_combine 169797143 5.842ns 0.009ns 5.833ns 6.901ns crc_test.perf_zlib_crc32_combine 26067 38.663us 95.094ns 38.546us 40.523us --- crc_test.perf_deflate_crc32_checksum 202821 4.937us 14.426ns 4.912us 5.093us crc_test.perf_adler_checksum 44684 22.733us 206.263ns 22.492us 25.258us crc_test.perf_zlib_crc32_checksum 18839 53.049us 36.117ns 53.013us 53.274us The new CRC32 implementation (deflate_crc32) doesn't provide a fast checksum_combine() yet, it delegates to zlib so it's as slow as the latter. Because for CRC32 checksum_combine() is several orders of magnitude slower than checksum(), we avoid calling checksum_combine() completely for this checksummer. We still do it for adler32, which has combine() which is faster than checksum(). SStable write performance was evaluated by running: perf_fast_forward --populate --data-directory /tmp/perf-mc \ --rows=10000000 -c1 -m4G --datasets small-part Below is a summary of the average frag/s for a memtable flush. Each result is an average of about 20 flushes with stddev of about 4k. Before: [1] MC,lz4: 330'903 [2] LA,lz4: 450'157 [3] MC,checksum: 419'716 [4] LA,checksum: 459'559 After: [1'] MC,lz4: 446'917 ([1] + 35%) [2'] LA,lz4: 456'046 ([2] + 1.3%) [3'] MC,checksum: 462'894 ([3] + 10%) [4'] LA,checksum: 467'508 ([4] + 1.7%) After this series, the performance of the MC format writer is similar to that of the LA format before the series. There seems to be a small but consistent improvement for LA too. I'm not sure why. " * tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla: tests: perf: Introduce perf_checksum tests: Add test for libdeflate CRC32 implementation sstables: compress: Use libdeflate for crc32 sstables: compress: Rename crc32_utils to zlib_crc32_checksummer licenses: Add libdeflate license Integrate libdeflate with the build system Add libdeflate submodule sstables: Avoid checksum_combine() for the crc32 checksummer sstables: compress: Avoid unnecessary checksum_combine() sstables: checksum_utils: Add missing include (cherry picked from commit `5e759b0c07`)	2018-12-08 13:42:43 +02:00
Avi Kivity	979cb636b8	Update seastar submodule * seastar e64281d...1651a2a (1): > tests: perf: Make do_not_optimize() take the argument by const&	2018-12-08 13:42:43 +02:00
Botond Dénes	59cf9d9070	querier: fix evict_one() and evict_all_for_table() Both of these have the same problem. They remove the to-be-evicted entries from `_entries` but they don't unregister the `entry` from the `read_concurrency_semaphore`. This results in the `reader_concurrency_semaphore` being left with a dangling pointer to the entries will trigger segfault when it tries to evict the associated inactive reads. Also add a unit test for `evict_all_for_table()` to check that it works properly (`evict_one()` is only used in tests, so no dedicated test for it). Fixes: #3962 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <57001857e3791c6385721b624d33b667ccda2e7d.1544010868.git.bdenes@scylladb.com> (cherry picked from commit `77dbc7d09a`)	2018-12-06 11:38:44 +02:00
Duarte Nunes	c9ec9d4087	Merge seastar upstream * seastar 880826e...e64281d (2): > core/semaphore: Change the access of semaphore_units main ctor > Merge "Add semaphore_units<>::split() function" from Duarte Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-05 20:25:17 +00:00
Gleb Natapov	2e8fefbc5a	storage_proxy: store hint for CL=ANY if all nodes replied with failure Current code assumes that request failed if all replicas replied with failure, but this is not true for CL=ANY requests. Take it into account. Fixed: #3565 (cherry picked from commit `17197fb005`)	2018-12-05 20:14:58 +00:00
Gleb Natapov	6be0635029	storage_proxy: complete write request early if all replicas replied with success of failure Currently if write request reaches CL and all replicas replied, but some replied with failures, the request will wait for timeout to be retired. Detect this case and retire request immediately instead. Fixes #3566 (cherry picked from commit `d1d04eae3c`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	04a544c0a2	storage_proxy: check that write failure response comes from recognized replica Before accounting failure response we need to make sure it comes from a replica that participates in the request. (cherry picked from commit `76ab3d716b`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	028f9b95d1	storage_proxy: move code executed on write timeout into separate function Currently the callback is in lambda, but we will want to call the code not only during timer expiration. (cherry picked from commit `7bc68aa0eb`)	2018-12-05 20:14:57 +00:00
Avi Kivity	54258ca8eb	Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte " This series changes hinted handoff to work with `frozen_mutation`s instead of naked `mutation`s. Instead of unfreezing a mutation from the commitlog entry and then freezing it again for sending, now we'll just keep the read, frozen mutation. Tests: unit(release) " * 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla: db/hints/manager: Use frozen_mutation instead of mutation db/hints/manager: Use database::find_schema() db/commitlog/commitlog_entry: Allow moving the contained mutation service/storage_proxy: send_to_endpoint overload accepting frozen_mutation service/storage_proxy: Build a shared_mutation from a frozen_mutation service/storage_proxy: Lift frozen_mutation_and_schema service/storage_proxy: Allow non-const ranges in mutate_prepare() (cherry picked from commit `1891779e64`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	c9a030f1f0	storage_proxy: count number of timed out write attempts after CL is reached It is useful to have this counter to investigate the reason for read repairs. Non zero value means that writes were lost after CL is reached and RR is expected. Message-Id: <20181009120900.GF22665@scylladb.com> (cherry picked from commit `207b57a892`)	2018-12-05 20:14:57 +00:00
Gleb Natapov	1c7daef554	storage_proxy: do not pass write_stats down to send_to_live_endpoints write_stats is referenced from write handler which is available in send_to_live_endpoints already. No need to pass it down. Message-Id: <20181009133017.GA14449@scylladb.com> (cherry picked from commit `319ece8180`)	2018-12-05 20:14:57 +00:00
Duarte Nunes	f8195a77b0	db/view/view_builder: Don't timeout waiting for view to be built Remove the timeout argument to db::view::view_builder::wait_until_built(), a test-only function to wait until a given materialized view has finished building. This change is motivated by the fact that some tests running on slow environments will timeout. Instead of incrementally increasing the timeout, remove it completely since tests are already run under an exterior timeout. Fixes #3920 Tests: unit release(view_build_test, view_schema_test) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181115173902.19048-1-duarte@scylladb.com> (cherry picked from commit `6fbf792777`)	2018-12-05 19:20:36 +00:00
Duarte Nunes	5b724c80ab	db/view: Don't copy keyspace name Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181022104527.14555-1-duarte@scylladb.com> (cherry picked from commit `f3a5ec0fd9`)	2018-12-05 19:19:26 +00:00
Nadav Har'El	4a7ae81b3f	materialized views: update stats.write statistics in all cases mutate_MV usually calls send_to_endpoint() to push view update to remote view replicas. This function gets passed a statistics object, service::storage_proxy_stats::write_stats and, in particular, updates its "writes" statistic which counts the number of ongoing writes. In the case that the paired view replica happens to be the same node, we avoid calling send_to_endpoint() and call mutate_locally() instead. That function does not take a write_stats object, so the "writes" statistic doesn't get incremented for the duration of the write. So we should do this explicitly. Co-authored-by: Nadav Har'El <nyh@scylladb.com> Co-authored-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `1d5f8d0015`)	2018-12-05 19:19:26 +00:00
Piotr Sarna	3cf26a60a2	auth: add abort_source to waiting for schema agreement When the auth service is requested to stop during bootstrap, it might have still not reached schema agreement. Currently, waiting for this agreement is done in an infinite loop, without taking abort_source into account. This patch introduces checking if abort was requested and breaking the loop in such case, so auth service can terminate. Tests: unit (release) dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test) Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com> (cherry picked from commit `7b0a3fbf8a`)	2018-12-04 14:33:05 +00:00
Tomasz Grabiec	2103d0d52b	sstables: Write Statistics.db offset map entries in the same order as Cassandra Before this patch we were writing offset map enteies in unspecified order, the one returned by std::unorderd_map. Cassandra writes them sorted by metadata_type. Use the same order for improved compatibility. Fixes #3955. Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `aa19f98d18`)	2018-12-04 14:30:19 +02:00

1 2 3 4 5 ...

16751 Commits