scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 07:23:15 +00:00

Author	SHA1	Message	Date
Duarte Nunes	d757c87107	cql3/query_processor: Remove prepared statements upon dropping a view Fixes #3198 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180209143652.31852-1-duarte@scylladb.com>	2018-02-09 16:30:28 +00:00
Avi Kivity	432268f582	Merge "branch 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla" from Raphael "The motivation is that it's no longer needed after new resharding algorithm that is the sole responsible for working with shared sstables and regular compaction will not work with those! So resharding will schedule deletion of shared sstables once it's certain that shards that own them have the new unshared sstables. The manager was needed for orchestrating deletion of shared sstable across shards. It brings extra complexity that's not longer needed, and it was also overloading shard 0, but the latter could have been fixed. Tests: - unit: release mode - dtest: resharding_test.py" * 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla: Remove SSTable's atomic deletion manager Stop using SSTable's atomic deletion manager database: split column_family::rebuild_sstable_list	2018-02-08 19:10:16 +02:00
Duarte Nunes	456b678e0b	database.hh: Fix data query stage argument type Fixes a merge gone wrong. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180208163338.25238-1-duarte@scylladb.com>	2018-02-08 16:35:10 +00:00
Avi Kivity	404172652e	Merge "Use xxHash for digest instead of MD5" from Duarte "This series changes digest calculation to use a faster algorithm (xxHash) and to also cache calculated cell hashes that can be kept in memory to speed up subsequent digest requests. The MD5 hash function has proved to be slow for large cell values: size = 256; elapsed = 4us size = 512; elapsed = 8us size = 1024; elapsed = 14us size = 2048; elapsed = 21us size = 4096; elapsed = 33us size = 8192; elapsed = 51us size = 16384; elapsed = 86us size = 32768; elapsed = 150us size = 65536; elapsed = 278us size = 131072; elapsed = 531us size = 262144; elapsed = 1032us size = 524288; elapsed = 2026us size = 1048576; elapsed = 4004us size = 2097152; elapsed = 7943us size = 4194304; elapsed = 15800us size = 8388608; elapsed = 31731us size = 16777216; elapsed = 64681us size = 33554432; elapsed = 130752us size = 67108864; elapsed = 263154us The xxHash is a non-cryptographic, 64bit (there's work in progress on the 128 version) hash that can be used to replace MD5. It performs much better: size = 256; elapsed = 2us size = 512; elapsed = 1us size = 1024; elapsed = 1us size = 2048; elapsed = 2us size = 4096; elapsed = 2us size = 8192; elapsed = 3us size = 16384; elapsed = 5us size = 32768; elapsed = 8us size = 65536; elapsed = 14us size = 131072; elapsed = 28us size = 262144; elapsed = 59us size = 524288; elapsed = 116us size = 1048576; elapsed = 226us size = 2097152; elapsed = 456us size = 4194304; elapsed = 935us size = 8388608; elapsed = 1848us size = 16777216; elapsed = 4723us size = 33554432; elapsed = 10507us size = 67108864; elapsed = 21622us Performance was tested using a 3 node cluster with 1 cpu and 8GB, and with the following cassandra-stress loaders. Measurements are for the read workload. sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=5000000 -schema 'replication(factor=3)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100 sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..5000000,5000000,500000)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100 xxhash + caching: Results: op rate : 32699 [READ:32699] partition rate : 32699 [READ:32699] row rate : 32699 [READ:32699] latency mean : 3.0 [READ:3.0] latency median : 3.0 [READ:3.0] latency 95th percentile : 3.9 [READ:3.9] latency 99th percentile : 4.5 [READ:4.5] latency 99.9th percentile : 6.6 [READ:6.6] latency max : 24.0 [READ:24.0] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:05:05 END md5: Results: op rate : 25241 [READ:25241] partition rate : 25241 [READ:25241] row rate : 25241 [READ:25241] latency mean : 3.9 [READ:3.9] latency median : 3.9 [READ:3.9] latency 95th percentile : 5.1 [READ:5.1] latency 99th percentile : 5.8 [READ:5.8] latency 99.9th percentile : 8.0 [READ:8.0] latency max : 24.8 [READ:24.8] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:06:36 END This translates into a 21% improvoment for this workload. Bigger cell values were also tested: sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=1000000 -schema 'replication(factor=3)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100 sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..1000000,500000,100000)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100 xxhash + caching: Results: op rate : 19964 [READ:19964] partition rate : 19964 [READ:19964] row rate : 19964 [READ:19964] latency mean : 4.9 [READ:4.9] latency median : 4.6 [READ:4.6] latency 95th percentile : 7.2 [READ:7.2] latency 99th percentile : 11.5 [READ:11.5] latency 99.9th percentile : 13.6 [READ:13.6] latency max : 29.2 [READ:29.2] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:08:20 END md5: Results: op rate : 12773 [READ:12773] partition rate : 12773 [READ:12773] row rate : 12773 [READ:12773] latency mean : 7.7 [READ:7.7] latency median : 7.3 [READ:7.3] latency 95th percentile : 10.2 [READ:10.2] latency 99th percentile : 16.8 [READ:16.8] latency 99.9th percentile : 19.2 [READ:19.2] latency max : 71.5 [READ:71.5] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:13:02 END This translates into a 37% improvoment for this workload. Fixes #2884 Tests: unit-tests (release), dtests (smp=2) Note: dtests are kinda broken in master (> 30 failures), so take the tests tag with a grain of himalayan salt." * 'xxhash/v5' of https://github.com/duarten/scylla: (29 commits) tests/row_cache_test: Test hash caching tests/memtable_test: Test hash caching tests/mutation_test: Use xxHash instead of MD5 for some tests tests/mutation_test: Test xx_hasher alongside md5_hasher schema: Remove unneeded include service/storage_proxy: Enable hash caching service/storage_service: Add and use xxhash feature message/messaging_service: Specify algorithm when requesting digest storage_proxy: Extract decision about digest algorithm to use cache_flat_mutation_reader: Pre-calculate cell hash partition_snapshot_reader: Pre-calculate cell hash query::partition_slice: Add option to specify when digest is requested row: Use cached hash for hash calculation mutation_partition: Replace hash_row_slice with appending_hash mutation_partition: Allow caching cell hashes mutation_partition: Force vector_storage internal storage size test.py: Increase memory for row_cache_stress_test atomic_cell_hash: Add specialization for atomic_cell_or_collection query-result: Use digester instead of md5_hasher range_tombstone: Replace feed_hash() member function with appending_hash ...	2018-02-08 18:24:58 +02:00
Avi Kivity	6298655178	Merge "Inline and optimise more aggressively" from Paweł "We have noticed in the past that the compiler is too conservative when it comes to deciding which functions to inline. Since inlining functions enables further optimisations such as const folding in some cases the difference in performance was significant enough to force us to add [[gnu::always_inline]] attribute in numerous places. However, this is neither a partical nor an elegant solution. A better way to deal with the problem is to adjust the compiler tunables that control the heuristics used for making inlining decisions. In particular, inline-unit-growth seems to affect the performance of the emitted code most. Apart from making the compiler more eager to inline functions bumping the optimisation level to -O3 also seems to have a positive impact on the performance. Fixes #1644. Tests: unit-test (release) Performance tested with gcc 7.3. Macrobenchmark perf_simple_query Flags: -c4 --duration 60 All results are medians. ./before ./after diff read 338662.12 405377.80 19.7% write 387378.89 466744.15 20.5% Microbenchmarks single run duration: 1.000s number of runs: 5 BEFORE test iterations median mad min max combined.one_row 858933 536.389ns 0.819ns 534.823ns 537.208ns combined.single_active 8469 77.131us 11.000ns 77.118us 77.145us combined.many_overlapping 1199 664.105us 160.807ns 663.818us 668.527us combined.disjoint_interleaved 8100 75.522us 22.254ns 75.500us 75.732us combined.disjoint_ranges 8288 72.580us 10.571ns 72.568us 72.599us memtable.one_partition_one_row 1216233 825.581ns 0.446ns 821.450ns 826.027ns memtable.one_partition_many_rows 127336 7.855us 2.153ns 7.853us 7.898us memtable.many_partitions_one_row 57919 17.356us 6.028ns 17.259us 17.362us memtable.many_partitions_many_rows 4751 210.496us 102.339ns 210.393us 211.188us AFTER test iterations median mad min max combined.one_row 1002321 450.292ns 0.313ns 447.202ns 450.605ns combined.single_active 9605 67.086us 8.620ns 67.073us 67.115us combined.many_overlapping 1476 519.554us 5.334ns 519.549us 519.953us combined.disjoint_interleaved 9280 64.363us 5.328ns 64.335us 64.369us combined.disjoint_ranges 9481 61.893us 3.620ns 61.885us 61.903us memtable.one_partition_one_row 1432668 699.775ns 0.106ns 696.023ns 699.918ns memtable.one_partition_many_rows 153692 6.536us 6.885ns 6.501us 6.543us memtable.many_partitions_one_row 63319 15.879us 5.080ns 15.793us 15.884us memtable.many_partitions_many_rows 5659 176.717us 66.770ns 176.650us 177.778us" * tag 'optimise-and-inline/v2' of https://github.com/pdziepak/scylla: configure.py: set optimisation level to -O3 configure.py: set inline-unit-growth to 300 configure.py: flag_supported: support flags with spaces configure.py: rename warning_supported to flag_supported configure.py: pass optimisation flags to seastar/configure.py cql3/select_statement: do not capture stack variables by reference	2018-02-08 17:45:41 +02:00
Tomasz Grabiec	cce1a2bce8	Merge "Use the CPU scheduler" from Glauber & Avi In this patchset I am resubmitting Avi's enablement of the CPU scheduler in his behalf. I've done a ton of testing in the series and there are some improvements / changes that I had previously sent as a separate series. What you see here is the result of merging that work. After this patchset is applied, workloads are smoother and we are able to uphold the pre-defined shares among the various actors. We also finally have everything we need to merge the CPU and I/O controllers. After that is done the code is now much simpler. But also, as a bonus, controllers that were previously available for I/O only (compactions) are enabled for CPU as well. * git@github.com:glommer/scylla.git cpusched-v7: Avi Kivity (4): database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler memtable, database: make memtable::clear_gently() inherit scheduling_group config: mark background_writer_scheduling_quota as Unused database: place data_query execution stage into scheduling_group Glauber Costa (9): database, main: set up scheduling_groups for our main tasks row_cache: actually use the scheduling group for update_cache allow update_cache and clear_gently to use the entire task quota. database: remove cpu_flush_quota metric controllers: retire auto_adjust_flush_quota controllers: allow memtable I/O controller to have shares statically set controllers: update control points for memtable I/O controller controllers: allow a static priority to override the controller output controllers: unify the I/O and CPU controllers	2018-02-08 15:58:40 +01:00
Paweł Dziepak	eb5b76ea50	configure.py: set optimisation level to -O3	2018-02-08 14:46:11 +00:00
Paweł Dziepak	bc65659a46	configure.py: set inline-unit-growth to 300 It has been discovered that the compiler is too conservative when deciding which functions to inline. In particular, the limiting tunable turned out to be inline-unit-growth which limits inlining in large translation units.	2018-02-08 14:46:11 +00:00
Paweł Dziepak	89063a9cc0	configure.py: flag_supported: support flags with spaces	2018-02-08 14:46:11 +00:00
Paweł Dziepak	8f4b30b572	configure.py: rename warning_supported to flag_supported warning_supported() can be used to detect support of any compiler flag, not just warnings.	2018-02-08 14:46:11 +00:00
Paweł Dziepak	a8372b87eb	configure.py: pass optimisation flags to seastar/configure.py	2018-02-08 14:46:11 +00:00
Paweł Dziepak	b635fec9bf	cql3/select_statement: do not capture stack variables by reference Default capture by reference considered harmful in async code.	2018-02-08 14:46:10 +00:00
Avi Kivity	ee763d889a	Merge seastar upstream * seastar 6d02263...2b0a81d (7): > configure.py: add -Wno-stringop-overflow > configure.py: add --optflags for specifying optimisation flags > build: add protobuf-compiler to docker dev image > build: update docker builder to newer Fedora > json_element: stream_object to get its parameter by value > json_element: stream range object > build: add yaml-cpp-devel installation to Dockerfile	2018-02-08 16:45:01 +02:00
Raphael S. Carvalho	312bd9ce25	Remove SSTable's atomic deletion manager Not used anymore, can be deleted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-02-07 22:38:45 -02:00
Raphael S. Carvalho	1472cfcc19	Stop using SSTable's atomic deletion manager The motivation is that it's no longer needed after new resharding algorithm that is the sole responsible for working with shared sstables and regular compaction will not work with those! So resharding will schedule deletion of shared sstables once it's certain that shards that own them have the new unshared sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-02-07 22:27:17 -02:00
Raphael S. Carvalho	b78881c0e9	database: split column_family::rebuild_sstable_list The motivation is that resharding will not want the code that is specific to regular compaction after atomic deletion is removed. Resharding will eventually only need to replace old tables with new ones, and it will be in charge of deletion of old tables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-02-07 22:18:18 -02:00
Glauber Costa	4272279bbb	controllers: unify the I/O and CPU controllers We have had so far an I/O controller, for compactions and memtables, and a CPU controller, for memtables only -- since the scheduling was still quota-based. Now that the CPU scheduler is fully functional, it is time to do away with the differences and integrate them both into one. We now have a memtable controller and a compaction controller, and they control both CPU and I/O. In the future, we may want to control processes that don't do one of them, like cache updates. If that ever happens, we'll try to make controlling one of them optional. But for now, since the I/O and CPU controllers for our main two processes would look exactly the same we should integrate them. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:30 -05:00
Glauber Costa	7b6f188e27	controllers: allow a static priority to override the controller output We have merged the I/O controller without this, but we want to integrate the CPU and I/O controllers into one. Currently, the quota can be statically set for the CPU controller. For now, until we gain more experience with it we should allow a static value to override the controller's output as well. That is particularly important since we don't yet control some strategies like LCS and the time-based ones. Users in the field may be using one of those strategies with a static value for background quota. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	6f295a2a8a	controllers: update control points for memtable I/O controller Right now CPU and I/O controllers have slightly different control points for no good reason. Let's use the CPU controller ones as the standard, as we have been using it in the field for longer and trust it more. The end goal is to fully integrate them. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	b895d495cc	controllers: allow memtable I/O controller to have shares statically set This is so it looks more like the CPU controller. The end goal is to integrate them. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	c099c98676	controllers: retire auto_adjust_flush_quota It no longer makes sense now that we have the full scheduler + controllers. In its lieu, we will provide an option to statically set the controller's shares as a safe guard against us getting this wrong. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	2c1d5cf966	database: remove cpu_flush_quota metric We can now grab that from the CPU scheduler, that exports both runtime and shares. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	c4974392b7	allow update_cache and clear_gently to use the entire task quota. We have had a quota of partitions to process in clear_gently / update_cache, so that we don't overwork. However, with those things now being in their own task group there is no harm in allowing it to run until we reach a natural preemption point. While we are at it, clear_gently did not check for need_preempt() before, so this patch fixes it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	a3a4d0a17a	row_cache: actually use the scheduling group for update_cache We have moved clear_gently from using a seastar::thread's scheduling_group to using the CPU scheduler's. However, update_cache was forgotten. This patch fixes that and gets rid of the old group just in case. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Avi Kivity	ce94e6deb7	database: place data_query execution stage into scheduling_group Because execution stages defer and batch processing of the function they run, they escape their fiber's context and therefore the scheduling group. Fix (for data_query) by initializing the execution_stage with the query scheduling_group. To do that we have to move the execution stage into the database object, so it has access to the scheduling group during initialization.	2018-02-07 17:19:29 -05:00
Avi Kivity	2ee163d32b	config: mark background_writer_scheduling_quota as Unused Since the background writer flush quota config is no longer used, mark it Unused.	2018-02-07 17:19:29 -05:00
Avi Kivity	ac525c9124	memtable, database: make memtable::clear_gently() inherit scheduling_group Instead of using a private thread_scheduling_group, make clear_gently use its caller's scheduling_group to control resource usage.	2018-02-07 17:19:29 -05:00
Glauber Costa	956af9f099	database, main: set up scheduling_groups for our main tasks Set up scheduling groups for streaming, compaction, memtable flush, query, and commitlog. The background writer scheduling group is retired; it is split into the memtable flush and compaction groups. Comments from Glauber: This patch is based in a patch from Avi with the same subject, but the differences are signficant enough so that I reset authorship. In particular: 1) A bug/regression is fixed with the boundary calculations for the memtable controller sampling function. 2) A leftover is removed, where after flushing a memtable we would go back to the main group before going to the cache group again 3) As per Tomek's suggestion, now the submission of compactions themselves are run in the compaction scheduling group. Having that working is what changes this patch the most: we now store the scheduling group in the compaction manager and let the compaction manager itself enforce the scheduling group. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Avi Kivity	641aaba12c	database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler thread_scheduling_groups are converted to plain scheduling_group. Due to differences in initialization (scheduling_group initializtion defers), we create the scheduling_groups in main.cc and propagate them to users via a new class database_config. The sstable writer loses its thread_scheduling_group parameter and instead inherits scheduling from its caller. Since shares are in the 1-1000 range vs. 0-1 for thread scheduling quotas, the flush controller was adjusted to return values within the higher ranges.	2018-02-07 17:19:29 -05:00
Glauber Costa	98549775fa	sstable_tests: make sure min_threshold is set explicitly The SSTable tests are a bit fragile now because they rely on min_threshold having a particular value. That is the default value, but if I change that default - which I am planning to do - the test breaks. Right now the test is not broken, but if we are planning on relying on a property having a particular value in tests, we should explicitly set it. So I am proactively chaning min_threshold in the tests to have the value of 4 explicitly, so we can change that in the future without breaking anything. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180207155513.12498-1-glauber@scylladb.com>	2018-02-07 18:45:52 +01:00
Tomasz Grabiec	d398aa913e	cache: Fix calculation of active_reads() Message-Id: <1518023341-27855-1-git-send-email-tgrabiec@scylladb.com>	2018-02-07 17:20:00 +00:00
Takuya ASADA	2c2173917c	dist/common/scripts/scylla_raid_setup: skip blkdiscard when disk is not supported TRIM Since we unconditionally running blkdiscard on disks, we may get ioctl error message on some disks which does not support TRIM. This can be ignore but it's bad UX, so let's skip running blkdiscard when TRIM is not supported on the disk. Fixes #2774 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1517992904-13838-1-git-send-email-syuu@scylladb.com>	2018-02-07 13:30:05 +02:00
Paweł Dziepak	6ccd317c38	Merge "Do not evict from memtable snapshots" from Tomasz "When moving whole partition entries from memtable to cache, we move snapshots as well. It is incorrect to evict from such snapshots though, because associated readers would miss data. Solution is to record evictability of partition version references (snapshots) and avoiding eviction from non-evictable snapshots. Could affect scanning reads, if the reader uses partition entry from memtable, and the partition is too large to fit in reader's buffer, and that entry gets moved to cache (was absent in cache), and then gets evicted (memory pressure). The reader will not see the remainder of that entry. Found during code review. Introduced in `ca8e3c4`, so affects 2.1+ Fixes #3186. Tests: unit (release)" * 'tgrabiec/do-not-evict-memtable-snapshots' of github.com:tgrabiec/scylla: tests: mvcc: Add test for eviction with non-evictable snapshots mutation_partition: Define + operator on tombstones tests: mvcc: Check that partition is fully discontinuous after eviction tests: row_cache: Add test for memtable readers surviving flush and eviction memtable: Make printable mvcc: Take partition_entry by const ref in operator<<() mvcc: Do not evict from non-evictable snapshots mvcc: Drop unnecessary assignment to partition_snapshot::_version tests: Use partition_entry::make_evictable() where appropriate mvcc: Encapsulate construction of evictable entries	2018-02-06 14:46:24 +00:00
Tomasz Grabiec	3c51cc79d5	tests: mvcc: Add test for eviction with non-evictable snapshots	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	d37131d320	mutation_partition: Define + operator on tombstones	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	ec5fe5b207	tests: mvcc: Check that partition is fully discontinuous after eviction evict() should remove everything, including range tombstones, so whole clustering range should be marked as discontinuous.	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	c1b82e60e3	tests: row_cache: Add test for memtable readers surviving flush and eviction Reproduces https://github.com/scylladb/scylla/issues/3186	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	d85d651e0f	memtable: Make printable Useful when debugging test failures.	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	06b7b54c3d	mvcc: Take partition_entry by const ref in operator<<() Some users will only have const&.	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	50f5bee12e	mvcc: Do not evict from non-evictable snapshots When moving whole partition entries from memtable to cache, we move snapshots as well. It is incorrect to evict from such snapshots though, because associated readers would miss data. Solution is to record evictability of partition version references (snapshots) and avoiding eviction from non-evictable snapshots. Could affect scanning reads, if the reader uses partition entry from memtable, and the partition is too large to fit in reader's buffer, and that entry gets moved to cache (was absent in cache), and then gets evicted (memory pressure). The reader will not see the remainder of that entry. Introduced in `ca8e3c4`, so affects 2.1+ Fixes #3186.	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	c391bff1d2	mvcc: Drop unnecessary assignment to partition_snapshot::_version merge_partition_versions() is responsible for merging versions unpinned by the current snapshot. If that fails, we don't need to set _version back since versions must be still referenced by someone else, this snapshot is not a unique owner. This change makes it easier to add tracking of evictability.	2018-02-06 14:24:18 +01:00
Tomasz Grabiec	439cbada2c	tests: Use partition_entry::make_evictable() where appropriate	2018-02-06 14:24:18 +01:00
Raphael S. Carvalho	09f4ee808f	sstables/compress: Fix race condition in segmented offset reading of shared sstable Race condition was introduced by commit `028c7a0888`, which introduces chunk offset compression, because a reading state is kept in the compress structure which is supposed to be immutable and can be shared among shards owning the same sstable. So it may happen that shard A updates state while shard B relies on information previously set which leads to incorrect decompression, which in turn leads to read misbehaving. We could serialize access to at() which would only lead to contention issues for shared sstables, but that can be avoided by moving state out of compress structure which is expected to be immutable after sstable is loaded and feeded to shards that own it. Sequential accessor (wraps state and reference to segmented_offset) is added to prevent at() and push_back() interfaces from being polluted. Tests: release mode. Fixes #3148. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180205192432.23405-1-raphaelsc@scylladb.com>	2018-02-06 12:10:10 +02:00
Tomasz Grabiec	d899ae0f02	mvcc: Encapsulate construction of evictable entries Internal invariants of MVCC are better preserved by partition_entry methods, so move construction of partition entries out of cache_entry constructors.	2018-02-05 17:54:03 +01:00
Avi Kivity	a94564a637	Merge seastar upstream * seastar 21badbd...6d02263 (4): > build: detect name of ninja executable > queue: pop_eventually/push_eventually should throw when called after abort > build: compile libfmt out-of-line > core/gate: Ensure with_gate leaves gate on exception	2018-02-05 14:42:07 +02:00
Tomasz Grabiec	d21fbc26c7	tests: range_tombstone_list: Do not depend on argument evaluation order next_pos() calls could be reordered resulting in invalid tombstones being generated. Message-Id: <1517833688-20022-1-git-send-email-tgrabiec@scylladb.com>	2018-02-05 12:31:37 +00:00
Tomasz Grabiec	d2baa49313	tests: Do not produce invalid range tombstones Upper bound should not be smaller than lower bound. Found by asserting on valid bounds. Message-Id: <1517833602-19732-1-git-send-email-tgrabiec@scylladb.com>	2018-02-05 12:29:03 +00:00
Takuya ASADA	6d134c0c2b	dist/redhat: block installing Scylla on older kernel We uses AmbientCapabilities directive on systemd unit, but it does not work on older kernel, causes following error: "systemd[5370]: Failed at step CAPABILITIES spawning /usr/bin/scylla: Invalid argument" It only works on kernel-3.10.0-514 == CentOS7.3 or later, block installing rpm to prevent the error. Fixes #3176 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1517822764-2684-1-git-send-email-syuu@scylladb.com>	2018-02-05 12:57:17 +02:00
Duarte Nunes	46099e4f58	tests/role_manager_test: Stop role_manager Not stopping them may cause the tests to fail due to an asynchronous process being scheduled and accessing freed data. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180202221640.28609-1-duarte@scylladb.com>	2018-02-05 09:39:59 +00:00
Avi Kivity	6919c7434e	Merge seastar upstream * seastar 19efbd9...21badbd (4): > reactor: change adjustment method for tasks becoming active > Merge 'Update ARM port' from Avi > http: Do not wait for close connection on stop if listen did not completed > core/future-util: Don't allow rvalues in do_for_each()	2018-02-04 14:28:28 +02:00

1 2 3 4 5 ...

14517 Commits