scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Asias He	da80f27f44	migration_manager: Fix nullptr dereference in maybe_schedule_schema_pull Commit `976324bbb8` changed to use get_application_state_ptr to get a pointer of the application_state. It may return nullptr that is dereferenced unconditionally. In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw: 4 nodes in the tests n1, n2, n3, n4 are started n1 is stopped n1 is changed to use different shard config n1 is restarted ( 2019-01-27 04:56:00,377 ) The backtrace happened on n2 right fater n1 restarts: 0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled 1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled 2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled 3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed) 4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status = 5 Segmentation fault on shard 0. 6 Backtrace: 7 0x00000000041c0782 8 0x00000000040d9a8c 9 0x00000000040d9d35 10 0x00000000040d9d83 11 /lib64/libpthread.so.0+0x00000000000121af 12 0x0000000001a8ac0e 13 0x00000000040ba39e 14 0x00000000040ba561 15 0x000000000418c247 16 0x0000000004265437 17 0x000000000054766e 18 /lib64/libc.so.6+0x0000000000020f29 19 0x00000000005b17d9 We do not know when this backtrace happened, but according to log from n3 an n4: INFO 2019-01-27 04:56:22,154 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL INFO 2019-01-27 04:56:21,594 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL We can be sure the backtrace on n2 happened before 04:56:21 - 19 seconds (the delay the gossip notice a peer is down), so the abort time is around 04:56:0X. The migration_manager::maybe_schedule_schema_pull that triggers the backtrace must be scheduled before n1 is restarted, because it dereference application_state pointer after it sleeps 60 seconds, so the time maybe_schedule_schema_pull is called is around 04:55:0X which is before n1 is restarted. So my theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time n1 has SCHEMA application_state, when n1 restarts, n2 gets new application state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty application_state for SCHEMA. We dereference the nullptr application_state and abort. Fixes: #4148 Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test Message-Id: <9ef33277483ae193a49c5f441486ee6e045d766b.1548896554.git.asias@scylladb.com> (cherry picked from commit `28d6d117d2`)	2019-02-01 13:00:38 +02:00
Jenkins	5174b1cd13	release: prepare for 3.0.2 by slivne scylla-3.0.2	2019-01-30 16:14:34 +02:00
Nadav Har'El	9ba608cae4	cql3: really ensure retrieval of columns for filtering Commit `fd422c954e` aimed to fix issue #3803. In that issue, if a query SELECTed only certain columns but did filtering (ALLOW FILTERING) over other unselected columns, the filtering didn't work. The fix involved adding the columns being filtered to the set of columns we read from disk, so they can be filtered. But that commit included an optimization: If you have clustering keys c1 and c2, and the query asks for a specific partition key and c1 < 3 and c2 > 3, the "c1 < 3" part does NOT need to be filtered because it is already done as a slice (a contiguous read from disk). The committed code erroneously concluded that both c1 and c2 don't need to be filtered, which was wrong (c2 does need to be read and filtered). In this patch, we fix this optimization. Previously, we used the "prefix length", which in the above example was 2 (both c1 and c2 were filtered) but we need a new and more elaborate function, num_prefix_columns_that_need_not_be_filtered(), to determine we can only skip filtering of 1 (c1) and cannot skip the second. Fixes #4121. This patch also adds a unit test to confirm this. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190123131212.6269-1-nyh@scylladb.com> (cherry picked from commit `76f1fcc346`)	2019-01-23 21:11:05 +02:00
Avi Kivity	f7c5cbc645	build: fix libdeflate object file corruption during parallel build libdeflate's build places some object files in the source directory, which is shared between the debug and release build. If the same object file (for the two modes) is written concurrently, or if one more reads it while the other writes it, it will be corrupted. Fix by not building the executables at all. They aren't needed, and we already placed the libraries' objects in the build directory (which is unshared). We only need the libraries anyway. Fixes #4130. Branches: master, branch-3.0 Message-Id: <20190123145435.19049-1-avi@scylladb.com> (cherry picked from commit `c83ae62aed`)	2019-01-23 21:11:05 +02:00
Duarte Nunes	cf4b4d4878	Merge 'hinted handoff: cache cf mappings' from Vlad " Cache cf mappings when breaking in the middle of a segment sending so that the sender has them the next time it wants to send this segment for where it left off before. Also add the "discard" metric so that we can track hints that are being discarded in the send flow. " Fixes #4122 * 'hinted_handoff_cache_cf_mappings-v1' of https://github.com/vladzcloudius/scylla: hinted handoff: cache column family mappings for segments that were not sent out in full hinted handoff: add a "discarded" metric (cherry picked from commit `88c7c1e851`)	2019-01-23 17:14:29 +02:00
Asias He	45bb1ba1b7	streaming: Futurize estimate_partitions The loop can take a long time if the number of sstables and/or ranges are large. To fix, futurize the loop. Fixes: #4005 Message-Id: <3b05cb84f3f57cc566702142c6365a04b075018e.1545290730.git.asias@scylladb.com> (cherry picked from commit `bcba6b4f4d`)	2019-01-22 18:20:21 +02:00
Botond Dénes	28294ed42e	auth/service: unregister migration listener on stop() Otherwise any event that triggers notification to this listener would trigger a heap-use-after-free. Refs: #4107 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <b6bbd609371a2312aed7571b05119d59c7d103d7.1548067626.git.bdenes@scylladb.com> (cherry picked from commit `f229dff210`)	2019-01-22 17:54:36 +02:00
Jenkins	3c4f8cf6ed	release: prepare for 3.0.1 by hagitsegev scylla-3.0.1	2019-01-20 12:42:00 +02:00
Botond Dénes	7b94264ae5	mutlishard_mutation_query(): use correct reader concurrency semaphore The multishard mutation query used the semaphore obtained from `database::user_read_concurrency_sem()` to pause-resume shard readers. This presented a problem when `multishard_mutation_query()` was reading from system tables. In this case the readers themselves would obtain their permits from the system read concurrency semaphore. Since the pausing of shard readers used the user read semaphore, pausing failed to fulfill its objective of alleviating pressure on the semaphore the reads obtained their permits from. In some cases this lead to a deadlock during system reads. To ensure the correct semaphore is used for pausing-resuming readers, obtain the semaphore from the `table` object. To avoid looking up the table on every pause or resume call, cache the semaphores when readers are created. Fixes: #4096 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c784a3cd525ce29642d7216fbe92638fa7884e88.1547729119.git.bdenes@scylladb.com> (cherry picked from commit `4537ec7426`)	2019-01-17 18:08:01 +02:00
Duarte Nunes	22a085fbd3	Merge 'Fix filtering with LIMIT and paging' from Piotr " Before this series the limit was applied per page instead of globally, which might have resulted in returning too many rows. To fix that: 1. restrictions filter now has a 'remaining' parameter in order to stop accepting rows after enough of them have already been accepted 2. pager passes its row limit to restrictions filter, so no more rows than necessary will be served to the client 3. results no longer need to be trimmed on select_statement level Tests: unit (release) " Fixes #4100 * 'fix_filtering_limit_with_paging_3' of https://github.com/psarna/scylla: tests: add filtering+limit+paging test case tests: allow null paging state in filtering tests cql3: fix filtering with LIMIT with regard to paging (cherry picked from commit `7505815013`)	2019-01-17 18:07:41 +02:00
Tomasz Grabiec	2d181da656	row_cache: Fix crash on memtable flush with LCS Presence checker is constructed and destroyed in the standard allocator context, but the presence check was invoked in the LSA context. If the presence checker allocates and caches some managed objects, there will be alloc-dealloc mismatch. That is the case with LeveledCompactionStrategy, which uses incremental_selector. Fix by invoking the presence check in the standard allocator context. Fixes #4063. Message-Id: <1547547700-16599-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `32f711ce56`) scylla-3.0.0	2019-01-15 21:16:13 +02:00
Nadav Har'El	d427a23d42	scylla_util.py: make view_hints_directory setting optional It is optional to set "view_hints_directory", so we shouldn't insist that it is defined in scylla.yaml on upgrade. Fixes #4091. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190114125225.10794-1-nyh@scylladb.com> (cherry picked from commit `9062750089`)	2019-01-14 16:59:40 +02:00
Shlomi Livne	37ab553f02	release: prepare for 3.0.0 Signed-off-by: Shlomi Livne <shlomi@scylladb.com>	2019-01-12 22:24:08 +02:00
Raphael S. Carvalho	6a3f4fb3f9	database: Fix race condition in sstable snapshot Race condition takes place when one of the sstables selected by snapshot is deleted by compaction. Snapshot fails because it tries to link a sstable that was previously unlinked by compaction's sstable deletion. Refs #4051. (master commit `1b7cad3531`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190110194048.26051-1-raphaelsc@scylladb.com>	2019-01-11 13:48:12 +02:00
Avi Kivity	8168d13887	Merge "Fix UDTs representation in serialization header" from Piotr " Tests: unit(release) " Fixes #4073. * commit 'FETCH_HEAD~1': Add test for serialization header with UDT Fix UDT names in serialization header (cherry picked from commit `4a6aeced59`)	2019-01-11 07:48:23 +02:00
Benny Halevy	13bdec6eb4	sstables: mc: sign-extend serialization_header min_local_deletion_time_base and min_ttl_base Refs #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190110141439.1324-1-bhalevy@scylladb.com> (cherry picked from commit `2dc3776407`)	2019-01-11 07:47:45 +02:00
Benny Halevy	57e7081d86	sstables: mc: sign-extend delta local_deletion_time and delta ttl Follow Cassandra's encoding so that values that are less than the baseline encoding_stats will wrap-around in 64-bits rather tham 32. Fixes #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com> (cherry picked from commit `60323b79d1`)	2019-01-09 23:16:00 +02:00
Avi Kivity	2fcae36d96	tests: mutation_source_test: generate valid utf-8 data test_fast_forwarding_across_partitions_to_empty_range uses an uninitialized string to populate an sstable, but this can be invalid utf-8 so that sstable cannot be sstabledumped. Make it valid by using make_random_string(). Fixes #4040. Message-Id: <20190107193240.14409-1-avi@scylladb.com> (cherry picked from commit `d8adbeda11`)	2019-01-08 14:53:55 +02:00
Avi Kivity	ba62dcd5c7	Update seastar submodule * seastar 618bc23...5226277 (1): > iotune: Initialize io_rates member variables Fixes #4064.	2019-01-08 11:39:50 +02:00
Nadav Har'El	515399ce17	materialized views: move hints to top-level directory While we keep ordinary hints in a directory parallel to the data directory, we decided to keep the materialized view hints in a subdirectory of the data directory, named "view_pending_updates". But during boot, we expect all subdirectories of data/ to be keyspace names, and when we notice this one, we print a warning: WARN: database - Skipping undefined keyspace: view_pending_updates This spurious warning annoyed users. But moreover, we could have bigger problems if the user actually tries to create a keyspace with that name. So in this patch, we move the view hints to a separate top-level directory, which defaults to /var/lib/scylla/view_hints, but as usual can be configured. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190107142257.16342-1-nyh@scylladb.com> (cherry picked from commit `da090a5458`)	2019-01-07 22:01:56 +02:00
Benny Halevy	772c4b5fdc	sstables: mc: expired_liveness_ttl should be max int32_t rather than max uint32_t Corresponding to Cassandra's EXPIRED_LIVENESS_TTL = Integer.MAX_VALUE; Fixes #4060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190107172457.20430-1-bhalevy@scylladb.com> (cherry picked from commit `40410465d7`)	2019-01-07 21:59:59 +02:00
Avi Kivity	874d88c98d	Update seastar submodule * seastar 08f1258...618bc23 (1): > perftune.py: tune only active NVMe HW queues on i3 AWS instances Ref #3831.	2019-01-06 12:59:22 +02:00
Avi Kivity	5a178ff635	compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads The workload in #3844 has these characteristics: - very small data set size (a few gigabytes per shard) - large working set size (all the data, enough for high cache miss rate) - high overwrite rate (so a compaction results in 12X data reduction) As a result, the compaction backlog controller assigns very few shares to compaction (low data set size -> low backlog), so compaction proceeds very slowly. Meanwhile, we have tons of cache misses, and each cache miss needs to read from a large number of sstables (since compaction isn't progressing). The end result is a high read amplification, and in this test, timeouts. While we could declare that the scenario is very artificial, there are other real-world scenarios that could trigger it. Consider a 100% write load (population phase) followed by 100% read. Towards the end of the last compaction, the backlog will drop more and more until compaction slows to a crawl, and until it completes, all the data (for that compaction) will have to be read from its input sstables, resulting in read amplification. We should probably have read amplification affect the backlog, but for now the simpler solution is to increase the minimum shares to 50 so that compaction always makes forward progress. This will result in higher-than-needed compaction bandwidth in some low write rate scenarios so we will see fluctuations in request rate (what the controller was designed to avoid), but these fluctioations will be limited to 5%. Since the base class backlog_controller has a fixed (0, 0) point, remove it and add it to derived classes (setting it to (0, 50) for compaction). Fixes #3844 (or at least improves it). Message-Id: <20181231162710.29410-1-avi@scylladb.com> (cherry picked from commit `b0980ba7c6`) scylla-3.0.rc4	2019-01-04 13:28:43 +02:00
Avi Kivity	d67439b910	Revert "release: prepare for 3.0-rc4" This reverts commit `21a5a4c76a`. we were already at rc4, and the commit only changes the syntax (from the incorrect one to the correct one).	2019-01-04 12:34:38 +02:00
Hagit Segev	21a5a4c76a	release: prepare for 3.0-rc4	2019-01-03 23:53:47 +02:00
Tomasz Grabiec	f818d6ee3f	tests: cql_test_env: Start the compaction manager Broken in `fee4d2e` Not doing this results in compaction requests being ignored. One effect of this is that perf_fast_forward produces many sstables instead of one. Refs #3984 Refs #3983 Message-Id: <1544719540-10178-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `245a0d953a`)	2019-01-03 14:56:42 +01:00
Tomasz Grabiec	20c2745592	Merge "Improve times to start / stop the nodes" from Glauber If the compaction manager is started, compactions may start (this is regardless of whether or not we trigger them). The problem with that is that they start at a time in which we are flushing the commitlog and the initialization procedure waits for the commitlog to be fully flushed and the resulting memtables flushed before we move on. Because there are no incoming writes, the amount of shares in memtable flushes decrease as memory used decreases and that can cause the startup procedure to take a long time. We have recently started to bump the shares manually for manual flushes. While that guarantees that we will not drive the shares to zero, I will make the argument that we can do better by making sure that those things are, at this point, running alone: user experience is affected by startup times and the bump we give to user-triggered operations will only do so much. Even if we increase the shares a lot flushes will still be fighting for resources with compactions and startup will take longer than it could. By making sure that flushes are this point running alone we improve the user experience by making sure the startup is as fast as it can be. There is a similar problem at the drain level, which is also fixed in this series. Fixes #3958 * git@github.com:glommer/scylla.git faster-restart compaction_manager: delay initialization of the compaction manager. drain: stop compactions early (cherry picked from commit `3e70ae1d06`)	2019-01-03 14:56:16 +01:00
Avi Kivity	cf5c72561c	release: prepare for scylla-3.0-rc4	2019-01-03 13:15:58 +02:00
Botond Dénes	53b85e5d32	querier_cache: unregister queriers evicted due to expired TTL Currently queriers evicted due to their TTL expiring are not unregistered from the `reader_concurrency_semaphore`. This can cause a use-after-free when the semaphore tries to evict the same querier at some later point in time, as the querier entry it has a pointer to is now invalid. Fix by unregistering the querier from the semaphore before destroying the entry. Refs: #4018 Refs: #4031 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4adfd09f5af8a12d73c29d59407a789324cd3d01.1546504034.git.bdenes@scylladb.com> (cherry picked from commit `e5a0ea390a`)	2019-01-03 13:14:02 +02:00
Avi Kivity	2456cf63f2	querier_cache: unregister querier from reader_concurrency_semaphore during eviction In insert_querier(), we may evict older queriers to make room for the new one. However, we forgot to unregister the evicted queriers from reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore eventually wanted to evict something, it saw an inactive_read_handle that was not connected to a querier_cache::entry, and crashed on use-after-free. Fix by evicting through the inactive_read_handle associated with the querier to be evicted. This removes traces of the querier from both reader_concurrency_semaphore and querier_cache. We also have to massage the statistics since querier_inactive_read::evict() updates different counters. Fixes #4018. Tests: unit(release) Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190102175023.26093-1-avi@scylladb.com> (cherry picked from commit `918d255168`)	2019-01-03 13:14:00 +02:00
Pekka Enberg	c1f6ce4251	Merge 'Fixes for the view_update_from_staging_generator' from Duarte "This series contains a couple of fixes to the view_update_from_staging_generator, the object responsible for generating view updates from sstables written through streaming. Fixes #4021" * 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla: db/view/view_update_from_staging_generator: Break semaphore on stop() db/view/view_update_from_staging_generator: Restore formatting db/view/view_update_from_staging_generator: Avoid creating more than one fiber (cherry picked from commit `96172b7bca`)	2018-12-29 20:22:54 +02:00
Duarte Nunes	fc82eb5586	streaming/stream_session: Only stage sstables for tables with views When streaming, sstables for which we need to generate view updates are placed in a special staging directory. However, we only need to do this for tables that actually have views. Refs #4021 Message-Id: <20181227215412.5632-1-duarte@scylladb.com> (cherry picked from commit `bab7e6877b`)	2018-12-28 20:52:15 +02:00
Avi Kivity	f58e592345	Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz " partition_snapshots created in the memtable will keep a reference to the memtable (as region) and to memtable::_cleaner. As long as the reader is alive, the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumes destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043` (in >= 3.0-rc1) Fixes #4030. Tests: - mvcc_test (debug) " tag 'fix-snapshot-merging-use-after-free-v1.1' of github.com:tgrabiec/scylla: tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed tests: mvcc: Introduce mvcc_container::migrate() tests: mvcc: Make mvcc_partition move-constructible tests: mvcc: Introduce mvcc_container::make_not_evictable() tests: mvcc: Allow constructing mvcc_container without a cache_tracker mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup mvcc: partition_snapshot: Introduce migrate() mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner (cherry picked from commit `8e2f6d0513`)	2018-12-28 13:37:29 +02:00
Duarte Nunes	6375b1e5b7	streaming/stream_session: Don't use table reference across defer points When creating a sstable from which to generate view updates, we held on to a table reference across defer points. In case there's a concurrent schema drop, the table object might be destroyed and we will incur in a use-after-free. Solve this by holding on to a shared pointer and pinning the table object. Refs #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181227105921.3601-1-duarte@scylladb.com> (cherry picked from commit `66e45469b2`)	2018-12-28 10:58:26 +02:00
Gleb Natapov	7ca24efb39	streaming: always read from rpc::source until end-of-stream during mutation sending rpc::source cannot be abandoned until EOS is reached, but current code does not obey it if error code is received, it throws exception instead that aborts the reading loop. Fix it by moving exception throwing out of the loop. Fixes: #4025 Message-Id: <20181227135051.GC29458@scylladb.com> (cherry picked from commit `37b4043677`)	2018-12-27 18:59:59 +02:00
Avi Kivity	32ebaaa585	Update libdeflate submodule * libdeflate 17ec6c9...e7e54ea (1): > build: improve out-of-tree build with multiple output trees (cherry picked from commit `d6a22c50cb`)	2018-12-25 14:41:24 +02:00
Nadav Har'El	a88c722a4c	build_ami.sh: need to check out the right branch of scylla-jmx This patch is for branch 3.0's build_ami.sh. It checks out the latest master branch of scylla-jmx, which not only sounds wrong, it also doesn't work: the latest master of scylla-jmx can only build a "relocatable package" but branch 3.0 doesn't work with those. This patch needs to be applied only in branch 3.0. It should probably be made more general, though... build_ami.sh should have been able to figure out what is the current branch, and if it is branch-3.0 or next-3.0, check out branch-3.0 of the other repositories. But I'm not sure how to do this correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181217214610.4498-1-nyh@scylladb.com>	2018-12-25 12:37:11 +02:00
Tomasz Grabiec	07582d6c10	sstables: index_reader: Fix abort when _trust_pi == trust_promoted_index::no data is not moved-from if _trust_pi == trust_promoted_index::no, which triggers the assert on data.empty(). We should make it empty unconditionally. Message-Id: <1545408731-14333-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `419c771791`)	2018-12-24 11:45:14 +02:00
Tomasz Grabiec	18c89edbf7	sstables: mc: reader: Use enum class instead of variant variant is an overkill here. Message-Id: <1545409014-16289-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `07d153c769`)	2018-12-24 11:45:09 +02:00
Duarte Nunes	5558fa8c44	service/storage_proxy: Protect against empty mutation when storing hint mutation_holder::get_mutation_for() can return nullptr's, so protect against those when storing a hint. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181221194853.98775-2-duarte@scylladb.com> (cherry picked from commit `e6a8883228`) scylla-3.0.rc3	2018-12-23 12:27:27 +02:00
Duarte Nunes	f678eb52cd	service/storage_proxy: Protect against empty mutation in mutation_holder The per_destination_mutation holder can contain empty mutations, so make sure release_mutation() skips over those. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181221194853.98775-1-duarte@scylladb.com> (cherry picked from commit `6c4a34f378`)	2018-12-23 12:27:25 +02:00
Tomasz Grabiec	dfb23f4b38	sstables: mc: index_reader: Handle CK_SIZE split across buffers properly we incorrectly falled-through to the next state instead of returning to read more data. This can manifest in a number of ways, an abort, or incorrect read. Introduced in `917528c` Fixes #4011. Message-Id: <1545402032-4114-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `d2f96a60f6`)	2018-12-21 20:40:35 +02:00
Tomasz Grabiec	502ddf158a	sstables: mc: reader: Avoid unnecessary index reads on fast forwarding When the next pending fragments are after the start of the new range, we know there is no need to skip. Caught by perf_fast_forward --datasets large-part-ds3 \ --run-tests=large-partition-slicing Refs #3984 Message-Id: <1545308006-16389-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `7afe2bad51`)	2018-12-21 20:40:35 +02:00
Paweł Dziepak	0ccb0a127a	Merge "Optimize slicing sstable readers" from Tomasz " Contains several improvements for fast-forwarding and slicing readers. Mainly for the MC format, but not only: - Exiting the parser early when going out of the fast-forwarding window [MC-format-only] - Avoiding reading of the head of the partition when slicing - Avoiding parsing rows which are going to be skipped [MC-format-only] " * 'sstable-mc-optimize-slicing-reads' of github.com:tgrabiec/scylla: sstables: mc: reader: Skip ignored rows before parsing them sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows sstables: mc: parser: Allow the consumer to skip the whole row sstables: continuous_data_consumer: Introduce skip() sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state() sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row sstables: reader: Do not read the head of the partition when index can be used sstables: mc: mutation_fragment_filter: Check the fast-forward window first sstables: mc: writer: Avoid calling unsigned_vint::serialized_size() (cherry picked from commit `e6d26a528f`)	2018-12-21 20:40:35 +02:00
Avi Kivity	b94997be0d	Merge " Extract MC sstable writer to a separate compilation unit" from Tomasz " The motivation is to keep code related to each format separate, to make it easier to comprehend and reduce incremental compilation times. Also reduces dependency on sstable writer code by removing writer bits from sstales.hh. The ka/la format writers are still left in sstables.cc, they could be also extracted. " * 'extract-sstable-writer-code' of github.com:tgrabiec/scylla: sstables: Make variadic write() not picked on substitution error sstables: Extract MC format writer to mc/writer.cc sstables: Extract maybe_add_summary_entry() out of components_writer sstables: Publish functions used by writers in writer.hh sstables: Move common write functions to writer.hh sstables: Extract sstable_writer_impl to a header sstables: Do not include writer.hh from sstables.hh sstables: mc: Extract bound_kind_m related stuff into mc/types.hh sstables: types: Extract sstable_enabled_features::all() sstables: Move components_writer to .cc tests: sstable_datafile_test: Avoid dependency on components_writer (cherry picked from commit `b023e8b45d`)	2018-12-21 20:40:35 +02:00
Benny Halevy	d3a5b10cb8	sstables_stats: writer_impl: move common members to base class To be used by sstable_writer for stats collection. Note that this patch is factored out so it can be verified with no other change in functionality. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `6853c1677d`)	2018-12-21 20:40:35 +02:00
Benny Halevy	48f3f899ac	sstable: make write_crc, write_digest, and new_sstable_component_file private methods Prepare for per-sstable sub directory. Also, these functions get most of their parameters from the sst at hand so they might as well be first class members. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ad5f1e4fbb`)	2018-12-21 20:40:35 +02:00
Paweł Dziepak	c4f745276c	Merge "Optimize sstable writing of large partitions" from Tomasz " This series contains several optimizations of the MC format sstable writer, mainly: - Avoiding output_stream when serializing into memory (e.g. a row) - Faster serialization of primitive types when serializing into memory I measured the improvement in throughput (frag/s) using perf_fast_forward for datasets with a single large partition with many small rows: - 10% for a row with a single cell of 8 bytes - 10% for a row with a single cell of 100 bytes - 9% for a row with a single cell of 1000 bytes - 13% for a row with 6 cells of 100 bytes " * tag 'avoid-output-stream-in-sstable-writer-v2' of github.com:tgrabiec/scylla: bytes_ostream: Optimize writing of fixed-size types sstables: mc: Write temporary data to bytes_ostream rather than file_writer sstables: mc: Avoid double-serialization of a range tombstone marker sstables: file_writer: Generalize bytes& writer to accept bytes_view sstables: Templetize write() functions on the writer sstables: Turn m_format_write_helpers.cc into an impl header sstables: De-futurize file_writer bytes_ostream: Implement clear() bytes_ostream: Make initial chunk size configurable (cherry picked from commit `e3f53542c9`)	2018-12-21 20:40:35 +02:00
Hagit Segev	392c7dee3c	release: prepare for 3.0-rc3	2018-12-21 20:19:50 +02:00
Gleb Natapov	04e982f909	streaming: hold to sink while close() is running and call close on error as well Currently if something throws while streaming in mutation sending loop sink is not closed. Also when close() is running the code does not hold onto sink object. close() is async, so sink should be kept alive until it completes. The patch uses do_with() to hold onto sink while close is running and run close() on error path too. Fixes #4004. Message-Id: <20181220155931.GL3075@scylladb.com> (cherry picked from commit `393269d34b`)	2018-12-20 19:50:47 +02:00

1 2 3 4 5 ...

16819 Commits