scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Asias He	fe263e5436	Revert "Revert "streaming: Start to send mutations after PREPARE_DONE_MESSAGE"" This reverts commit `1f29a698d5`.	2016-03-24 08:43:17 +08:00
Asias He	a6dd6e6d55	Revert "Revert "streaming: Simplify session completion logic"" This reverts commit `354fca9d56`.	2016-03-24 07:48:27 +08:00
Gleb Natapov	0afd1c6f0a	config: enable truncate_request_timeout_in_ms option Option truncate_request_timeout_in_ms is used by truncate. Mark it as used. Message-Id: <20160323162649.GH2282@scylladb.com>	2016-03-23 18:50:24 +02:00
Yoav Kleinberger	91269d0c15	tools/scyllatop: add sums to aggregate view the aggregate view now supports both sums and means. Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <1328af8efb113a786d7402b0704220108bfb28db.1458749600.git.yoav@scylladb.com>	2016-03-23 18:49:57 +02:00
Shlomi Livne	6a18634f9f	scylla_io_setup import scylla-server env args scylla_io_seup requires the scylla-server env to be setup to run correctly. previously scylla_io_setup was encapsulated in scylla-io.service that assured this. extracting CPUSET,SMP from SCYLLA_ARGS as CPUSET is needed for invoking io_tune Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <d49af9cb54ae327c38e451ff76fe0322e64a5f00.1458747527.git.shlomi@scylladb.com>	2016-03-23 17:54:06 +02:00
Pekka Enberg	8bf3d4f550	Merge "Make sure repairs do not cripple incoming load" from Glauber "This series makes sure that the influence of repairs on the ongoing loads is limited. This patch does not fix the situation completely, but it will be the best we can do for 1.0 Here's a brief explanation about some potentially contentions points, and future work: 1) With the old parallelism semaphore in tree, we could never really drop parallelism below 256, since even with (local) parallelism = 1, we would still have 256 vnodes. So while the number 100 is totally empirical, we know for a fact that around 200-something, we start having real trouble. (total) parallelism = 100 is enough to allow us to survive a load as much as 3 times heavier than the load described in Issue944. So while it is empirical, at least it is based on something 2) I totally support changing the checksumming algorithm. However, I would rather focus my efforts on testing this to exhaustion than doing this at the moment. But if anybody wants to do it, I think it is a great thing to have before 1.0. Specially because we'll probably need a new verb for that, so we would be better off having it from the start 3) This problem was made harder due to the fact that there are three conditions really that can affect the ongoing load. Only one of them needs to trigger for us to see degradation, so fixing them individually will usually buy us nothing. Those are: a) The disk bandwidth. Since the mutations are all together in the same memtable/commitlog as normal memtables, we can differentiate between them from the I/O Scheduler perspective. This is not an issue of course if the incoming mutations are not enough for us to saturate the disk, but specially given the highly parallel nature of repair, we usually will. If the commitlog queue starts getting too big, for instance, new requests will start being put to wait. The effect of this part of the series is to completely shift the high waiting times from those classes to the streaming ones (unfortunately compaction is still affected, but that's fine IMHO). With the new streaming classes, the waiting time of a memtable / commitlog requests is still kept in the microseconds range. The streaming classes, on the other hand, will be in the hundreds of milliseconds range, or even seconds. b) The memory consumption: since the whole problem that leads to a) is the fact that due to high disk activity some requests will have to wait, we will end up with a lot of streaming memtables not yet flushed. Because of that, we will start throttling new incoming CQL requests and all the isolation efforts are rendered useless. Once again, due to the highly parallel nature of repair, this turned out to be a very easy condition to trigger. The solution proposed here is to limit a maximum amount of dirty memory for the repair job (in here, 25 %). This way, we can endure even slightly heavier loads without sweating too much. c) The task scheduler: repair generates a ton of requests for range checksums, and we actually want to keep it that way - so that the ranges checksummed are small enough so we don't have to resend a lot of mutations for no reason. However, if we pile up thousands of continuations in the task scheduler, seastar has absolutely no mechanism (right now) to prioritize between different kinds of requests. That means that the continuations that are supposed to be handling user requests will simply not for a long time. Even if the Seastar load is less than 100 % that is still a problem, since that is just adding hundreds of milliseconds worth of latencies to any request processing. Fixes #944 and fixes #1033."	2016-03-23 16:07:06 +02:00
Yoav Kleinberger	d2cfb86dc8	tools/scyllatop: defend against unexpected strings from collectd Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <cd7ecf6b3b82bd2027179cbec4e689a946469e9a.1458740337.git.yoav@scylladb.com>	2016-03-23 16:05:59 +02:00
Asias He	c2eff7e824	streaming: Complete receive task after the flush A STREAM_MUTATION_DONE message will signal the receiver that the sender has completed the sending of streams mutations. When the receiver finds it has zero task to send and zero task to receive, it will finish the stream_session, and in turn finish the stream_plan if all the stream_sessions are finished. We should call receive_task_completed only after the flush finishes so that when stream_plan is finshed all the data is on disk. Fixes repair_disjoint_data_test issue with Glauber's "[PATCH v4 0/9] Make sure repairs do not cripple incoming load" serries ====================================================================== FAIL: repair_disjoint_data_test (repair_additional_test.RepairAdditionalTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "scylla-dtest/repair_additional_test.py", line 102, in repair_disjoint_data_test self.check_rows_on_node(node1, 3000) File "scylla-dtest/repair_additional_test.py", line 33, in check_rows_on_node self.assertEqual(len(result), rows, len(result)) AssertionError: 2461	2016-03-23 09:40:49 -04:00
Glauber Costa	f49e965d78	repair: rework repair code so we can limit parallelism The repair code as it is right now is a bit convoluted: it resorts to detached continuations + do_for_each when calling sync_ranges, and deals with the problem of excessive parallelism by employing a semaphore inside that range. Still, even by doing that, we still generate a great number of checksum requests because the ranges themselves are processed in parallel. It would be better to have a single-semaphore to limit the overall parallelism for all requests. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:40:49 -04:00
Glauber Costa	34a9fc106f	database: keep streaming memtables in their own region group Theoretically, because we can have a lot of pending streaming memtables, we can have the database start throttling and incoming connections slowing down during streaming. Turns out this is actually a very easy condition to trigger. That is basically because the other side of the wire in this case is quite efficient in sending us work. This situation is alleviated a bit by reducing parallelism, but not only it does't go away completely, once we have the tools to start increasing parallelism again it will become common place. The solution for this is to limit the streaming memtables to a fraction of the total allowed dirty memory. Using the nesting capability built in in the LSA regions, we will make the streaming region group a child of the main region group. With that, we can throttle streaming requests separately, while at the same time being able to control the total amount of dirty memory as well. Because of the property, it can still be the case that incoming requests will throttle earlier due to streaming - unless we allow for more dirty memory to be used during repairs - but at least that effect will be limited. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:40:47 -04:00
Glauber Costa	455d5a57d2	streaming memtables: coalesce incoming writes The repair process will potentially send ranges containing few mutations, definitely not enough to fill a memtable. It wants to know whether or not each of those ranges individually succeeded or failed, so we need a future for each. Small memtables being flushed are bad, and we would like to write bigger memtables so we can better utilize our disks. One of the ways to fix that, is changing the repair itself to send more mutations at a single batch. But relying on that is a bad idea for two reasons: First, the goals of the SSTable writer and the repair sender are at odds. The SSTable writer wants to write as few SSTables as possible, while the repair sender wants to break down the range in pieces as small as it can and checksum them individually, so it doesn't have to send a lot of mutations for no reason. Second, even if the repair process wants to process larger ranges at once, some ranges themselves may be small. So while most ranges would be large, we would still have potentially some fairly small SSTables lying around. The best course of action in this case is to coalesce the incoming streams write-side. repair can now choose whatever strategy - small or big ranges - it wants, resting assure that the incoming memtables will be coalesced together. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:38:22 -04:00
Glauber Costa	5fa866223d	streaming: add incoming streaming mutations to a different sstable Keeping the mutations coming from the streaming process as mutations like any other have a number of advantages - and that's why we do it. However, this makes it impossible for Seastar's I/O scheduler to differentiate between incoming requests from clients, and those who are arriving from peers in the streaming process. As a result, if the streaming mutations consume a significant fraction of the total mutations, and we happen to be using the disk at its limits, we are in no position to provide any guarantees - defeating the whole purpose of the scheduler. To implement that, we'll keep a separate set of memtables that will contain only streaming mutations. We don't have to do it this way, but doing so makes life a lot easier. In particular, to write an SSTable, our API requires (because the filter requires), that a good estimate on the number of partitions is informed in advance. The partitions also need to be sorted. We could write mutations directly to disk, but the above conditions couldn't be met without significant effort. In particular, because mutations can be arriving from multiple peer nodes, we can't really sort them without keeping a staging area anyway. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:13:00 -04:00
Glauber Costa	10c8ca6ace	priority manager: separate streaming reads from writes Streaming has currently one class, that can be used to contain the read operations being generated by the streaming process. Those reads come from two places: - checksums (if doing repair) - reading mutations to be sent over the wire. Depending on the amount of data we're dealing with, that can generate a significant chunk of data, with seconds worth of backlog, and if we need to have the incoming writes intertwined with those reads, those can take a long time. Even if one node is only acting as a receiver, it may still read a lot for the checksums - if we're talking about repairs, those are coming from the checksums. However, in more complicated failure scenarios, it is not hard to imagine a node that will be both sending and receiving a lot of data. The best way to guarantee progress on both fronts, is to put both kinds of operations into different classes. This patch introduces a new write class, and rename the old read class so it can have a more meaningful name. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:12:59 -04:00
Glauber Costa	78189de57f	database: make seal_on_overflow a method of the memtable_list Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:12:59 -04:00
Glauber Costa	635bb942b2	database: move add_memtable as a method of the memtable_list The column family still has to teach the memtable list how to allocate a new memtable, since it uses CF parameters to do so. After that, the memtable_list's constructor takes a seal and a create function and is complete. The copy constructor can now go, since there are no users left. The behavior of keeping a reference to the underlying memtables can also go, since we can now guarantee that nobody is keeping references to it (it is not even a shared pointer anymore). Individual memtables are, and users may be keeping references to them individually. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:12:59 -04:00
Glauber Costa	6ba95d450f	database: move active_memtable to memtable_list Each list can have a different active memtable. The column family method keeps existing, since the two separate sets of memtable are just an implementation detail to deal with the problem of streaming QoS: the active memtable keeps being the one from the main list. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:12:59 -04:00
Glauber Costa	af6c7a5192	database: create a class for memtable_list memtable_list is currently just an alias for a vector of memtables. Let's move them to a class on its own, exporting the relevant methods to keep user code unchanged as much as possible. This will help us keeping separate lists of memtables. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:12:59 -04:00
Avi Kivity	8ed95754c0	Merge seastar upstream * seastar 9f2b868...aa281bd (7): > shared_promise: Add move assignment operator > lowres_clock: Fix stretched time > scripts: Delete tap with ip instead of tunctl > vla: Actually be exception-safe > vla: Ensure memory is freed if ctor throws > vla: Ensure memory is correctly freed > net: Improve error message when parsing invalid ipv4 address	2016-03-23 14:39:31 +02:00
Takuya ASADA	50db64de33	dist: drop -j2 option on .spec, make build_rpm.sh able to specify -j option Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1458678665-30273-1-git-send-email-syuu@scylladb.com>	2016-03-23 13:32:14 +02:00
Gleb Natapov	48c83163b9	init: make more initialization threaded Since initialization now runs in a thread storage, messaging and gossiper services initialization code may take advantage of it too. Message-Id: <20160323094732.GF2282@scylladb.com>	2016-03-23 11:53:11 +02:00
Shlomi Livne	4ecc37111f	dist/ami: Use the actual number of disks instead of AWS meta service We have seen in some cases that when using the boto api to start instances the aws metadata service http://169.254.169.254/latest/meta-data/block-device-mapping/ returns incorrect number of disks - workaround that by checking the actual number of disks using lsblk Adding a validation at the end verifying that after all computations the NR_IO_QUEUES will not be greater then the number of shards (we had an issue with i2.8x) Fixes: #1062 Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <54c51cd94dd30577a3fe23aef3ce916c01e05504.1458721659.git.shlomi@scylladb.com>	2016-03-23 10:47:08 +02:00
Raphael Carvalho	370b1336fe	service: fix refresh Vlad and I were working on finding the root of the problems with refresh. We found that refresh was deleting existing sstable files because of a bug in a function that was supposed to return the maximum generation of a column family. The intention of this function is to get generation from last element of column_family::_sstables, which is of type std::map. However, we were incorrectly using std::map::end() to get last element, so garbage was being read instead of maximum generation. If the garbage value is lower than the minimum generation of a column family, then reshuffle_sstables() would set generation of all existing sstables to a lower value. That would confuse our mechanism used to delete sstables because sstables loaded at boot stage were touched. Solution to this problem is about using rbegin() instead of end() to get last element from column_family::_sstables. The other problem is that refresh will only load generations that are larger than or equal to X, so new sstables with lower generation will not be loaded. Solution is about creating a set with generation of live SSTables from all shards, and using this set to determine whether a generation is new or not. The last change was about providing an unused generation to reshuffle procedure by adding one to the maximum generation. That's important to prevent reshuffle from touching an existing SSTable. Tested 'refresh' under the following scenarios: 1) Existing generations: 1, 2, 3, 4. New ones: 5, 6. 2) Existing generations: 3, 4, 5, 6. New ones: 1, 2. 3) Existing generations: 1, 2, 3, 4. New ones: 7, 8. 4) No existing generation. No new generation. 5) No existing generation. New ones: 1, 2. I also had to adapt existing testcase for reshuffle procedure. Fixes #1073. Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com> Message-Id: <1c7b8b7f94163d5cd00d90247598dd7d26442e70.1458694985.git.raphaelsc@scylladb.com>	2016-03-23 10:21:58 +02:00
Benoît Canet	1594bdd5bb	dist/ubuntu: Fix the init script variable sourcing The variable sourcing was crashing the init script on ubuntu. Fix it with the suggestion from Avi. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458685099-1160-1-git-send-email-benoit@scylladb.com>	2016-03-23 09:03:17 +02:00
Tomasz Grabiec	5f44afa311	cql3: batch_statement: Execute statements sequentially Currently we execute all statements in parallel, but some statements depend on order, in particular list append/prepend. Fix by executing sequentially. Fixes cql_additional_tests.py:TestCQL.batch_and_list_test dtest. Fixes #1075. Message-Id: <1458672874-4749-1-git-send-email-tgrabiec@scylladb.com>	2016-03-22 20:59:40 +02:00
Pekka Enberg	354fca9d56	Revert "streaming: Simplify session completion logic" This reverts commit `208b7fa7ba`. It breaks Glauber's upcoming repair series.	2016-03-22 20:37:50 +02:00
Pekka Enberg	1f29a698d5	Revert "streaming: Start to send mutations after PREPARE_DONE_MESSAGE" This reverts commit `4c06221766`. It breaks Glauber's upcoming repair series.	2016-03-22 20:37:22 +02:00
Avi Kivity	7df21768d6	Merge "Fix row_cache_alloc_stress test" from Tomasz "The test predates LSA zones and was not anticipating that LSA would take much more free memory from the system than it needs in its assertions. Fix by accounting for the fact properly."	2016-03-22 18:46:31 +02:00
Avi Kivity	b8f80bb2be	Update scylla-ami submodule * dist/ami/files/scylla-ami 56f1ab7...89e7436 (1): > Merge "iotune packaging fix for scylla-ami" from Takuya	2016-03-22 17:55:00 +02:00
Takuya ASADA	dac2bc3055	dist: on scylla_io_setup, SMP and CPUSET should be empty when the parameter not present Fixes #1060 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1458659928-2050-1-git-send-email-syuu@scylladb.com>	2016-03-22 17:49:06 +02:00
Avi Kivity	8cf785e53a	Merge "Merge "iotune packaging fix" from Takuya "This implements #1065 - iotune will NOT be a part of scylla service - remove the scylla.io.service - User will have to run it manually - using a script call scylla_io_tune_setup (that will do the exact same thing the service does today. - if they wont, and do not use --developer-mode, scylla init will fail will a proper error - scylla will not start (in the same manner it does not start if you run scylla on non XFS FS) - For c3,m3,i2 we will use the evaluation formula we have (that takes the number of disks , cores etc.) - For other instances we will set --developer-mode. if the user logins into the instance - he will get a developer-mode warning - No iotune on AWS" Fixes #1065.	2016-03-22 17:46:32 +02:00
Takuya ASADA	9889712d43	dist: remove scylla-io-setup.service and make it standalone script	2016-03-22 17:45:58 +02:00
Takuya ASADA	2cedab07f2	dist: on scylla_io_setup print out message both for stdout and syslog	2016-03-22 17:45:58 +02:00
Takuya ASADA	83112551bb	dist: introduce dev-mode.conf and scylla_dev_mode_setup	2016-03-22 17:45:58 +02:00
Tomasz Grabiec	a4e3adfbec	Fix assertion in row_cache_alloc_stress Fixes the following assertion failure: row_cache_alloc_stress: tests/row_cache_alloc_stress.cc:120: main(int, char**)::<lambda()>::<lambda()>: Assertion `mt->occupancy().used_space() < memory::stats().free_memory()' failed. memory::stats()::free_memory() may be much lower than the actual amount of reclaimable memory in the system since LSA zones will try to keep a lot of free segments to themselves. Fix by using actual amount of reclaimable memory in the check.	2016-03-22 16:31:04 +01:00
Tomasz Grabiec	a0cba3c86f	logalloc: Introduce tracker::occupancy() Returns occupancy information for all memory allocated by LSA, including segment pools / zones.	2016-03-22 16:28:10 +01:00
Yoav Kleinberger	97bb7a35d9	tools/scyllatop: some sensible default metrics Previosly if the user did not specify any metrics, scyllatop use whatever it could find. Now we have some preset defaults which are probably more interesting. Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <1458658804-377-1-git-send-email-yoav@scylladb.com>	2016-03-22 17:04:13 +02:00
Tomasz Grabiec	529c8b8858	logalloc: Rename tracker::occupancy() to region_occupancy()	2016-03-22 14:56:44 +01:00
Pekka Enberg	5019b709ba	service/migration_manager: Simplify verb unregistration You can safely unregister verbs even if they're not registered yet. Simplify code in migration manager by dropping the redundant checks. Message-Id: <1458027669-6517-1-git-send-email-penberg@scylladb.com>	2016-03-22 15:24:55 +02:00
Pekka Enberg	3e1a660839	Merge seastar upstream * seastar c193821...9f2b868 (4): > memory: set free memory to non-zero value in debug mode > Merge "Increase IOTune's robustness by including a timeout" from Glauber > shared_future: add companion class, shared_promise > rpc: fix client connection stopping	2016-03-22 15:16:21 +02:00
Asias He	4c06221766	streaming: Start to send mutations after PREPARE_DONE_MESSAGE Below are 3 possible cases in a stream session, after commit `208b7fa7ba` (streaming: Simplify session completion logic) We might close the session before the exchange of the PREPARE_DONE_MESSAGE message in case 1). To fix, we defer the sending of mutations after PREPARE_DONE_MESSAGE is sent at the initiator node. 1) Initiator Follower tx rx tx rx 1 0 0 1 send prepare send back prepare recev prepare send mutations (close the session before prepare_done msg is sent) recv mutations (close session before prepare_done msg is received) send prepare_done recv prepare_done and send no mutations 2) Initiator Follower tx rx tx rx 0 1 1 0 send prepare send back prepare recv prepare nothing to send send prepare_done recv prepare_done and send mutations (close session) recv mutations (close session) 3) Initiator Follower tx rx tx rx 1 1 1 1 send prepare send back prepare recv prepare send mutations recv mutations, can not close session since we have mutations to send send prepare_done recv prepare_done and send mutations (close session) recv mutations (close session) Message-Id: <d6510b558565db23202164fa491b883ef3796e58.1458634037.git.asias@scylladb.com>	2016-03-22 15:05:57 +02:00
Takuya ASADA	6b2a8a2f70	dist: enable collectd on scylla_setup by default, to make scyllatop usable Fixes #1037 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1458324769-9152-1-git-send-email-syuu@scylladb.com>	2016-03-22 15:02:18 +02:00
Tomasz Grabiec	ca08db504b	managed_bytes: Make operator[] work for large blobs as well Fixes assertion in mutation_test: mutation_test: ./utils/managed_bytes.hh:349: blob_storage::char_type* managed_bytes::data(): Assertion `!_u.ptr->next' Introduced in `ea7c2dd085` Message-Id: <1458648786-9127-1-git-send-email-tgrabiec@scylladb.com>	2016-03-22 14:43:52 +02:00
Gleb Natapov	1e6352e398	messaging: do not admit new requests during messaging service shutdown. Sending a message may open new client connection which will never be closed in case messaging service is shutting down already. Fixes #1059 Message-Id: <1458639452-29388-3-git-send-email-gleb@scylladb.com>	2016-03-22 13:00:18 +02:00
Gleb Natapov	357c91a076	messaging: do not delete client during messaging service shutdown Messaging service stop() method calls stop() on all clients. If remove_rpc_client_one() is called while those stops are running client::stop() will be called twice which not suppose to happen. Fix it by ignoring client remove request during messaging service shutdown. Fixes #1059 Message-Id: <1458639452-29388-2-git-send-email-gleb@scylladb.com>	2016-03-22 13:00:18 +02:00
Asias He	b8abd88841	messaging_service: Take reference of ms in send_message_timeout_and_retry Take a reference of messaging_service object inside send_message_timeout_and_retry to make sure it is not freed during the life time of send_message_timeout_and_retry operation.	2016-03-22 12:32:19 +02:00
Pekka Enberg	ae33e9fe76	dist/ubuntu: Use tilde for release candidate builds The version number ordering rules are different for rpm and deb. Use tilde ('~') for the latter to ensure a release candidate is ordered _before_ a final version. Message-Id: <1458627524-23030-1-git-send-email-penberg@scylladb.com>	2016-03-22 11:52:05 +02:00
Avi Kivity	5a20a70728	Merge "CQL syntax extension to handle sstable loader lists" from Calle "Adds an extension function SCYLLA_TIMEUUID_LIST_INDEX to CQL syntax for collection element indexing, which, if the target is a list, will attempt to directly index the list (which is really a map) by the ordering time uuid (as index parameter)."	2016-03-22 11:42:47 +02:00
Duarte Nunes	36571a2018	init: Trim spaces in seeds list This patch ensures we are resilient against spaces before or after IP addresses in the seeds list. Fixes #958 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1458637617-5761-1-git-send-email-duarte@scylladb.com>	2016-03-22 11:10:29 +02:00
Avi Kivity	1798889e85	Merge "Make apply() exception-safe" from Tomasz "We cannot leave partially applied mutation behind when the write fails. It may fail if memory allocation fails in the middle of apply(). This for example would violate write atomicity, readers should either see the whole write or none at all. This fix makes apply() revert partially applied data upon failure, by the means of ReversiblyMergeable concept. In a nut shell the idea is to store old state in the source mutation as we apply it and swap back in case of exception. At cell level this swapping is inexpensive, just rewiring pointers. For this to work, the source mutation needs to be brought into mutable form, so frozen mutations need to be unfrozen. In practice this doesn't increase amount of cell allocations in the memtable apply path because incoming data will usually be newer and we will have to copy it into LSA anyway. There are extra allocations though for the data structures which holds cells. I didn't see significant change in performance of: build/release/tests/perf/perf_simple_query -c1 -m1G --write --duration 13 The score fluctuates around ~77k ops/s. The change was tested with a unit test (patch to mutation_test) which generates random mutations and injects allocation failures at every possible allocation site in the apply path. This also uncovered other preexisting bugs."	2016-03-22 10:43:41 +02:00
Gleb Natapov	ea92064d38	avoid invoke_on_all during developer-mode application if possible Message-Id: <20160315145327.GW6117@scylladb.com>	2016-03-22 10:40:30 +02:00

1 2 3 4 5 ...

8987 Commits