scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 12:06:44 +00:00

Author	SHA1	Message	Date
Avi Kivity	8ce9697401	dhcp: wrap initializers with braces to prevent ambiguity	2014-11-26 14:59:49 +02:00
Avi Kivity	58487b55d4	smp: massage init captures to satisfy clang	2014-11-26 14:59:03 +02:00
Avi Kivity	44c3e9fc04	collectd: wrap initializers with braces Helps prevents ambiguity with constructors that accept multiple parameters.	2014-11-26 14:57:58 +02:00
Avi Kivity	c30b3e93c2	reactor: massage collectd registrations to satisfy clang Warns of an unused variable, even though the destructor has side effects.	2014-11-26 14:57:02 +02:00
Avi Kivity	9ab5dce5c4	memory: fix throw specifiers on sized delete Noticed by clang.	2014-11-26 14:56:40 +02:00
Avi Kivity	9c7fc9d5d1	memcache: massage init capture to satisfy clang	2014-11-26 14:56:18 +02:00
Avi Kivity	05e8ee5e0c	memcache: remove unneeded use of variable length array Noticed by clang.	2014-11-26 14:55:30 +02:00
Avi Kivity	239f4a3bf5	memcache: remove unused subdevice::_length Noticed by clang.	2014-11-26 14:55:01 +02:00
Asias He	1a1ff2a22a	tcp: Fix get_isn It should be microseconds instead of milliseconds. Signed-off-by: Asias He <asias@cloudius-systems.com>	2014-11-26 13:26:54 +02:00
Asias He	fecf47b50a	tcp: Defending against sequence number attacks This patch implements initial sequence number generation algorithm per RFC6528.	2014-11-26 12:34:16 +02:00
Gleb Natapov	01e9410adc	smp: move thread creation sync point after start_all_queues() Configure all smp queues before calling engine.configure() so that engine.configure() may use submit_to() api. Note that messages will still be processed only after engine.run() is executed.	2014-11-26 12:20:04 +02:00
Gleb Natapov	cee8eb3121	net: remove unused function from net/native-stack.hh	2014-11-26 12:19:47 +02:00
Avi Kivity	33ed01d354	Merge branch 'flashcache' of github.com:cloudius-systems/seastar-dev From Raphael: "Flashcache is basically an extension of memcache where a flash device is used to achieve a considerably higher cache hit ratio (~130x better). Flashcache major additions: ----- * Flashcache device length is divided by the number of CPUs, where each portion is then assigned to a per-cpu cache. * Let me readily mention that items aren't stored on disk, but instead data from items. Keys always remain stored in memory. * Each item has now a state field that describes its status. * Each item can be in any of the following states: - MEM (Item is stored only in memory) - TO_MEM_DISK (Transition from MEM to MEM_DISK state) - MEM_DISK (Item is stored both in memory and on disk) - DISK(Item is stored only on disk) - ERASED (Item was invalidated) * Algorithm added to balance items between MEM and MEM_DISK state. * Three LRU lists were added to keep track of MEM, MEM_DISK and DISK items. * When item is ERASED, it shouldn't be in any of the lists above. * When the working set fits memory, items should only be stored in MEM and MEM_DISK lists. * Upon a SET request, the ratio of MEM and MEM_DISK (MEM_DISK / (MEM + MEM_DISK)) is taken into account to decide whether or not a LRU item should be moved to MEM_DISK state (consists of scheduling a LRU item to be stored on disk, where its data field remains intact). * Before an item is scheduled to be moved to MEM_DISK state, it's set to the transition state called TO_MEM_DISK. Why? It's basically to handle client requests on transitioning items. Example: For get requests, let's only provide the data given that the data remains intact. * Upon memory pressure, a specialized reclaiming function is called to do the following: get a LRU item from MEM_DISK list that has no readers (i.e. refcount is zero); remove it from MEM_DISK list, erase the data; set its state to DISK; The steps above are executed repeatedly until the request amount of memory reclaimed is satisfied. * Upon a GET request on a DISK item, a per-item semaphore is used to guarantee that the first request will proceed with the loading of the data from the flash device, while the others wait for the process to complete. * ERASED state is used to inform flashcache that an item was invalidated and thus shouldn't be moved to any list. E.G. invalidation request could happen while the data from an item is being loaded from disk. Result: ----- Performance is worse (unfortunate but also expected because of time waiting for items to be loaded) but hit ratio is considerably better as also expected. I'm thinking of adding a new state for items called LOADED that, when the data from the item is loaded from disk, mark the item as LOADED; insert it into MEM list; and schedule an item from MEM list to be moved to MEM_DISK list. That may bring a good performance benefit, no data to back up my claim though. By the time being, an item loaded is directly moved to MEM_DISK list (as its data is already stored on disk), where it then could be quickly evicted upon a memory pressure. $ sudo ./memcached --stats --device /dev/sdb --mem 600M (POSIX stack) * MEMCACHE - TCP: $ memaslap -T 4 -s 127.0.0.1 -t 60s -c 256 servers : 127.0.0.1 threads count: 4 concurrency: 256 run time: 60s windows size: 10k set proportion: set_prop=0.10 get proportion: get_prop=0.90 cmd_get: 6310281 cmd_set: 701266 get_misses: 1783262 written_bytes: 1216572735 read_bytes: 5039263122 object_bytes: 762977408 Run time: 60.0s Ops: 7011547 TPS: 116837 Net_rate: 99.4M/s * FLASHCACHE - TCP: $ memaslap -T 4 -s 127.0.0.1 -t 60s -c 256 servers : 127.0.0.1 threads count: 4 concurrency: 256 run time: 60s windows size: 10k set proportion: set_prop=0.10 get proportion: get_prop=0.90 cmd_get: 3067576 cmd_set: 340959 get_misses: 13576 written_bytes: 591452430 read_bytes: 3392472330 object_bytes: 370963392 Run time: 60.0s Ops: 3408535 TPS: 56804 Net_rate: 63.3M/s"	2014-11-25 13:42:14 +02:00
Raphael S. Carvalho	35f37a4235	memcache: generate flashcache flashcached.cc and memcached.cc files were created to generate flashcached and memcached respectively through a template parameter.	2014-11-25 09:10:33 -02:00
Raphael S. Carvalho	300b310a27	memcache: move ./memcached.cc to ./memcache.cc Actual purpose is explained by the subsequent commit.	2014-11-25 09:10:33 -02:00
Raphael S. Carvalho	087038bd47	memcache: flashcache integration flashcached isn't generated by the build process yet, please check subsequent commits.	2014-11-25 09:05:13 -02:00
Avi Kivity	8b632ca2fb	Revert bogus allocator/deleter commits Reverts: `e0df395124` "Add make_free_deleter" `cfd8a1f997` "core: special-case deleter for raw memory"" `5eaecc8805` "Use default allocator" Introduced accidentally.	2014-11-25 12:06:13 +02:00
Avi Kivity	9eea1752b0	Merge branch 'asias/tcp' of github.com:cloudius-systems/seastar-dev TCP improvements from Asias.	2014-11-25 11:58:47 +02:00
Asias He	bd0849f40b	tcp: Send ACK immediately when segment fills a gap arrives See RFC5681: 3.2. Fast Retransmit/Fast Recovery for more details. """ In addition, a TCP receiver SHOULD send an immediate ACK when the incoming segment fills in all or part of a gap in the sequence space. """	2014-11-25 16:31:42 +08:00
Asias He	e1f4499b28	tcp: Send ACK immediately when out of order segment arrives See RFC5681: 3.2. Fast Retransmit/Fast Recovery for more details.	2014-11-25 15:59:48 +08:00
Avi Kivity	49c17db25e	Merge branch 'glommer/xen' of github.com:cloudius-systems/seastar-dev From Glauber: "Before those patches, Xen was not surviving a full round of wrk. Now it survives a 20min one. That doesn't mean it is devoid of bugs: I am still seeing some warnings being generated, so there is definitely more work to do. But at least it doesn't crash and is stable. Performance wise, Xen+OSv fares at 32k req/sec in my laptop, where lwan does 45k"	2014-11-25 09:55:12 +02:00
Gleb Natapov	8a754386c2	net: remove unused variable in native_network_stack	2014-11-25 09:54:44 +02:00
Asias He	e14674ff3c	tcp: Improve merge_out_of_order In case of seg_beg > _rcv.need, we can stop looking since seg_beg can grow only.	2014-11-25 15:44:59 +08:00
Asias He	2a3ce92b19	tcp: Reduce maximum delayed timer The maximum delayed ack timer allowed by RFC1122 is 500ms, most implementations use 200ms by default, including Windows and Linux.	2014-11-25 15:39:33 +08:00
Glauber Costa	dd8c5a3521	xen: fix index calculation The xen protocol needs works by filling positions in a circular ring. The indexes become free to be used again when they are processed by the other side. There is a problem, however: those indexes must be sequential, because all the sides share is a produced / consumed index. But there are situations in which we call get_index() - which produces an index X, but the .then() clause schedules some other caller of send() to run in our place. That one, in turn, can call get_index(), then create a packet with index X + 1 that will be put in the ring before the packet with index X. If the other end processes this packet very fast, it will respond saying "I have processed packets up to X + 1". We will act on it as marking X as processed as well - since it comes before X + 1, and when X is really processed, chaos will ensue. The solution for that is to just have the semaphore to count how many spaces we have in the ring. Once we guarantee that the current caller have space, we then compute get_index() inside the .then() clause. This works well because the indexes are all sequential anyway. For the same reason, we are actually able to remove the queue, and resort to a simple counter. Once we know there is room, we just get the next index, whatever it may be. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-24 17:01:14 +01:00
Glauber Costa	3f67c12925	xen: make idx method static It does not depend on any instance member. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-24 16:44:05 +01:00
Glauber Costa	3c195d25e6	xen: useful assert we can't reach this place with a negative ref id, so let's assert to make sure we're fine. Help catching some bugs. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-24 16:42:55 +01:00
Glauber Costa	fa252087c4	xen: use the right index The index in the ring and the packet id tends to be the same. But it doesn't have to. There are some situations where the backend and the frontend get out of sync with this, and this is totally valid. One example is when the backend skb already have enough room to hold all of the data being transmitted (netback.c, line 1611 @3.16). The netback will respond immediately, even though there are other pending packets that are not yet fully processed. The ring index, then, must come from the rsp value, not from the req/rsp id. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-24 16:38:38 +01:00
Asias He	e0df395124	Add make_free_deleter	2014-11-24 18:16:25 +08:00
Asias He	cfd8a1f997	Revert "core: special-case deleter for raw memory" This reverts commit `f75d1822cc`.	2014-11-24 18:16:25 +08:00
Asias He	5eaecc8805	Use default allocator	2014-11-24 18:16:25 +08:00
Asias He	35186f659a	tcp: Fix transmission When a bulk of data is passed from user application, the TCP layer call output only once to send data. This will slow TX a lot, because the output will send at most MSS size of data while we might have way more than MSS to send. We will send again only after remote ack the data we just sent. This slowness can be seen easily with tso turned off. To fix, we should send as much as we are allowed to. This patch boosts TX bandwidth from 0.N MiB/Sec to hundreds MiB/Sec. Before: [asias@hjpc pingpong]$ go run client-txtx.go Server: 192.168.66.123:10000 Connections: 1 Bytes Received(MiB): 10 Total Time(Secs): 76.217338072 Bandwidth(MiB/Sec): 0.13120374252054473 After: [asias@hjpc pingpong]$ go run client-txtx.go Server: 192.168.66.123:10000 Connections: 1 Bytes Received(MiB): 100 Total Time(Secs): 0.5105951040000001 Bandwidth(MiB/Sec): 195.84989988466475	2014-11-24 11:54:33 +02:00
Asias He	18565277f3	tcp: Add congestion control support This patch adds congestion control to our TCP according to RFC5681. These four algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery, are added. Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>	2014-11-24 11:54:19 +02:00
Avi Kivity	347b135b78	memory: rename statistics member functions to be more readable Requested by Gleb.	2014-11-24 11:53:13 +02:00
Avi Kivity	f32a8be723	memory: make statistics thread local Noticed by Gleb.	2014-11-24 11:50:07 +02:00
Avi Kivity	3b69d76f06	Merge branch 'stats' Add statistics for memory allocations and httpd.	2014-11-24 10:00:34 +02:00
Avi Kivity	88b38bfbdf	Revert "virtio: Lazy interrupts" This reverts commit 817023f91741e43731823e72d60800016cbf2633; causes hangs and throughput problems.	2014-11-24 09:28:41 +02:00
Vlad Zolotarov	1238807d98	net: implement a few proper constructors for ethernet_address Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-11-23 23:26:54 +02:00
Vlad Zolotarov	80396b5153	packet: Use const_cast The original cast gave an error with -Werror=cast-qual Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-11-23 23:25:38 +02:00
Avi Kivity	0903e36179	httpd: export statistics via collectd	2014-11-23 19:28:54 +02:00
Avi Kivity	a7f14fa13e	core: export memory statistics via collectd	2014-11-23 19:28:26 +02:00
Avi Kivity	5cd200831b	memory: add allocation statistics collection	2014-11-23 19:28:07 +02:00
Glauber Costa	5f82f7296f	xen: debug functions This is being very helpful for my local debugging. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-20 16:57:16 +01:00
Glauber Costa	bc617f8340	make xen work again. After the latest reactor rework from Nadav, it is no longer allowed to use eventfds in the reactor for OSv. Change the code to use the reactor notifier instead. We could just use that instead of semaphores altogether. But because the semaphore is per listener, we need a translation anyway. So let's keep this one doing the interrupt processing, and the semaphores doing the rest of the work. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-11-20 14:59:25 +02:00
Glauber Costa	af01032450	make xen work again. After the latest reactor rework from Nadav, it is no longer allowed to use eventfds in the reactor for OSv. Change the code to use the reactor notifier instead. We could just use that instead of semaphores altogether. But because the semaphore is per listener, we need a translation anyway. So let's keep this one doing the interrupt processing, and the semaphores doing the rest of the work. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-20 13:41:20 +01:00
Tomasz Grabiec	e85371112b	memcached: optimize handle_get() We don't really need to copy keys as the parser is not reused until we're done. Also, in case of a single key we don't use map_reduce() which saves us one allocation (reducer).	2014-11-20 13:25:07 +02:00
Tomasz Grabiec	7737871ea0	memcached: move data rather than copy it	2014-11-20 13:25:06 +02:00
Avi Kivity	37bf4898e3	allocator_test: limit runtime to 5 seconds	2014-11-20 13:04:03 +02:00
Avi Kivity	cabdf1dcbc	allocator_test: reindent	2014-11-20 12:37:18 +02:00
Avi Kivity	3d0724c4c1	Merge branch 'xen'	2014-11-20 12:33:45 +02:00

... 1061 1062 1063 1064 1065 ...

53948 Commits