scylladb

Author	SHA1	Message	Date
Pekka Enberg	86355a54a5	db: Remove column_family stub There's a proper column_family in database.hh now. Remove a stub that was introduced during the initial conversion. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-02-23 14:31:36 +02:00
Tomasz Grabiec	e214c0192a	Merge branch seastar-dev.git avi/smp From Avi: Rather rough sharding of the database. DecoratedKey/Token were de-abstracted (they're concrete types now). This means that some of their type information can be held in the partitioner, so anyone playing with tokens needs access to the partitioner. Sharding is simplistic, using the first byte of the hash as a key to select the shard (with a modulo operation).	2015-02-23 10:53:25 +01:00
Avi Kivity	2720ba34bf	db: shard data Add database::shard_of() to compute the shard hosting the partition (with a simplistic algorithm, but perhaps not too bad). Convert non-metadata invoke_on_all() and local calls on the database to use shard_of().	2015-02-23 11:37:12 +02:00
Avi Kivity	0db67ff121	thrift: add foreign_ptr<> variant to complete() Some calls will return complex types, so allow them to return a foreign_ptr<> to ensure cleanup will happen in the correct place.	2015-02-23 11:37:12 +02:00
Avi Kivity	6f7dff825c	dht: add global partitioner Skip configuration for now and go for murmur3 instead.	2015-02-23 11:37:12 +02:00
Avi Kivity	d0dae3f938	dht: add murmur3_partitioner We don't follow origin precisely in normalizing the token (converting a zero to something else). We probably should, to allow direct import of a database.	2015-02-23 11:37:12 +02:00
Avi Kivity	fd9f90f45a	dht: implement token minimum and midpoint Rather than converting to unsigned longs for the fractional computations, do them it bytes. The overhead of allocating longs will be larger than the computation, given that tokens are usually short (8 bytes), and our bytes type stores them inline.	2015-02-23 10:20:24 +02:00
Avi Kivity	edc4ac4231	dht: de-virtualize token Origin uses abstract types for Token; for two reasons: 1. To create a distinction between tokens for keys and tokens that represent the end of the range 2. To use different implementations for tokens belonging to different partitioners. Using abstract types carries a penalty of indirection, more complex memory management, and performance. We can eliminate it by using a concrete type, and defer any differences in the implementation to the partitioner. End-of-range token representation is folded into the token class.	2015-02-23 10:20:24 +02:00
Avi Kivity	41ba192590	dht: make i_partitioner's methods public, and add a destructor	2015-02-23 10:20:22 +02:00
Avi Kivity	a166f74a54	dht: rename DecoratedKey and Token to fit with naming conventions	2015-02-22 17:53:32 +02:00
Avi Kivity	83430355c2	Merge branch 'master' of github.com:cloudius-systems/seastar into db LICENSE moved to LICENSE.seastar, since (at least for now) urchin is not open source. Conflicts: apps/seastar/main.cc	2015-02-22 16:23:59 +02:00
Avi Kivity	c35b61bda5	distributed: fix return type of lambda-friendly invoke_on() result_of<> is mean to be used only with its 'type' member; use result_of_t instead.	2015-02-22 16:20:36 +02:00
Takuya ASADA	2d4b46d8e7	README.md: add Fedora 21 document Since Fedora 21, we don't need to use rawhide package for g++ 4.9. Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>	2015-02-20 12:13:22 +02:00
Avi Kivity	f116ef8fb6	file: skip . and .. in directory listings	2015-02-20 10:13:15 +02:00
Tomasz Grabiec	c36aab3fe2	Merge seastar-dev avi/smp This initial attemt at sharding broadcasts all writes while directing reads to the local shards. Further refinements will keep schema updates broadcasted, but will unicast row mutations to their shard (and conversely convert row reads from reading the local shard to unicast as well). Of particular interest is the change to the thrift handler, where a sequential application of mutations is converted to parallel application, in order to hide SMP latency and improve batching.	2015-02-19 18:04:57 +01:00
Avi Kivity	842a0f70ca	Revert "core: demangle stdout" This reverts commit a8698fa17c62542a98e928d395f38e66f6ff148c; causes problems with dpdk. Conflicts: core/stdio.cc core/stdio.hh	2015-02-19 18:55:43 +02:00
Avi Kivity	caa83858f0	reactor: handle SIGTERM and SIGINT once stop() is not prepared to be called twice.	2015-02-19 18:45:50 +02:00
Avi Kivity	cb63d16b40	thrift: get rid of useless try/catch Exceptions are now handled with then_wrapped(), nothing is left to catch.	2015-02-19 18:00:03 +02:00
Avi Kivity	70381a6da5	db: distribute database object s/database/distributed<database>/ everywhere. Use simple distribution rules: writes are broadcast, reads are local. This causes tremendous data duplication, but will change soon.	2015-02-19 17:53:13 +02:00
Gleb Natapov	3928092053	smp: put a cache line between smp statistics structures CPU may automatically prefetch next cache line, so if statistics that are collected on different cpus resided on adjacent cache lines CPU may erroneously prefetch cache line that is guarantied to be accessed by another CPU. Fix it by putting a cache line between two structures.	2015-02-19 17:00:18 +02:00
Vlad Zolotarov	e752975b08	DPDK: Copy the Rx data into the allocated buffer in a non-decopuled case If data buffers decoupling from the rte_mbuf is not available (hugetlbfs is not available)copy the newly received data into the memory buffer we allocate and build the "packet" object from this buffer. This will allow us returning the rte_mbuf immediately which would solve the same issue the "decoupling" is solving when hugetlbfs is available. The implementation is simplistic (no preallocation, packet data cache alignment, etc.). Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-02-19 16:59:48 +02:00
Vlad Zolotarov	06565c80d5	DPDK: Decouple Rx data buffers - Allocate the data buffers instead of using the default inline rte_mbuf layout. - Implement an rx_gc() and add an _rx_gc_poller to call it: we will refill the rx mbuf's when at least 64 free buffers. This threshold has been chosen as a sane enough number. - Introduce the mbuf_data_size == 4K. Allocate 4K buffers for a detached flow. We are still going to allocate 2K data buffers for an inline case since 4K buffers would require 2 pages per mbuf due to "mbuf_overhead". Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-02-19 16:58:54 +02:00
Vlad Zolotarov	a4345fa9bf	DPDK: Rename: mbuf_size -> inline_mbuf_size and mbuf_data_size -> inline_mbuf_data_size Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-02-19 16:58:52 +02:00
Vlad Zolotarov	e04cb93d2f	DPDK: Use std::vector instead of std::deque for a tx_buf_factory internal cache std::vector is promised to be continuous storage while std::deque is not. In addition std::vector's semantics yields a simpler code than those of deque. Therefore std::vector should deliver a better performance for a stack semantics we need here. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-02-19 16:58:52 +02:00
Vlad Zolotarov	d22af60057	DPDK: Fix the mempool external buffer size calculation Take into an account the alignment, header and trailer that mempool is adding to the elements. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-02-19 16:58:51 +02:00
Gleb Natapov	bebefe2afe	net: return reference to hw_feature instead of copying the structure I noticed that tcp::hw_features() is not inlined and copies the structure to a caller. The function takes ~1.5% in httpd profiling.	2015-02-19 16:58:50 +02:00
Avi Kivity	7f8d88371a	Add LICENSE, NOTICE, and copyright headers to all source files. The two files imported from the OSv project retain their original licenses.	2015-02-19 16:52:34 +02:00
Avi Kivity	e8096ff2bb	db: add database::stop() Required by distributed<>'s contract.	2015-02-19 16:33:27 +02:00
Avi Kivity	8f9f794a73	db: make column_family::apply(mutation) not steal the contents With replication, we want the contents of the mutation to be available to multiple replicas. (In this context, we will replicate the mutation to all shards in the same node, as a temporary step in sharding a node; but the issue also occurs when replicating to other nodes).	2015-02-19 16:23:09 +02:00
Avi Kivity	a2519926a6	db: add some iostream output operators Helps debugging	2015-02-19 15:56:26 +02:00
Avi Kivity	3ec83658f3	thrift: store the keyspace name in set_keyspace() The keyspace pointer is only valid for the local shard.	2015-02-19 15:55:17 +02:00
Avi Kivity	93818692e1	thrift: add adapter from futures to thrift completion objects Futures hold either a value or an exception; thrift uses two separate function objects to signal completion, one for success, the other for an exception. Add a helper to pass the result of a future to either of these.	2015-02-19 09:32:18 +02:00
Avi Kivity	96a93a2d8c	thrift: add workaround for compile breakage due to thrift code generator	2015-02-19 09:32:18 +02:00
Avi Kivity	b795e9375f	Merge branch 'master' of github.com:cloudius-systems/seastar into db	2015-02-19 09:28:04 +02:00
Avi Kivity	a8698fa17c	core: demangle stdout When using print() to debug on smp, it is very annoying to get interleaved output. Fix by wrapping stdout with a fake stream that has a line buffer for each thread.	2015-02-19 09:26:17 +02:00
Glauber Costa	861d2625b2	file_stream: proper seek support. Our file_stream interface supports seek, but when we try to seek to arbitrary locations that are smaller than an aio-boundary (say, for instance, f->seek(4)), we will end up not being able to perform the read. We need to guarantee the reads are aligned, and will then present to the caller the buffer properly offset. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-02-18 22:56:07 +02:00
Tomasz Grabiec	e8d22e5598	tuple: Fix component iterator The iterator prematurely compared equal with end().	2015-02-18 21:59:35 +02:00
Gleb Natapov	c4c5899f89	net: handle arp resolution errors in tcp Pass timeouts up the calling chain and schedule retry if waiter list is too long.	2015-02-18 20:12:08 +02:00
Avi Kivity	07bb7cceb1	tests: fix mutation_test debug build	2015-02-18 20:09:41 +02:00
Avi Kivity	7c5755e76c	Merge branch 'master' of github.com:cloudius-systems/seastar into db	2015-02-18 20:01:12 +02:00
Avi Kivity	84234b5b9a	memory: implement the C11 aligned_alloc() function	2015-02-18 19:48:18 +02:00
Avi Kivity	b4098dac2f	core: add distributed::invoke_on() variants not requiring a pointer to member Current variants of distributed<T>::invoke_on() require member function to invoke, which may be tedious to implement for some cases. Add a variant that supports invoking a functor, accepting the local instance by reference.	2015-02-18 16:52:56 +02:00
Avi Kivity	17914a80cd	future: add a utility to promote a type to a its own future Some of the core functions accept functions returning either an immediate type, or a future, and return a future in either case (e.g. smp::submit_to()). To make it easier to metaprogram with these functions, provide a utility that computes the return type, futurize<T>: futurize_t<bar> => future<bar> futurize_t<void> => future<> futurize_t<future<bar>> => future<bar>	2015-02-18 16:52:56 +02:00
Gleb Natapov	f7cade107b	seawreck: abort on a connection error	2015-02-18 16:52:56 +02:00
Gleb Natapov	1cfaa7eefe	net: populate dpdk redirection table even if there is only one queue tcp::connect() uses redirection table to figure out what queue will handle a connection.	2015-02-18 16:52:56 +02:00
Avi Kivity	c1abe0e573	smp: remove gratuitous cache miss when no responses are pending boost::lockfree::spsc_queue::push() writes the producer index even when no data is pushed, so check whether we need to do any work beforehand.	2015-02-17 18:00:12 +02:00
Vlad Zolotarov	1934160549	DPDK: Add TSO support - tcp.hh: Properly calculate the pseudo-header in the TSO case: it should be calculated as if ip_len is zero. - Enable TSO in the DPDK network backend. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-02-17 12:47:13 +02:00
Avi Kivity	112277e4e7	Merge branch 'types' into db Fix obvious bottlenecks in mutations, from Tomasz: "These changes improve throughput in perf_mutation test on my laptop 20 times, from ~120K to 2.4M tps."	2015-02-17 12:43:22 +02:00
Tomasz Grabiec	245f4dfe00	types: Avoid allocation in simple_type_impl::less()	2015-02-17 12:43:15 +02:00
Tomasz Grabiec	ad3ffd2e96	tuple: Remove internal deserialize_value() usages deserialize_value() is slow because it involves multiple allocations and copies. Internal operations such as compare() or hash() don't need all that heavy transformations, now that those functions work on bytes_view we can iterate over component values in-place.	2015-02-17 12:43:15 +02:00

1 2 3 4 5 ...

1853 Commits