Commit Graph

1067 Commits

Author SHA1 Message Date
Avi Kivity
56684936e6 Merge branch 'thrift' into db
This patchset adds a simple metadata and data store, and wires the
system_add_keyspace RPC to set it up.

With this, cassandra-stress is able to initialize the database (but not
store anything in it).
2014-12-24 09:41:50 +02:00
Avi Kivity
a7360a3ce1 thrift: implement set_keyspace RPC 2014-12-24 09:40:52 +02:00
Avi Kivity
6c6f5c2099 cassandra.thrift: add copyright and change notices
The Apache license requires us to make a note on any changed file.
2014-12-23 19:51:56 +02:00
Avi Kivity
e16345a8d1 licenses: add Apache License 2.0 for code copied from upstream Cassandra 2014-12-23 19:51:18 +02:00
Avi Kivity
72ab87f41f thrift: support system_add_keyspace 2014-12-23 18:41:29 +02:00
Avi Kivity
4a3f3847e8 thrift: support set_cql_version 2014-12-23 18:41:29 +02:00
Avi Kivity
a49fd99327 thrift: create a database and pass it to the server
Not sharded yet.
2014-12-23 18:41:29 +02:00
Avi Kivity
641c859903 db: add in-memory database
Simplistic database using std::map<> to hold rows, and boost::any to
hold values.

Supports:
  - multiple key spaces
  - multiple column families
  - a few data types

Does not support:
  - container data types
  - secondary indexes
  - composites
  - validators
2014-12-23 18:41:29 +02:00
Avi Kivity
55b983159b Merge branch 'master' of github.com:cloudius-systems/seastar into db 2014-12-23 18:37:34 +02:00
Avi Kivity
133d39131c net: fix const correctness for byte-order functions
The byte-order functions were changed not to do in-place conversions,
but they still accept non-const inputs, although they do not modify them.
This can make them harder to use in some cases.

Fix by marking the inputs const.
2014-12-23 17:48:16 +02:00
Gleb Natapov
510171d083 net: add function to map packet's rss hash to a cpu
Provide a function that maps packet's rss hash to a cpu that should handle
it. This function is needed to find appropriate src port for outgoing
tcp/udp connection. Use this function to forward de-fragmented ip packet
to avoid one extra hop too.
2014-12-23 17:36:40 +02:00
Vlad Zolotarov
db50b480a3 dpdk: check_port_link_status(): Cosmetics fix of a printouts.
Add a space after the "Checking link status" to prevent it from
merging with "done" if the link is up immediatelly.
For instance this is going to be the case for a VF
of a PF with already established link (e.g. on AWS).

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-23 16:55:50 +02:00
Vlad Zolotarov
1a6474d6cc dpdk: added the asserts to check the assumptions regarding CSUM features
We assume that if Rx IPv4, TCP and UDP checksum offload features are suported then
they are supported or not supported all together. The same is about the Tx UDP and TCP
checksum offload.

Add the assert that check this assumption.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-23 16:55:44 +02:00
Vlad Zolotarov
38781639ef dpdk: Use all availiable parser options for RSS.
Don't limit ourselves to just IPV4, TCP and UDP even if it's all we currently
care about.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-23 16:55:38 +02:00
Vlad Zolotarov
02dd7a3e24 packet: Change the type of offload_info.vlan_tci to std::experimental::optional
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-23 16:51:05 +02:00
Vlad Zolotarov
c9e0e7aff8 dpdk: Set RSS mode: enable RSS if seastar is configured with more than 1 CPU.
Even if port has a single queue we still want the RSS feature to be
available in order to make HW calculate RSS hash for us.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-23 16:50:28 +02:00
Vlad Zolotarov
15e432715a dpdk: Use DPDK provided default configurations for Rx and Tx queues parameters.
DPDK 1.8 provides per-device default Tx and Rx queues configurations in the output
of rte_eth_dev_info_get(). Use them instead of ixgbe tuned hardcoded values.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-23 16:48:31 +02:00
Vlad Zolotarov
51bb90a397 dpdk: Don't print the MAC address from the hw_address() method.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-22 17:37:18 +02:00
Vlad Zolotarov
2b4f9f69f8 dpdk: Make the port initialization stages more pronounced
- Rename: init_port() -> init_port_start().
 - Added a function init_port_fini() that has a code originally found flat in
   init_local_queue().
 - Moved the link state check to init_port_fini() since the link state should
   be checked after the port has been started.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-22 17:37:13 +02:00
Vlad Zolotarov
59403f0774 dpdk: First version that supports both 1.7.x and 1.8.x (current git master) DPDK versions.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-22 17:37:05 +02:00
Vlad Zolotarov
11c54bd1d5 dpdk: Change the default behavior when --dpdk-pmd is set
1) Make --dpdk-pmd parameter to be a flag instead of a (key, value).
 2) Default to a default hugetlbfs DPDK settings when --hugepages is not
    given and --dpdk-pmd is set.

This will allow a more friendly user experience in general and when one doesn't
want to provide a --hugepages parameter in particular.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-22 17:36:58 +02:00
Vlad Zolotarov
ddf239a943 dpdk: Move the scattered DPDK EAL initialization into the dpdk::eal.
- Move the smp::dpdk_eal_init() code into the dpdk::eal::init() where it belongs.
 - Removed the unused "opts" parameter of dpdk::dpdk_device constructor - all its usage
   has been moved to dpdk::eal::init().
 - Cleanup in reactor.cc: #if HAVE_DPDK -> #ifdef HAVE_DPDK; since we give a -DHAVE_DPDK
   option to a compiler.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-22 17:36:49 +02:00
Vlad Zolotarov
7ec062e222 dpdk: Move dpdk_eal class into a separate file
- Make it's methods static.
 - Rename dpdk::dpdk_eal -> dpdk::eal

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-22 17:36:42 +02:00
Avi Kivity
094329a95e Move apps/seastar/* to top level 2014-12-22 16:15:37 +02:00
Avi Kivity
5e9c815ee5 Merge branch 'thrift' into db
This patchset adds support for an asynchronous thrift service serving the
Cassandra interface.

The implementation uses the "continuation object style" thrift code generation
option, which can be readily adapted to the promise/future interface.

With this, one can connect with a client and see this:

$ ./cassandra-stress  insert -node localhost
Exception in thread "main" java.lang.RuntimeException: org.apache.thrift.TApplicationException: sorry, not implemented
	at org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:139)
	at org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:109)
	at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesThrift(SettingsSchema.java:112)
	at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:60)
	at org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:200)
	at org.apache.cassandra.stress.StressAction.run(StressAction.java:57)
	at org.apache.cassandra.stress.Stress.main(Stress.java:109)
Caused by: org.apache.thrift.TApplicationException: sorry, not implemented
	at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
	at org.apache.cassandra.thrift.Cassandra$Client.recv_set_cql_version(Cassandra.java:1896)
	at org.apache.cassandra.thrift.Cassandra$Client.set_cql_version(Cassandra.java:1883)
	at org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:128)
	... 6 more
2014-12-22 15:02:42 +02:00
Avi Kivity
619da6da1d thrift: wire up thrift connection handler and thrift server
With this, a Cassandra client can connect (only to receive an
unimplemented exception immediately).
2014-12-22 13:38:42 +02:00
Avi Kivity
47a4084b81 thrift: basic server
As thrift does not support pipelining, the server is very simple.  It
implements the thrift framed transport, where each message is preceded
by a four-byte message size header.
2014-12-22 10:33:45 +02:00
Avi Kivity
fc3188913f thrift: throw exceptions in unimplemented services
Where possible, throw an exception instead of returning an uninitialized
value.

Where not possible (if the method does not throw), return a "dummy" string.
2014-12-22 10:33:45 +02:00
Avi Kivity
52c8866db1 thrift: add skeleton handler
Generated by thrift and massaged to compile.
2014-12-22 10:33:45 +02:00
Avi Kivity
9fc0082229 build: generate, build, and link Cassandra thrift support 2014-12-21 17:32:53 +02:00
Avi Kivity
e17b67ea93 build: thrift support
Support adding a thrift file as a source.  Since thrift generates multiple
output files, whose names cannot be trivially derived from the source file
name, we have to specify it as an object containing the source file name
and any additional information needed to derive the generated file names
(in this case, the generated thrift services).
2014-12-21 17:30:50 +02:00
Avi Kivity
183a40e757 thrift: fix name conflict with VERSION
The generic thrift headers bring in a #define (yuch) named VERSION, while the
Cassandra interface also defines a symbol with the same name.

Rename the symbol to avoid a compile conflict.
2014-12-21 17:29:02 +02:00
Avi Kivity
648ea3b469 thrift: add cassandra.thrift
Taken from cassandra.git bf599fb5b062cbcc652da78b7d699e7a01b949ad.
2014-12-21 15:51:53 +02:00
Gleb Natapov
b958a44304 smp: create seastar threads using DPDK when compiled with DPDK support
DPDK initialization creates its own threads and assumes that application
uses them, otherwise things do not work correctly (rte_lcore_id()
returns incorrect value for instance). This patch uses DPDK threads to
run seastar main loop making DPDK APIs work as expected.
2014-12-18 14:43:37 +02:00
Avi Kivity
10079c758a Merge branch 'poller-race'
Fix corruption when a function called from a poller context (usually
an smp request) tries to install a poller itself.
2014-12-18 10:57:26 +02:00
Avi Kivity
c09694d76c reactor: fix corruption in _pollers
register_poller() (and unregister_poller()) adjusts _pollers, but it may be
called while iterating it, and since std::vector<> mutations invalidate
iterators, corruption occurs.

Fix by deferring manipulation of _pollers into a task, which is executed at
a time where _pollers is not touched.
2014-12-17 16:56:20 +02:00
Avi Kivity
481080feb8 reactor: refactor reactor::poller
Currently, reactor::_pollers holds reactor::poller pointers; since these
are movable types, it's hard to maintain _pollers, as the pointers can keep
changing.

Refactor poller so that _pollers points at an internal type, which does not
move when a reactor::poller moves.  This requires getting rid of
std::function, since it lacks a comparison operator.
2014-12-17 16:20:52 +02:00
Tomasz Grabiec
16f4b7dec7 tests: make test_response_spanning_many_datagrams immune to value order 2014-12-16 16:30:50 +02:00
Avi Kivity
a0daeae865 sstring: optimize release()
By switching to malloc/free, we can use make_free_deleter(), which
does not require extra memory.
2014-12-16 14:55:02 +02:00
Avi Kivity
ebf89ac560 virtio: use make_object_deleter 2014-12-16 14:55:02 +02:00
Avi Kivity
692ee47456 deleter: introduce make_object_deleter
When we have an object acting as resource guard for memory, we can convert
it into a deleter using

  make_deleter([obj = std::move(obj)] {})

introduce a simpler interface

  make_object_deleter(std::move(obj))

for doing the same thing.
2014-12-16 14:55:02 +02:00
Avi Kivity
3e4c53300d Merge branch 'mq' of ssh://github.com/cloudius-systems/seastar-dev
Multiqueue support for #cpu != #q, from Gleb.
2014-12-16 11:11:22 +02:00
Gleb Natapov
c8189157ed net: use RSS hash key calculated by HW if available
Some (all?) RSS capable HW provides us with a hash that was used to
select rx queue the packet was delivered to. If such hash is available
it is better to use it to forward packet instead of calculating hash
ourself and suffering cache missed.
2014-12-16 10:53:41 +02:00
Gleb Natapov
d796487976 net: use our RSS key instead of letting DPDK select one 2014-12-16 10:53:41 +02:00
Gleb Natapov
d8ddaeb104 net: forward reassembled ip packet to correct queue
To figure out a cpu that should handle reassembled TCP packet RSS
redirection table have to be consulted.
2014-12-16 10:53:41 +02:00
Gleb Natapov
64adef7def net: copy RSS redirection table from a device
We will need it in later patch.
2014-12-16 10:53:41 +02:00
Gleb Natapov
fbef83beb0 net: support for num of cpus > num of queues
This patch introduce a logic to divide cpus between available hw queue
pairs. Each cpu with hw qp gets a set of cpus to distribute traffic
to. The algorithm doesn't take any topology considerations into account yet.
2014-12-16 10:53:41 +02:00
Gleb Natapov
7ac3ba901c net: rework packet forwarding logic
Instead of forward() deciding packet destination make it collect input
for RSS hash function depending on packet type. After data is collected
use toeplitz hash function to calculate packet's destination.
2014-12-16 10:53:41 +02:00
Gleb Natapov
dd2f73401f net: add toeplitz hash function for rss 2014-12-16 10:53:41 +02:00
Gleb Natapov
bd9b0b8962 net: remove broadcast logic from forwarding path
No longer used.
2014-12-15 17:38:20 +02:00