When configure.py called inside Makefile, dpdk_cflags() mistakenly extract "Entering directory" message. This patch prevent the problem.
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
This gives about 30% increase in tps in:
build/release/tests/perf/perf_simple_query -c1 --query-single-key
This patch switches query result format from a structured one to a
serialized one. The problems with structured format are:
- high level of indirection (vector of vectors of vectors of blobs), which
is not CPU cache friendly
- high allocation rate due to fine-grained object structure
On replica side, the query results are probably going to be serialized
in the transport layer anyway, so this change only subtracts
work. There is no processing of the query results on replica other
than concatenation in case of range queries. If query results are
collected in serialized form from different cores, we can concatenate
them without copying by simply appending the fragments into the
packet. This optimization is not implemented yet.
On coordinator side, the query results would have to be parsed from
the transport layer buffers anyway, so this also doesn't add work, but
again saves allocations and copying. The CQL server doesn't need
complex data structures to process the results, it just goes over it
linearly consuming it. This patch provides views, iterators and
visitors for consuming query results in serialized form. Currently the
iterators assume that the buffer is contiguous but we could easily
relax this in future so that we can avoid linearization of data
received from seastar sockets.
The coordinator side could be optimized even further for CQL queries
which do not need processing (eg. select * from cf where ...) we
could make the replica send the query results in the format which is
expected by the CQL binary protocol client. So in the typical case the
coordinator would just pass the data using zero-copy to the client,
prepending a header.
We do need structure for prefetched rows (needed by list
manipulations), and this change adds query result post-processing
which converts serialized query result into a structured one, tailored
particularly for prefetched rows needs.
This change also introduces partition_slice options. In some queries
(maybe even in typical ones), we don't need to send partition or
clustering keys back to the client, because they are already specified
in the query request, and not queried for. The query results hold now
keys as optional elements. Also, meta-data like cell timestamp and
ttl is now also optional. It is only needed if the query has
writetime() or ttl() functions in it, which it typically won't have.
Creates a temp makefile that includes the mk/rte.vars.mk from DPDK
SDK and prints MACHINE_CFLAGS. The value of MACHINE_CFLAGS is then added
to args.user_cflags.
Effectively the MACHINE_CFLAGS is a bunch of -DRTE_XXX macros and -march=native.
Without these flags the compilation fails inside rte_memcpy.h in the upstream DPDK
tree.
This patch reverts the 04e0c5293b patch.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
This adds the API to urchin, all API related files will be placed under
the api directory.
The API server uses a context object to access global parameters,
typically the different servers themselves.
After this patch the api-doc will be available, though empty under:
http://localhost:10000/api-doc
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
The previous implementation could read either one sstable row or several,
but only when all the data was read in advance into a contiguous memory
buffer.
This patch changes the row read implementation into a state machine,
which can work on either a pre-read buffer, or data streamed via the
input_stream::consume() function:
The sstable::data_consume_rows_at_once() method reads the given byte range
into memory and then processes it, while the sstable::data_consume_rows()
method reads the data piecementally, not trying to fit all of it into
memory. The first function is (or will be...) optimized for reading one
row, and the second function for iterating over all rows - although both
can be used to read any number of rows.
The state-machine implementation is unfortunately a bit ugly (and much
longer than the code it replaces), and could probably be improved in the
future. But the focus was parsing performance: when we use large buffers
(the default is 8192 bytes), most of the time we don't need to read
byte-by-byte, and efficiently read entire integers at once, or even larger
chunks. For strings (like column names and values), we even avoid copying
them if they don't cross a buffer boundary.
To test the rare boundary-crossing case despite having a small sstable,
the code includes in "#if 0" a hack to split one buffer into many tiny
buffers (1 byte, or any other number) and process them one by one.
The tests still pass with this hack turned on.
This implementation of sstable reading also adds a feature not present
in the previous version: reading range tombstones. An sstable with an
INSERT of a collection always has a range tombstone (to delete all old
items from the collection), so we need this feature to read collections.
A test for this is included in this patch.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Register the handler of gossip verbs using ms::register_handler.
Implement send_gossip using ms::send_message. The handlers are only
placeholders for now. Will implement them later.
A periodic timer (1 seconds) is added to send gossip message
periodically.
This makes it possible to call get_local_failure_detector() in gossiper.
Otherwise there is a cyclic dependency between gms/gossiper.hh and
gms/failure_detector.hh
So that it's compiled only once and not for every test. This change
reduced recompilation time for a dummy test module from 40 seconds to
4 on my laptop.
This change also makes it now possible to pass seastar framework command
line arguments to every test using this framework.
It is built on top of seastar rpc infrastructure. I've sorted out all
the message VERBs which Origin use. All of them can be implemented using
this messaging_service.
Each Verb contains a handler. There are two types of handlers, one
will return a message back to sender, the other will not. The former
can be registered using ms.register_handler(), the latter can be
registered using ms.register_handler_oneway().
Usage example:
To use messaging_service to send a message. All you need is:
messaging_service& ms = get_local_messaging_service();
1) To register a message hander:
ms.register_handler(messaging_verb::ECHO, [] (int x, long y) {
print("Server got echo msg = (%d, %ld) \n", x, y);
std::tuple<int, long> ret(x*x, y*y);
return make_ready_future<decltype(ret)>(std::move(ret));
});
ms.register_handler_oneway(messaging_verb::GOSSIP_SHUTDOWN, [] (empty_msg msg) {
print("Server got shutdown msg = %s\n", msg);
return messaging_service::no_wait();
});
2) To send a message:
using RetMsg = std::tuple<int, long>;
return ms.send_message<RetMsg>(messaging_verb::ECHO, id, msg1, msg2).then([] (RetMsg msg) {
print("Client sent echo got reply = (%d , %ld)\n", std::get<0>(msg), std::get<1>(msg));
return sleep(100ms).then([]{
return make_ready_future<>();
});
});
return ms.send_message_oneway<void>(messaging_verb::GOSSIP_SHUTDOWN, std::move(id), std::move(msg)).then([] () {
print("Client sent gossip_shutdown got reply = void\n");
return make_ready_future<>();
});
Tests:
send to cpu 0
$ ./message --server 127.0.0.1 --cpuid 0 --smp 2
send to cpu 1
$ ./message --server 127.0.0.1 --cpuid 1 --smp 2
To support HEAD version of DPDK, we need -mavx/-mavx2 on Seastar CFLAGS.
But we cannot enable it until Host CPU has the feature, so we need to check it.
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
To support HEAD version of DPDK, we need -mavx/-mavx2 on Seastar CFLAGS.
But we cannot enable it until Host CPU has the feature, so we need to check it.
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
OSv does not able to load separated library version of DPDK, so it's better to use combined library (libintel_dpdk.so).
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
This is a migration of the api_docs from OSv. It replaces the static
api-doc.json with a dynamic generated reply, this allows to register API
in run time.
The api_registry_builder is a helper class that holds the file and api
path, simplifying registring both the api_doc handler and registering
additional API.
To use the api_doc, first generate a api_registry_builder.
The registry supply two functions, one for registring the api_doc
handler and one for registering an API.
Both function are passed as an argument for the set_routes method of the
http_server_control object.
To find the handler, the get_exact_match in the routes object was needed
to become public.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Now that http is a library, different internal apps would need to use it
and include its dependency list. To simplify that, the http general
dependencies where moved to a variable.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch converts (for very small value of 'converts') some
replication related classes. Only static topology is supported (it is
created in keyspace::create_replication_strategy()). During mutation
no replication is done, since messaging service is not ready yet,
only endpoints are calculated.
For serializing to commit log, and potentially internal wire messaging.
Note: intentionally incompatible with stock C wire/serial format.
Note: intentionally separate from the CQL-centric serialization
for a few reasons.
1.) Need "bulk serializers" for internal objects (mutation etc)
which might not fit well into the "types.hh" serializer schemes.
2.) No need for polymorphism/virtual type parameters since we know
exactly what we serialize and to where.
This demonstrate how to use a swagger definition file to create an API.
The swagger file demo.json define one api called hello_world, it has
both query and path parameters and return an object as a result.
The handler implementation simply places the given parameters in the
return object.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>