Commit Graph

2859 Commits

Author SHA1 Message Date
Asias He
8ea0d94ece message: Listen on single port 2015-04-20 16:11:01 +08:00
Asias He
b38dae4a2b gossip: Dump failure detector info 2015-04-20 15:49:27 +08:00
Asias He
7e0a0c381f gossip: Remove debug print message 2015-04-20 15:49:27 +08:00
Avi Kivity
8702fb1d13 Merge branch 'gleb/snitch' of github.com:cloudius-systems/seastar-dev into db
Some Snitch and StorageService related conversions, from Gleb.
2015-04-20 09:44:38 +03:00
Gleb Natapov
d75ccb0047 the very beginning of StorageService conversion 2015-04-20 09:18:23 +03:00
Gleb Natapov
d13422773d copy StorageSrvice.java over 2015-04-20 09:18:23 +03:00
Gleb Natapov
c39af6dda0 gossip: store regular pointer to subscribers instead of shared one
Some subscribers are allocated statically, so it is a churn to make
shared pointers from them. And since registered subscribers have to be
unregister before been destroyed anyway there is no lifetime issue here
that require use of a smart pointer.
2015-04-20 09:18:23 +03:00
Gleb Natapov
02fb270fbe token operator<< 2015-04-19 10:15:14 +03:00
Raphael S. Carvalho
fdf50ef643 sstables: add initial support to compression
Starting with LZ4, the default compressor.
Stub functions were added to other compression algorithms, which should
eventually be replaced with an actual implementation.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-04-19 10:07:29 +03:00
Tomasz Grabiec
8d6b93d787 query: Document intention behind query results format 2015-04-19 10:07:02 +03:00
Avi Kivity
d691e4d1ff Merge branch 'tgrabiec/cleanups' of github.com:cloudius-systems/seastar-dev into db
Remove obsolete files and fix code which used them, from Tomasz.
2015-04-19 09:53:49 +03:00
Tomasz Grabiec
9309a2ee6f Remove obselete files 2015-04-17 15:08:06 +02:00
Tomasz Grabiec
979b671adf cql3: Fix abstract_marker::raw::prepare()
It was using the wrong version of "collection_type(_impl)?" class.
2015-04-17 15:08:06 +02:00
Tomasz Grabiec
744d75e7f8 db: Move max_ttl from db/expiring_cell.hh to gc_clock.hh 2015-04-17 15:08:06 +02:00
Tomasz Grabiec
d87fbe9eb8 cql3: Fix references to obsolete collection types
The code was using the wrong version of list_type_impl and
collection_type_impl.
2015-04-17 15:08:06 +02:00
Avi Kivity
1aaa8c2f13 tests: fix cql_query_test tuple_test using a data_type on multiple cores
Pointed out by Tomek.
2015-04-16 18:39:43 +03:00
Avi Kivity
2c8b3a8e22 Merge branch 'tgrabiec/tuple_type_iterator' of github.com:cloudius-systems/urchin into db
Iterators for composites, from Tomasz.
2015-04-16 17:52:38 +03:00
Avi Kivity
edaf43f36a Merge branch 'asias/gossip_v1' of github.com:cloudius-systems/seastar-dev into db
Gossip now actually talks among nodes, from Asias.
2015-04-16 17:00:24 +03:00
Tomasz Grabiec
bacede04b2 types: Expose component iterators in tuple_wrapper
This automatically exposes them in partition_key and clustering_key too.

The iterators return bytes_view to components.

For example:

  schema s;
  partition_key k;

  for (bytes_view component : boost::make_iterator_range(key.begin(s), key.end(s))) {
     // ...
  }
2015-04-16 14:04:04 +02:00
Tomasz Grabiec
5ef11d113a types: Improve code readability 2015-04-16 14:04:04 +02:00
Tomasz Grabiec
4c418ddef8 types: Use enum rather than bool in tuple_type template parameter
The 'bool' type doesn't hold any meaning on its own, which makes the
template instantiation sites not very readable:

  tuple_type<true>

To improve that, we can introduce an enum class which is meaningful in
every context:

  tuple_type<allow_prefixes::yes>
2015-04-16 14:57:21 +03:00
Asias He
6a2eed05fd tests: Gossip around node load info
$ ./gossip --seed 127.0.0.1  --listen-address 127.0.0.1
$ ./gossip --seed 127.0.0.1  --listen-address 127.0.0.2
$ ./gossip --seed 127.0.0.1  --listen-address 127.0.0.3

After a few seconds, all the 3 nodes will know each other's load info
by gossip.

----------- endpoint_state_map dump beg -----------
ep=127.0.0.1, eps=EndpointState: HeartBeatState = generation = 1, version = 0, AppStateMap =  { 1 : Value(0.5,1) }
ep=127.0.0.2, eps=EndpointState: HeartBeatState = generation = 1, version = 0, AppStateMap =  { 1 : Value(0.5,1) }
ep=127.0.0.3, eps=EndpointState: HeartBeatState = generation = 1, version = 0, AppStateMap =  { 1 : Value(0.5,1) }
----------- endpoint_state_map dump end -----------
2015-04-16 17:44:20 +08:00
Asias He
02f8c9d965 gossip: Add dump_endpoint_state_map for debug 2015-04-16 17:44:20 +08:00
Asias He
4abee75c04 gossip: Drop fail guard in mark_alive and apply_state_locally 2015-04-16 17:44:20 +08:00
Asias He
4cffb5513d gossip: Drop unnecessary FIXME 2015-04-16 17:44:20 +08:00
Asias He
7f98644742 gossip: Fix send_gossip
Insert when local_ep_state_ptr is engaged not otherwise.
2015-04-16 17:08:19 +08:00
Asias He
eeafdf5815 gossip: Make gms::versioned_value::load static
We are supposed to call it without an instance.
We will convert other similar functions in follow up patches.
2015-04-16 17:03:46 +08:00
Asias He
d661827045 gossip: Fix get_broadcast_address
It is default to listen_address.
2015-04-16 17:01:52 +08:00
Asias He
adff3b9c79 gossip: Drop redundant print in heart_beat_state 2015-04-16 16:59:53 +08:00
Tomasz Grabiec
3bb23c5aff types: Convert tuple_type to use uint16_t for length component
Origin is using CompositeType to serialize composite keys and that
type is using 16-bit integer to encode the length. If it's enough for
Origin, it's enough for us.
2015-04-16 11:49:26 +03:00
Tomasz Grabiec
426cb73983 tests: Fix misspelled path to bytes_ostream_test in test.py 2015-04-16 09:58:51 +02:00
Avi Kivity
a087a0a4dd Merge branch 'asias/gossip_v1' of github.com:cloudius-systems/seastar-dev into db
Allow gossip to bind to a specific address, from Asias.
2015-04-16 10:16:22 +03:00
Asias He
65be3ab711 tests: Allow gossip to listen on a specific IP address 2015-04-16 14:58:52 +08:00
Asias He
7fd4c0a402 message: Allow listen on a specific IP address 2015-04-16 14:58:52 +08:00
Asias He
b18a7bf6c8 gossip: Fix namespace of get_next_version 2015-04-16 14:58:52 +08:00
Asias He
0a1fffa443 gossip: Use get_cluster_name and get_partitioner_name helper 2015-04-16 14:58:52 +08:00
Asias He
e60f1db994 gossip: Drop two hacks which were used to test before 2015-04-16 14:58:52 +08:00
Asias He
8705077517 gossip: Pass reference in send_all and request_all 2015-04-16 14:58:52 +08:00
Asias He
0ffa75f3b7 gossip: Allow specify seeds node IP address 2015-04-16 14:58:52 +08:00
Avi Kivity
2f079e2810 Add missing maps.cc 2015-04-16 08:17:14 +03:00
Gleb Natapov
1dbad40513 gossip: add missing namespace
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
2015-04-15 22:37:30 +03:00
Gleb Natapov
d0a1e35a86 class factory
In Java it is possible to create an object by knowing its class name in
runtime. Replication strategies are created this way (I presume class
name comes from configuration somehow), so when I translated the code to
urchin I wrote replication_strategy_registry class to map a class name to
a factory function. Now I see that this is used in other places too (I
see that snitch class created in the same way), so instead of repeating
the same code for each class hierarchy that is created from its name in
origin this patch tries to introduce an infrastructure to do that easily.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
2015-04-15 22:37:28 +03:00
Avi Kivity
3d134a797e Merge branch 'tgrabiec/query-format-v2' of github.com:cloudius-systems/seastar-dev into db 2015-04-15 21:59:34 +03:00
Avi Kivity
e50be70de2 cql3: move cql3_type's raw classes to .cc 2015-04-15 21:52:04 +03:00
Tomasz Grabiec
ee906471ab cql3: Move method implementations to .cc 2015-04-15 20:44:59 +02:00
Tomasz Grabiec
00f99cefd4 db: split query.hh to reduce header dependencies 2015-04-15 20:44:59 +02:00
Tomasz Grabiec
878a740b9d db: Write query results in serialized form
This gives about 30% increase in tps in:

  build/release/tests/perf/perf_simple_query -c1 --query-single-key

This patch switches query result format from a structured one to a
serialized one. The problems with structured format are:

  - high level of indirection (vector of vectors of vectors of blobs), which
    is not CPU cache friendly

  - high allocation rate due to fine-grained object structure

On replica side, the query results are probably going to be serialized
in the transport layer anyway, so this change only subtracts
work. There is no processing of the query results on replica other
than concatenation in case of range queries. If query results are
collected in serialized form from different cores, we can concatenate
them without copying by simply appending the fragments into the
packet. This optimization is not implemented yet.

On coordinator side, the query results would have to be parsed from
the transport layer buffers anyway, so this also doesn't add work, but
again saves allocations and copying. The CQL server doesn't need
complex data structures to process the results, it just goes over it
linearly consuming it. This patch provides views, iterators and
visitors for consuming query results in serialized form. Currently the
iterators assume that the buffer is contiguous but we could easily
relax this in future so that we can avoid linearization of data
received from seastar sockets.

The coordinator side could be optimized even further for CQL queries
which do not need processing (eg. select * from cf where ...)  we
could make the replica send the query results in the format which is
expected by the CQL binary protocol client. So in the typical case the
coordinator would just pass the data using zero-copy to the client,
prepending a header.

We do need structure for prefetched rows (needed by list
manipulations), and this change adds query result post-processing
which converts serialized query result into a structured one, tailored
particularly for prefetched rows needs.

This change also introduces partition_slice options. In some queries
(maybe even in typical ones), we don't need to send partition or
clustering keys back to the client, because they are already specified
in the query request, and not queried for. The query results hold now
keys as optional elements. Also, meta-data like cell timestamp and
ttl is now also optional. It is only needed if the query has
writetime() or ttl() functions in it, which it typically won't have.
2015-04-15 20:44:50 +02:00
Tomasz Grabiec
0f99570555 Introduce bytes_ostream 2015-04-15 20:33:49 +02:00
Tomasz Grabiec
d287fd4c39 utils: Extend data_input() with more methods 2015-04-15 20:33:49 +02:00
Tomasz Grabiec
a22a9bd3fb cql3: Cache contains_static_columns() 2015-04-15 20:33:49 +02:00