Commit Graph

480 Commits

Author SHA1 Message Date
Avi Kivity
0b01b74444 build: disable seastar Xen support
Not needed, and conflicts with dpdk.
2015-08-18 12:31:26 +03:00
Avi Kivity
e9a46215ef build: change project name
The configure script originated from seastar, need a name change.
2015-08-18 12:29:05 +03:00
Avi Kivity
932ddc328c logalloc: optimize current_allocation_strategy()
This heavily used function shows up in many places in the profile (as part
of other functions), so it's worth optimizing by eliminating the special
case for the standard allocator.  Use a statically allocated object instead.

(a non-thread-local object is fine since it has no data members).
2015-08-17 16:51:10 +03:00
Avi Kivity
95847f86c3 Merge "locator: introduce i_endpoint_snitch::reset_snitch()" from Vlad
"This series introduces the i_endpoint_snitch::reset_snitch() static method
that allows to replace the current (global) snitch instance with the new one.
This is done in an (per-shard) atomic way transparent so anyone holding a reference
to snitch_ptr.

This series starts with some cleanups, adds the above method and the unit test
that verifies its functionality."
2015-08-12 19:29:08 +03:00
Avi Kivity
517ceed515 Merge "sstable index write benchmark"
"I am currently looking at the performance of our index_read, since it was in
the past pinpointed at the source of problems.

While the read side is the one that is mostly interesting, I would like to test
both - besides anything else, it is easier to test reads after writes so we
don't have to create synthetic data with outside tools.

This patch introduces the write side benchmark (read side will hopefully come
tomorrow).  While the write side is, as mentioned, not the most interesting
part, I did see some standing from the flamegraph that allowed me to optimize
one particular function, yielding a 8.6 % improvement."
2015-08-12 18:33:11 +03:00
Glauber Costa
4ddef06ba6 perf tests: test sstables index reads and writes
This is a test that allow us to query the performance of our sstable index
reads and writes (currently only writes implemented). A lot of potentially
common code is put into a header, which will make writing new tests easier if
needed.

We don't want to take shortcuts for this, so all reading and writing is done
through public sstable interfaces.

For writing, there is no way to write the index without writing the datafile.
But because we are only writing the primary key, the datafile will not contain
anything else. This is the closest we can get to an index testing with the
public interfaces.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-12 09:18:37 -05:00
Vlad Zolotarov
806cc8c09a locator: snitch_reset_test
Checks that both successful and insuccessful calls for reset_snitch()
function as expected.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-08-12 16:44:47 +03:00
Pekka Enberg
a3c194b050 transport/server: Move event_notifier class to separate file
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-12 09:59:35 +03:00
Raphael S. Carvalho
9823164c89 db: introduce compaction manager
Currently, each column family creates a fiber to handle compaction requests
in parallel to the system. If there are N column families, N compactions
could be running in parallel, which is definitely horrible.

To solve that problem, a per-database compaction manager is introduced here.

Compaction manager is a feature used to service compaction requests from N
column families. Parallelism is made available by creating more than one
fiber to service the requests. That being said, N compaction requests will
be served by M fibers.

A compaction request being submitted will go to a job queue shared between
all fibers, and the fiber with the lowest amount of pending jobs will be
signalled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-11 17:25:46 +03:00
Avi Kivity
d6351ecca7 utils: add crc32 class
C++ interface to the crc32 x86 instruction.
2015-08-09 00:05:33 +03:00
Avi Kivity
70618762c3 build: require at least a Nehalem-class cpu
We want to use the crc32 instruction, which was made available
on Nehalem, so let's require it.  It's old enough to be present
everywhere.
2015-08-08 23:28:32 +03:00
Tomasz Grabiec
038183eabd Merge branch 'penberg/event-cleanups/v1' from seastar-dev.git
Cleanups of CQL events code from Pekka.
2015-08-07 12:18:26 +03:00
Pekka Enberg
33ce99b732 transport/event: Move implementation to source file
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-07 09:28:46 +03:00
Tomasz Grabiec
9a1ee1b96a api: Introduce RESTful API for LSA
To force compaction, invoke:

  $ curl -X POST http://localhost:10000/lsa/compact
2015-08-06 16:50:15 +02:00
Tomasz Grabiec
658c21a060 tests: Add LSA tests 2015-08-06 14:05:16 +02:00
Tomasz Grabiec
5a9e296803 utils: lsa: Introduce log-structured allocator 2015-08-06 14:05:15 +02:00
Tomasz Grabiec
e7e79af435 tests: Add allocation_strategy_test 2015-08-06 12:52:43 +02:00
Avi Kivity
522f23b830 Merge "Schema table cleanups" from Pekka
"Clean up the schema table code. Be explicit that we don't support
Cassandra 3.0 and eliminate some dead code."
2015-08-05 15:09:59 +03:00
Avi Kivity
c720cddc5c tests: mv tests/urchin/* -> tests/
Now that seastar is in a separate repository, we can use the tests/
directory.
2015-08-05 14:16:52 +03:00
Pekka Enberg
99a80050e3 db: Rename legacy_schema_tables to schema_tables
There's nothing legacy about it so rename legacy_schema_tables to
schema_tables. The naming comes from a Cassandra 3.x development branch
which is not relevant for us in the near future.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:56:47 +03:00
Nadav Har'El
34b1cc42cd Initial repair support
This patch adds the beginning of node repair support. Repair is initiated
on a node using the REST API, for example to repair all the column families
in the "try1" keyspace, you can use:

curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"

I tested that the repair already works (exchanges mutations with all other
replicas, and successfully repairs them), so I think can be committed,
but will need more work to be completed

 1. Repair options are not yet supported (range repair, sequential/parallel
    repair, choice of hosts, datacenters and column families, etc.).

 2. *All* the data of the keyspace is exchanged - Merkle Trees (or an
    alternative optimization) and partial data exchange haven't been
    implemented yet.

 3. Full repair for nodes with multiple separate ranges is not yet
    implemented correctly. E.g., consider 10 nodes with vnodes and RF=2,
    so each vnode's range has a different host as a replica, so we need
    to exchange each key range separately with a different remote host.

 4. Our repair operation returns a numeric operation id (like Origin),
    but we don't yet provide any means to use this id to check on ongoing
    repairs like Origin allows.

 5. Error hangling, logging, etc., needs to be improved.

 6. SMP nodes (with multiple shards) should work correctly (thanks to
    Asias's latest patch for SMP mutation streaming) but haven't been
    tested.

 7. Incremental repair is not supported (see
    http://www.datastax.com/dev/blog/more-efficient-repairs)

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-05 13:26:36 +03:00
Nadav Har'El
1d4c1eda51 ninja: add "clean" target
This patch adds a "ninja clean", better than the current "ninja -t clean".

Ninja's "ninja -t clean" is a nice trick, designed to save the Makefile writer
the tedious chore of listing the targets to remove, by automatically gathering
this list. But our build system, following OSv's one, actually uses a much
cooler (and better) trick: All build files are generated in a single
subdirectory, "build/", and cleaning the build products is as simple as
"rm -rf build".

So this patch adds a target, "ninja clean", which does exactly this (rm -rf
build). "ninja clean" is not only easier to type than "ninja -t clean", it
also has one important benefit: When the ninja rules change, "ninja -t clean"
doesn't remember to delete now-defunct targets, and they stay behind. On my
build machine, "ninja -t clean" left behind almost a gigabyte of old crap.
Moreover, when the ninja file changes drastically (as it changed a few days
ago), not cleaning up everything can even cause new builds to break - e.g.,
when something was previously a file and now needs to be a directory.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-04 16:00:09 +03:00
Avi Kivity
eaca3f7cd1 build: forward compiler selection to seastar configuration 2015-08-03 22:37:32 +03:00
Tomasz Grabiec
b88fc51e2a tests: Introduce test for storage_proxy::make_local_reader() 2015-08-03 15:21:40 +02:00
Asias He
6398bb4bdc gossip: Move code from gms/endpoint_state.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
a95213e81e gossip: Kill gms/gms.cc
All headers of gms/* are included. No need to include them all in gms.cc now.
2015-07-31 10:43:40 +08:00
Asias He
e074b1b7f8 gossip: Move operator<< of gossip_digest_ack2 to gossip_digest_ack2.cc 2015-07-31 10:43:39 +08:00
Asias He
ca5eea7fad gossip: Move operator<< of gossip_digest_ack to gossip_digest_ack.cc 2015-07-31 10:43:39 +08:00
Asias He
76efae87b5 gossip: Move operator<< of gossip_digest_syn to gossip_digest_syn.cc 2015-07-31 10:43:39 +08:00
Asias He
a2b54fc757 main: Introduce init.cc to cleanup service startup code
This patch introduce init.cc file which hosts all the initialization
code. The benefits are 1) we can share initialization code with tests
code. 2) all the service startup dependency / order code is in one
single place instead of everywhere.
2015-07-28 18:20:45 +08:00
Nadav Har'El
d074dfe322 build: no need any more to try both ninja and ninja-build
We have the "ninja" variable. Use it.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-07-28 12:30:49 +03:00
Pekka Enberg
7fc1311d4a db/consistency_level: Move implementation to .cc file
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-28 10:06:18 +03:00
Amnon Heiman
4908222d6a Adding utils.json Swagger definition file
The utils file will hold general modules, that need to be used by
multiple modules.

As a start, it holds the histogram definition.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-26 10:58:45 +03:00
Tomasz Grabiec
45b4471a0e tests: Introduce test for query::partition_range 2015-07-24 16:08:41 +02:00
Pekka Enberg
78840a690f cql3: Move ut_name implementation to .cc file
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-24 10:21:11 +02:00
Pekka Enberg
1e5fad25d7 cql3: Move attributes implementation to .cc file
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-24 10:21:11 +02:00
Avi Kivity
8870bf1bf8 Merge "Handling of non-full partition range queries" from Tomasz 2015-07-22 15:18:02 +03:00
Avi Kivity
aa4dae29f3 build: unconditionally rebuild libseastar.a
Instead of trying to second-guess the seastar build system, always rebuild
libseastar.a.  Specify restat = 1 so that binaries are only relinked if
something truly changed.
2015-07-22 15:04:32 +03:00
Tomasz Grabiec
440962dbbf tests: Run mutation source tests on memtable 2015-07-22 13:14:33 +02:00
Tomasz Grabiec
0f3588708e tests: Extract range query tests from sstable_mutation_test into mutation_source_test
The idea is to reuse the same testing code on any mutation_source, for
example on memtable.

The range query test cases are now part of a generic mutation_source
test suite.
2015-07-22 13:14:33 +02:00
Pekka Enberg
e361f2a436 utils/runtime: Add uptime helpers
The functionality is similar to RuntimeMBean.getUptime() that's needed
in schema pulling logic.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-22 13:02:43 +03:00
Pekka Enberg
c6dc61eab4 service: Convert MigrationTask to C++
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-22 11:57:00 +03:00
Avi Kivity
6d2278f345 build: make libseastar.a depend on seastar configuration
Should cause a rebuild if seastar configuration changes, for example due
to --enable-dpdk.
2015-07-21 19:22:48 +03:00
Avi Kivity
cbc44d19cb build: fix build with dpdk
Rip out the seastar-related dpdk configuration, replace with simply
forwarding dpdk configuration to seastar.
2015-07-21 17:19:01 +03:00
Avi Kivity
c1e25e40f0 Merge "Streaming updates" from Asias
"With this series:

1) We can verify that data from Node A to Node B are streamed correctly.

2) Session completion are handled now.

Node A:
[Stream #08a2d480-2f7b-11e5-ae28-000000000000] Session with 127.0.0.2 is complete
[Stream #08a2d480-2f7b-11e5-ae28-000000000000] All sessions completed

Node B:
[Stream #08a2d480-2f7b-11e5-ae28-000000000000] Session with 127.0.0.1 is complete
[Stream #08a2d480-2f7b-11e5-ae28-000000000000] All sessions completed"
2015-07-21 11:29:26 +03:00
Avi Kivity
4e662a4512 build: figure out ninja executable more cleanly 2015-07-21 11:26:26 +03:00
Asias He
b53b8dfec8 streaming: Kill stream_reader
It is used by IncomingFileMessage's deserialize to read from network and
write int sstable. In urchin, we use messaging service, incoming
mutation is handler within the STREAM_MUTATION handler. No need for
stream_reader.
2015-07-21 16:12:54 +08:00
Asias He
8acf335f15 streaming: Kill stream_writer
In Origin, it is used by OutgoingFileMessage's serialize function to
write given section of the SSTable to network. In urchin, we send
mutaion directly in stream_transfer_task::start(). We can kill
stream_write now.
2015-07-21 16:12:54 +08:00
Nadav Har'El
2b337e33b4 build: strip all tests
This patch saves almost 20 GB (!!) of disk space in Urchin's build
directory, as well as a lot of memory during the link phase of the
build (which can be noticable on low-memory machines which leads to
slow swapping).

Because of C++'s extremely lengthy mangled names, and extremely numerous
functions, the debugging information generated for Seastar code is absurdly
large, and added to every single executable we generate. This is most
noticable in tests - we currently have over 30 tests (with hopefully much
more to come), each compiled into a separate excutable with its own copy of
all this debug information. Many of these executables are half a gigabyte,
each!

So this patch creates all test executables - whether debug or release mode -
stripped. When a user encounters a failing test he wants debug information
for (for gdb or the sanitizer), he can trivially relink it unstripped,
with a command like:

    ninja build/release/tests/urchin/sstable_test_g

note the added "_g". This links the already existing object files (which
still have their debug information, which takes just a fraction of a second.

On my machine, this patch reduces the Urchin built tests from about
27 GB to 8.1 GB. The build/release/tests directory drops from 10 GB to
just 0.6 GB! The build/debug/tests directory is still huge (7.5 GB),
although still smaller than what it was (17 GB). This remaining hugeness
is not because of the debug information, but because of the undefined-
behavior sanitizer (-fsanitize=undefined), which unfortunately adds a
huge data segment to each executable and I still don't know how to improve
on that. Nevertheless, it's still a significant reduction in space and will
be even more important as we write more tests.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-07-21 08:43:19 +03:00
Avi Kivity
5339783d56 build: handle case where ninja command is named 'ninja-build' 2015-07-19 21:41:42 +03:00