This heavily used function shows up in many places in the profile (as part
of other functions), so it's worth optimizing by eliminating the special
case for the standard allocator. Use a statically allocated object instead.
(a non-thread-local object is fine since it has no data members).
"This series introduces the i_endpoint_snitch::reset_snitch() static method
that allows to replace the current (global) snitch instance with the new one.
This is done in an (per-shard) atomic way transparent so anyone holding a reference
to snitch_ptr.
This series starts with some cleanups, adds the above method and the unit test
that verifies its functionality."
"I am currently looking at the performance of our index_read, since it was in
the past pinpointed at the source of problems.
While the read side is the one that is mostly interesting, I would like to test
both - besides anything else, it is easier to test reads after writes so we
don't have to create synthetic data with outside tools.
This patch introduces the write side benchmark (read side will hopefully come
tomorrow). While the write side is, as mentioned, not the most interesting
part, I did see some standing from the flamegraph that allowed me to optimize
one particular function, yielding a 8.6 % improvement."
This is a test that allow us to query the performance of our sstable index
reads and writes (currently only writes implemented). A lot of potentially
common code is put into a header, which will make writing new tests easier if
needed.
We don't want to take shortcuts for this, so all reading and writing is done
through public sstable interfaces.
For writing, there is no way to write the index without writing the datafile.
But because we are only writing the primary key, the datafile will not contain
anything else. This is the closest we can get to an index testing with the
public interfaces.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Currently, each column family creates a fiber to handle compaction requests
in parallel to the system. If there are N column families, N compactions
could be running in parallel, which is definitely horrible.
To solve that problem, a per-database compaction manager is introduced here.
Compaction manager is a feature used to service compaction requests from N
column families. Parallelism is made available by creating more than one
fiber to service the requests. That being said, N compaction requests will
be served by M fibers.
A compaction request being submitted will go to a job queue shared between
all fibers, and the fiber with the lowest amount of pending jobs will be
signalled.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
There's nothing legacy about it so rename legacy_schema_tables to
schema_tables. The naming comes from a Cassandra 3.x development branch
which is not relevant for us in the near future.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This patch adds the beginning of node repair support. Repair is initiated
on a node using the REST API, for example to repair all the column families
in the "try1" keyspace, you can use:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
I tested that the repair already works (exchanges mutations with all other
replicas, and successfully repairs them), so I think can be committed,
but will need more work to be completed
1. Repair options are not yet supported (range repair, sequential/parallel
repair, choice of hosts, datacenters and column families, etc.).
2. *All* the data of the keyspace is exchanged - Merkle Trees (or an
alternative optimization) and partial data exchange haven't been
implemented yet.
3. Full repair for nodes with multiple separate ranges is not yet
implemented correctly. E.g., consider 10 nodes with vnodes and RF=2,
so each vnode's range has a different host as a replica, so we need
to exchange each key range separately with a different remote host.
4. Our repair operation returns a numeric operation id (like Origin),
but we don't yet provide any means to use this id to check on ongoing
repairs like Origin allows.
5. Error hangling, logging, etc., needs to be improved.
6. SMP nodes (with multiple shards) should work correctly (thanks to
Asias's latest patch for SMP mutation streaming) but haven't been
tested.
7. Incremental repair is not supported (see
http://www.datastax.com/dev/blog/more-efficient-repairs)
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This patch adds a "ninja clean", better than the current "ninja -t clean".
Ninja's "ninja -t clean" is a nice trick, designed to save the Makefile writer
the tedious chore of listing the targets to remove, by automatically gathering
this list. But our build system, following OSv's one, actually uses a much
cooler (and better) trick: All build files are generated in a single
subdirectory, "build/", and cleaning the build products is as simple as
"rm -rf build".
So this patch adds a target, "ninja clean", which does exactly this (rm -rf
build). "ninja clean" is not only easier to type than "ninja -t clean", it
also has one important benefit: When the ninja rules change, "ninja -t clean"
doesn't remember to delete now-defunct targets, and they stay behind. On my
build machine, "ninja -t clean" left behind almost a gigabyte of old crap.
Moreover, when the ninja file changes drastically (as it changed a few days
ago), not cleaning up everything can even cause new builds to break - e.g.,
when something was previously a file and now needs to be a directory.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This patch introduce init.cc file which hosts all the initialization
code. The benefits are 1) we can share initialization code with tests
code. 2) all the service startup dependency / order code is in one
single place instead of everywhere.
The utils file will hold general modules, that need to be used by
multiple modules.
As a start, it holds the histogram definition.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Instead of trying to second-guess the seastar build system, always rebuild
libseastar.a. Specify restat = 1 so that binaries are only relinked if
something truly changed.
The idea is to reuse the same testing code on any mutation_source, for
example on memtable.
The range query test cases are now part of a generic mutation_source
test suite.
The functionality is similar to RuntimeMBean.getUptime() that's needed
in schema pulling logic.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
"With this series:
1) We can verify that data from Node A to Node B are streamed correctly.
2) Session completion are handled now.
Node A:
[Stream #08a2d480-2f7b-11e5-ae28-000000000000] Session with 127.0.0.2 is complete
[Stream #08a2d480-2f7b-11e5-ae28-000000000000] All sessions completed
Node B:
[Stream #08a2d480-2f7b-11e5-ae28-000000000000] Session with 127.0.0.1 is complete
[Stream #08a2d480-2f7b-11e5-ae28-000000000000] All sessions completed"
It is used by IncomingFileMessage's deserialize to read from network and
write int sstable. In urchin, we use messaging service, incoming
mutation is handler within the STREAM_MUTATION handler. No need for
stream_reader.
In Origin, it is used by OutgoingFileMessage's serialize function to
write given section of the SSTable to network. In urchin, we send
mutaion directly in stream_transfer_task::start(). We can kill
stream_write now.
This patch saves almost 20 GB (!!) of disk space in Urchin's build
directory, as well as a lot of memory during the link phase of the
build (which can be noticable on low-memory machines which leads to
slow swapping).
Because of C++'s extremely lengthy mangled names, and extremely numerous
functions, the debugging information generated for Seastar code is absurdly
large, and added to every single executable we generate. This is most
noticable in tests - we currently have over 30 tests (with hopefully much
more to come), each compiled into a separate excutable with its own copy of
all this debug information. Many of these executables are half a gigabyte,
each!
So this patch creates all test executables - whether debug or release mode -
stripped. When a user encounters a failing test he wants debug information
for (for gdb or the sanitizer), he can trivially relink it unstripped,
with a command like:
ninja build/release/tests/urchin/sstable_test_g
note the added "_g". This links the already existing object files (which
still have their debug information, which takes just a fraction of a second.
On my machine, this patch reduces the Urchin built tests from about
27 GB to 8.1 GB. The build/release/tests directory drops from 10 GB to
just 0.6 GB! The build/debug/tests directory is still huge (7.5 GB),
although still smaller than what it was (17 GB). This remaining hugeness
is not because of the debug information, but because of the undefined-
behavior sanitizer (-fsanitize=undefined), which unfortunately adds a
huge data segment to each executable and I still don't know how to improve
on that. Nevertheless, it's still a significant reduction in space and will
be even more important as we write more tests.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>