Commit Graph

6012 Commits

Author SHA1 Message Date
Avi Kivity
16d6daba64 mutation_reader: optimize make_reader_returning()
make_reader_returning() is used by the single-key query path, and is slowed
down by needlessly allocating a vector, which is initialized by copying
the mutation (as initializer_list<> can not be moved from).

Fix by giving it its own implementation instead of relying on
make_reader_returning_many().
2015-08-27 11:52:22 +02:00
Avi Kivity
5f62f7a288 Revert "Merge "Commit log replay" from Calle"
Due to test breakage.

This reverts commit 43a4491043, reversing
changes made to 5dcf1ab71a.
2015-08-27 12:39:08 +03:00
Avi Kivity
28bb65525b Merge "API updates for storage_service" from Asias
"Added APIs like get_release_version, get_operation_mode, is_gossip_running and
so on."
2015-08-27 11:40:48 +03:00
Avi Kivity
0fff367230 Merge "test for compaction metadata's ancestors" from Raphael 2015-08-27 11:07:53 +03:00
Avi Kivity
4e3c9c5493 Merge "compaction manager fixes" from Raphael 2015-08-27 11:05:26 +03:00
Asias He
0fcfc017c8 storage_service: Add debug info for loaded tokens and host ids 2015-08-27 11:01:09 +03:00
Asias He
ad4008d50e gms: Fix release_version
With this patch, the release_version column in system.peers is now correct.
2015-08-27 11:01:08 +03:00
Asias He
80c996a315 db/system_keyspace: Fix get_local_host_id
Before:
host_id in system.local is empty

After:
host_id in system.local is inserted correctly

This fixes a hasty problem that we always get a new host_id when
booting up a node with data.
2015-08-27 11:01:07 +03:00
Avi Kivity
43a4491043 Merge "Commit log replay" from Calle
"Initial implementation/transposition of commit log replay.

* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
  max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
  sstables are inspected for high water mark, and then replayed from
  those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
  per _previous_ runs shards, not current.

Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
  against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
  so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
  like origin. Partly because I am lazy, but also partly because our serial
  format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
  file, detailing which keyspace/cf:s to replay). Partly because we have no
  system properties.

There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
2015-08-27 10:53:36 +03:00
Avi Kivity
5dcf1ab71a Merge seastar upstream
* seastar d1fa2d7...10e09b0 (1):
  > future: balance constructors and destructors in future_state
2015-08-26 23:50:07 +03:00
Avi Kivity
bee26cec12 Merge seastar upstream
* seastar 23f4fae...d1fa2d7 (4):
  > memory: provide some statistic for total memory in debug mode
  > core/thread: Introduce yield()
  > future-util: Move later() into future-util.hh
  > tests: make alloc_test work with many --memory sizes
2015-08-26 21:05:51 +03:00
Avi Kivity
093bf13f4e Merge "SSTables: write performance" from Glauber
"This patchset fixes some small but pertinent issues with perf_sstable_index,
and then goes on to make the sstable buffer size configurable by tests.

With that in place, we can easily run a benchmark across multiple values to
figure out how the system behaves under multiple conditions.

The performance is improved by as much as 5% (smp == 8), although it not always
does so: smp == 12 yields the same for both buf sizes, but has a significant larger
error.

Available at:
    git@github.com:glommer/urchin.git   sstable-write-perf

Benchmark:
==========

run() {
    cpus=$1
    partitions=$2
    for i in 8 16 32 64 128 256; do
        printf "%3dk: " $i;
        ./build/release/tests/perf/perf_sstable_index --smp $cpus --iterations 30 --partitions $partitions --buffer_size $i;
    done
    echo -e "\n"
}

for cpus in 1 2 4 6 8 12; do
    echo "SMP $cpus, 500000 partitions"
    run
done

SMP 1, 500000 partitions
  8k: 325037.27 +- 335.47 partitions / sec (30 runs, 1 concurrent ops)
 16k: 374839.11 +- 400.24 partitions / sec (30 runs, 1 concurrent ops)
 32k: 405940.98 +- 467.75 partitions / sec (30 runs, 1 concurrent ops)
 64k: 428007.61 +- 522.59 partitions / sec (30 runs, 1 concurrent ops)
128k: 436788.19 +- 539.66 partitions / sec (30 runs, 1 concurrent ops)
256k: 442052.82 +- 656.14 partitions / sec (30 runs, 1 concurrent ops)

SMP 2, 500000 partitions
  8k: 569126.58 +- 641.92 partitions / sec (30 runs, 1 concurrent ops)
 16k: 646362.97 +- 987.67 partitions / sec (30 runs, 1 concurrent ops)
 32k: 718711.55 +- 1170.77 partitions / sec (30 runs, 1 concurrent ops)
 64k: 747707.73 +- 1464.18 partitions / sec (30 runs, 1 concurrent ops)
128k: 774929.13 +- 1540.03 partitions / sec (30 runs, 1 concurrent ops)
256k: 786300.94 +- 1552.16 partitions / sec (30 runs, 1 concurrent ops)

SMP 4, 500000 partitions
  8k: 974641.46 +- 1919.13 partitions / sec (30 runs, 1 concurrent ops)
 16k: 1086237.69 +- 2626.61 partitions / sec (30 runs, 1 concurrent ops)
 32k: 1173998.02 +- 4412.38 partitions / sec (30 runs, 1 concurrent ops)
 64k: 1254343.28 +- 5193.97 partitions / sec (30 runs, 1 concurrent ops)
128k: 1272103.63 +- 6710.27 partitions / sec (30 runs, 1 concurrent ops)
256k: 1277801.52 +- 5529.45 partitions / sec (30 runs, 1 concurrent ops)

SMP 6, 500000 partitions
  8k: 1131322.35 +- 3122.81 partitions / sec (30 runs, 1 concurrent ops)
 16k: 1284103.88 +- 4804.79 partitions / sec (30 runs, 1 concurrent ops)
 32k: 1324921.30 +- 5489.99 partitions / sec (30 runs, 1 concurrent ops)
 64k: 1401296.23 +- 5461.20 partitions / sec (30 runs, 1 concurrent ops)
128k: 1459283.89 +- 6674.87 partitions / sec (30 runs, 1 concurrent ops)
256k: 1449591.69 +- 6105.13 partitions / sec (30 runs, 1 concurrent ops)

SMP 8, 500000 partitions
  8k: 1168346.90 +- 3466.96 partitions / sec (30 runs, 1 concurrent ops)
 16k: 1288961.45 +- 3594.02 partitions / sec (30 runs, 1 concurrent ops)
 32k: 1362826.10 +- 5666.47 partitions / sec (30 runs, 1 concurrent ops)
 64k: 1412672.45 +- 4961.73 partitions / sec (30 runs, 1 concurrent ops)
128k: 1489967.90 +- 9373.07 partitions / sec (30 runs, 1 concurrent ops)
256k: 1449589.43 +- 9772.39 partitions / sec (30 runs, 1 concurrent ops)

SMP 12, 500000 partitions
  8k: 1183805.45 +- 3225.67 partitions / sec (30 runs, 1 concurrent ops)
 16k: 1295929.96 +- 9187.47 partitions / sec (30 runs, 1 concurrent ops)
 32k: 1373621.47 +- 10332.86 partitions / sec (30 runs, 1 concurrent ops)
 64k: 1436798.27 +- 12399.21 partitions / sec (30 runs, 1 concurrent ops)
128k: 1438624.45 +- 8824.31 partitions / sec (30 runs, 1 concurrent ops)
256k: 1482219.58 +- 15888.66 partitions / sec (30 runs, 1 concurrent ops)"
2015-08-26 20:52:27 +03:00
Tzach Livyatan
26bdcef325 README: Add recursive to git submodule update command
Recursive update is required for nested submodule

Signed-off-by: Tzach Livyatan <tzach@cloudius-systems.com>
2015-08-26 13:33:40 +03:00
Avi Kivity
9a0d8adc04 Merge "Implement system table accessors for peers table" from Glauber
Fixes #177.
2015-08-26 13:32:22 +03:00
Takuya ASADA
c341579568 dist: add make on build dependency
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-08-26 12:47:36 +03:00
Glauber Costa
391eea564e system_tables: implement load_host_id
A simple translation from the original code.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 19:16:30 -05:00
Glauber Costa
0fd2861293 system_tables: implement load_tokens
A simple translation from the original code

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 19:16:30 -05:00
Glauber Costa
cd5d93fe3a sstables: change default buffer size to 128k
Example run of perf_sstable_index:

 64k: 1401296.23 +- 5461.20 partitions / sec (30 runs, 1 concurrent ops)
128k: 1459283.89 +- 6674.87 partitions / sec (30 runs, 1 concurrent ops)

This is 4 % higher on an 0.45 % error

For larger buffers, like 256k, this doesn't yield a consistent gain, sometimes
yielding a loss.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:50 -05:00
Glauber Costa
873cf17cf4 sstable tests: allow for the creation of sstables of non-default buffer size.
This can now be used in the sstable_index_write performance test.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:50 -05:00
Glauber Costa
a5e173ec98 sstable: move buffer size inside the sstable object.
We'll pay the price of having this now as a variable instead of a constexpr,
but this dims in comparison with the rest of the operation.

By paying this cost, we gain the ability of actually specifying it during test
runs, making it easy to automate scripts that will measure the performance over
various buffer sizes.

I am also providing a new constructor that allows for the setting of the buffer
size.  The said constructor will be private, meaning that only the test class
will be able to use it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:50 -05:00
Glauber Costa
f4d8310d88 perf_sstable_index: calculate time spent before the map reduce operation.
Not doing that will include the smp communication costs in the total cost of
the operation. This will not very significant when comparing one run against
the other when the results clearly differ, but the proposed way yields error
figures that are much lower. So results are generally better.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:49 -05:00
Glauber Costa
19d25130af perf_sstable_index: make parallelism an explicit option
As we have discussed recently, the sstable writer can't even handle intra-core
parallelism - it has only one writer thread per core, and for reads, it affects
the final throughput a lot.

We don't want to get rid of it, because in real scenarios intra-core
parallelism will be there, specially for reads. So let's make it a tunable so we
can easily test its effect on the final result.

The iterations are now all sequential, and we will run x parallel invocation at
each of them.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:49 -05:00
Asias He
65af005ad0 storage_service: Kill get_saved_caches_location
It is implemented in api/storage_service.cc ss::get_saved_caches_location.
2015-08-26 06:51:48 +08:00
Asias He
dd8fb73370 storage_service: Kill getCommitLogLocation
It is implemented in api/storage_service.cc ss::get_commitlog.
2015-08-26 06:51:47 +08:00
Asias He
192d54d163 storage_service: Kill get_all_data_file_locations
It is implemented in api/storage_service.cc.
2015-08-26 06:51:47 +08:00
Asias He
d218e56a3f storage_service: Kill get_current_generation_number
It is implemented in api/storage_service.cc.
2015-08-26 06:51:47 +08:00
Asias He
7f741f90eb api/storage_service: Add join_ring
$ curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json" "http://127.0.0.1:10000/storage_service/join_ring"
2015-08-26 06:51:47 +08:00
Asias He
9528f27201 api/storage_service: Add is_joined
$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/join_ring"

true
2015-08-26 06:51:47 +08:00
Asias He
a145787afc api/storage_service: Add stop_gossiping
$ curl -X DELETE --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/gossiping"
2015-08-26 06:51:47 +08:00
Asias He
5d758a89d7 storage_service: Implement stop_gossiping 2015-08-26 06:51:47 +08:00
Asias He
67768c5e1b api/storage_service: Add start_gossiping
$ curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json" "http://127.0.0.1:10000/storage_service/gossiping"

btw, the description looks incorrect:
   POST /storage_service/gossiping
   allows a user to recover a forcibly 'killed' node
2015-08-26 06:51:47 +08:00
Asias He
87735b1069 storage_service: Implement start_gossiping 2015-08-26 06:51:47 +08:00
Asias He
c475f992a9 storage_service: Add get_generation_number helper 2015-08-26 06:51:47 +08:00
Asias He
6b4f27dc84 api/storage_service: Add is_gossip_running
$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/gossiping"

true
2015-08-26 06:51:47 +08:00
Asias He
36d86865a8 storage_service: Implement is_gossip_running 2015-08-26 06:51:47 +08:00
Asias He
5d5016f8d1 api/storage_service: Add is_starting
$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/is_starting"

false
2015-08-26 06:51:47 +08:00
Asias He
5dbbdd81e5 storage_service: Implement is_starting 2015-08-26 06:51:47 +08:00
Asias He
28a3eef9e3 api/storage_service: Add get_operation_mode
$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/operation_mode"

"NORMAL"
2015-08-26 06:51:47 +08:00
Asias He
e13956354e storage_service: Implement get_operation_mode 2015-08-26 06:51:47 +08:00
Asias He
cafdb99d23 api/storage_service: Add get_schema_version
$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/schema_version"

"59adb24e-f3cd-3e02-97f0-5b395827453f"
2015-08-26 06:51:47 +08:00
Asias He
33db0995b9 api/storage_service: Add get_release_version
$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/release_version"

"2.1.8"
2015-08-26 06:51:47 +08:00
Asias He
620a5ae5b6 storage_service: Implement get_schema_version 2015-08-26 06:51:47 +08:00
Asias He
9172578f78 storage_service: Implement get_release_version 2015-08-26 06:51:47 +08:00
Avi Kivity
e6965c520d Merge "Adding the ownership suport to storage_service" from Amnon
"This series adds the missing code from origin to support this functionality.
While doing so, some method where changed to be const when it was more
appropriate and a few const version of methods where added when the two
variation was required."
2015-08-25 20:13:33 +03:00
Avi Kivity
6e3cf26c73 Merge "improve mutation related errors logging" from Vlad
Fixes #30.
2015-08-25 19:44:25 +03:00
Amnon Heiman
c92bd9b121 API: Adding the ownership implmentation to storage_service
This adds the ownwership method implementation to the storage_service
API. After the patch the following url will be supported:

GET /storage_service/ownership/{keyspace}
GET /storage_service/ownership/

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-25 19:39:14 +03:00
Amnon Heiman
2c5716dac3 API: storage_service Add the swagger definition for ownership
This adds the API for get_effective_ownership and
get_ownership in storage_service.

It is based on the StorageServiceMBean definition.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-25 19:39:13 +03:00
Amnon Heiman
fd20f167e7 storage_service: get_ranges_for_endpoint, get_ownership and
effective_ownership

This patch adds the implementation for get_ranges_for_endpoint,
get_ownership and effective_ownership based on origin implementation.

The methods are used by the API.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-25 19:39:13 +03:00
Amnon Heiman
b5ceef451e keyspace: Add the get_non_system_keyspaces and expose the replication
This patch adds the get_non_system_keyspaces that found in origin and
expose the replication strategy. With the get_replication_strategy
method.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-25 19:39:13 +03:00
Amnon Heiman
6d875f1ec1 token_metadata: Add const when applicable
This patch adds a const version for get_datacenter_endpoints and
get_topology.

It modified the token iterator to use a const version of token_metadata
and it make first_token, first_token_index, tokens_end and ring_range to
be a const method.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-25 19:39:13 +03:00