make_reader_returning() is used by the single-key query path, and is slowed
down by needlessly allocating a vector, which is initialized by copying
the mutation (as initializer_list<> can not be moved from).
Fix by giving it its own implementation instead of relying on
make_reader_returning_many().
Before:
host_id in system.local is empty
After:
host_id in system.local is inserted correctly
This fixes a hasty problem that we always get a new host_id when
booting up a node with data.
"Initial implementation/transposition of commit log replay.
* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
sstables are inspected for high water mark, and then replayed from
those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
per _previous_ runs shards, not current.
Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
like origin. Partly because I am lazy, but also partly because our serial
format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
file, detailing which keyspace/cf:s to replay). Partly because we have no
system properties.
There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
* seastar 23f4fae...d1fa2d7 (4):
> memory: provide some statistic for total memory in debug mode
> core/thread: Introduce yield()
> future-util: Move later() into future-util.hh
> tests: make alloc_test work with many --memory sizes
Example run of perf_sstable_index:
64k: 1401296.23 +- 5461.20 partitions / sec (30 runs, 1 concurrent ops)
128k: 1459283.89 +- 6674.87 partitions / sec (30 runs, 1 concurrent ops)
This is 4 % higher on an 0.45 % error
For larger buffers, like 256k, this doesn't yield a consistent gain, sometimes
yielding a loss.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We'll pay the price of having this now as a variable instead of a constexpr,
but this dims in comparison with the rest of the operation.
By paying this cost, we gain the ability of actually specifying it during test
runs, making it easy to automate scripts that will measure the performance over
various buffer sizes.
I am also providing a new constructor that allows for the setting of the buffer
size. The said constructor will be private, meaning that only the test class
will be able to use it.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Not doing that will include the smp communication costs in the total cost of
the operation. This will not very significant when comparing one run against
the other when the results clearly differ, but the proposed way yields error
figures that are much lower. So results are generally better.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
As we have discussed recently, the sstable writer can't even handle intra-core
parallelism - it has only one writer thread per core, and for reads, it affects
the final throughput a lot.
We don't want to get rid of it, because in real scenarios intra-core
parallelism will be there, specially for reads. So let's make it a tunable so we
can easily test its effect on the final result.
The iterations are now all sequential, and we will run x parallel invocation at
each of them.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
$ curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json" "http://127.0.0.1:10000/storage_service/gossiping"
btw, the description looks incorrect:
POST /storage_service/gossiping
allows a user to recover a forcibly 'killed' node
"This series adds the missing code from origin to support this functionality.
While doing so, some method where changed to be const when it was more
appropriate and a few const version of methods where added when the two
variation was required."
This adds the ownwership method implementation to the storage_service
API. After the patch the following url will be supported:
GET /storage_service/ownership/{keyspace}
GET /storage_service/ownership/
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the API for get_effective_ownership and
get_ownership in storage_service.
It is based on the StorageServiceMBean definition.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
effective_ownership
This patch adds the implementation for get_ranges_for_endpoint,
get_ownership and effective_ownership based on origin implementation.
The methods are used by the API.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch adds the get_non_system_keyspaces that found in origin and
expose the replication strategy. With the get_replication_strategy
method.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch adds a const version for get_datacenter_endpoints and
get_topology.
It modified the token iterator to use a const version of token_metadata
and it make first_token, first_token_index, tokens_end and ring_range to
be a const method.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>