From Vlad:
"Currently database always created a SimpleSnitch and ignores the corresponding parameter
provided by the user. This series fixes this situation:
- Changes the snitch creation interface to comply the Java-like interface that
has already been used in a topology_strategy classes family.
- Fix all the places where a SimpleSnitch has been created ignoring the user configuration."
Moved setting of configuration variables after the configuration file
has been read.
Updated the code parsing seeds to comply with configuration file format
- seeds: <ip1>,<ip2>,<ip3>
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
Origin's --seed-provider-parameters format is seeds=<ip1>,<ip2>,<ip3> to
align with yaml configuration file format and command line options a
different seperator must be used instead of "," - switched to using ";"
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
Commit 0993142d8 changed buffer size of output streams to a better
number, but this change was lost when translating the sstable write
code to use the thread facility.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
partitions_ranges will be manipulated upon to be split for different
destination, so provide it separately from read_command to not copy the
later for each destination.
From Pekka:
"This series fixes up schema management code to store keyspace strategy
options to database. The map is stored as JSON just like in Origin."
We don't use this module and it's compilation is broken in DPDK 2.0.0
against Linux kernels 4.0.x.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Fix keyspace strategy options to preserve key-value ordering by
switching to std::map. We need this to be able to store the map in
database as JSON because unordered maps can cause the schema merging
code to attempt a keyspace update, which we don't support, even though
the values did not change.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Currently, we flush out memtables very aggressively which results into
lots of small sstable writes. The proper fix here is to do accounting on
the memtable size but before that happens, bump up the threshold to
another magic number which gives better batching:
$ ./build/release/seastar --smp 1 --data-file-directories data --commitlog-directory commitlog/
$ tools/bin/cassandra-stress write -mode cql3 native prepared -rate threads=32
Before:
Results:
op rate : 37280
partition rate : 37280
row rate : 37280
latency mean : 0.8
latency median : 0.6
latency 95th percentile : 1.1
latency 99th percentile : 7.6
latency 99.9th percentile : 11.9
latency max : 50.5
Total operation time : 00:00:30
END
After:
Results:
op rate : 46721
partition rate : 46721
row rate : 46721
latency mean : 0.7
latency median : 0.5
latency 95th percentile : 0.9
latency 99th percentile : 1.3
latency 99.9th percentile : 5.8
latency max : 96.3
Total operation time : 00:00:39
END
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This adds the API implementation to the column family API.
After this patch the following API will be supported:
/column_family/name
/column_family
/column_family/name/keyspace
This adds the Column familiy swagger definition file, the API is
equivelent to the ColumnFamilyStoreMBean definition.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Previous patch added an assert that is not true in the case a test runs
without an attached commit log, yet still generates enough mutations to cause
a memtable flush.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Commit log guarantees that once an RP is assigned to a data frame/caller, it
will not block before returning the result via future. However, this is not
enough, since we could
a.) Have blocked earlier, in which case the return value processing will be
async anyway
b.) Even if no blocking takes place, future chaining mechanism could decide
it has to reorder execution.
Assuming though that the case where this happens is rare, and cases where it
actually affects the rule of replay position ordering is even rarer, we can
guard against it by simply keeping track of the highest RP _discarded_ (sent
to sstable flush), and if we attempt to apply a mutation with a higher RP,
simply re-do the operation (i.e. write same entry to commit log again).
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
From Vlad:
"The series includes the first production snitch implementation:
gossiping_property_file_snitch.
There are also a few fixes/improvements in different parts of the project
that were discovered on the way."
Reads the configuration from cassandra-rackdc.properties.
This file may include the following fields:
- dc: Local Data Center name
- rack: Local Rack name
- prefer_local: A boolean value that defines if cluster should prefer
local address - relevant for AWS cloud.
Class will schedule a timer that will re-read the property file and inform a
Gossiper if there are changes in the local configuration.
Differences from the Origin C* implementation:
- No support for a legacy property_file_snitch.
- Class supports overriding the property file name in a constructor.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v4:
- Fix a debug compilation: define reload_property_file_period() to be a constexpr
method instead of a member.
- Don't stop() the snitch when snitch_is_ready is set to an exceptional state.
New in v2:
- Adjust to new file interface.
- Futurize reload_propery_file().
- Use trim() and split() from boost::algorithm.
- Read optimization and logging:
- Re-read the file only if it was changed since the last read.
- Use logging facilities from log.hh.
- Cleanups:
- Introduce bad_property_file_error exception.
- Remove unnecessary check after dma_read_exactly() call.
- Styling.
- Copyright.
- Move most of the functions implementation into the .cc file.
- Added stop() method.
- Implements the non-trivial versions of get_rack() and get_datacenter().
Performs a lookup in a following order:
1) Searches in a gossiper::endpoint_state_map.
2) Searches in a SystemTable.
3) If not found in any of the above returns a default value.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- Introduce db::system_keyspace::endpoint_dc_rack.
- Kill trim() and split().
- Added missing copyright and license statements.
- _my_rack and _my_dc are not optional anymore.
- Added a promiss that has to be set when snitch is stopped.
- Forbid explicit snitch creation with constructor.
- Allow the creation of snitches only with locator::make_snitch() template
function.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v4:
- Make sure the snitch is stopped before it's destroyed when _snitch_is_ready
is returned in an exceptional state.
New in v2:
- Change snitch_ptr to be std::unique_ptr<i_endpoint_snitch>
- abstract_replication_strategy::create_replication_strategy(): explicitly
specify (template) types of create_object() parameters.
- Re-arrange the loop in marge_keyspaces() so that lambdas that depend on
"this" complete before there is a chance that "this" gets destroyed.
- create_keyspace(): Don't add a new keyspace if a keyspace with this name
already exists.
- i_endpoint_snitch: added a stop() virtual method
- Added a stop() pure virtual method.
- Added an enum class snitch_state and a _state member initialized to snitch_state::initializing,
added an assert() in a destructor requiring _state to become snitch_state::stopped,
which should be set when stop() is complete.
- rack_inferring_snitch: added a stop() method.
- simple_snitch: added a stop() method.
- Added stop() methods to abstract_replication_strategy and keyspace.
- Updated database::stop() to wait for all keyspaces in _keyspaces to stop.