Commit Graph

99 Commits

Author SHA1 Message Date
Avi Kivity
9f9f435e9a Merge "Adding snitch_name and update_snitch" from Amnon
"This adds the get_snitch_name and update_snitch functionality to the API. After
this series it would be possible to return the snitch name and to update the
snitch."
2015-08-16 19:34:41 +03:00
Avi Kivity
7a14bcd66e Merge "API: add get estimated row size histogram to column family" from Amnon
"This series cleans the streaming_histogram and the estimated histogram that
were importad from origin, it then uses it to get the estimated min and max row
estimation in the API."
2015-08-16 17:31:23 +03:00
Avi Kivity
eb09eddee5 Merge "Adding sampled histogram" from Amnon
"Histograms are used to collect latency information, in Origin, many of the
operations are timed, this is a potential performance issue. This series adds
an option to sample the operations, where small amount will be timed and the
most will only be counted.

This will give an estimation for the statistics, while keeping an accurate
count of the total events and have neglectible performance impact.

The first to use the modified histogram are the column family for their read
and write."

Conflicts:
	database.hh
2015-08-16 17:15:24 +03:00
Nadav Har'El
5a02eeaba9 v2: repair: track ongoing repairs
[in v2: 1. Fixed a few small bugs.
        2. Added rudementary support parallel/sequential repair.
	3. Verified that code works correctly with Asias's fix to streaming]

This patch adds the capability to track repair operations which we have
started, and check whether they are still running or completed (successfully
or unsuccessfully).

As before one starts a repair with the REST api:

   curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"

where "try1" is the name of the keyspace. This returns a repair id -
a small integer starting with 0. This patch adds support for similar
request to *query* the status of a previously started repair, by adding
the "id=..." option to the query, which enquires about the status of the
repair with this id: For example.,

    curl -i -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1?id=0"

gets the current status of this repair 0. This status can be RUNNING,
SUCCESSFUL or FAILED, or a HTTP 400 "unknown repair id ..." in case an
invalid id is passed (not the id of any real repair that was previously
started).

This patch also adds two alternative code-paths in the main repair flow
do_repair_start(): One where each range is repaired one after another,
and one where all the ranges are repaired in parallel. At the moment, the
enabled code is the parallel version, just as before this patch. But the
will also be useful for implementing the "parallel" vs "sequential" repair
options of Cassandra.

Note that if you try to use repair, you are likely to run into a bug in
the streaming code which results in Scylla either crashing or a repair
hanging (never realising it finished). Asias already has a fix this this bug,
and will hopefully publish it soon, but it is unrelated to the repair code
so I think this patch can independently be committed.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-16 14:23:02 +03:00
Amnon Heiman
eee3094197 API: Add the get_snitch command
This adds the get_snitch_name command.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-16 12:15:38 +03:00
Amnon Heiman
524e0a00df API: Adding the update snitch API
The update snitch API reset the snitch with a new class.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-16 12:15:14 +03:00
Amnon Heiman
773106b90e API: add get estimated row size histogram to column family
This adds the implementation to in the API to the row size histogram.

It adds a map_cf method that perform a map operation over all column
family on the different shards.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-12 13:10:18 +03:00
Amnon Heiman
0ca7189664 API: Adding the estimated_histogram to the utils definition file
This adds the estimated_histogram to the utils definition file.

The estimated_histogram holds a list of buckets and a list of buckets
offsets.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-12 13:10:18 +03:00
Amnon Heiman
ae34ba32fa API: Adding min row and max row support to column_family
This adds the implementation for min and max row size in column family.

It uses the column family map redudce helper function with the addtional
function to get the min and max row size.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-12 13:10:18 +03:00
Amnon Heiman
ba5b1db618 API: Add a wrapper function for min and max
This helper function wraps the std min and max template for int64_t, it
makes it easier to pass them as a value in need.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-12 13:10:18 +03:00
Amnon Heiman
dab068dde9 API: modify column family API to use the histogram
With the change in column_family stats, the API needs to get the counter
from the read and write histogram.

It also adds the implementation for the read and write latency histogram.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Amnon Heiman
17ebebf268 API: When combining histogram, return zeroed histogram on empty
This change make sure that when there are no results (ie. all the
histogram that are summed are empty) the return result will be a zerroed
histogram and not an empty object.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Amnon Heiman
3ef36681cc API: Adding read, write latency histogram to column_family
This adds the latency histogram to the column_family swagger
definitions.
The definitions are based on the ColumnFamilyMetrics.
It adds the following commands:

get_read_latency_histogram
get_all_read_latency_histogram
get_write_latency_histogram
get_all_write_latency_histogram

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Raphael S. Carvalho
1e335006e7 api: add missing stats to column family api
addresses issue #84

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-10 12:31:38 +03:00
Tomasz Grabiec
e3592a4a04 api: lsa: Invoke compaction on all shards 2015-08-07 22:05:53 +02:00
Tomasz Grabiec
6ae0747fe5 lsa: Use size_t for sizes 2015-08-06 18:40:06 +02:00
Tomasz Grabiec
5d7500d648 api: lsa: Make logger static 2015-08-06 18:40:06 +02:00
Tomasz Grabiec
9a1ee1b96a api: Introduce RESTful API for LSA
To force compaction, invoke:

  $ curl -X POST http://localhost:10000/lsa/compact
2015-08-06 16:50:15 +02:00
Tomasz Grabiec
1046ee6e80 memtable: Remove all_partitions()
Preferred way to access the memtable is via reader.
2015-08-06 14:05:16 +02:00
Nadav Har'El
34b1cc42cd Initial repair support
This patch adds the beginning of node repair support. Repair is initiated
on a node using the REST API, for example to repair all the column families
in the "try1" keyspace, you can use:

curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"

I tested that the repair already works (exchanges mutations with all other
replicas, and successfully repairs them), so I think can be committed,
but will need more work to be completed

 1. Repair options are not yet supported (range repair, sequential/parallel
    repair, choice of hosts, datacenters and column families, etc.).

 2. *All* the data of the keyspace is exchanged - Merkle Trees (or an
    alternative optimization) and partial data exchange haven't been
    implemented yet.

 3. Full repair for nodes with multiple separate ranges is not yet
    implemented correctly. E.g., consider 10 nodes with vnodes and RF=2,
    so each vnode's range has a different host as a replica, so we need
    to exchange each key range separately with a different remote host.

 4. Our repair operation returns a numeric operation id (like Origin),
    but we don't yet provide any means to use this id to check on ongoing
    repairs like Origin allows.

 5. Error hangling, logging, etc., needs to be improved.

 6. SMP nodes (with multiple shards) should work correctly (thanks to
    Asias's latest patch for SMP mutation streaming) but haven't been
    tested.

 7. Incremental repair is not supported (see
    http://www.datastax.com/dev/blog/more-efficient-repairs)

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-05 13:26:36 +03:00
Amnon Heiman
cea73277ca API: Add read, write, and flush statistic to column_family
This adds the API implementation for the read, write, number of
panding flushes and memtable switch count.

The implementation uses a helper function to perform map and map_reduce
on column_family.

The get_uuid helper method now supports both colon notations (i.e.
either as a ":" or as %3A)

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-03 11:36:40 +03:00
Amnon Heiman
8356b493a3 API: Adding read and write counters to column_family definition
This adds the read and write counters to the column_family swagger
definitions.

It adds the following commands:
get_read
get_all_read
get_write
get_all_write

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-03 11:36:33 +03:00
Amnon Heiman
01aacbeacc API: Adding the histogram implementation to storage_proxy
This adds the implementation to the histogram for the storage proxy.
After this patch the following url will be available:
/storage_proxy/metrics/read/latency/histogram
/storage_proxy/metrics/range/latency/histogram
/storage_proxy/metrics/write/latency/histogram

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-26 11:03:31 +03:00
Amnon Heiman
429f7d2b20 API: Adding the histogram stats definition to storage_proxy
This adds the read, write and range histograms to the storage_proxy
It adds the following commands:
get_read_metrics_latency_histogram
get_range_metrics_latency_histogram
get_write_metrics_latency_histogram

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-26 11:03:30 +03:00
Amnon Heiman
130d8a7cc6 API: generalize the sum helper functions and add histogram support
This patch generalizd the sum helper function to accept any field as
long as it support the + operator and that it can be parrsed as json.

It adds a sum function to sum histograms it does so by:
adding the totatl, adding the sum, set the min and max
setting the avrage and variance and combining the samples.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-26 11:03:24 +03:00
Amnon Heiman
4908222d6a Adding utils.json Swagger definition file
The utils file will hold general modules, that need to be used by
multiple modules.

As a start, it holds the histogram definition.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-26 10:58:45 +03:00
Pekka Enberg
dcbbafd41c api: Switch to "#pragma once" as include guard
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-22 18:03:42 +03:00
Avi Kivity
c74e36c30e Merge branch 'master' of github.com:cloudius-systems/urchin into db
Conflicts:
	message/messaging_service.cc
	message/messaging_service.hh
2015-07-16 12:51:19 +03:00
Avi Kivity
1d4805236b messaging_service: don't include config.hh in .hh
config.hh changes rapidly, so don't force lots of recompiles by including it.

Need to place seed_provider_type in namespace scope, so we can forward
declare it for that.
2015-07-16 12:26:02 +03:00
Asias He
244b9289c6 api/messaging_service: Use get_stats
Hide rpc::protocol<serializer, messaging_verb>::client from it.
2015-07-16 17:23:26 +08:00
Glauber Costa
04c0fbcb8c remove calls to seal_active_memtable
It should not be called directly: externall callers should be calling flush()
instead.

To be sure it doesn't happen again, make seal_active_memtable private.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-15 10:24:20 -04:00
Glauber Costa
9c464aff9b database: clean up various APIs
In much of our column_families APIs, we need to pass a pointer to the database.
The only reason we do that, is so we can properly handle the commit log entries
after we seal the current memtables into sstables.

Now that we store a pointer to the commit log in the CF itself at the time it
is created, we no longer have to do it. As a result, the APIs are a lot
cleaner, with no gratuitous parameters.

My motivation for this was the flush method, but as a result, apply() also gets
cleaner.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-15 10:24:20 -04:00
Amnon Heiman
83a64d75b5 Cleaning the gossiper API
This replaces the void method in the gossiper API to be json_void

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-15 14:20:28 +03:00
Amnon Heiman
b6fa2187af Cleanup the cache_service API
This replaces the void method in the cache service API to be json_void

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-15 14:20:28 +03:00
Amnon Heiman
2b0393525f Cleaning the hinted_handoff API
This replaces the void method in the hinted handoff API to be json_void

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-15 14:20:27 +03:00
Amnon Heiman
161c37d607 Cleaning the storage_proxy API
This replaces the void method in the storage proxy API to be json_void

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-15 14:20:27 +03:00
Amnon Heiman
14aafc83b6 API: Adding the get_host_id implementation to the API
This adds the implementation for the get host id API.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-15 14:19:16 +03:00
Amnon Heiman
40d0d58a50 Cleaning the storage_service API
This changes the return type of void API to json_void, for a cleaner
API.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-15 14:18:37 +03:00
Amnon Heiman
06df13b091 API: Adding an implementation to storage_proxy counters
This adds an implementation to the storage_service counters. The
implementation uses the stats object inside the storage_proxy.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-12 22:59:43 +03:00
Amnon Heiman
dfc7121fd8 API: Add a helper function to sum stat values of a distributed objec
A common scenario in the API is to get a sigle value from a distributed
object that has a get_stats method.

The helper function would get the object and a function that return a
single value from the stat object and would perform the map_reduce.

It would return a future that can be used as a return value from the
API.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-12 22:59:37 +03:00
Amnon Heiman
d0ce45efbb API: Add a reference to storage_proxy into API context
The API needs to call the storage_proxy, for that a reference to the
distribute storage_proxy is added to the context and is set in main.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-12 22:59:37 +03:00
Amnon Heiman
3dd694b8c6 API: Adding a stub implementation to the storage_proxy
This adds a stub implementation to the metrics of the storage proxy.
After this patch the following URL will be available:
/storage_service/metrics/cas_write/contention
/storage_service/metrics/cas_write/condition_not_met
/storage_service/metrics/cas_read/unfinished_commit
/storage_service/metrics/cas_read/contention
/storage_service/metrics/cas_read/condition_not_met
/storage_service/metrics/read/timeouts
/storage_service/metrics/read/unavailables
/storage_service/metrics/range/timeouts
/storage_service/metrics/range/unavailables
/storage_service/metrics/write/timeouts
/storage_service/metrics/write/unavailables

The implementation returns 0 for all queries.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-12 14:07:28 +03:00
Amnon Heiman
c82b89a8b0 Adding the metrics definition to the storage_proxy
This adds the storage definition to the storage proxy swagger definition
file.
It adds the definitions for the following command:
get_cas_write_metrics_unfinished_commit
get_cas_write_metrics_contention
get_cas_write_metrics_condition_not_met
get_cas_read_metrics_unfinished_commit
get_cas_read_metrics_contention
get_cas_read_metrics_condition_not_met
get_read_metrics_timeouts
get_read_metrics_unavailables
get_range_metrics_timeouts
get_range_metrics_unavailables
get_write_metrics_timeouts
get_write_metrics_unavailables

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-12 14:05:05 +03:00
Amnon Heiman
fd7e0e512a Adding the commit log metric stub implementation
This adds a stub implementation to the commit log metrics.
The calls return the currect value type with a stub value.

After this patch the following url will be available:
/commitlog/metrics/completed_tasks
/commitlog/metrics/pending_tasks
/commitlog/metrics/total_commit_log_size

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-06 12:38:22 +03:00
Amnon Heiman
240c7b0572 API: Adding the commit log metrics definitions
This adds the commit log swagger definition to to the commit log
definition file.

The API is based on the CommitLogMetrics.
The following commands were added:
get_completed_tasks
get_pending_tasks
get_total_commit_log_size

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-06 12:37:45 +03:00
Avi Kivity
3aebcfe6b7 Merge "Adding the column family metrics API" from Amnon
"The column family matrics is a set of data related to the column family.

This series adds an API based on the ColumnFamilyMetrics mbean.
It has a stub implementation, just so the JMX proxy would get a response."
2015-07-05 17:36:58 +03:00
Avi Kivity
8fa053c7c6 Merge "Adding the hinted_handoff API" from Amnon
"This series adds the hinted handoff and hinted handoff metrics API with a stub
implementation.  The API definition was based on the HintedHandOffMetricsMBean
and the HintedHandoffMetrics."

Conflicts:
	api/api.cc
	configure.py
2015-07-05 17:33:25 +03:00
Avi Kivity
b8f8f66e81 Merge "Adding the endpoint_snitch_api" from Amnon
"This series adds the endpoint snitch api. It is based on the
EndpointSnitchInfoMBean definition."

Conflicts:
	api/api.cc
	configure.py
2015-07-05 17:31:23 +03:00
Avi Kivity
dedb9e8434 Merge "Adding the cache service metrics API" from Amnon
"This series adds the cache service metrics API, It is based on the CacheMetrics
definitions.

There are statistics on per key, row and counters that will be expose in the
API.  This series contain a stub implementation, that returns the correct types
but with a stub value."
2015-07-05 16:45:43 +03:00
Amnon Heiman
3b4ce5a219 API: Adding a stub implementation for hinted_handoff metrics
This adds a stub implementation for the hinted handoff metrics.
The stubbed methods return the correct type, but with a stub value.
After this patch the following path will be available:
/hinted_handoff/metrics/create_hint/{addr}
/hinted_handoff/metrics/not_stored_hints/{addr}

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-07-05 16:42:54 +03:00