There are places in which we need to use the column family object many
times, with deferring points in between. Because the column family may
have been destroyed in the deferring point, we need to go and find it
again.
If we use lw_shared_ptr, however, we'll be able to at least guarantee
that the object will be alive. Some users will still need to check, if
they want to guarantee that the column family wasn't removed. But others
that only need to make sure we don't access an invalid object will be
able to avoid the cost of re-finding it just fine.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>
This adds the GET and POST api for slow query logging.
The GET return an object with the enable, ttl and threshold and the POST
lets you configure each of them.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
There is nothing really that fundamentally ties the estimated histogram to
sstables. This patch gets rid of the few incidental ties. They are:
- the namespace name, which is now moved to utils. Users inside sstables/
now need to add a namespace prefix, while the ones outside have to change
it to the right one
- sstables::merge, which has a very non-descriptive name to begin with, is
changed to a more descriptive name that can live inside utils/
- the disk_types.hh include has to be removed - but it had no reason to be
here in the first place.
Todo, is to actually move the file outside sstables/. That is done in a separate
step for clarity.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We have API for getting pending compaction tasks both in column
family and compaction manager. Column family is already returning
pending tasks properly.
Compaction manager's one is used by 'nodetool compactionstats', and
was returning a value which doesn't reflect pending compaction.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a20b88938ad39e95f98bfd7f93e4d1666d1c6f95.1471641211.git.raphaelsc@scylladb.com>
get_sstables_including_compacted_undeleted() may return temporary shared
ptr which will be destroyed before the loop if not stored locally.
Fixes#1514
Message-Id: <20160728100504.GD2502@scylladb.com>
In a leveled column family, there can be many thousands of sstables, since
each sstable is limited to a relatively small size (160M by default).
With the current approach of reading from all sstables in parallel, cpu
quickly becomes a bottleneck as we need to check the bloom filter for each
of these sstables.
This patch addresses the problem by introducing a
compaction-strategy-specific data structure for holding sstables. This
data structure has a method to obtain the sstables used for a read.
For leveled compaction strategy, this data structure is an interval map,
which can be efficiently used to select the right sstables.
This adds a definition to the scylla release version. The API already
return the compatibility version (ie. the compatible origin version)
This definition returns the scylla version, a call to the API should
return the same result as running scylla --version.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
sstable_list is now a map<generation, sstable>; change it to a set
in preparation for replacing it with sstable_set. The change simplifies
a lot of code; the only casualty is the code that computes the highest
generation number.
This adds to the definition of the collectd API the ability to turn on
and off specific collectd metrics.
For the GET end point a POST option was added that allow to enable or
disable a metric.
The general GET endpoint now returns the enable flag that indicates if
the metric is enable.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1466932139-19264-2-git-send-email-amnon@scylladb.com>
The space calculation counters in column family had two problem:
1. The total bytes is an ever growing counter, which is meaningless for
the API.
2. Trying to simply sum the size on all shards, ignores the fact that the
same sstable file can be referenced by multiple shards, this is
especially noticeable during migration time.
To solve this, the implementation was modified so instead of
collecting the sizes, the API would collect a map of file name to size
and then would do the summing.
This removes the duplications and fixes the total bytes calculation
Calling cfstats before the change with load after a compaction happend:
$ nodetool cfstats keyspace1
Keyspace: keyspace1
Verify write latency 1068253.0 76435
Read Count: 75915
Read Latency: 0.5953986037015082 ms.
Write Count: 76435
Write Latency: 0.013975966507490025 ms.
Pending Flushes: 0
Table: standard1
SSTable count: 5
Space used (live): 44261215
Space used (total): 219724478
After the fix:
$ nodetool cfstats keyspace1
Keyspace: keyspace1
Verify write latency 1863206.0 124219
Read Count: 125401
Read Latency: 0.9381053978835895 ms.
Write Count: 124219
Write Latency: 0.01499936402643718 ms.
Pending Flushes: 0
Table: standard1
SSTable count: 6
Space used (live): 50402904
Space used (total): 50402904
Space used by snapshots (total): 0
Fixes: #1042
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1464518757-14666-2-git-send-email-amnon@scylladb.com>
Currently, we register snitch API in set_server_gossip_settle() which
waits until a node has joined the cluster. This makes 'nodetool status'
not properly show the status of a joining node. Fix the issue by
registering snitch API earlier.
Fixes#1269.
Message-Id: <1463576381-15484-1-git-send-email-penberg@scylladb.com>
object
The API would expose now the rate_moving_average and
rate_moving_average_and_histogram.
The old end points remains for the transition period, but marked as
depricated.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This patch replaces the latency histogram to
rate_moving_avrage_and_histogram and the counters to
rate_moving_average.
The old endpoints where left unchagned but marked as depricated when
needed.
This patch replaces the helper function for column family with two
function, one that collect the relevant column family from all shareds
and another one that do the translation to json object.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This patch adds the helper function that are used to sum the
rate_moving_average and rate_moving_average_and_histogram.
The current sum functionality for histogram was modified to support
rate and histogram but return a histogram. This way current endpoints
would continue to behave the same.
It also cleans the histogram related method by using the plus operator
in the histogram.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
rate_moving_average and rate_moving_average_and_histogram are type that
are used by the JMX. They are based on the yammer meter and timer and
are used to collect derivative information.
Specificlly: rate_moving_average calculate rates and
rate_moving_average_and_histogram collect rates and
histogram.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
They are used by nodetool removenode:
$ nodetool removenode force
$ nodetool removenode status
For example:
$ nodetool removenode status
RemovalStatus: Removing token (-8969872965815280276). Waiting for
replication confirmation from [127.0.0.3,127.0.0.1].
$ nodetool removenode force
RemovalStatus: No token removals in process.
Tested with:
1)
- start 3 nodes
- inject data with
cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1
2)
- start 3 nodes
- inject data with
cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1
- kill -9 node3
- nodetool removenode will wait forever since node3 is gonne, node3
will never send the replication confirmation to node1
- run nodetool removenode force on node1
nodetool removenode completes with the following error:
$ nodetool removenode 31690b82-ebb0-4594-8bcf-1ce82b6e0f6e
nodetool: Scylla API server HTTP POST to URL
'/storage_service/remove_node' failed: nodetool removenode force is called by user
nodetool removenode force completes sucessfully
$ nodetool removenode force
RemovalStatus: Removing token (-9171569494049085776). Waiting for
replication confirmation from [127.0.0.3,127.0.0.1].
Fixes 1135.
After this change, user can query compression ratio on a per column
family basis with 'nodetool cfstats'.
look at 'nodetool cfstats' output:
./bin/nodetool cfstats ks.test5
Keyspace: ks
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: test5
SSTable count: 1
Space used (live): 4774
Space used (total): 4774
Space used by snapshots (total): 0
Off heap memory used (total): 131384
SSTable Compression Ratio: 0.833333
...
Fixes#636.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a1bee5a23fe63787df3e387a88f2d216ba4a4134.1459802771.git.raphaelsc@scylladb.com>
This is a left over from the re ordering of the API init. The api_doc
should be set first, so later API registration will enable their
relevent swagger doc.
Currently, the swagger documentation of the system API is not available.
Fixes#1160
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1459750490-15996-1-git-send-email-amnon@scylladb.com>
'nodetool enable/disablebackup' callback was modifying only the
existing keyspaces and column families configurations.
However new keyspaces/column families were using
the original 'incremental_backups' configuration value which could
be different from the value configured by 'nodetool enable/disablebackup'
user command.
This patch updates the database::_enable_incremental_backups per-shard
value in addition to updating the existing keyspaces and column families
configurations.
Fixes#845
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
For each stream_session, we pretend we are sending/receiving one file,
to make it compatible with nodetool. For receiving_files, the file name
is "rxnofile". For sending_files, the file name is "txnofile".
stream_manager::update_all_progress_info is introduced to update the
progress info of all the stream_sessions in the node. We need this
because streaming mutations are received on all the cores, but the
stream_session object is only on one of the cores. It adds overhead if
we update progress info in stream_session object whenever we receive a
streaming mutation. So, what we do now is when we really need the
progress info, we update the progress info in stream_session object.
With http://127.0.0.$i:10000/stream_manager/, it looks like below when
decommission node 3 in a 3 nodes cluster.
=========== GET NODE 1
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.3", "current_bytes": 16876296}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]
=========== GET NODE 2
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16755552, "peer": "127.0.0.3", "current_bytes": 16755552}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]
=========== GET NODE 3
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"sending_files": [{"value": {"direction":
"OUT", "file_name": "txnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.1", "current_bytes": 16876296}, "key":
"txnofile"}], "sending_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.1", "peer":
"127.0.0.1"},{"sending_files": [{"value": {"direction": "OUT",
"file_name": "txnofile", "session_index": 0, "total_bytes": 16755552,
"peer": "127.0.0.2", "current_bytes": 16755552}, "key": "txnofile"}],
"sending_summaries": [{"files": 1, "total_size": 0, "cf_id":
"869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state":
"PREPARING", "connecting": "127.0.0.2", "peer": "127.0.0.2"}]}]
To implement nodetool's "--start-token"/"--end-token" feature, we need
to be able to repair only *part* of the ranges held by this node.
Our REST API already had a "ranges" option where the tool can list the
specific ranges to repair, but using this interface in the JMX
implementation is inconvenient, because it requires the *Java* code
to be able to intersect the given start/end token range with the actual
ranges held by the repaired node.
A more reasonable approach, which this patch uses, is to add new
"startToken"/"endToken" options to the repair's REST API. What these
options do is is to find the node's token ranges as usual, and only
then *intersect* them with the user-specified token range. The JMX
implementation becomes much simpler (in a separate patch for scylla-jmx)
and the real work is done in the C++ code, where it belongs, not in
Java code.
With the additional scylla-jmx patch to use the new REST API options
provided here, this fixes#917.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1455807739-25581-1-git-send-email-nyh@scylladb.com>
'nodetool cleanup' must wait for termination of cleanup, however,
cleanup is handled asynchronously. To solve that, a mechanism is
added here to wait for termination of a cleanup. This mechanism is
about using promise to notificate waiter of cleanup completion.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <6dc0a39170f3f51487fb8858eb443573548d8bce.1455655016.git.raphaelsc@scylladb.com>
Currently, only the shard where the stream_plan is created on will send
streaing mutations. To utilize all the available cores, we can make each
shard send mutations which it is responsbile for. On the receiver side,
we do not forward the mutations to the shard where the stream_session is
created, so that we can avoid unnecessary forwarding.
Note: the downside is that it is now harder to:
1) to track number of bytes sent and received
2) to update the keep alive timer upon receive of the STREAM_MUTATION
To fix, we now store the sent/recieved bytes info on all shards. When
the keep alive timer expires, we check if any progress has been made.
Hopefully, this patch will make the streaming much faster and in turn
make the repair/decommission/adding a node faster.
Refs: https://github.com/scylladb/scylla/issues/849
Tested with decommission/repair dtest.
Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>