Commit Graph

10392 Commits

Author SHA1 Message Date
Amnon Heiman
4e0dcb59e7 scylla.spec: package the node_exporter scripts
This patch adds the node_exporter related files to the rpm.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-09-22 09:43:53 +03:00
Amnon Heiman
3d242fdb4d Add a link to node_exporter_install
This adds a link to node_exporter_install in sbin, so it will be
availabe in the path.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-09-22 09:43:53 +03:00
Amnon Heiman
9d3edd3a28 service file for node_exporter with systemd
This patch adds a service file for OS that supports systemd.

When started, it would run an already installed node_exporter or fail.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-09-22 09:43:53 +03:00
Amnon Heiman
801b2c4914 An installation script for node_exporter
node_exporter is a utility that export node information via prometheus
API. It takes care of host related metrics such as CPU and memory.

The install script, download the node_exporter binaries, create a link
in /usr/bin.

On OS with systemd supported it would enable and start the installed
service file to start as a service. On others (ubuntu) it would create a conf file and start it.

The installation should be done using sudo.

After a successful installation, the node_exporter would run as a
service.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-09-22 09:33:03 +03:00
Pekka Enberg
f1d0401ed2 main: Use proper logger for API server messages
We have a "startlog" that we can use to print out API server messages.

Message-Id: <1474358312-26510-1-git-send-email-penberg@scylladb.com>
2016-09-20 11:09:59 +03:00
Pekka Enberg
38b137713f transport/server: Fix CQL v1 prepared statement execution
The EXECUTE message encoding is different between CQL binary protocol
versions v1 and v2 (and later). Fix process_execute() to deserialize the
message as per the CQL binary protocol v1 specification:

    Executes a prepared query. The body of the message must be:
      <id><n><value_1>....<value_n><consistency>
    where:
      - <id> is the prepared query ID. It's the [short bytes] returned as a
        response to a PREPARE message.
      - <n> is a [short] indicating the number of following values.
      - <value_1>...<value_n> are the [bytes] to use for bound variables in the
        prepared query.
      - <consistency> is the [consistency] level for the operation.

Fixes #1676

Message-Id: <1474287392-16792-1-git-send-email-penberg@scylladb.com>
2016-09-19 15:26:30 +03:00
Raphael S. Carvalho
0eaa0f46c9 sstables: store first and last decorated keys in sstable object
leveled strategy uses heavily first and last decorated keys of a
sstable to get overlapping sstables in a given level. By storing
first and last decorated keys in sstable object, it's expected
that performance of leveled strategy (not compaction) will be
improved.
We will set first and last keys in sstable when either loading
or sealing it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>
2016-09-19 13:25:58 +02:00
Raphael S. Carvalho
dffb41f9d8 sstables: remove schema parameter from some sstable methods
schema can now be found in the sstable object itself.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>
2016-09-19 13:25:58 +02:00
Tomasz Grabiec
2282599394 tests: Add test for UUID type ordering
Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>
2016-09-16 11:07:14 +01:00
Tomasz Grabiec
804fe50b7f types: fix uuid_type_impl::less
timeuuid_type_impl::compare_bytes is a "trichotomic" comparator (-1,
0, 1) while less() is a "less" comparator (false, true). The code
incorrectly returns c1 instead of c1 < 0 which breaks the ordering.

Fixes #1196.
Message-Id: <1473956716-5209-1-git-send-email-tgrabiec@scylladb.com>
2016-09-16 11:06:55 +01:00
Duarte Nunes
bc3cbb7009 thrift: Correctly detect clustering range wrap around
This patch uses the clustering bounds comparator to correctly detect
wrap around of a clustering range in the thrift handler.

Refs #1446

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473938611-8590-1-git-send-email-duarte@scylladb.com>
2016-09-15 14:31:16 +01:00
Shlomi Livne
acb83073e2 ami: Fix instructions how to run scylla_io_setup on non ephemeral instances
On instances differenet then i2/m3/c3 we provide instructions to run
scylla_ip_setup. Running scylla_io_setup requires access to
/var/lib/scylla to crate a temporary file. To gain access to that
directory the user should run 'sudo scylla_io_setup'.

refs: #1645

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <4ce90ca1ba4da8f07cf8aa15e755675463a22933.1473935778.git.shlomi@scylladb.com>
2016-09-15 13:40:53 +03:00
Avi Kivity
d2b5f3ff44 Merge seastar upstream
* seastar e534401...40a68fa (1):
  > rpc: fix dangling reference in read_rcv_buf
2016-09-15 12:20:49 +03:00
Gleb Natapov
2e8b255741 Merge seastar upstream
* seastar 0303e0c...e534401 (6):
  > Merge "enable rpc to work on non contiguous memory for receive" from Gleb
  > install-dependencies.sh: install python3 for Ubuntu/Debian, which requires for configure.py
  > fix tcp stuck when output_stream write more than 212992 bytes once.
  > scripts/posix_net_conf.sh: supress 'ls: cannot access /sys/class/net/<NIC>/device/msi_irqs/' error message
  > scripts/posix_net_conf.sh: fix 'command not found' error when specifies --cpu-mask
  > native_network_stack: Fix use after free/missing wait in dhcp

Includes: "Remove utils::fragmented_input_stream and utils::input_stream in favor of seastar version" from Gleb.
2016-09-15 12:12:16 +03:00
Tomasz Grabiec
ed312c2b1a Merge remote-tracking branch 'duarte/comparator/v1'
From Duarte:

This patchset reuses the bound_view::comparator in range_tombstone to
correctly detect wrap around of a clustering range. This fixes a
manifestation of #1446 that results in wrong query results.

Introduced by b1f9688432

Fixes #1669
Refs #1446
2016-09-14 18:21:05 +02:00
Paweł Dziepak
bc2ff41003 cql3: fix units in large batch warning
When displaying a warning about batch being too large C* reports batch
size and limit in bytes while S* uses kB.

This patch switches Scylla to use bytes.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1473867171-18932-1-git-send-email-pdziepak@scylladb.com>
2016-09-14 18:38:46 +03:00
Takuya ASADA
647673195c dist/redhat/build_rpm.sh: add dependency for rpmbuild
Install rpmbuild when it's not installed yet.
Fixes #1651

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1473193430-14792-1-git-send-email-syuu@scylladb.com>
2016-09-14 14:57:55 +03:00
Calle Wilund
f126cf769a column_family: Ensure flush() waits for all previous flushes + self
Fixes #1577
Message-Id: <1472569952-4066-1-git-send-email-calle@scylladb.com>
2016-09-14 11:00:41 +01:00
Duarte Nunes
f864bca773 row_cache: Deal with side-effects in allocating_section
In row_cache::make_reader, we update statistics inside an
allocating_section, which retries the supplied function until it can
satisfy all allocations by way of reserving LSA memory up front. Since
those updates are interleave with allocations, retries can lead to
miscounts.

This patch fixes this by updating statistics after all allocations.

Fixes #1659

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473845977-20205-1-git-send-email-duarte@scylladb.com>
2016-09-14 10:46:25 +01:00
Tomasz Grabiec
a498da1987 database: Ignore spaces in initial_token list
Currently we get boost::lexical_cast on startup if inital_token has a
list which contains spaces after commas, e.g.:

  initial_token: -1100081313741479381, -1104041856484663086, ...

Fixes #1664.
Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>
2016-09-14 11:58:13 +03:00
Paweł Dziepak
c220c676c8 types: honour end of sstring_view
There are several places in types.cc where we assume that sstring_view
range is null terminated. That may be not true and we should always use
either begin()/end() or data()/size() pairs.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-09-07 14:30:56 -07:00
Paweł Dziepak
6373289532 Merge "Adding slow query API" from Amnon
"This series adds an API for the slow query recording.

After this series it will be possible to set the/get the slow query
recording parameters."
2016-09-07 11:06:09 -07:00
Pekka Enberg
1095705a6b Update scylla-ami submodule
* dist/ami/files/scylla-ami 14c1666...e1e3919 (1):
  > scylla_ami_setup: remove scylla_cpuset_setup
2016-09-07 21:04:03 +03:00
Avi Kivity
7ac729b4d5 Merge "Optimize reads for clustered data" from Raphael
"This will be very important for read performance of time series use case,
where timestamp is usually stored as a clustering key, and the user asks
for specific data using a clustering range filter. Example:
    CREATE TABLE temperature (
        weatherstation_id text,
        event_time timestamp,
        temperature text,
        PRIMARY KEY (weatherstation_id,event_time)
    );
    ...
    SELECT * FROM temperature
        WHERE weatherstation_id='1234ABCD'
        AND event_time > '2013-04-03 07:01:00'
        AND event_time < '2013-04-03 07:04:00';

This is based on: https://issues.apache.org/jira/browse/CASSANDRA-5514

To check correctness, I wrote a dtest that runs scylla with row cache disabled,
creates several sstables with non overlapping clustering key ranges, queries
data using several clustering range filters, and checks that the database
returns the expected results.

Tested performance with a tool I wrote myself [1] and performance is indeed
improved by this patchset. This tool works as follow:
Scylla is started with row cache disabled. That's wanted here because we're
measuring a specific code that only gets executed if row cache misses the data
we asked for. Then Scylla is populated node with N sstables ('nodetool flush'
is used to ensure it), where each will have M clustering keys, totaling N*M
clustering keys. Finally, we will start asking for data using a clustering
range filter. The tool measures throughput and min/max/avg latency.

[1]: https://gist.github.com/raphaelsc/4c415f592aaed14a18be31279d225972

Follow the results:

BEFORE
-----
('Clustering keys / second: ', 747.9672111659951)
('Max latency (ms): ', 33)
('Min latency (ms): ', 12)
('Avg latency (ms): ', 13.0)
The operation took 13.3695700169 seconds

AFTER
-----
('Clustering keys / second: ', 3159.115303945648)
('Max latency (ms): ', 22)
('Min latency (ms): ', 2)
('Avg latency (ms): ', 3.0)
The operation took 3.16544318199 seconds

NOTE: Throughput and average latency are improved by a factor of ~4.
-----"
2016-09-04 15:06:32 +03:00
Amnon Heiman
11c687dd93 API: Add slow query logging implementation
This adds the implementation for the slow query logging API.

After this patch the following will be available:

curl -X GET  "http://localhost:10000/storage_service/slow_query"
curl -X POST
"http://localhost:10000/storage_service/slow_query?enable=true&ttl=10&threshold=6000"

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-09-03 01:15:22 +03:00
Amnon Heiman
ed1d02b1a3 API: Add slow query API definition
This adds the GET and POST api for slow query logging.

The GET return an object with the enable, ttl and threshold and the POST
lets you configure each of them.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-09-03 01:15:15 +03:00
Raphael S. Carvalho
b9f67351da db: expose clustering filter info via collectd
That's needed to observe behavior of clustering filter, and to
check if it's worthwhile for a specific workload.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 11:32:23 -03:00
Raphael S. Carvalho
a2dc88889d db: enable clustering optimization only on dtcs
Leveled strategy will not benefit from this strategy because
there's only a few sstables that will contain a given partition
key, which means that a clustering key that belongs to a specific
partition key can only be in a few sstables as well.

Date tiered strategy is the one that will actually benefit the
most from this optimization. Size tiered may benefit from it too
if clustering key isn't overwritten, but it will not use the
clustering optimization.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 11:31:07 -03:00
Raphael S. Carvalho
8d03ccd604 sstables: optimize reads with clustering filter
If user specifies a clustering filter, it's possible to filter out
sstable based on its metadata that tracks min/max clustering value.

For example, if sstable stores clustering key from 'a' through 'c',
it's possible to filter out that sstable if user asks for data
with clustering key greater than 'c'.

That's done by comparing each component separately because
clustering key may be composite. Further information can be found
here: https://issues.apache.org/jira/browse/CASSANDRA-5514

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:51:50 -03:00
Raphael S. Carvalho
768aced741 partition_slice: introduce key-independent function to get ranges
That will be important for sstable code that will rule out a sstable
if it doesn't cover a given clustering key range.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:50:56 -03:00
Raphael S. Carvalho
dce61ddb02 types: introduce abstract_type::as_tri_comparator()
That's akin to abstract_type::as_less_comparator's nature.
So we don't have to repeat something like the following everywhere:
auto cmp = [&type] (const bytes_view& b1, const bytes_view& b2) {
	return type->compare(b1, b2); }

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:50:53 -03:00
Raphael S. Carvalho
004617839d database: check bloom filter of all sstables earlier
All sstables will now have bloom filter checked in a single pass
before reader iterate through all candidates. It's possible that
we will need to futurize the procedure if it holds cpu for too
long. This change is also a step towards the optimization that
will rule out sstables based on clustering filter.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:50:08 -03:00
Raphael S. Carvalho
2a426ab248 tests: add test to check tombstone metadata
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:35 -03:00
Raphael S. Carvalho
94c8ef39c3 sstables: store components ranges in sstable object
Store range for each clustering component in sstable itself to
optimize sstable filtering based on clustering key.
If schema defines no clustering key, this new field will be
empty. Each range stores min and max value of that specific
component. With this information, it's possible to know if a
sstable possibly stores a given clustering component.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:32 -03:00
Raphael S. Carvalho
026853fabb tests: add test to check composite validity
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:30 -03:00
Raphael S. Carvalho
0a5af61176 sstables: introduce function to validate min max clustering values
Scylla was generating a sstable with incorrect min max clustering
values. This information is used to filter out a sstable when user
asks for a range of clustering rows. So it's important to detect
wrong metadata and make sure that it will not be used.
The validation is fast and will only happen when loading a sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:28 -03:00
Raphael S. Carvalho
1f31223f32 sstables: store schema in sstable object
That will be needed for optimization that will store decorated keys
in the sstable object, and also for a subsequent work that will
detect wrong metadata (min/max column names) by looking at columns
in the schema. As schema is stored in sstable, there's no longer
a need to store ks and cf names in it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:17 -03:00
Avi Kivity
7a140a306e Revert "sstables: optimize selection of sstables for leveled strategy"
This reverts commit c75b07fc34f0e7267a8e49276b96bbd4686cb78d; does not
deduplicate the sstable list.
2016-09-01 18:34:08 +03:00
Raphael S. Carvalho
c75b07fc34 sstables: optimize selection of sstables for leveled strategy
It's possible to copy sstables directly into vector, and that will
improve performance. my benchmark tool[1] shows that new version
reduces running time of *copy procedure* by factor of two after
1024^2 calls.
Switching to back_inserter improves throughput even further.
[1]: gist.github.com/raphaelsc/a4b27290f362cdecdef399770dda759c

Refs #1632.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <7153514a9b5f5eb24dff518ee9fa3680e0881dae.1472741401.git.raphaelsc@scylladb.com>
2016-09-01 18:08:53 +03:00
Glauber Costa
dc5d8e33af Revert "row_cache: update sstable histograms on cache hits"
This reverts commit 1726b1d0cc.

Reverting this patch turns our SSTable access counter into a miss counter only.
The estimated histogram always starts its first bucket at 1, so by marking cache
accesses we will be wrongly feeding "1" into the buckets.

Notice that this is not yet ideal: nodetool is supposed to show a histogram of
all reads, and by doing this we are changing its meaning slightly. Workloads
that serve mostly from cache will be distorted towards their misses.

The real solution is to use a different histogram, but we will need to enforce
a newer version of nodetool for that: the current issue is that nodetool expects
an EstimatedHistogram in a specific format in the other side.

Conflicts:
	row_cache.hh

Message-Id: <a599fa9e949766e7c9697450ae34fc28e881e90a.1472742276.git.glauber@scy
lladb.com>
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-09-01 18:07:31 +03:00
Avi Kivity
e33671c285 Merge "tracing: Trace read sstables" from Duarte
"This patchset traces sstables we read from. To do that, we
need to flow the trace_state_ptr to the mutation_readers."
2016-09-01 13:24:16 +03:00
Duarte Nunes
ba374da043 database: Trace sstable accesses
This patch traces when we read from an sstable, be it a key range or a
single one.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:32 +02:00
Duarte Nunes
f4cf2f2aef tracing: Make trace_state_ptr argument required
This patch makes the optional trace_state_ptr arguments introduced in
previous patches mandatory where possible. Functions which are called
internally don't have a trace context, so for those we keep the
argument's default value for convenience.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:32 +02:00
Duarte Nunes
46b86ff801 storage_proxy: Pass along trace_state for queries
This patch changes the storage_proxy so it passed along a
trace_state_ptr to the layers below, when querying locally or
receiving a remote query request.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:32 +02:00
Duarte Nunes
030db65c62 database: Accept a trace_state_ptr
This patch changes the database and column_family types so a
trace_state_ptr can be passed in when querying. This enables tracing
of the inner components.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:28 +02:00
Duarte Nunes
9269256246 row_cache: Accept a trace_state_ptr
This patch changes the row_cache so it accepts a trace_state_ptr,
which it is responsible of flowing to the underlying mutation_reader
if needed.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:00:55 +02:00
Duarte Nunes
5fd66f00c2 mutation_reader: Accept trace_state_ptr
This patch changes the mutation_reader so it optionally accepts a
trace_state_ptr. This will allow us to trace, for example, which
sstables are accessed during a request.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:00:31 +02:00
Avi Kivity
cc127295e9 Merge "Fill in information for sstables per read histogram" from Glauber
"Nodetool cfhistograms is supposed to tell us how many SSTables were touched per
read. Currently, we are a bit in the dark as we don't export that information.

This patch exports that, so that we can start using it."
2016-09-01 12:54:24 +03:00
Glauber Costa
1726b1d0cc row_cache: update sstable histograms on cache hits
If we have a cache hit, we still need to update our sstable histogram - notting
that we have touched 0 SSTables.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:14:22 -04:00
Glauber Costa
ce24fd05fe database: keep statistics on SSTables touched per read
That is done for single partition queries only - mimicking what
Cassandra does on that matter.

For this to be correct, we also need to update this histogram on cache
hits - in which case we update the read as having touched 0 SSTables. That
will be done on a separate patch.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:14:21 -04:00