In row_cache::make_reader, we update statistics inside an
allocating_section, which retries the supplied function until it can
satisfy all allocations by way of reserving LSA memory up front. Since
those updates are interleave with allocations, retries can lead to
miscounts.
This patch fixes this by updating statistics after all allocations.
Fixes#1659
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473845977-20205-1-git-send-email-duarte@scylladb.com>
Currently we get boost::lexical_cast on startup if inital_token has a
list which contains spaces after commas, e.g.:
initial_token: -1100081313741479381, -1104041856484663086, ...
Fixes#1664.
Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>
There are several places in types.cc where we assume that sstring_view
range is null terminated. That may be not true and we should always use
either begin()/end() or data()/size() pairs.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
"This will be very important for read performance of time series use case,
where timestamp is usually stored as a clustering key, and the user asks
for specific data using a clustering range filter. Example:
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
...
SELECT * FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time > '2013-04-03 07:01:00'
AND event_time < '2013-04-03 07:04:00';
This is based on: https://issues.apache.org/jira/browse/CASSANDRA-5514
To check correctness, I wrote a dtest that runs scylla with row cache disabled,
creates several sstables with non overlapping clustering key ranges, queries
data using several clustering range filters, and checks that the database
returns the expected results.
Tested performance with a tool I wrote myself [1] and performance is indeed
improved by this patchset. This tool works as follow:
Scylla is started with row cache disabled. That's wanted here because we're
measuring a specific code that only gets executed if row cache misses the data
we asked for. Then Scylla is populated node with N sstables ('nodetool flush'
is used to ensure it), where each will have M clustering keys, totaling N*M
clustering keys. Finally, we will start asking for data using a clustering
range filter. The tool measures throughput and min/max/avg latency.
[1]: https://gist.github.com/raphaelsc/4c415f592aaed14a18be31279d225972
Follow the results:
BEFORE
-----
('Clustering keys / second: ', 747.9672111659951)
('Max latency (ms): ', 33)
('Min latency (ms): ', 12)
('Avg latency (ms): ', 13.0)
The operation took 13.3695700169 seconds
AFTER
-----
('Clustering keys / second: ', 3159.115303945648)
('Max latency (ms): ', 22)
('Min latency (ms): ', 2)
('Avg latency (ms): ', 3.0)
The operation took 3.16544318199 seconds
NOTE: Throughput and average latency are improved by a factor of ~4.
-----"
This adds the GET and POST api for slow query logging.
The GET return an object with the enable, ttl and threshold and the POST
lets you configure each of them.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
That's needed to observe behavior of clustering filter, and to
check if it's worthwhile for a specific workload.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Leveled strategy will not benefit from this strategy because
there's only a few sstables that will contain a given partition
key, which means that a clustering key that belongs to a specific
partition key can only be in a few sstables as well.
Date tiered strategy is the one that will actually benefit the
most from this optimization. Size tiered may benefit from it too
if clustering key isn't overwritten, but it will not use the
clustering optimization.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
If user specifies a clustering filter, it's possible to filter out
sstable based on its metadata that tracks min/max clustering value.
For example, if sstable stores clustering key from 'a' through 'c',
it's possible to filter out that sstable if user asks for data
with clustering key greater than 'c'.
That's done by comparing each component separately because
clustering key may be composite. Further information can be found
here: https://issues.apache.org/jira/browse/CASSANDRA-5514
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That will be important for sstable code that will rule out a sstable
if it doesn't cover a given clustering key range.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That's akin to abstract_type::as_less_comparator's nature.
So we don't have to repeat something like the following everywhere:
auto cmp = [&type] (const bytes_view& b1, const bytes_view& b2) {
return type->compare(b1, b2); }
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
All sstables will now have bloom filter checked in a single pass
before reader iterate through all candidates. It's possible that
we will need to futurize the procedure if it holds cpu for too
long. This change is also a step towards the optimization that
will rule out sstables based on clustering filter.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Store range for each clustering component in sstable itself to
optimize sstable filtering based on clustering key.
If schema defines no clustering key, this new field will be
empty. Each range stores min and max value of that specific
component. With this information, it's possible to know if a
sstable possibly stores a given clustering component.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Scylla was generating a sstable with incorrect min max clustering
values. This information is used to filter out a sstable when user
asks for a range of clustering rows. So it's important to detect
wrong metadata and make sure that it will not be used.
The validation is fast and will only happen when loading a sstable.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That will be needed for optimization that will store decorated keys
in the sstable object, and also for a subsequent work that will
detect wrong metadata (min/max column names) by looking at columns
in the schema. As schema is stored in sstable, there's no longer
a need to store ks and cf names in it.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
It's possible to copy sstables directly into vector, and that will
improve performance. my benchmark tool[1] shows that new version
reduces running time of *copy procedure* by factor of two after
1024^2 calls.
Switching to back_inserter improves throughput even further.
[1]: gist.github.com/raphaelsc/a4b27290f362cdecdef399770dda759c
Refs #1632.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <7153514a9b5f5eb24dff518ee9fa3680e0881dae.1472741401.git.raphaelsc@scylladb.com>
This reverts commit 1726b1d0cc.
Reverting this patch turns our SSTable access counter into a miss counter only.
The estimated histogram always starts its first bucket at 1, so by marking cache
accesses we will be wrongly feeding "1" into the buckets.
Notice that this is not yet ideal: nodetool is supposed to show a histogram of
all reads, and by doing this we are changing its meaning slightly. Workloads
that serve mostly from cache will be distorted towards their misses.
The real solution is to use a different histogram, but we will need to enforce
a newer version of nodetool for that: the current issue is that nodetool expects
an EstimatedHistogram in a specific format in the other side.
Conflicts:
row_cache.hh
Message-Id: <a599fa9e949766e7c9697450ae34fc28e881e90a.1472742276.git.glauber@scy
lladb.com>
Signed-off-by: Glauber Costa <glauber@scylladb.com>
This patch makes the optional trace_state_ptr arguments introduced in
previous patches mandatory where possible. Functions which are called
internally don't have a trace context, so for those we keep the
argument's default value for convenience.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the storage_proxy so it passed along a
trace_state_ptr to the layers below, when querying locally or
receiving a remote query request.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the database and column_family types so a
trace_state_ptr can be passed in when querying. This enables tracing
of the inner components.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the row_cache so it accepts a trace_state_ptr,
which it is responsible of flowing to the underlying mutation_reader
if needed.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the mutation_reader so it optionally accepts a
trace_state_ptr. This will allow us to trace, for example, which
sstables are accessed during a request.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"Nodetool cfhistograms is supposed to tell us how many SSTables were touched per
read. Currently, we are a bit in the dark as we don't export that information.
This patch exports that, so that we can start using it."
If we have a cache hit, we still need to update our sstable histogram - notting
that we have touched 0 SSTables.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
That is done for single partition queries only - mimicking what
Cassandra does on that matter.
For this to be correct, we also need to update this histogram on cache
hits - in which case we update the read as having touched 0 SSTables. That
will be done on a separate patch.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The make_reader method is currently a const method, but we would like to start
keeping hit statistics from it.
Instead of relaxing the const condition too much, we can just mark the _stats
field as mutable, indicating that make_reader will not be able to change
anything in the CF, except for keeping statistics.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
There is nothing really that fundamentally ties the estimated histogram to
sstables. This patch gets rid of the few incidental ties. They are:
- the namespace name, which is now moved to utils. Users inside sstables/
now need to add a namespace prefix, while the ones outside have to change
it to the right one
- sstables::merge, which has a very non-descriptive name to begin with, is
changed to a more descriptive name that can live inside utils/
- the disk_types.hh include has to be removed - but it had no reason to be
here in the first place.
Todo, is to actually move the file outside sstables/. That is done in a separate
step for clarity.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Sometimes the user would like to dump all the metrics into a file or
pipe it to another program, as requested in issue #1506.
This patch makes scyllatop check if stdout is connected to a TTY,
and if not - it does not fire up the fancy urwid UI but instead, just
writes all it's collected metrics to stdout.
Optionally, the user tell the program to quit after a specific
number of iterations via the -n or --iterations flag
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1471777516-9903-1-git-send-email-yoav@scylladb.com>
"clustering_key_filtering_context is no longer needed.
partition_slice can be used instead so this series removes
clustering_key_filtering_context and passes partition_slice down where
it's needed. Then a static get_ranges method is used to obtain
clustering key ranges for a given partition.
Fixes #1614."
Remove clustering_key_filter_factory and clustering_key_filtering_context.
Use partition_slice directly with a static get_ranges method.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This fixes the problem of multiple concurrent get_ranges calls.
Previously each call was invalidating the result of the previous
call. Now they don't step on each other foot.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
On posix_net_conf.sh's single queue NIC mode (which means RPS enabled mode), we are excluded cpu0 and it's sibling from network stack processing cpus, and assigned NIC IRQ to cpu0.
So always network stack is not working on cpu0 and it's sibling, to get better performance we need to exclude these cpus from scylla too.
To do this, we need to get RPS cpu mask from posix_net_conf.sh, pass it to scylla_cpuset_setup to construct /etc/scylla.d/cpuset.conf when scylla_setup executed.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1472544875-2033-2-git-send-email-syuu@scylladb.com>
Right now scylla_prepare specifies -mq option to posix_net_conf.sh when number of RX queues > 1, but on posix_net_conf.sh it sets NIC mode to sq when queues < ncpus / 2.
So the logic is different, and actually posix_net_conf.sh does not need to specify -sq/-mq now, it autodetects queue mode.
So we need to drop detection logic from scylla_prepare, let posix_net_conf.sh to detect it.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1472544875-2033-1-git-send-email-syuu@scylladb.com>
The CQL type IDs are specified as hex in the CQL binary protocol
specification. Define CQL type IDs in the code explicitly to make
reviewing the code and adding new types easier.
Message-Id: <1472537971-26053-1-git-send-email-penberg@scylladb.com>
Alexandr Porunov reports that Scylla fails to start up after reboot as follows:
Aug 25 19:44:51 scylla1 scylla[637]: Exiting on unhandled exception of type 'std::system_error': Error system:99 (Cannot assign requested address)
The problem is that because there's no dependency to network service,
Scylla simply attempts to start up too soon in the boot sequence and
fails.
Fixes#1618.
Message-Id: <1472212447-21445-1-git-send-email-penberg@scylladb.com>