Commit Graph

331 Commits

Author SHA1 Message Date
Paweł Dziepak
1293073019 managed_bytes: add cast to mutable_view 2017-03-02 09:05:11 +00:00
Paweł Dziepak
0ed2352ade utils: introduce mutable_view
std::basic_string_view does not allow modifying the underlying buffer.
This patch introduces a mutable_view which permits that.
2017-03-02 09:05:10 +00:00
Duarte Nunes
11b5076b3c lsa: Use log histogram for closed segments
This patch replaces the current heap with a logarithmic histogram
to hold the closed segment descriptors.

This histogram stores elements in different buckets according to
their size. Values are mapped to a sequence of power-of-two ranges
that are split in N sub-buckets. Values less than a minimum value
are placed in bucket 0, whereas values bigger than a maximum value
are not admitted.

There is some loss of precision as segments are now not totally
ordered, and precision decreases the more sparse a segment is. This
allows to reduce the cost of the computations needed when freeing
from a closed segment.

Performance results for perf_simple_query -c4 --duration 60
           before       after       diff
read     43954.27    45246.10      +2.9%
write    48911.54    52807.76      +7.9%

Fixes #1442

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170227235328.27937-1-duarte@scylladb.com>
2017-02-28 18:40:38 +02:00
Calle Wilund
0d87f3dd7d utils::UUID: operator< should behave as comparison of hex strings/bytes
I.e. need to be unsigned comparison.
Message-Id: <1487683665-23426-1-git-send-email-calle@scylladb.com>
2017-02-22 09:19:22 +00:00
Tomasz Grabiec
c70ebc7ca5 lsa: Make reclaim_timer enclose segment_pool::reclaim_segments()
LSA timing did not include segment migration. It does after this
change.
Message-Id: <1486657046-9378-1-git-send-email-tgrabiec@scylladb.com>
2017-02-09 17:07:59 +00:00
Amnon Heiman
1e3cfe7396 estimated_histogram: returns a metrics histogram
The metrics histogram is a struct that describe a histogram.
This patch adds a getter method that lets the estimated_histogram return
a metrics::histogram, this will allow to register it as a histogram
metrics.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2017-02-06 17:34:43 +02:00
Avi Kivity
7a00dd6985 Merge "Avoid avalanche of tasks after memtable flush" from Tomasz
"Before, the logic for releasing writes blocked on dirty worked like this:

  1) When region group size changes and it is not under pressure and there
     are some requests blocked, then schedule request releasing task

  2) request releasing task, if no pressure, runs one request and if there are
     still blocked requests, schedules next request releasing task

If requests don't change the size of the region group, then either some request
executes or there is a request releasing task scheduled. The amount of scheduled
tasks is at most 1, there is a single releasing thread.

However, if requests themselves would change the size of the group, then each
such change would schedule yet another request releasing thread, growing the task
queue size by one.

The group size can also change when memory is reclaimed from the groups (e.g.
when contains sparse segments). Compaction may start many request releasing
threads due to group size updates.

Such behavior is detrimental for performance and stability if there are a lot
of blocked requests. This can happen on 1.5 even with modest concurrency
because timed out requests stay in the queue. This is less likely on 1.6 where
they are dropped from the queue.

The releasing of tasks may start to dominate over other processes in the
system. When the amount of scheduled tasks reaches 1000, polling stops and
server becomes unresponsive until all of the released requests are done, which
is either when they start to block on dirty memory again or run out of blocked
requests. It may take a while to reach pressure condition after memtable flush
if it brings virtual dirty much below the threshold, which is currently the
case for workloads with overwrites producing sparse regions.

I saw this happening in a write workload from issue #2021 where the number of
request releasing threads grew into thousands.

Fix by ensuring there is at most one request releasing thread at a time. There
will be one releasing fiber per region group which is woken up when pressure is
lifted. It executes blocked requests until pressure occurs."

* tag 'tgrabiec/lsa-single-threaded-releasing-v2' of github.com:cloudius-systems/seastar-dev:
  tests: lsa: Add test for reclaimer starting and stopping
  tests: lsa: Add request releasing stress test
  lsa: Avoid avalanche releasing of requests
  lsa: Move definitions to .cc
  lsa: Simplify hard pressure notification management
  lsa: Do not start or stop reclaiming on hard pressure
  tests: lsa: Adjust to take into account that reclaimers are run synchronously
  lsa: Document and annotate reclaimer notification callbacks
  tests: lsa: Use with_timeout() in quiesce()
2017-02-02 17:49:31 +02:00
Tomasz Grabiec
e40fb438f5 lsa: Avoid avalanche releasing of requests
Before, the logic for releasing writes blocked on dirty worked like
this:

  1) When region group size changes and it is not under pressure and
     there are some requests blocked, then schedule request releasing
     task

  2) request releasing task, if no pressure, runs one request and if
     there are still blocked requests, schedules next request
     releasing task

If requests don't change the size of the region group, then either
some request executes or there is a request releasing task
scheduled. The amount of scheduled tasks is at most 1, there is a
single thread of excution.

However, if requests themselves would change the size of the group,
then each such change would schedule yet another request releasing
thread, growing the task queue size by one.

The group size can also change when memory is reclaimed from the
groups (e.g.  when contains sparse segments). Compaction may start
many request releasing threads due to group size updates.

Such behavior is detrimental for performance and stability if there
are a lot of blocked requests. This can happen on 1.5 even with modest
concurrency becuase timed out requests stay in the queue. This is less
likely on 1.6 where they are dropped from the queue.

The releasing of tasks may start to dominate over other processes in
the system. When the amount of scheduled tasks reaches 1000, polling
stops and server becomes unresponsive until all of the released
requests are done, which is either when they start to block on dirty
memory again or run out of blocked requests. It may take a while to
reach pressure condition after memtable flush if it brings virtual
dirty much below the threshold, which is currently the case for
workloads with overwrites producing sparse regions.

Refs #2021.

Fix by ensuring there is at most one request releasing thread at a
time. There will be one releasing fiber per region group which is
woken up when pressure is lifted. It executes blocked requests until
pressure occurs.

The logic for notification across hierachy was replaced by calling
region_group::notify_relief() from region_group::update() on the
broadest relieved group.
2017-02-01 17:41:55 +01:00
Tomasz Grabiec
d55baa0cd1 lsa: Move definitions to .cc 2017-02-01 17:41:55 +01:00
Tomasz Grabiec
8f8b111b33 lsa: Simplify hard pressure notification management
The hard pressure was only signalled on region group when
run_when_memory_available() was called after the pressure condition
was met.

So the following loop is always an infinite loop rather than stopping
when engouh is allocated to cause pressure:

   while (!gr.under_pressure()) {
       region.allocate(...);
   }

It's cleaner if pressure notification works not only if
run_when_memory_available() is used but whenever conditino changes,
like we do for the soft pressure.

There is comment in run_when_memory_available() which gives reasons
why notifications are called from there, but I think those reasons no
longer hold:

 - we already notify on soft pressure conditions from update(), and if
   that is safe, notifying about hard pressure should also be safe. I
   checked and it looks safe to me.

 - avoiding notification in the rare case when we stopped writing
   right after crossing the threshold doesn't seem benefitial. It's
   unlikely in the first place, and one could argue it's better to
   actually flush now so that when writes resume they will not block.
2017-02-01 17:41:55 +01:00
Tomasz Grabiec
9aa1be5d08 lsa: Do not start or stop reclaiming on hard pressure
We already call these when crossing the soft threshold. We shouldn't
stop reclaiming when hard pressure is gone because soft pressure may
still be present. Calling start_reclaiming() on hard pressure is
unnecessary because soft pressure also starts it, and when there is
hard pressure there is also soft pressure.
2017-02-01 17:40:15 +01:00
Amnon Heiman
45b6070832 Merge seastar upstream
* seastar 397685c...c1dbd89 (13):
  > lowres_clock: drop cache-line alignment for _timer
  > net/packet: add missing include
  > Merge "Adding histogram and description support" from Amnon
  > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
  > Set the option '--server' of tests/tcp_sctp_client to be required
  > core/memory: Remove superfluous assignment
  > core/memory: Remove dead code
  > core/reactor: Use logger instead of cerr
  > fix inverted logic in overprovision parameter
  > rpc: fix timeout checking condition
  > rpc: use lowres_clock instead of high resolution one
  > semaphore: make semaphore's clock configurable
  > rpc: detect timedout outgoing packets earlier

Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.

Includes fixup from Amnon:

collectd api should use the metrics getters

As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.

This will be important when the internal implementation will changed
from union to variant.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
2017-02-01 14:39:08 +02:00
Tomasz Grabiec
ed9ff19467 lsa: Document and annotate reclaimer notification callbacks
They are called from region_group::update(), so must be alloc-free and
noexcept.
2017-01-30 19:18:07 +01:00
Tomasz Grabiec
2c7902fb2b Revert "lsa: Reduce reclamation latency"
This reverts commit d61002cc33.

Introduced a regression in row_cache_alloc_stress.

The problem is that reclaim_from_evictable() evicts way too much after
the refactor due to the stop condition not taking into account how
much data was evicted so far and only looking at occupancy of the
minimal segment. This may lead to eviction of the whole region.
2017-01-26 10:43:18 +01:00
Tomasz Grabiec
d61002cc33 lsa: Reduce reclamation latency
Currently eviction is performed until occupancy of the whole region
drops below the 85% threshold. This may take a while if region had
high occupancy and is large. We could improve the situation by only
evicting until occupancy of the sparsest segment drops below the
threshold, as is done by this change.

I tested this using a c-s read workload in which the condition
triggers in the cache region, with 1G per shard:

 lsa-timing - Reclamation cycle took 12.934 us.
 lsa-timing - Reclamation cycle took 47.771 us.
 lsa-timing - Reclamation cycle took 125.946 us.
 lsa-timing - Reclamation cycle took 144356 us.
 lsa-timing - Reclamation cycle took 655.765 us.
 lsa-timing - Reclamation cycle took 693.418 us.
 lsa-timing - Reclamation cycle took 509.869 us.
 lsa-timing - Reclamation cycle took 1139.15 us.

The 144ms pause is when large eviction is necessary.

The change improves worst case latency. Reclamation time statistics
over 30 second period after cache fills up, in microseconds:

Before:

  avg = 1524.283148
  stdev = 11021.021118
  min = 12.934000
  max = 144356.000000
  sum = 257603.852000
  samples = 169

After:

  avg = 1317.362414
  stdev = 1913.542802
  min = 263.935000
  max = 19244.600000
  sum = 175209.201000
  samples = 133

Refs #1634.

Message-Id: <1484730859-11969-1-git-send-email-tgrabiec@scylladb.com>
2017-01-19 17:35:36 +02:00
Amnon Heiman
e19fa02a17 remove scollectd from headers
As the metrics migration progressed, some include to scollectd.hh left
behind.

Because of the nature of the scollecd implementation those include
brings alot of code with them to the header files and eventually to many
source file.

This patch remove those include and add a missing include to
storage_proxy.cc.

The reason the compiler didn't complain is an indication to the
problematic nature of those include in the first place.

Before this patch, change in metrics.hh would cause 169 files to
compile, after this change 17.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1484667536-2185-1-git-send-email-amnon@scylladb.com>
2017-01-17 17:39:47 +02:00
Tomasz Grabiec
ddfee57c97 Replace iostream include with iosfwd in headers
Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>
2017-01-17 14:52:44 +02:00
Vlad Zolotarov
022bca16bf utils::logalloc: move collectd counters registration to metrics registration layer
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-01-10 16:24:55 -05:00
Pekka Enberg
f83503c09e date.h: 64-bit year and days representation
We need 64-bit year and days representation to support the boundary
values of the CQL data type, which is implemented using Joda Time
library's DateTime type.
2017-01-09 10:42:20 +02:00
Pekka Enberg
7f2fc6470c utils/date.h: Import date and time library sources
This patch imports the "date.h" date and time library based on the C++11
<chrono> header, which is proposed for standadization:

  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0355r1.html

We need it to implement support for the CQL date type.

Import repository

  https://github.com/HowardHinnant/date

Import commit:

  commit 2935f80109b8cfc15eb1243afe35f7ec3530f971
  Author: Howard Hinnant <howard.hinnant@gmail.com>
  Date:   Sun Jan 1 15:02:08 2017 -0500

      Have get_version check for the file named version first
2017-01-09 10:39:54 +02:00
Tomasz Grabiec
ab5c77fcf1 bloom_filter: Allow checking presence using pre-hashed key
Will allow us to calculate the hash once and use it on many filters
instead of calculating the hash for each filter separately.

Another change made is to avoid precomputing all indexes during filter
operations, and have for_each_index() template instead which invokes a
functor.
2016-12-19 14:20:58 +01:00
Asias He
00d7a35949 utils: Put crc32 under utils namespace
It conflicts with crc in zlib
Message-Id: <1480918984-4117-2-git-send-email-asias@scylladb.com>
2016-12-05 11:48:29 +02:00
Tomasz Grabiec
e14caaef60 utils/logalloc: Add ability to timeout run_when_memory_available() task 2016-11-29 16:40:58 +01:00
Tomasz Grabiec
61d81617e1 utils/flush_queue: Add ability to wait with a timeout 2016-11-29 16:40:58 +01:00
Avi Kivity
176fca5775 logalloc: use correct header for unique_ptr
<bits/unique_ptr.hh> is a libstdc++ internal header.  USe <memory> instead.
2016-11-27 23:08:04 +02:00
Paweł Dziepak
ef57b9a26f rename memory_usage() to external_memory_usage() where applicable
Renaming the function to external_memory_usage() makes it clear that
sizeof(T) is not included, something that was a source of confusion in
the past.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-11-18 11:25:36 +00:00
Glauber Costa
f86c9e36f4 logalloc: allow region group reclaimer to specify a soft limit
The region_group_reclaimer will let us know every time we are over the
limit we have specified for memory usage.

However, For some applications, we would be interested in knowing about
memory build up earlier, so we can start doing something about it before
we reach that condition.

This patch introduce soft limit notifications for the
region_group_reclaimer. After this patch is applied, start_reclaim() is
called earlier, and stop_reclaim() later, after the soft condition is
abated.

There are methods that allow one to easily test if the pressure
condition is a soft limit condition or a hard, threshold condition and
act accordingly. Whether to act on both conditions or just one of them
is up to the application.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-11-16 21:20:23 -05:00
Paweł Dziepak
b8d737ff0a tests/row_cache_test: verify that eviction follows lru
Refs #1847.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1479231555-28191-1-git-send-email-pdziepak@scylladb.com>
2016-11-15 18:57:54 +01:00
Glauber Costa
93386bcec7 histograms: do not use latency_in_nano
Now that the histogram has its own unit expressed in its template
parameter, there is no reason to convert it to nano just so we may need
to convert it back if the histogram needs another unit.

This patch will keep everything as a duration until last moment, and
then we'll convert when needed.

This was suggested by Amnon.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <218efa83e1c4ddc6806c51913d4e5f82dc6d231e.1479139020.git.glauber@scylladb.com>
2016-11-14 18:01:43 +02:00
Glauber Costa
608d825790 histogram: fix reporting units
We are tracking latencies in microseconds, but almost everywhere else
they are reported in microseconds. Instead of just converting, this
patch tries to be a bit more future proof and embed the unit into the
type - and we then default to microseconds.

I have verified that the JMX measures now report sane values for both
the storage proxy and the column family. nodetool cfhistograms still
works fine. That one is reported in nanoseconds, but through the
estimated_histogram, not ihistogram.

Fixes #1836

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-11-11 11:36:56 -05:00
Glauber Costa
1342d044eb moving averages: change metrics calculation
We have recently fixed a bug due to which the constructor parameters for
moving average were inverted, leading to the numbers being just plain
wrong. However, the calculation of alpha was already inverted, meaning
it was right by accident and now that's wrong.

With the wrong alpha, the values we see are still correct, but they move
very quickly. The intention of this code is obviously to smooth things
out.

This was found out by Nadav. I have tested and confirmed that the smoothing
factor now works as expected.

Fixes  #1837

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-11-10 22:33:34 -05:00
Amnon Heiman
a977ea85e1 histogram: moving_average and total rate should be calculate in seconds
The moving average and the total average should be calculated in seconds
and not nanoseconds.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-11-10 22:32:53 -05:00
Glauber Costa
d3f11fbabf histogram: moving averages: fix inverted parameters
moving_averages constructor is defined like this:

    moving_average(latency_counter::duration interval, latency_counter::duration tick_interval)

But when it is time to initialize them, we do this:

	... {tick_interval(), std::chrono::minutes(1)} ...

As it can be seen, the interval and tick interval are inverted. This
leads to the metrics being assigned bogus values.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <d83f09eed20ea2ea007d120544a003b2e0099732.1478798595.git.glauber@scylladb.com>
2016-11-10 11:28:51 -08:00
Tomasz Grabiec
6548132423 lsa: Make logalloc::tracker::full_compaction() compact all reclaimable regions
is_compactible() will pass on very small regions. full_compaction() is
only used in tests to force objects to be moved due to compaction, so
we want all reclaimable regions to be compacted.
2016-10-18 11:16:08 +02:00
Paweł Dziepak
d08cffd3c7 lsa: avoid exceptions during segment_zone creation
LSA tries to allocate zones as large as possible (while still leaving
enough free space for the standard allocator). It uses the amount of
free memory in order to guess how much it can get, but that obviously
doesn't account for fragmentation and the allocation attempt may fail.

This patch changes the LSA code so that it doesn't throw in case zone
couldn't be created but just returns a null pointer which should be
more performant if the LSA memory cannot grow any more.

Fixes #1394.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1476435031-5601-1-git-send-email-pdziepak@scylladb.com>
2016-10-14 11:08:24 +02:00
Tomasz Grabiec
e617bcd8a7 logalloc: disable abort on allocation failure in places in which it is benign
Some places start big expecting allocation failure, then reduce the
requested size. Let's not abort in such cases.

Message-Id: <1476295120-32047-1-git-send-email-tgrabiec@scylladb.com>
2016-10-13 10:53:32 +03:00
Tomasz Grabiec
4357d0a6d9 db: Add counter for writes blocked on dirty memory
There is already queue_length-requests_blocked_memory, but it's a
gauge so does not reflect what happened between the sampling points.

total_operations-requests_blocked_memory will allow to see if there
were any (and how many) requests which were blocked by dirty memory.

Message-Id: <1476098616-12682-1-git-send-email-tgrabiec@scylladb.com>
2016-10-10 14:25:22 +03:00
Avi Kivity
c94fb1bf12 build: reduce inclusions of messaging_service.hh
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.

Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
2016-10-05 11:46:49 +03:00
Avi Kivity
f8118d9fc2 Merge "Virtual dirty memory management" from Glauber
"Description:
============

Scylla currently suffers from a brick wall behavior of the request throttler.
Requests pile up until we reach the dirty memory limit, at which point we stop
serving them until we have freed enough memory to allow for more requests.

The problem is that freeing dirty memory means writing an SSTable to completion.
That can take a long time, even if we are blessed with great disks. Those long
waiting times can and will translate into timeouts. That is bad behavior.

What this patch does is introduce one form of virtual dirty memory accounting.
Instead of allowing 100 % of the dirty memory to be filled up until we stop
accepting requests, we will do that when we reach 50 % of memory. However,
instead of releasing requests only when an SSTable is fully written, we start
releasing them when some memory was written.

The practical effect of that, is that once we reach 50 % occupancy in our dirty
memory region, we will bring the system from CPU speed to disk speed, and will
start accepting requests only at the rate we are able to write memory back.

Results
=======

With this patchset running a load big enough to easily saturate the disk,
(commitlog disabled to highlight the effects of the memtable writer), I am able
to run scylla for many minutes, with timeouts occurring only when I run out of
disk space, whereas without this patch a swarm of timeouts would start merely 2
seconds after the load started - and would never get stable.

In V2, I have sent a set of graphs illustrating the performance of this solution.
This version does not have any significant differences in that front.

For details, please refer to
https://groups.google.com/d/msg/scylladb-dev/iCvD-3Z-QqY/EM8KUh_MAQAJ

Accuracy of the accounting:
---------------------------
It is important for us to be as accurate as possible when accounting freed
memory, since every byte we mark as freed may allow one or more requests to be
executed.  I have measured the accuracy of this approach (ignoring padding,
object size for the mutation fragments) to be 99.83 % of used memory in the
test workload I have ran (large, 65k mutations). Memtables under this circumnstance
tend to have a very high occupancy ratio because throttle breeds idle, and idle
breeds compact-on-idle.

Known Issues:
-------------

A lot of time can be elapsed between destroying the flush_reader and actually
releasing memory. The release of memory only happens when the SSTable is fully
sealed, and we have to flush the files, as well as finish writing all SSTable
components at this point. This happened in practice with a buggy kernel that
would result in flushes taking a long time.

After that is fixed, this is just a theoretical problem and in practice it
shouldn't matter given the time we expect those operations to take."

* 'virtual-dirty-v6' of github.com:glommer/scylla:
  database: allow virtual dirty memory management
  streamed_mutation: make _buffer private
  add accounting of memory read to partition_snapshot_reader
  move partition_snapshot_reader code to header file
  LSA: allow a group to query its own region group
  memtables: split scanning reader in two
  sstables: use special reader for writing a memtable
  LSA: export information about object memory footprint
  LSA: export information about size of the throttle queue
  database: export virtual dirty bytes region group
2016-10-04 20:57:52 +03:00
Glauber Costa
86aa0b830d LSA: allow a group to query its own region group
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-04 10:39:10 -04:00
Glauber Costa
28e3f2f6ee LSA: export information about object memory footprint
We allocate objects of a certain size, but we use a bit more memory to hold
them.  To get a clerer picture about how much memory will an object cost us, we
need help from the allocator. This patch exports an interface that allow users
to query into a specific allocator to get that information.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-04 10:39:10 -04:00
Gleb Natapov
32989d1e66 Merge seastar upstream
* seastar 2b55789...5b7252d (3):
  > Merge "rpc: serialize large messages into fragmented memory" from Gleb
  > Merge "Print backtrace on SIGSEGV and SIGABRT" from Tomasz
  > test_runner: avoid nested optionals

Includes patch from Gleb to adapt to seastar changes.
2016-09-28 17:34:16 +03:00
Glauber Costa
f5fd6bd714 LSA: export information about size of the throttle queue
Also add information about for how long has the oldest been sitting in the
queue. This is part of the backpressure work to allow us to throttle incoming
requests if we won't have memory to process them. Shortages can happen in all
sorts of places, and it is useful when designing and testing the solutions to
know where they are, and how bad they are.

This counter is named for consistency after similar counters from transport/.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-09-27 12:09:08 -04:00
Nadav Har'El
fe1ba753ce Avoid semaphore's default initial value
The fact that Seastar's semaphore has a default initializer of 1 if not
explicitly initialized is confusing and unexpected and recently lead to
two bugs. So ScyllaDB should not rely on this default behavior, and specify
the initial value of each semaphore explicitly.

In several cases in the ScyllaDB code, the explict initialization was
missing, and this patch adds it. In one case (rate_limiter) I even think
the default of 1 was a bit strange, and 0 makes more sense.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1474530745-23951-1-git-send-email-nyh@scylladb.com>
2016-09-24 19:25:02 +03:00
Tomasz Grabiec
b0b28696b5 scylla-gdb: Add 'scylla lsa-segment' command
Allows one to examine contents of LSA segment.

Example:

  (gdb) scylla lsa-segment 0x601000480000
  0x601000480e70: live size=144 migrator=standard_migrator<cache_entry>::object
  0x601000480f10: live size=144 migrator=standard_migrator<cache_entry>::object
  0x601000480fb0: free size=192
  0x60100048107e: free size=42
  0x6010004814e0: free size=192
  0x6010004815ae: free size=40
  0x6010004815e8: free size=192
  0x6010004816b8: live size=144 migrator=standard_migrator<cache_entry>::object
  0x601000481758: free size=192
  ...
2016-09-20 16:53:21 +02:00
Gleb Natapov
2e8b255741 Merge seastar upstream
* seastar 0303e0c...e534401 (6):
  > Merge "enable rpc to work on non contiguous memory for receive" from Gleb
  > install-dependencies.sh: install python3 for Ubuntu/Debian, which requires for configure.py
  > fix tcp stuck when output_stream write more than 212992 bytes once.
  > scripts/posix_net_conf.sh: supress 'ls: cannot access /sys/class/net/<NIC>/device/msi_irqs/' error message
  > scripts/posix_net_conf.sh: fix 'command not found' error when specifies --cpu-mask
  > native_network_stack: Fix use after free/missing wait in dhcp

Includes: "Remove utils::fragmented_input_stream and utils::input_stream in favor of seastar version" from Gleb.
2016-09-15 12:12:16 +03:00
Glauber Costa
4310635bae move estimated histogram to utils
Nothing sstable-specific in it, really.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:13:23 -04:00
Paweł Dziepak
e76203c927 utils: add input_stream
input_stream performs a type erasure on seastar::simple_input_stream and
fragmented_input_stream. The main goal is to keep the overhead for the
cases when simple_input_stream is used minimum.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-08-22 09:31:33 +01:00
Paweł Dziepak
29827a9726 utils: add fragmented_input_stream
fragmented_input_stream is an input stream usable by IDL-generated
deserializers which can read from fragmented buffers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-08-22 09:31:33 +01:00
Amnon Heiman
4c14b2a527 histogram: Add an estimated sum method
The histogram implementation uses sampling to estimate the mean and sum.
This patch adds a method that returns an estimated sum based on the mean
and the total number of events measured.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1467547341-30438-2-git-send-email-amnon@scylladb.com>
2016-08-16 11:06:50 +03:00