Commit Graph

431 Commits

Author SHA1 Message Date
Asias He
cd2105b8bd database: make_streaming_reader for ranges
Allow to make a streaming reader with a vector of ranges in addition to
a single range. This will be used soon in following streaming patch.

We can make the reader more efficient later.
2016-12-12 09:04:21 +08:00
Raphael S. Carvalho
fcfc84e836 compaction: reduce bloom filter overhead with incremental selector
The procedure to calculate max purgeable timestamp is optimized
by only visiting sstables that overlap with key being currently
compacted. That's done using incremental sstable selector.

Function to calculate maximum purgeable timestamp is made 10 times
faster when compacting sstables overlap with 10% of all sstables.

Fixes #1322.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-12-09 16:17:17 -02:00
Tomasz Grabiec
1b5f338c17 memtable: Track flushed memory in memtable object 2016-12-05 12:59:09 +01:00
Tomasz Grabiec
c3768fe4de memtable: Pass dirty_memory_manager& to memtable constructor
The implementation assumes that memtable's region group is owned by
dirty_memory_manager, and tries to obtain a reference to it like this:

  boost::intrusive::get_parent_from_member(_region.group(), &dirty_memory_manager::_region_group));

This is undefined behavior when the region's group does not come from
dirty manager. It's safer to be explicit about this dependency by
taking a reference to dirty_memory_manager in the constructor.
2016-12-05 12:59:09 +01:00
Glauber Costa
d7256e7b21 database: do not call seal directly from the streaming timer
Streaming memtable have a delayed mode where many flushes are coalesced
together into one, with the actual flush happening later and propagated
to all the previous waiters.

However, the timer that triggers the actual flush was not using the
newly introduced flush infrastructure. This was a minor problem because
those flushes wouldn't try to take the semaphore, and so we could have
many flushes going on at the same time.

What was a potential performance issue became a correctness issue when
we moved the reversal of the dirty memory accounting out of
revert_potentially_cleaned_up_memory() into remove_from_flush_manager().

Since the latter is only called through the flush infrastructure, it
simply wasn't called. So the deferral of the reversal exposed this bug.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <0d5755375bc27524b8cfb9970c76d492b14d9eea.1480522742.git.glauber@scylladb.com>
2016-11-30 18:00:55 +01:00
Tomasz Grabiec
b5d5612f98 database: Add counter for timed out writes 2016-11-29 16:40:59 +01:00
Tomasz Grabiec
2c561ecaed db: Allow writes to be timed out 2016-11-29 16:40:58 +01:00
Tomasz Grabiec
b1ae6ad2ad db: Introduce counters for failed reads and writes 2016-11-29 16:40:58 +01:00
Raphael S. Carvalho
f141b0cdae database: atomically add new sstables to cf when refreshing
New sstables are loaded and added in parallel, meaning that scylla can
potentially return stale data if a new sstable containing a tombstone
wasn't loaded yet. Compaction should also not run until all new sstables
are added for similar reasons.

Fix is about separating blocking and non-blocking steps to allow
atomic add of multiple new sstables.

Fixes #1368.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <14283b8a4a69127071d1fabef320a93c91817ec2.1480356073.git.raphaelsc@scylladb.com>
2016-11-28 20:30:48 +02:00
Avi Kivity
28857e42e7 Merge " Virtualize size_estimates system table" from Duarte
"We currently write the size_estimates system table for every schema
on a periodic basis, currently set to 5 minutes, which can interfere
with an ongoing workload.

This patchset virtualizes it such that queries are intercepted and we
calculate the results on the fly, only for the ranges the caller is interested in.

Fixes #1616"

* 'virtual-estimates/v4' of github.com:duarten/scylla:
  size_estimates_virtual_reader: Add unit test
  db: Delete size_estimates_recorder
  size_estimates: Add virtual reader
  column_family: Add support for virtual readers
  storage_service: get_local_tokens() returns a future
  nonwrapping_range: Add slice() function
  range: Find a sequence's lower and upper bounds
  system_keyspace: Build mutations for size estimates
  size_estimates: Store the token range as bytes
  range_estimates: Add schema
  murmur3_partitioner: Convert maximum_token to sstring
2016-11-28 10:12:59 +02:00
Glauber Costa
c32803f2f0 database: move reversion of virtual dirty state closer to update_cache.
When we finish writing a memtable, we revert the dirty memory charges
immediately. When we do that, dirty memory will grow back to what it
was, and soon (we hope) will go down again when we release the requests
for real.

During that time, we may not accept new requests. Sealing can take a
long time, specially in the face of Linux issues like the ones we have
seen in the past. It also will take proportionally more time if the
SSTables end up being small, which is a possibility in some scenarios.

This patch changes the dirty_memory_manager so that the charges won't be
reverted right after we finish the flush. Rather, we will hold on to it,
and revert it right before we update the cache. We don't need to do it
for all classes of memtable writes, because after we finish flushing,
flush_one() will destroy the hashed element anyway.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <2d5a8f6ca57d5036f4850ac163557bca59b8063d.1480004384.git.glauber@scylladb.com>
2016-11-24 18:18:15 +01:00
Glauber Costa
0ca8c3f162 database: keep a pointer to the memtable list in a memtable
We current pass a region group to the memtable, but after so many recent
changes, that is a bit too low level. This patch changes that so we pass
a memtable list instead.

Doing that also has a couple of advantages. Mainly, during flush we must
get to a memtable to a memtable_list. Currently we do that by going to
the memtable to a column family through the schema, and from there to
the memtable_list.

That, however, involves calling virtual functions in a derived class,
because a single column family could have both streaming and normal
memtables. If we pass a memtable_list to the memtable, we can keep
pointer, and when needed get the memtable_list directly.

Not only that gets rid of the inheritance for aesthetic reasons, but
that inheritance is not even correct anymore. Since the introduction of
the big streaming memtables, we now have a plethora of lists per column
family and this transversal is totally wrong. We haven't noticed before
because we were flushing the memtables based on their individual sizes,
but it has been wrong all along for edge cases in which we would have to
resort to size-based flush. This could be the case, for instance, with
various plan_ids in flight at the same time.

At this point, there is no more reason to keep the derived classes for
the dirty_memory_manager. I'm only keeping them around to reduce
clutter, although they are useful for the specialized constructors and
to communicate to the reader exactly what they are. But those can be
removed in a follow up patch if we want.

The old memtable constructor signature is kept around for the benefit of
two tests in memtable_tests which have their own flush logic. In the
future we could do something like we do for the SSTable tests, and have
a proxy class that is friends with the memtable class. That too, is left
for the future.

Fixes #1870

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <811ec9e8e123dc5fc26eadbda82b0bae906657a9.1479743266.git.glauber@scylladb.com>
2016-11-21 18:18:27 +02:00
Duarte Nunes
cd7e2fd602 column_family: Add support for virtual readers
Virtual readers allow queries to selected tables, usually system
tables, to be answered by the engine. This is useful for tables which
aren't written by users and whose contents can be calculated on
demand.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-11-21 11:15:05 +00:00
Glauber Costa
461778918b fix shutdown and exception conditions for flush logic
This patch addresses post-merge follow up comments by Tomek.
Basically, what we do is:
- we don't need to signal() from remove_from_flush_manager(), because
  the explicit flushes no longer wait on the condition variable. So we
  don't.
- We now wait on the stop() flushes (regardless of their return status)
  so we can make sure that the _flush_queue will indeed be done with.
- we acquire the semaphore before shutting down the dirty_memory_manager
  to make sure that there are no pending flushes
- the flush manager that holds the semaphore has to match in the exception
  handler

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <a23ab5098934546c660a08de64cd9294bb3a2008.1479400239.git.glauber@scylladb.com>
2016-11-17 21:16:44 +01:00
Glauber Costa
f08162e181 database: rework memtable flush logic
The way we currently flush memtables, we seal the current one but wait
on a semaphore for the actual flush to proceed.

This is pointless, because if the flush is not proceeding we'll use up
memory for the new entries anyway, be them in a newly opened memtable or
not. As a matter of fact, by opening a new memtable we are foregoing
coalescing opportunities.

After recent changes to the flush paths, we are now in a position to do
differently. We move the semaphore earlier, and if we can't acquire it
we keep appending to the current memtable.

For explicit flushes, we'll queue and prioritize them over memory-based
flushes. This has the nice property of potentially coalescing various
flushes for the same CF into one.

Coalescing flushes for the same CF is particularly helpful for
commitlog-initiated flushes that can't complete within the flush period.
What we see currently, is that under heavy load the commitlog will keep
sealing memtables adding to the existing load.

Another interesting property of this approach is that we can keep the
disk utilization higher, by allowing a new flush to start before the
memtable is fully sealed. By design, every time a memtable is finished
flushing it will call revert_potentially_cleaned_up_memory() to revert
the virtual memory charges.  That is the perfect moment for us to act.
It indicates that all the data flushing part is done.

The way we'll do it is by keeping the semaphore_units alive for this
memtable. When the flush ends, we destroy that object. This will
effectively trigger the next flush if there is a next flush that can be
initiated.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-11-16 21:20:58 -05:00
Glauber Costa
895e838ac0 get rid of max_memtable_size
After recent changes to the memtable code, there is no reason for us to
uphold a maximum memtable size. Now that we only flush one memtable at a
time anyway, and also have soft limit notifications from the
region_group_reclaimer, we can just set the soft limit to the target
size and let all of that be handled by the dirty_memory_manager.

It does have the added property that we'll be flushing when we globally
reach the soft limit threshold. In conditions in which we have multiple
CF writes fighting for memory, that guarantees that we will start
flushing much earlier than the hard limit.

The threshold is set to 1/4 of dirty memory. While in theory we would
prefer the memtables to go as big as 1/2 of dirty memory, in my
experiments I have found 1/4 to be a better fit, at least for the
moment.

The reason for such behavior is that in situations where we have slow
disks, setting the soft limit to 1/2 of dirty will put us in a situation
in which we may not have finished writing down the memtable when we hit
the limit, and then throttle. When set the threshold to 1/4 of dirty, we
don't throttle at all.

This behavior could potentially be fixed by not doing the full
memtable-based throttling after we do the commitlog throttling, but that
is not something realistic for the moment.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-11-16 21:20:24 -05:00
Glauber Costa
2ed3f342c1 pass a region to dirty_memory_manager accounting API
We would like to know from which region is a particular flush coming
from, and account accordingly. The reasoning behind that, is that soon
we'll be driving the flushes internally from the dirty_memory_manager
without explcitly triggering them.

We need to start a flush before the current one finishes, otherwise
we'll have a period without significant disk activity when the current
SSTable is being sealed, the caches are being updated, etc.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-11-16 21:20:24 -05:00
Tomasz Grabiec
c1a7e2090e Revert "database: change find_column_families signature so it returns a lw_shared_ptr"
This reverts commit f3528ede65.
2016-11-04 10:48:21 +01:00
Tomasz Grabiec
6366eb5cf8 Revert "correctly calculate latencies for writes"
This reverts commit a382f10fc4.
2016-11-04 10:48:02 +01:00
Glauber Costa
a382f10fc4 correctly calculate latencies for writes
Right now we are calculating latencies only when we are about to add an
item to the memtable.

That's incorrect and misleading, for two reasons. First, it leaves the
commitlog latencies out. But second, it is done after the memtable wall
effect is applied, which means we are not counting throttle time neither
in the memtables or in the commitlog.

To do that, we'll start the latency_counter object as soon as possible
and move it all the way to apply_in_memory(). That should span the
entire write operation.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <4e424780d290fd5938046060df2b17e2b470b717.1478111467.git.glauber@scylladb.com>
2016-11-03 13:27:31 +01:00
Glauber Costa
f3528ede65 database: change find_column_families signature so it returns a lw_shared_ptr
There are places in which we need to use the column family object many
times, with deferring points in between. Because the column family may
have been destroyed in the deferring point, we need to go and find it
again.

If we use lw_shared_ptr, however, we'll be able to at least guarantee
that the object will be alive. Some users will still need to check, if
they want to guarantee that the column family wasn't removed. But others
that only need to make sure we don't access an invalid object will be
able to avoid the cost of re-finding it just fine.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>
2016-11-03 13:27:31 +01:00
Paweł Dziepak
6755a679f6 drop key readers
key_readers weren't used since introduction of continuity flag to cache
entries.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Glauber Costa
33e9c2bbdd memtable: reduce sstable flush concurrency to one
Limiting the concurrency of memtable flushes to 4 was a temporary
workaround for the fact that we lacked good write behind support. Now
that write behind is properly merged we can reduce the concurrency to
what it should be, one.

This means that memtable flushes will now be serialized, and only when
one of them ends will the next one begin. Disk parallelism is obtained
through the write-behind mechanism.

Fixes #1373

Signed-off-by: Glauber Costa <glauber@scylladb.com>

Message-Id: <528f9ef928b5101bed952df600eb8555c275497a.1475881100.git.glauber@scylladb.com>
2016-10-09 10:48:57 +03:00
Tomasz Grabiec
2a5a90f391 db: Do not timeout streaming readers
There is a limit to concurrency of sstable readers on each shard. When
this limit is exhausted (currently 100 readers) readers queue. There
is a timeout after which queued readers are failed, equal to
read_request_timeout_in_ms (5s by default). The reason we have the
timeout here is primarily because the readers created for the purpose
of serving a CQL request no longer need to execute after waiting
longer than read_request_timeout_in_ms. The coordinator no longer
waits for the result so there is no point in proceeding with the read.

This timeout should not apply for readers created for streaming. The
streaming client currently times out after 10 minutes, so we could
wait at least that long. Timing out sooner makes streaming unreliable,
which under high load may prevent streaming from completing.

The change sets no timeout for streaming readers at replica level,
similarly as we do for system tables readers.

Fixes #1741.

Message-Id: <1475840678-25606-1-git-send-email-tgrabiec@scylladb.com>
2016-10-07 15:41:04 +03:00
Glauber Costa
aa6a96d09b database: export virtual dirty bytes region group
Currently, we export the region group where memtables are placed as dirty bytes.
Upcoming patches will optimistically mark some bytes in this region as free, a
scheme we know as "virtual dirty".

We are still interested in knowing the real state of the dirty region, so we
will keep track of the bytes virtually freed and split the counters in two.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-09-27 12:09:08 -04:00
Asias He
b505e34062 database: Introduce make_streaming_reader
The make_streaming_reader returns a combined mutation reader reads
mutations from sstables and memtable. The memtable reader handles
memtable flushing automatically so no special handling is needed here.

It will be used by streaming soon.
2016-09-26 16:02:48 +08:00
Raphael S. Carvalho
67343798cf api: implement api to return sstable count per level
'nodetool cfstats' wasn't showing per-level sstable count because
the API wasn't implemented.

Fixes #1119.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0dcdf9196eaec1692003fcc8ef18c77d0834b2c6.1474410770.git.raphaelsc@scylladb.com>
2016-09-21 09:13:40 +03:00
Raphael S. Carvalho
b9f67351da db: expose clustering filter info via collectd
That's needed to observe behavior of clustering filter, and to
check if it's worthwhile for a specific workload.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 11:32:23 -03:00
Duarte Nunes
f4cf2f2aef tracing: Make trace_state_ptr argument required
This patch makes the optional trace_state_ptr arguments introduced in
previous patches mandatory where possible. Functions which are called
internally don't have a trace context, so for those we keep the
argument's default value for convenience.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:32 +02:00
Duarte Nunes
030db65c62 database: Accept a trace_state_ptr
This patch changes the database and column_family types so a
trace_state_ptr can be passed in when querying. This enables tracing
of the inner components.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:28 +02:00
Glauber Costa
0f413695ac database: make column family stats mutable
The make_reader method is currently a const method, but we would like to start
keeping hit statistics from it.

Instead of relaxing the const condition too much, we can just mark the _stats
field as mutable, indicating that make_reader will not be able to change
anything in the CF, except for keeping statistics.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:13:24 -04:00
Glauber Costa
5c4d73577a initialize sstables_per_read histogram with 35 instead of 90 buckets
This is to match what Cassandra does. Nodetool may be expecting this on the
other side.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:13:24 -04:00
Glauber Costa
4310635bae move estimated histogram to utils
Nothing sstable-specific in it, really.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:13:23 -04:00
Glauber Costa
ffc2131c51 decouple estimated_histogram from sstables
There is nothing really that fundamentally ties the estimated histogram to
sstables. This patch gets rid of the few incidental ties. They are:

 - the namespace name, which is now moved to utils. Users inside sstables/
   now need to add a namespace prefix, while the ones outside have to change
   it to the right one
 - sstables::merge, which has a very non-descriptive name to begin with, is
   changed to a more descriptive name that can live inside utils/
 - the disk_types.hh include has to be removed - but it had no reason to be
   here in the first place.

Todo, is to actually move the file outside sstables/. That is done in a separate
step for clarity.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:13:23 -04:00
Piotr Jastrzebski
3607d99269 Remove clustering_key_filtering_context.
Remove clustering_key_filter_factory and clustering_key_filtering_context.
Use partition_slice directly with a static get_ranges method.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-08-30 20:31:55 +02:00
Piotr Jastrzebski
636a4acfd0 Add flag to configure
max size of a cached partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-07-21 09:47:20 +02:00
Duarte Nunes
3518db531e database: Get non-system column_families
This patch adds an utility function that allows fetching the set of
column_families that do not belong to the system keyspace.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-18 23:58:31 +00:00
Duarte Nunes
4bc00c2055 database: Expose selection of sstables by a range
This patch allows a set of a column_family's sstables to be
selected according to a range of ring_positions.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-18 23:58:31 +00:00
Avi Kivity
24e3026e32 Merge "compaction manager refactoring" from Raphael 2016-07-10 17:16:23 +03:00
Tomasz Grabiec
c0233c877d db: Avoid out-of-memory when flushing cannot keep up
memtable_list::seal_on_overlflow() is called on each mutation to check
if current memtable should be flushed. It will call
memtable_list::seal_active_memtable() when that is the case.

The number of concurrent seals is guarded by a semaphore, starting
from commit 0f64eb7e7d, and allows
at most 4 of them.

If there are 4 flushes already pending, every incoming mutation will
enqueue a new flush task on the semaphore's wait list, without waiting
for it. The wait queue can grow without bounds, eventually leading to
out-of-memory.

The fix is to seal the memtable immediately to satisfy should_flush()
condition, but limit concurrency of actual flushes. This way the wait
queue size on the semaphore is limited by memtables pending a flush,
which is fairly limited.

Message-Id: <1467997652-16513-1-git-send-email-tgrabiec@scylladb.com>
2016-07-10 10:53:51 +03:00
Raphael S. Carvalho
e38f66c6fe database: make certain column family functions const qualified
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 15:05:22 -03:00
Paweł Dziepak
32a5de7a1f db: handle receiving fragmented mutations
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.

Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.

This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
f2ae31711e streaming: inform CF when streaming fails
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
4031c0ed8f streaming: pass plan_id to column family for apply and flush
plan_id is needed to keep track of the origin of mutations so that if
they are fragmented all fragments are made visible at the same time,
when that particular streaming plan_id completes.

Basically, each streaming plan that sends big (fragmented) mutations is
going to have its own memtables and a list of sstables which will get
flushed and made visible when that plan completes (or dropped if it
fails).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
51ec7a7285 db: wait for ongoing flushes at end of streaming
When flush_streaming_mutations() is called at the end of streaming it is
supposed to flush all data and then invalidate cache. ranges However, if
there are already some memtable flushes in progress it won't wait for them.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Glauber Costa
54ce6221a7 allow the dirty memory manager to be used without a database object
Some of our tests don't provide a database object to a CF. Create a default
dirty memory manager object that can be used without a database for them.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <872f8c9232ff87d788e271b1db86c814d7a75d9f.1467832713.git.glauber@scylladb.com>
2016-07-07 10:00:43 +01:00
Glauber Costa
b0932ceb04 database: act on LSA pressure notification
Issue 1195 describes a scenario with a fairly easy reproducer in which we can
freeze the database. That involves writing simultaneously to multiple CFs, such
that the sum of all the memory they are using is larger than the dirty memory
limit, without not any of them individually being larger than the memtable size.
Because we will never reach the individual memtable seal size for any of them,
none of them will initiate a flush leading the database to a halt.

The LSA has now gained infrastructure that allow us to be notified when pressure
conditions mount. What we will do in this case is initiate a flush ourselves.

Fixes #1195

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 17:46:28 -04:00
Glauber Costa
7169b727ea move system tables to its own region
In the spirit of what we are doing for the read semaphore, this patch moves
system writes to its own dirty memory manager. Not only will it make sure that
system tables will not be serialized by its own semaphore, but it will also put
system tables in its own region group.

Moving system tables to its own region group has the advantage that system
requests won't be waiting during throttle behind a potentially big queue of user
requests, since requests are tended to in FIFO order within the same region
group. However, system tables being more controlled and predictable, we can
actually go a step further and give them some extra reservation so they may not
necessarily block even if under pressure (up to 10 MB more).

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 17:46:28 -04:00
Glauber Costa
c358947284 database: wrap semaphore and region group into a new dirty memory manager
We currently have a semaphore in the column family level that protects us against
multiple concurrent sstable flushes. However, storing that semaphore into the CF,
not the database, was a (implementation, not design) mistake.

One comment in particular makes it quite clear:

   // Ideally, we'd allow one memtable flush per shard (or per database object), and write-behind
   // would take care of the rest. But that still has issues, so we'll limit parallelism to some
   // number (4), that we will hopefully reduce to 1 when write behind works.

So I aimed for the shard, but ended up coding it into the CF because that's closer to the
flush point - my bad.

This patch fixes this while paving the way for active reclaim to take place. It wraps the semaphore
and the region group in a new structure, the dirty_memory_manager. The immediate benefit is that we
don't need to be passing both the semaphore and the region group downwards in the DB -> CF path. The
long term benefit is that we now have a one unified structure that can hold shared flush data in all
of the CFs.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 15:29:04 -04:00
Glauber Costa
0c31f3e626 database: move memtable throttler to the LSA throttler
The LSA infrastructure, through the use of its region groups, now have
a throttler mechanism built-in. This patch converts the current throttlers
so that the LSA throttler is used instead.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 15:05:19 -04:00