Commit Graph

10251 Commits

Author SHA1 Message Date
Avi Kivity
d0308ff488 Merge seastar upstream
* seastar 81df893...ab29b12 (1):
  > core: Fix bug in make_file_impl() which affects directory scanning
2016-08-17 21:57:03 +03:00
Amos Kong
9d53305475 systemd: have the first housekeeping check right after start
Issue: https://github.com/scylladb/scylla/issues/1594

Currently systemd run first housekeeping check at the end of
first timer period. We expected it to be run right after start.

This patch makes systemd to be consistent with upstart.

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <4cc880d509b0a7b283278122a70856e21e5f1649.1471433388.git.amos@scylladb.com>
2016-08-17 16:02:00 +03:00
Avi Kivity
0033197ba9 Merge seastar upstream
* seastar 823a404...81df893 (3):
  > memory: Do not increase g_allocs on failure in allocate and allocate_aligned
  > memory: Balance the g_frees and g_allocs
  > Merge "thread: explicitly yield on get()" from Glauber

Fixes #1586.
2016-08-17 13:28:30 +03:00
Avi Kivity
4871b19337 Merge "Fixes for streamed_mutation_from_mutation" from Paweł
"This series contains fixes for two memory leaks in
streamed_mutation_from_mutation.

Fixes #1557."
2016-08-17 13:24:22 +03:00
Avi Kivity
e7eb76fc58 Introduce stdx.hh header file
So we don't have to create an stdx = std::experimental alias everywhere.
Message-Id: <1471417039-21391-1-git-send-email-avi@scylladb.com>
2016-08-17 11:19:49 +01:00
Paweł Dziepak
148e9c5608 streamed_mutation_from_mutation: fix destroying bi::sets
Once unlink_leftmost_without_rebalance() has been called on a bi::set no
other method can be used. This includes clear_and_disposed() used by the
mutation_partition destructor.

We like unlink_leftmost_without_rebalance() because it is efficient, so
the solution is to manually finish destroying clustering row and range
tombstone sets in the reader destructor using that function.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-08-17 11:03:59 +01:00
Paweł Dziepak
fe9575d01d streamed_mutation_from_mutation: fix leak on allocation failure
mutation_fragment() constructor allocates memory. If it fails the
already unlinked parts of mutation (either rows_entry or range_tombtone)
will be leaked.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-08-17 11:02:24 +01:00
Benoit Canet
90ef150ee9 systemd: Remove WorkingDirectory directive
The WorkingDirectory directive does not support environment variables on
systemd version that is shipped with Ubuntu 16.04. Fortunately, not
setting WorkingDirectory implicitly sets it to user home directory,
which is the same thing (i.e. /var/lib/scylla).

Fixes #1319

Signed-of-by: Benoit Canet <benoit@scylladb.com>
Message-Id: <1470053876-1019-1-git-send-email-benoit@scylladb.com>
2016-08-17 12:34:11 +03:00
Raphael S. Carvalho
108fd1fade database: close file in lister
After listing is done, let's close file. This fixes no bug.
It's only an improvement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2f52d297bcf6a6b6e3429912c28f17e6b37f8842.1471381607.git.raphaelsc@scylladb.com>
2016-08-17 11:01:44 +03:00
Glauber Costa
b361dee488 database: memtables pending flushes tell us nothing
We have two counters that tracks how many memtable flushes are in progress, and
how much memory are they pinning.

The problem is, after we have revamped the code to limit the amount of flushes
in progress, those counters became useless: as they live inside the semaphore
side, they will only be incremented once we have past the semaphore.

One wouldn't notice if working with CPU-bound problems, where memtables don't
pile. But as soon as they do, those counters will always show the same numbers:
the depth of the semaphore, which doesn't mean much. The problem is poised to
become much worse: once we enable write behind in full and set the semaphore's
depth to one, that's the number we'll see here all the time.

The fix is to move the counters outside the semaphore, which will bring back its
old semantics.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c5ae6903e170f3f356cdda7ed78a4c9ba8d5f024.1471370504.git.glauber@scylladb.com>
2016-08-17 10:54:15 +03:00
Piotr Jastrzebski
bb0c4c3c40 Fix compilation errors
query::range parameter in mutation_partiton::range
has to be changed to nonwrapping_range.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <36e444bfe90586f8d3b08ca36d8dc13d5898ef97.1471347402.git.piotr@scylladb.com>
2016-08-16 12:49:54 +01:00
Avi Kivity
bf02ca831d Merge "Tracing: change a back pressure scheme" from Vlad
"This series changes the tracing back pressure scheme from limiting the amount traces in
a single session by a fixed number to have a per-shard budget consumed by all active tracing
sessions.

It was really easy to cause the traces to be dropped even if there weren't too many
active traces: e.g. if there was a single active session which creates more traces
than a per-session limit (30) the traces above 30-th were going to be dropped. Namely
traces were dropped when there were only 30 active traces, which is ridiculous.

This series introduces two main changes:
   - Changes the records budgeting from being per-session to be per-shard. This substantially
     increases the amount of active records after which new records are going to be dropped.
   - Introduces a flow when events' records are written BEFORE the corresponding tracing
     session is over (right now traces are written to I/O back end only when the session object
     is destroyed).

The later is meant to virtually eliminate the traces drops in normal situations at all.
Of course, if a back end is slow or if there are a lot of small sessions that do not complete we would still have
to drop new sessions/records in order to avoid uncontrolled growth of a memory foot print of Tracing.

If we see the later case happening a lot in the future we may add lowres timers to each session that would
commit the cached records for writing every X time. But let's not try to optimize something that we
are not completely sure has to be optimized... "
2016-08-16 12:21:02 +03:00
Amnon Heiman
0706db9387 API: use the estimated sum when converting histogram to json
The function that convert histogram to the json histogram object need to
use the estimated_sum to get the actual sum and not the sampled sum.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1467547341-30438-3-git-send-email-amnon@scylladb.com>
2016-08-16 11:06:51 +03:00
Amnon Heiman
4c14b2a527 histogram: Add an estimated sum method
The histogram implementation uses sampling to estimate the mean and sum.
This patch adds a method that returns an estimated sum based on the mean
and the total number of events measured.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1467547341-30438-2-git-send-email-amnon@scylladb.com>
2016-08-16 11:06:50 +03:00
Pekka Enberg
fde8677f1a cql3/query_processor: Clean up code formatting
Currently, query_processor.cc code formatting is all over the place,
which makes the file hard to read. Apply some formatting magic to make
it prettier.

Message-Id: <1470832486-26020-2-git-send-email-penberg@scylladb.com>
2016-08-16 10:39:15 +03:00
Pekka Enberg
ce07822f49 cql3/query_processor: Use type deduction to make code more readable
Use the 'auto' specifier for variables and lambda parameters to make the
code more readable.

Message-Id: <1470832486-26020-1-git-send-email-penberg@scylladb.com>
2016-08-16 10:39:11 +03:00
Avi Kivity
b1f9688432 Merge "range: Add nonwrapping_range" from Duarte
"Ranges that wrap around are a source of complexity and bugs. This patchset
adds a nonwrapping_range class, which specifies the range can't wrap around.
It is the user of the nonwrapping_range that is required to enforce this
constraint.

The idea is to incrementaly disallow ranges that wrap around. We do it
for query::clustering_range in this patchset, and it can be done similarly
for other ranges. This moves the burden of unwrapping ranges to the edges.

Fixes #1544"
2016-08-16 10:08:24 +03:00
Duarte Nunes
5161ea283f query: query::clustering_range can't wrap around
This patch changes the type of query::clustering_range to express that
ranges that wrap around are not allowed, and ranges that have the
start bound after the end bound are considered empty.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:50:20 +00:00
Duarte Nunes
3275fabe53 storage_proxy: Short circuit query without clustering ranges
This patch makes the storage_proxy return an empty result when the
query doesn't define any clustering ranges (default or specific).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
56f10abce3 thrift: Don't always validate clustering range
This patch makes make_clustering_range not enforce that the range be
non-wrapping, so that it can be validated differently if needed. A
make_clustering_range_and_validate function is introduced that keeps
the old behavior.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
be4adf212a nonwrapping_range: Add unit tests
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
bb16e194bc range: Add nonwrapping_range class
This patch introduces the nonwrapping_range class. This class is
intended to be used by code that requires non wrapping ranges.
Internally, it uses a wrapping_range. Users are responsible for
ensuring the bounds are correct when creating a nonwrapping_range.

The path proposed here is to incrementally replace usages of
wrapping_range/range by nonwrapping_range, pushing usages of wrapping
ranges as further to the edges as possible.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
2bb428973a range: Rename to wrapping_range
This patch renames range to wrapping_range in preparation for adding a
new range type, nonwrapping_range.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
2c0b049176 clustering_key_filter: Don't forward declare range
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Paweł Dziepak
5cae44114f partition_version: handle errors during version merge
Currently, partition snapshot destructor can throw which is a big no-no.
The solution is to ignore the exception and leave versions unmerged and
hope that subsequent reads will succeed at merging.

However, another problem is that the merge doesn't use allocating
sections which means that memory won't be reclaimed to satisfy its
needs. If the cache is full this may result in partition versions not
being merged for a very long time.

This patch introduces partition_snapshot::merge_partition_versions()
which contains all the version merging logic that was previously present
in the snapshot destructor. This function may throw so that it can be
used with allocating sections.

The actual merging and handling of potential erros is done from
partition_snapshot_reader destructor. It tries to merge versions under
the allocating section. Only if that fails it gives up and leaves them
unmerged.

Fixes #1578
Fixes #1579.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471265544-23579-1-git-send-email-pdziepak@scylladb.com>
2016-08-15 15:56:53 +03:00
Asias He
ef782f0335 gossip: Add heart_beat_version to collectd
$ tools/scyllatop/scyllatop.py '*gossip*'

node-1/gossip-0/gauge-heart_beat_version 1.0
node-2/gossip-0/gauge-heart_beat_version 1.0
node-3/gossip-0/gauge-heart_beat_version 1.0

Gossip heart beat version changes every second. If everyting is working
correctly, the gauge-heart_beat_version output should be 1.0. If not,
the gauge-heart_beat_version output should be less than 1.0.

Message-Id: <cbdaa1397cdbcd0dc6a67987f8af8038fd9b2d08.1470712861.git.asias@scylladb.com>
2016-08-15 12:32:00 +03:00
Nadav Har'El
0d00da7f7f sstables: don't forget to read static row
[v2: fix check for static column (don't check if the schema is not compound)
 and move want-static-columns flag inside the filtering context to avoid
 changing all the callers.]

When a CQL request asks to read only a range of clustering keys inside
a partition, we actually need to read not just these clustering rows, but
also the static columns and add them to the response (as explained by Tomek
in issue #1568).

With the current code, that CQL request is translated into an
sstable::read_row() with a clustering-key filter. But this currently
only reads the requested clustering keys - NOT the static columns.

We don't want sstable::read_row() to unconditionally read the from disk
the static columns because if, for example, they are already cached, we
might not want to read them from disk. We don't have such partial-partition
cache yet, but we are likely to have one in the future.

This patch adds in the clustering key filter object a flag of whether we
need to read the static columns (actually, it's function, returning this
flag per partition, to match the API for the clustering-key filtering).

When sstable::read_row() sees the flag for this partition is true, it also
request to read the static columns.
Currently, the code always passes "true" for this flag - because we don't
have the logic to cache partially-read partitions.

The current find_disk_ranges() code does not yet support returning a non-
contiguous byte range, so this patch, if it notices that this partition
really has static columns in addition to the range it needs to read,
falls back to reading the entire partition. This is a correct solution
(and fixes #1568) but not the most efficient solution. Because static
columns are relatively rare, let's start with this solution (correct
by less efficient when there are static columns) and providing the non-
contiguous reading support is left as a FIXME.

Fixes #1568

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1471124536-19471-1-git-send-email-nyh@scylladb.com>
2016-08-15 12:30:19 +03:00
Avi Kivity
4fcebd4ca6 random_partitioner: fix overflow in shard_of()
uint128_t will overflow if smp::count > 2.  Replace with a larger type.

Message-Id: <1471188765-30142-1-git-send-email-avi@scylladb.com>
2016-08-15 09:41:54 +03:00
Amnon Heiman
612f677283 scylla.spec: conditionally include the housekeeping.cfg in the conf package
When the housekeeping configuration name was changed from conf to cfg it
was no longer included as part of the conf rpm.

This change adds a macro that determines of if the file should be
included or not and use that marco to conditionally add the
configuration file to the rpm.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1471169042-19099-1-git-send-email-amnon@scylladb.com>
2016-08-14 13:25:59 +03:00
Tomasz Grabiec
1b2ea14d0e partition_version: Add missing linearization context
Snapshot removal merges partitions, and cell merging must be done
inside linearization context.

Fixes #1574

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1471010625-18019-1-git-send-email-tgrabiec@scylladb.com>
2016-08-12 17:55:23 +03:00
Piotr Jastrzebski
f212a6cfcb Fix after free access bug in storage proxy
Due to speculative reads we can't guarantee that all
fibers started by storage_proxy::query will be finished
by the time the method returns a result.

We need to make sure that no parameter passed to this
method ever changes.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <31952e323e599905814b7f378aafdf779f7072b8.1471005642.git.piotr@scylladb.com>
2016-08-12 16:34:43 +02:00
Duarte Nunes
918a2939ff docker: If set, broadcast address is seed
This patch configures the broadcast address to be the seed if it is
configured, otherwise Scylla complains about it and aborts.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470863058-1011-1-git-send-email-duarte@scylladb.com>
2016-08-12 11:46:50 +03:00
Avi Kivity
21392cf5fd Merge seastar upstream
* seastar 7fd8d49...823a404 (1):
  > io_priority_class: remove non-explicit operator unsigned
2016-08-11 17:20:23 +03:00
Avi Kivity
65aa9135a1 Merge seastar upstream
* seastar 59613e7...7fd8d49 (1):
  > reactor: Do not test for poll mode default
2016-08-11 14:46:45 +03:00
Amnon Heiman
5a4fc9c503 scylla-housekeeping: rename configuration file from conf to cfg
Files with a conf extension are run by the scylla_prepare on the AMI.
The scylla-housekeeping configuration file is not a bash script and
should not be run.

This patch changes its extension to cfg which is more python like.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470896759-22651-2-git-send-email-amnon@scylladb.com>
2016-08-11 14:44:56 +03:00
Tomasz Grabiec
f1c2481040 sstables: Fix bug in promoted index generation
maybe_flush_pi_block, which is called for each cell, assumes that
block_first_colname will be empty when the first cell is encountered
for each partition.

This didn't hold after writing partition which generated no index
entry, because block_first_colname was cleared only when there way any
data written into the promoted index. Fix by always clearing the name.

The effect was that the promoted index entry for the next partition
would be flushed sooner than necessary (still counting since the start
of the previous partition) and with offset pointing to the start of
the current partition. This will cause parsing error when such sstable
is read through promoted index entry because the offset is assumed to
point to a cell not to partition start.

Fixes #1567

Message-Id: <1470909915-4400-1-git-send-email-tgrabiec@scylladb.com>
2016-08-11 13:08:48 +03:00
Amnon Heiman
a24941cc5f build_deb: Add dist flag
The dist flag mark the debian package as distributed package.
As such the housekeeping configuration file will be included in the
package and will not need to be created by the scylla_setup.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470907208-502-2-git-send-email-amnon@scylladb.com>
2016-08-11 12:25:07 +03:00
Pekka Enberg
d1a052237d dist/docker: Fix typo in "--overprovisioned" help text
Reported by Mathias Bogaert (@analytically).
Message-Id: <1470904395-4614-1-git-send-email-penberg@scylladb.com>
2016-08-11 11:38:03 +03:00
Nadav Har'El
7409688356 README.md: add another required package
I tried to compile scylladb on a new Fedora 24 system, and the "-lsystemd"
library was missing during like. We need the systemd-devel package for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470865063-12871-1-git-send-email-nyh@scylladb.com>
2016-08-11 10:21:21 +02:00
Avi Kivity
42d8701121 Merge seastar upstream
* seastar 64ae228...59613e7 (20):
  > reactor: fix I/O queue pending requests collectd metric
  > simple_input_stream: introduce copy_to() member function
  > scollectd: Fix merge skew between "disabled" and "descriped" metrics patches
  > Merge "Write-behind for XFS"
  > Merge "Fix the SMP queue poller" from Tomasz
  > Merge "collectd syntactical sugar & descriptions" from Calle
  > Fix build failure introduced by 5b1051ce0de5b772f22444abe6a1d97076b49b1f
  > Add --abort-on-seastar-bad-alloc option
  > Merge "Make arp comply with C++ strict aliasing rules"
  > doc: use install-dependencies.sh on docs
  > add execute bit on install-dependencies.sh
  > core/reactor: Fix use-after-free on io_event's promise
  > tcp: write option length correctly
  > tcp: make tcp options comply with strict aliasing rules
  > tcp: comply with strict aliasing rules
  > add libdl to library list
  > reactor: add exception counter
  > install-dependencies.sh: remove unnecessary sudo
  > install-dependencies.sh: install add-apt-repository when it's not installed
  > install-dependencies.sh: add protobuf to dependencies, for newly added prometheus API support

Fixes #1558.
2016-08-10 15:14:31 +03:00
Tomasz Grabiec
d7f8ce7722 Merge branch 'raphael/fix_min_max_metadata_v2' from git@github.com:raphaelsc/scylla.git
Fix for generation of sstables min/max clustering metadata from Raphael.
2016-08-10 10:43:35 +02:00
Pekka Enberg
6a5ab6bff4 dist/docker: Add '--smp', '--memory', and '--overprovisoned' options
Add '--smp', '--memory', and '--overprovisioned' options to the Docker
image. The options are written to /etc/scylla.d/docker.conf file, which
is picked up by the Scylla startup scripts.

You can now, for example, restrict your Docker container to 1 CPU and 1
GB of memory with:

   $ docker run --name some-scylla penberg/scylla --smp 1 --memory 1G --overprovisioned 1

Needed by folks who want to run Scylla on Docker in production.

Cc: Sasha Levin <alexander.levin@verizon.com>
Message-Id: <1470680445-25731-1-git-send-email-penberg@scylladb.com>
2016-08-10 11:34:08 +03:00
Raphael S. Carvalho
8deb1ca19d tests: add test to check sstables's min and max clustering values
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-08-09 15:54:40 -03:00
Raphael S. Carvalho
ef6ddf2398 sstables: fix tracking of min and max clustering components
Scylla was tracking min and max column names instead. Min and max
clustering components are tracked to optimize reads that use a
clustering filter. For more details:
https://issues.apache.org/jira/browse/CASSANDRA-5514

Also fix potential bug if clustering value is empty.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-08-09 15:01:30 -03:00
Vlad Zolotarov
5deec0e327 tracing::write_complete(): improve a message in case of a logic error
Improve a message if there is a logic error and add logging of such
errors.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 19:00:43 +03:00
Vlad Zolotarov
67d537ecb5 tracing: issue a write event if a single session creates a lot of events
Currently write events are issued every time a trace session is closed.
However if a single session creates a lot of events we will start dropping them
after the total amount of pending records bypasses the limit.

This patch will issue a write event before the session end in that case.

Since now new events may be added to the active tracing session while it's
scheduled for write we have to ensure the following:
   - Not to add the already pending for write session to the pending bulk.
   - Grab all pending data in a specific session in a synchronous way during
     the write event.
   - Serialize creation of events mutations - otherwise the "monotonic nanos"
     logic won't work.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 19:00:43 +03:00
Vlad Zolotarov
5391bcc5a9 tracing: improve a back pressure policy
Use a per-shard tracing records budget instead
of maintaining a fixed-size per-session records budget and
a per-shard sessions budget.

The original policy could lead to some irrational situations,
when we have a single tracing session that creates a substantial
amount of records that we can handle but we would start dropping
new records after it surpasses the per-session limit.

The new policy handles a per-shard trace records budget that is
being consumed by each trace() call and by a primary session destructor
when a session record is created.

Each active record may only be in one of the following states:
   - cached: stored in its session's object. When record is in this state
             it's not going to be written to I/O during the next write event.
   - pending for write: when record is in this state it's going to be written
             to I/O during the next write event.
   - flushing: the record is being currently written to the I/O.

There are counters of the total amount of records in each state above.

Each record may only be in a specific state at every point of time and
thereby it must be accounted only in one and only one of the three
counters.

The sum of all three counters should not be greater than
(max_pending_trace_records + write_event_records_threshold) at any time
(actually it can get as high as a value above plus (max_pending_sessions)
if all sessions are primary but we won't take this into an account for
simplicity).

The same is about the number of outstanding sessions: it may not be greater
than (max_pending_sessions + write_event_sessions_threshold) at any time.

If total number of tracing records is greater or equal to the limit
above, the new trace point is going to be dropped.

If current number or records plus the expected number of trace records
per session (exp_trace_events_per_session) is greater than the limit
above new sessions will be dropped. A new session will also be dropped if
there are too many active sessions.

When the record or a session is dropped the appropriate statistics
counters are updated and there is a rate-limited warning message printed
to the log.

Every time a number of records pending for write is greater or equal to
(write_event_records_threshold) or a number of sessions pending for
write is greater or equal to (write_event_sessions_threshold) a write
event is issued.

Every 2 seconds a timer would write all pending for write records
available so far.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 19:00:43 +03:00
Vlad Zolotarov
d8fe5317d1 tracing::trace_keyspace_helper: make events' mutations applying loop interruptible
When building events' mutation don't apply them in a tight loop
but rather apply each of them in a separate continuation to allow
reactor to interrupt this loop if it takes too long for it to
complete (e.g. where there are a lot of mutations to apply).

Since building all events' mutations is asynchronous now we can
no longer keep the "nanos" state in a global trace_keyspace_helper
object but rather have to move it into the per-session
backend_session_state class.

backend_session_state class is a backend-specific implementation of a
tracing::backend_session_state_base class.

An instance of the above object is created by a
tracing::i_tracing_backend_helper::allocate_session_state() virtual
method and is stored in a tracing::one_session_records object.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 19:00:39 +03:00
Vlad Zolotarov
63a0502ed1 tracing: rework the interface between the tracing/trace_state and the backend
Before this patch the interaction between the layers above was as follows:
   - trace_state was passing the trace event data to a backend object every
     time trace() method was called.
   - trace_state was passing the session data to a backend object in a destructor.
   - A backend object was storing this data in a form of lambda where all data
     above was caught in a capture list. This was primarily done in order to
     delay the call for make_xxx_mutation(). Lambdas were stored in a map by a session
     ID and they were executed when a kick() method was called.
   - A tracing::tracing object was periodically calling a kick() method of a
     backend that was initiating a write of all pending data to the storage.

All backend methods used in the described above interactions were virtual.
Thereby, for instance, for each and every trace record we were calling a virtual method that was
receiving a significant amount of parameters, store a lambda in a map and return.
This is clearly a suboptimal way of using virtual functions since we prevent a compiler
from inlining an obviously inlinable operations.

This patch changes the interaction scheme to be as follows:
   - Trace events and session data are stored and passed around in a form of structs
     that hold all relevant information (no more lambdas).
   - As long as a trace session is active its data is aggregated inside the corresponding
     trace_state object.
   - The object containing all records is passed and stored as a lw_shared_ptr to save extra
     copies and to shorten capture lists.
   - All aggregated data is passed to a tracing::tracing object in a trace_state destructor.
     The data is stored in a std::deque in a tracing::tracing object (instead of a map by a session ID).
   - A single backend's virtual method call writes all data aggregated so far (kick()
     method is not needed any more), every time a write event occurs.
   - Backend has only one virtual method now:
      - Write a bulk of sessions' data aggregated so far.
   - Backend's virtual method receives a records bulk object by reference.

As a result:
   - A latency of a single trace event that has no formatting improved from 0.2us to 0.1us.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 15:25:52 +03:00
Vlad Zolotarov
960b423ce0 tracing/tracing.cc: rename a logger object
s/logger/tracing_logger/

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 15:21:47 +03:00