Currently commitlog_entry_writer constructor calculates serialized size
before it is knows if a schema should be included into the entry. The
result is never used since it is recalculated when schema information is
supplied. The patch removes needless calculation.
Message-Id: <20170614114607.GA21915@scylladb.com>
Currently each time UDT or tuple is parsed new object is created. If
those objects are used to create container type repeatedly it will cause
memory leak since container types are interned, but lookup in the
cache is done using pointer to a contained type (which will be always
different for UDT and tuples). This patches interns also UDT and tuple,
so each type the same object is parsed same pointer is also returned.
Refs #2469Fixes#2487
Message-Id: <20170612142942.GO21915@scylladb.com>
If we do two truncates in a row, the second will have neither memtable
nor sstable data. Thus we will not write/remove sstables, and thus
get no resulting truncation replay position.
Message-Id: <1497378469-6063-1-git-send-email-calle@scylladb.com>
end_bound() returns temporary object (end_bound_ref), so it cannot be
taken by reference here and used later. Copy instead.
Message-Id: <20170612132328.GJ21915@scylladb.com>
Apparently some GDB versions (7.11.1-86.fc24) don't parse double '>'
in a type name, so this:
std::pair<utils::UUID const, seastar::lw_shared_ptr<column_family>>
should be this:
std::pair<utils::UUID const, seastar::lw_shared_ptr<column_family> >
Message-Id: <1497256644-4335-1-git-send-email-tgrabiec@scylladb.com>
The problem is that 'key' is a 'bytes' object now, which doesn't have __format__.
Fixes the following error:
Traceback (most recent call last):
File "~/src/scylla/scylla-gdb.py", line 184, in invoke
TypeError: non-empty format string passed to object.__format__
Error occurred in Python command: non-empty format string passed to object.__format__
Message-Id: <1497253433-374-2-git-send-email-tgrabiec@scylladb.com>
"This series switches repair to use more stream plans to stream the mismatched
sub ranges and use a range generator to produce sub ranges.
Test shows no huge memory is used for repair with large data set.
In addition, we now have a progress reporter in the log how many ranges are processed.
Jun 06 14:18:22 [shard 0] repair - Repair 512 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942]
Jun 06 14:19:55 [shard 0] repair - Repair 513 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942]
Fixes #2430."
* tag 'asias/fix-repair-2430-branch-master-v1' of github.com:cloudius-systems/seastar-dev:
repair: Remove unused sub_ranges_max
repair: Reduce parallelism in repair_ranges
repair: Tweak the log a bit
repair: Use more stream_plan
repair: iterator over subranges instead of list
From seastar-dev.git calle/concorde
Normally, we require that all mutations applied to a column family
have replay positions higher than all previously flushed.
The main reason for this is to be able to determine when to drop a
commit log segment, i.e. determine that all replay positions less
than X are now in sstables.
This patch series, small as it is, relaxes this by instead of just
keeping track of high rp applied, keep a reference count to each
segment per CF in memtables, and on flush, release this very count.
The only case where we need to keep a water mark for RP is then
for table truncation, for which we simply say that the highest RP
applied to the column family is the lowest allowed henceforth,
and use the old reordering logic for this instead. I.e. very rare.
There is of course one (big?) downside to all this, and this is
"normal" commit log replay on startup after crash/shutdown.
Since we relax RP ordering, we cannot use RP:s in sstables as
low marks for replay start, since it is now allowed to exist
non-persisted mutations in commitlog with lower RP:s than
previously flushed. I.e. we more or less always have to replay
the full commit log.
It is worth noting though that due to compaction and the non-
propagation of RP marks to new sstables, we end up often
doing this anyway, so it is hard to say how much of a regression
this is.
truncate
With commitlog keeping use-count per CF id, we can ease the ordering
restriction on replay positiontion. Previously we required that all
added mutations have a position > previously flushed. However, if
we accept that replay must now be all data, by keeping track instead
per CF of highest RP ever entered, we can instead just set a
low mark on truncation, since this is the only remaining hard
RP divider.
Use per CF-id reference count instead, and use handles as result of
add operations. These must either be explicitly released or stored
(rp_set), or they will release the corresponding replay_position
upon destruction.
Note: this does _not_ remove the replay positioning ordering requirement
for mutations. It just removes it as a means to track segment liveness.
Test should
a.) Wait for the flush semaphore
b.) Only compare segement sets between start and end, not start,
end and inbetwen. I.e. the test sort of assumed we started
with < 2 (or so) segments. Not always the case (timing)
Message-Id: <1496828317-14375-1-git-send-email-calle@scylladb.com>
We currently repair all the ranges in parallel.
1) All the ranges will contend for parallelism_semaphore, instead of
processing multiple ranges in parallel and calculating the sub ranges
(which take memory) for each range in parallel, we can handle the ranges
one bye one.
We could have enough parallelism because the checksum are calucated on
all the shards.
2) If for some reason the repair failed, if we handle ranges 1 by 1, we
can log which range of repair is successful. Next time, we can ignore
them. If we start ranges in parallel, it has a high chance, no single
range is completed because all the ranges are on going.
Refs #1912
- Count n out m ranges the repair is running for (kind of progress report)
- Make the 'Found differing range' log debug because it can be millions
of such entries
- Print the failed ranges
In the very beginning, we use a stream_plan for each checksum range.
Later, we changed to use a single stream_plan for all the checksum
ranges. It pushes memory presure to streaming, e.g., millinons of ranges
in a vector to send over RPC.
To fix, we do checksum and streaming in parallel, limit the number of
checksum ranges stored in memory.
Fixes#2430
When starting repair, we divided the large token ranges (vnodes) linto small
subranges of a desired length (around 100 partition), and built a huge list
of those subranges - to iterate over them later and compare checksums of
those chunks.
However, building this list up-front is completely unnecessary, and wastes
a lot of memory: In a test with 1 TB of data, as much as 3 gigabytes was
spent on this list. Instead, what we do in this patch is to find the next
chunk in a DFS-like splitting algorithm, using only the token range
midpoint() function (as before). The amount of memory needed for this is
O(logN), instead of O(N) in the previous implementation.
Refs #2430.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
After change in boot, read_filter is called by distributed loader,
so its update to _filter_file_size is lost. The load variant
which receives foreign components that must do it. We were also
not updating it for newly created sstables.
Fixes#2449.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170606151129.5477-1-raphaelsc@scylladb.com>
At least on Debian8, mk-build-deps -i silently finishes with return code 0
even it fails to install dependencies.
To prevent this, we should manually install the metapackage generated by
mk-build-deps using gdebi.
Fixes#2445
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496737502-10737-2-git-send-email-syuu@scylladb.com>
Installing openjdk-8-jre-headless from jessie-backports breaks texlive on
jessie main repo.
It causes 'Unmet build dependencies' error when building gdb package.
To prevent this, force insatlling texlive from jessie-backports before start
building gdb.
Fixes#2444
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496737502-10737-1-git-send-email-syuu@scylladb.com>
"This series fixes some issues with the thrift_server, namely
ensuring that streams and sockets are properly closed.
Fixes#499Fixes#2437"
* 'thrift-server-fixes/v1' of github.com:duarten/scylla:
thrift/server: Close connections when stopping server
thrift/server: Move connection class to header
thrift/server: Shutdown connection
thrift/server: Close output_stream when connection is done
This patch adds the shutdown() function to thrif_server::connection,
and calls it after a connection is done.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The column mask identifies the kind of atom in a row in an sstable. Two
definitions of these values were present: one as a C-style enumeration and one
as a C++11-style enumeration.
The C++11-style definition is used elsewhere in `sstables.cc`. It also offers
additional type-safety.
Therefore, this commit removes the inlined C-style enumeration.
Fixes#2214.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <c525b4ae7fad3b54480e133921aa4ffe0dd5d9ce.1496352711.git.jhaberku@scylladb.com>
To reduce unwanted dependencies, we need to replace dependency from collectd to
collectd-core.
However, collectd provides /etc/collectd/collectd.conf, so without this package
we need to install the configuration file by our self.
So install the file on .postinst script.
Fixes#2426
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496231743-7828-1-git-send-email-syuu@scylladb.com>
In the write path we don't wait for view updates, as they happen in
the background.
The view schema tests can fail when running with more than one cpu due
to this inherent race condition: the write to the base table returns
while the view updates are still being processed, after which we issue
a query to the view table. The shard handling the view data is not
guaranteed to finish processing the mutation before handling the query.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170531165726.9212-1-duarte@scylladb.com>
invariant is broken if size of L0 candidates is equal to max
sstable size because the overlapping L1 sstables will not be
added to compacting set, and they will be promoted.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170530143708.3775-1-raphaelsc@scylladb.com>
* seastar 68dbf60...b1f69cc (10):
> metrics: fix namespace in documentation
> add special logger for memory allocation failures
> xen: remove
> Merge "sanity checks, fixes and extensions in the perftune.py" from Vlad
> tutorial: more "seastar" namespace
> execution_stages: fix build errors in comments
> tutorial: more "seastar" namespace additions
> tutorial: more minor changes
> tutorial: minor changes to the introduction
> tutorial: start overhauling the examples to use "seastar" namespace
Mimic origin behaviour, iff TLS encryption is enabled, and
native_transport_port_ssl is set and different from
native_transport_port, start both tls- and non-tls
listeners.
Message-Id: <1496061600-24454-2-git-send-email-calle@scylladb.com>
Removing non-murmur3 partitioners will allow us to reduce memory footprint
and speed up some code by utilizing the properties of the murmur3 partitioner
token.
Message-Id: <20170528172536.16079-1-avi@scylladb.com>
"
- Introduce a parent span IP and span ID paradigm.
- Introduce time series tables to simplify traces processing.
- Add the "How to get traces?" chapter to the tracing.md.
"
* 'tracing-span-ids-and-time-series-helpers-v4' of github.com:cloudius-systems/seastar-dev:
docs: tracing.md: add a "how to get traces" chapter
tracing::trace_keyspace_helper: introduce a time series helper tables
tracing: cleanup: use nullptr instead of trace_state_ptr()
tracing: introduce a span ID and parent span ID