Commit Graph

12170 Commits

Author SHA1 Message Date
Gleb Natapov
c7a59ab7ff do not calculate serialized size of commitlog_entry_writer before final format is knows
Currently commitlog_entry_writer constructor calculates serialized size
before it is knows if a schema should be included into the entry. The
result is never used since it is recalculated when schema information is
supplied. The patch removes needless calculation.

Message-Id: <20170614114607.GA21915@scylladb.com>
2017-06-14 14:53:07 +03:00
Gleb Natapov
a032078410 intern also tuple and user defined types
Currently each time UDT or tuple is parsed new object is created. If
those objects are used to create container type repeatedly it will cause
memory leak since container types are interned, but lookup in the
cache is done using pointer to a contained type (which will be always
different for UDT and tuples). This patches interns also UDT and tuple,
so each type the same object is parsed same pointer is also returned.

Refs #2469
Fixes #2487

Message-Id: <20170612142942.GO21915@scylladb.com>
2017-06-14 14:41:17 +03:00
Calle Wilund
525730e135 database: Fix assert in truncate to handle empty memtables+sstables
If we do two truncates in a row, the second will have neither memtable
nor sstable data. Thus we will not write/remove sstables, and thus
get no resulting truncation replay position.
Message-Id: <1497378469-6063-1-git-send-email-calle@scylladb.com>
2017-06-14 11:21:21 +02:00
Gleb Natapov
21197981a5 Fix use after free in nonwrapping_range::intersection
end_bound() returns temporary object (end_bound_ref), so it cannot be
taken by reference here and used later. Copy instead.

Message-Id: <20170612132328.GJ21915@scylladb.com>
2017-06-12 15:34:36 +01:00
Tomasz Grabiec
20095d7ed6 gdb: Fix "scylla column_families" command
Apparently some GDB versions (7.11.1-86.fc24) don't parse double '>'
in a type name, so this:

 std::pair<utils::UUID const, seastar::lw_shared_ptr<column_family>>

should be this:

 std::pair<utils::UUID const, seastar::lw_shared_ptr<column_family> >

Message-Id: <1497256644-4335-1-git-send-email-tgrabiec@scylladb.com>
2017-06-12 11:39:50 +03:00
Tomasz Grabiec
9e7a040f0c gdb: Fix "scylla keyspaces" command
The problem is that 'key' is a 'bytes' object now, which doesn't have __format__.

Fixes the following error:

  Traceback (most recent call last):
    File "~/src/scylla/scylla-gdb.py", line 184, in invoke
  TypeError: non-empty format string passed to object.__format__
  Error occurred in Python command: non-empty format string passed to object.__format__

Message-Id: <1497253433-374-2-git-send-email-tgrabiec@scylladb.com>
2017-06-12 11:22:59 +03:00
Tomasz Grabiec
230683bdfa gdb: Add missing seastar namespace qualifier
Message-Id: <1497253433-374-1-git-send-email-tgrabiec@scylladb.com>
2017-06-12 11:22:53 +03:00
Asias He
2bcb368a13 repair: Fix range use after free
Capture it by value.

scylla:  [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed)
scylla:  [shard 0] repair - Failed sync of range ==<runtime_exception
(runtime error: Invalid token. Should have size 8, has size 0#012)>: streaming::stream_exception (Stream failed)

Message-Id: <7fda4432e54365f64b556e7e4c26e36d3a9bb1b7.1497238229.git.asias@scylladb.com>
2017-06-12 11:00:57 +03:00
Avi Kivity
419ad9d6cb Merge "repair memory usage fix" from Asias
"This series switches repair to use more stream plans to stream the mismatched
sub ranges and use a range generator to produce sub ranges.

Test shows no huge memory is used for repair with large data set.

In addition, we now have a progress reporter in the log how many ranges are processed.

   Jun 06 14:18:22  [shard 0] repair - Repair 512 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942]
   Jun 06 14:19:55  [shard 0] repair - Repair 513 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942]

Fixes #2430."

* tag 'asias/fix-repair-2430-branch-master-v1' of github.com:cloudius-systems/seastar-dev:
  repair: Remove unused sub_ranges_max
  repair: Reduce parallelism in repair_ranges
  repair: Tweak the log a bit
  repair: Use more stream_plan
  repair: iterator over subranges instead of list
2017-06-08 14:19:08 +03:00
Tomasz Grabiec
9b7f170121 gdb: Improve error message
Message-Id: <1496849069-21750-1-git-send-email-tgrabiec@scylladb.com>
2017-06-07 18:26:31 +03:00
Tomasz Grabiec
0dfe1ad431 Merge "Relax replay position ordering requirement" from Calle
From seastar-dev.git calle/concorde

Normally, we require that all mutations applied to a column family
have replay positions higher than all previously flushed.
The main reason for this is to be able to determine when to drop a
commit log segment, i.e. determine that all replay positions less
than X are now in sstables.

This patch series, small as it is, relaxes this by instead of just
keeping track of high rp applied, keep a reference count to each
segment per CF in memtables, and on flush, release this very count.

The only case where we need to keep a water mark for RP is then
for table truncation, for which we simply say that the highest RP
applied to the column family is the lowest allowed henceforth,
and use the old reordering logic for this instead. I.e. very rare.

There is of course one (big?) downside to all this, and this is
"normal" commit log replay on startup after crash/shutdown.
Since we relax RP ordering, we cannot use RP:s in sstables as
low marks for replay start, since it is now allowed to exist
non-persisted mutations in commitlog with lower RP:s than
previously flushed. I.e. we more or less always have to replay
the full commit log.
It is worth noting though that due to compaction and the non-
propagation of RP marks to new sstables, we end up often
doing this anyway, so it is hard to say how much of a regression
this is.
2017-06-07 14:51:28 +02:00
Calle Wilund
18806989b6 database: remove hard rp ordering requirement, set low rp mark on
truncate

With commitlog keeping use-count per CF id, we can ease the ordering
restriction on replay positiontion. Previously we required that all
added mutations have a position > previously flushed. However, if
we accept that replay must now be all data, by keeping track instead
per CF of highest RP ever entered, we can instead just set a
low mark on truncation, since this is the only remaining hard
RP divider.
2017-06-07 12:07:01 +00:00
Calle Wilund
d9b8c79eb9 commitlog_replayer: Ignore sstable replay positions
With relaxed position ordering, we cannot use existing sstables as
water mark for replay. We must replay everything above truncation
marks.
2017-06-07 12:07:01 +00:00
Calle Wilund
2913241df1 memtable/commitlog: Change bookkeep to track individul segments
Use per CF-id reference count instead, and use handles as result of 
add operations. These must either be explicitly released or stored
(rp_set), or they will release the corresponding replay_position
upon destruction. 

Note: this does _not_ remove the replay positioning ordering requirement
for mutations. It just removes it as a means to track segment liveness.
2017-06-07 12:07:01 +00:00
Calle Wilund
0c598e5645 commitlog_test: Fix test_commitlog_delete_when_over_disk_limit
Test should
a.) Wait for the flush semaphore
b.) Only compare segement sets between start and end, not start,
    end and inbetwen. I.e. the test sort of assumed we started
    with < 2 (or so) segments. Not always the case (timing)

Message-Id: <1496828317-14375-1-git-send-email-calle@scylladb.com>
2017-06-07 12:44:02 +03:00
Avi Kivity
07ff3f68e0 Merge seastar upstream
* seastar b1f69cc...621b7ed (8):
  > net/api: Remove outdated comments
  > Merge "Fixes for Clang 5" from Paweł
  > Merge "Metrics: Safely transfer metadata between shared" from Amnon
  > posix: add missing #include
  > build: add cmake dependency
  > build: add -Wno-maybe-uninitialized
  > rpc: handle messages larger than memory limit
       (Fixes #2453)
  > doxygen: enable macro expansion
2017-06-07 11:04:56 +03:00
Takuya ASADA
7fe63c539a dist/debian: install gdebi when it's not exist
Since we started to use gdebi for install build-dep metapackage that generated by
mk-build-dep, we need to install gdebi on build_deb.sh too.

Fixes #2451

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496819209-30318-1-git-send-email-syuu@scylladb.com>
2017-06-07 10:24:22 +03:00
Asias He
3fdb8a3d3f repair: Remove unused sub_ranges_max
With the sub range iterator, it is not used anymore. Drop it.
2017-06-07 08:52:45 +08:00
Asias He
ca00c10b35 repair: Reduce parallelism in repair_ranges
We currently repair all the ranges in parallel.

1) All the ranges will contend for parallelism_semaphore, instead of
processing multiple ranges in parallel and calculating the sub ranges
(which take memory) for each range in parallel, we can handle the ranges
one bye one.

We could have enough parallelism because the checksum are calucated on
all the shards.

2) If for some reason the repair failed, if we handle ranges 1 by 1, we
can log which range of repair is successful. Next time, we can ignore
them. If we start ranges in parallel, it has a high chance, no single
range is completed because all the ranges are on going.

Refs #1912
2017-06-07 08:50:57 +08:00
Asias He
3852665156 repair: Tweak the log a bit
- Count n out m ranges the repair is running for (kind of progress report)
- Make the 'Found differing range' log debug because it can be millions
  of such entries
- Print the failed ranges
2017-06-07 08:50:57 +08:00
Asias He
2043ffc064 repair: Use more stream_plan
In the very beginning, we use a stream_plan for each checksum range.
Later, we changed to use a single stream_plan for all the checksum
ranges. It pushes memory presure to streaming, e.g., millinons of ranges
in a vector to send over RPC.

To fix, we do checksum and streaming in parallel, limit the number of
checksum ranges stored in memory.

Fixes #2430
2017-06-07 08:50:56 +08:00
Nadav Har'El
b3ff37e67f repair: iterator over subranges instead of list
When starting repair, we divided the large token ranges (vnodes) linto small
subranges of a desired length (around 100 partition), and built a huge list
of those subranges - to iterate over them later and compare checksums of
those chunks.

However, building this list up-front is completely unnecessary, and wastes
a lot of memory: In a test with 1 TB of data, as much as 3 gigabytes was
spent on this list. Instead, what we do in this patch is to find the next
chunk in a DFS-like splitting algorithm, using only the token range
midpoint() function (as before). The amount of memory needed for this is
O(logN), instead of O(N) in the previous implementation.

Refs #2430.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2017-06-07 08:50:56 +08:00
Raphael S. Carvalho
0ca1e5cca3 sstables: fix report of disk space used by bloom filter
After change in boot, read_filter is called by distributed loader,
so its update to _filter_file_size is lost. The load variant
which receives foreign components that must do it. We were also
not updating it for newly created sstables.

Fixes #2449.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170606151129.5477-1-raphaelsc@scylladb.com>
2017-06-06 18:20:28 +03:00
Takuya ASADA
a4c392c113 dist/debian: use gdebi instead of mk-build-deps -i
At least on Debian8, mk-build-deps -i silently finishes with return code 0
even it fails to install dependencies.
To prevent this, we should manually install the metapackage generated by
mk-build-deps using gdebi.

Fixes #2445

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496737502-10737-2-git-send-email-syuu@scylladb.com>
2017-06-06 11:37:34 +03:00
Takuya ASADA
5608842e96 dist/debian/dep: install texlive from jessie-backports to prevent gdb build fail on jessie
Installing openjdk-8-jre-headless from jessie-backports breaks texlive on
jessie main repo.
It causes 'Unmet build dependencies' error when building gdb package.
To prevent this, force insatlling texlive from jessie-backports before start
building gdb.

Fixes #2444

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496737502-10737-1-git-send-email-syuu@scylladb.com>
2017-06-06 11:37:33 +03:00
Paweł Dziepak
b2b78158f6 mutation_partition: restore formatting
No functional change.

Message-Id: <20170526104119.22075-2-pdziepak@scylladb.com>
2017-06-06 11:20:57 +03:00
Gleb Natapov
f5679e0416 database: remove remnants of no longer existing db::serializer.
Message-Id: <20170604100552.GD8248@scylladb.com>
2017-06-04 13:07:17 +03:00
Raphael S. Carvalho
dcbeb42f67 sstables: explicitly close file in fsync_directory
or close is called in the reactor thread when destroying the
file object.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170602024346.7803-1-raphaelsc@scylladb.com>
2017-06-02 21:09:58 +03:00
Pekka Enberg
a6dc21615b Merge "Fixes to thrift/server" from Duarte
"This series fixes some issues with the thrift_server, namely
ensuring that streams and sockets are properly closed.

Fixes #499
Fixes #2437"

* 'thrift-server-fixes/v1' of github.com:duarten/scylla:
  thrift/server: Close connections when stopping server
  thrift/server: Move connection class to header
  thrift/server: Shutdown connection
  thrift/server: Close output_stream when connection is done
2017-06-02 08:15:22 +03:00
Duarte Nunes
c525331e60 thrift/server: Close connections when stopping server
Fixes #499

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-06-02 00:15:20 +02:00
Duarte Nunes
315c69b830 thrift/server: Move connection class to header
No changes in functionality. Required for an upcoming patch.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-06-02 00:15:20 +02:00
Duarte Nunes
22fafd5034 thrift/server: Shutdown connection
This patch adds the shutdown() function to thrif_server::connection,
and calls it after a connection is done.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-06-02 00:15:20 +02:00
Duarte Nunes
0a5ec97b7f thrift/server: Close output_stream when connection is done
Fixes #2437

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-06-02 00:15:20 +02:00
Jesse Haber-Kucharsky
376c661823 Eliminate duplicate definition of sstable column mask values
The column mask identifies the kind of atom in a row in an sstable. Two
definitions of these values were present: one as a C-style enumeration and one
as a C++11-style enumeration.

The C++11-style definition is used elsewhere in `sstables.cc`. It also offers
additional type-safety.

Therefore, this commit removes the inlined C-style enumeration.

Fixes #2214.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <c525b4ae7fad3b54480e133921aa4ffe0dd5d9ce.1496352711.git.jhaberku@scylladb.com>
2017-06-02 00:06:31 +02:00
Michał Matczuk
04da4dbf83 docker support for api-address
Message-Id: <1b5fb2bbba1b879aae825094a0f1b77c865be139.1496318996.git.michal@scylladb.com>
2017-06-01 15:31:45 +03:00
Takuya ASADA
22339bba44 dist/debian: depends to collectd-core instead of collectd, to reduce dependencies
To reduce unwanted dependencies, we need to replace dependency from collectd to
collectd-core.
However, collectd provides /etc/collectd/collectd.conf, so without this package
we need to install the configuration file by our self.
So install the file on .postinst script.

Fixes #2426

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496231743-7828-1-git-send-email-syuu@scylladb.com>
2017-06-01 13:20:37 +03:00
Takuya ASADA
909a9ebf97 dist/debian: provide prebuilt 3rdparty packages for Ubuntu 16.04
Currently we only offers 14.04 prebuit but we have 16.04 one on s3, so use it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496301544-15251-1-git-send-email-syuu@scylladb.com>
2017-06-01 10:37:52 +03:00
Duarte Nunes
15a62701f2 test.py: Ensure view_schema_test runs with only one cpu
In the write path we don't wait for view updates, as they happen in
the background.

The view schema tests can fail when running with more than one cpu due
to this inherent race condition: the write to the base table returns
while the view updates are still being processed, after which we issue
a query to the view table. The shard handling the view data is not
guaranteed to finish processing the mutation before handling the query.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170531165726.9212-1-duarte@scylladb.com>
2017-05-31 19:17:51 +01:00
Raphael S. Carvalho
b8091799ca lcs: fix off-by-one comparison
invariant is broken if size of L0 candidates is equal to max
sstable size because the overlapping L1 sstables will not be
added to compacting set, and they will be promoted.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170530143708.3775-1-raphaelsc@scylladb.com>
2017-05-30 17:39:51 +03:00
Avi Kivity
15af6acc8b dist: redirect stdout/stderr to the journal on systemd systems
Fixes #2408.

Message-Id: <20170524080729.10085-1-avi@scylladb.com>
2017-05-30 08:47:17 +03:00
Avi Kivity
1c84aae0c1 Merge seastar upstream
* seastar 68dbf60...b1f69cc (10):
  > metrics: fix namespace in documentation
  > add special logger for memory allocation failures
  > xen: remove
  > Merge "sanity checks, fixes and extensions in the perftune.py" from Vlad
  > tutorial: more "seastar" namespace
  > execution_stages: fix build errors in comments
  > tutorial: more "seastar" namespace additions
  > tutorial: more minor changes
  > tutorial: minor changes to the introduction
  > tutorial: start overhauling the examples to use "seastar" namespace
2017-05-29 19:02:02 +03:00
Calle Wilund
3512ed4596 storage_service/config: Add "native_transport_port_ssl" option
Mimic origin behaviour, iff TLS encryption is enabled, and
native_transport_port_ssl is set and different from
native_transport_port, start both tls- and non-tls
listeners.

Message-Id: <1496061600-24454-2-git-send-email-calle@scylladb.com>
2017-05-29 15:53:56 +03:00
Calle Wilund
1b387a1f56 cql server: Allow multiple listeners on different ports
Need to separate "notifiers" to per-port/address and keep
life span as such.

Message-Id: <1496061600-24454-1-git-send-email-calle@scylladb.com>
2017-05-29 15:53:50 +03:00
Avi Kivity
ef98afa748 build: make swagger generated code depend on the code generator
Fixes failures when moving between branches due to the seastar namespace
change.
Message-Id: <20170528100052.29131-1-avi@scylladb.com>
2017-05-29 13:17:42 +02:00
Avi Kivity
8979d7abf0 Deprecate non-murmur3 partitioners
Removing non-murmur3 partitioners will allow us to reduce memory footprint
and speed up some code by utilizing the properties of the murmur3 partitioner
token.
Message-Id: <20170528172536.16079-1-avi@scylladb.com>
2017-05-28 19:35:56 +02:00
Takuya ASADA
36ccbc1539 dist/ami: follow rpm output dir path change
CentOS mock support on build_rpm.sh changed rpm output directory, so follow it.

Fixes #2406

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1495573343-13912-1-git-send-email-syuu@scylladb.com>
2017-05-28 13:02:36 +03:00
Amos Kong
f655639e5a scylla_setup: fix deadloop in inputting invalid option
example: # scylla_setup --invalid-opt

Fixes #2305

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <9a4f631b126d8eaaae479fa99137db7a61a7c869.1493135357.git.amos@scylladb.com>
2017-05-28 13:02:10 +03:00
Takuya ASADA
bdec38d23c dist/common/scripts/scylla_setup: skip SELinux setup when it's already disabled
It doesn't make sence to ask "Do you want to disable SELinux?" when SELinux is
already disabled, so skip whole question.

Fixes #2411

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1495652423-20806-1-git-send-email-syuu@scylladb.com>
2017-05-28 13:00:10 +03:00
Avi Kivity
c4faa1e202 Merge "tracing: tracing spans and time series helper table" from Vlad
"
 - Introduce a parent span IP and span ID paradigm.
 - Introduce time series tables to simplify traces processing.
 - Add the "How to get traces?" chapter to the tracing.md.
"

* 'tracing-span-ids-and-time-series-helpers-v4' of github.com:cloudius-systems/seastar-dev:
  docs: tracing.md: add a "how to get traces" chapter
  tracing::trace_keyspace_helper: introduce a time series helper tables
  tracing: cleanup: use nullptr instead of trace_state_ptr()
  tracing: introduce a span ID and parent span ID
2017-05-28 12:01:35 +03:00
Paweł Dziepak
d9dd798c4f counter_write_query: avoid use-after-free on partition range
Message-Id: <20170526104119.22075-1-pdziepak@scylladb.com>
2017-05-28 11:41:30 +03:00