Commit Graph

19149 Commits

Author SHA1 Message Date
Avi Kivity
ca28fdc37d Revert "dist/docker/redhat: change user of scylla services to 'scylla'"
This reverts commit b1226fb15a. When the
data volume is mounted from the host (as is usual in container
deployments), we can't expect that the files will be owned by the
in-container scylla user. So that commit didn't really fix #4536.

A follow-up patch will relax the check so it passes in a container
environment.
2019-08-13 14:36:00 +03:00
Avi Kivity
0d0ee20f76 Merge "Implement sstable_info API command (info on sstables)" from Calle
"
Refs #4726

Implement the api portion of a "describe sstables" command.

Adds rest types for collecting both fixed and dynamic attributes, some grouped. Allows extensions to add attributes as well. (Hint hint)
"

* 'sstabledesc' of https://github.com/elcallio/scylla:
  api/storage_service: Add "sstable_info" command
  sstables/compress: Make compressor pointer accessible from compression info
  sstables.hh: Add attribute description API to file extension
  sstables.hh: Add compression component accessor
  sstables.hh: Make "has_component" public
2019-08-12 21:16:08 +03:00
Dejan Mircevski
8be147d069 cql3: Handle empty LIKE pattern
Match SQL's LIKE in allowing an empty pattern, which matches only
an empty text field.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-12 19:48:31 +03:00
Rafael Ávila de Espíndola
99c7f8457d logalloc: Add a migrators_base that is common to debug and release
This simplifies the debug implementation and it now should work with
scylla-gdb.py.

It is not clear what, if anything, is lost by not using random
ids. They were never being reused in the debug implementation anyway.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190618144755.31212-1-espindola@scylladb.com>
2019-08-12 19:44:55 +03:00
Calle Wilund
2b19bfbfbc types: Remove obsolete "FIXME"
inet_addr_type_impl has supported ipv6 for some time now.
Message-Id: <20190812142731.6384-1-calle@scylladb.com>
2019-08-12 17:30:15 +03:00
Calle Wilund
1afc899e37 type_parser: Fix/improve exception messages
Removes long-standing FIXME for message detail
Also simplifies some code, removing duplication.

Message-Id: <20190812134144.2417-1-calle@scylladb.com>
2019-08-12 17:03:43 +03:00
Calle Wilund
fdf2017487 cql3::term: Remove unneeded const_cast
Removed no longer needed FIXME (to_string became const long ago)

Message-Id: <20190812133943.2011-1-calle@scylladb.com>
2019-08-12 17:00:46 +03:00
Asias He
131acc09cc repair: Adjust parallelism according to memory size (#4696)
After commit 8a0c4d5 (Merge "Repair switch to rpc stream" from Asias),
we increased the row buffer size for repair from 512KiB to 32MiB per
repair instance. We allow repairing 16 ranges (16 repair instance) in
parallel per repair request. So, a node can consume 16 * 32MiB = 512MiB
per user requested repair. In addition, the repair master node can hold
data from all the repair followers, so the memory usage on repair master
can be larger than 512MiB. We need to provide a way to limit the memory
usage.

In this patch, we limit the total memory used by repair to 10% of the
shard memory. The ranges that can be repaired in parallel is:

max_repair_ranges_in_parallel = max_repair_memory / max_repair_memory_per_range.

For example, if each shard has 4096MiB of memory, then we will have
max_repair_ranges_in_parallel = 4096MiB / 32MiB = 12.

Fixes #4675
2019-08-12 11:09:27 +03:00
Avi Kivity
e6cde72d2b Merge "Fix cql server admission control to take all leftover work into account" from Gleb
"
Current admission control takes a permit when cql requests starts and
releases it when reply is sent, but some requests may leave background
work behind after that point (some because there is genuine background
work to do like complete a write or do a read repair, and some because
a read/write may stuck in a queue longer than the request's timeout), so
after Scylla replies with a timeout some resources are still occupied.

The series fixes this by passing the permit down to storage_proxy where
it is held until all background work is completed.

Fixes #4768
"

* 'gleb/admission-v3' of github.com:scylladb/seastar-dev:
  transport: add a metric to follow memory available for service permit.
  storage_proxy: store a permit in a read executor
  storage_proxy: store a permit in a write response handler
  Pass service permit to storage_proxy
  transport: introduce service_permit class and use it instead of semaphore_units
  transport: hold admission a permit until a reply is sent
  transport: remove cql server load balancer
2019-08-12 11:02:37 +03:00
Gleb Natapov
3e27c2198a transport: add a metric to follow memory available for service permit.
Add a metric to follow memory available for service permit. When this
memory is close to zero cql server stops admitting new requests.
2019-08-12 10:20:43 +03:00
Gleb Natapov
7d7b1685aa storage_proxy: store a permit in a read executor
A read executor exists until read operation completes in its entirety
so storing a permit there guaranties that it will be freed only after
no background work left for the request on this server.
2019-08-12 10:20:43 +03:00
Gleb Natapov
d5ced800f0 storage_proxy: store a permit in a write response handler
A write response handler exists until write operation completes in its
entirety so storing a permit there guaranties that it will be freed only
after no background work left for the request on this server.
2019-08-12 10:20:43 +03:00
Gleb Natapov
6a4207f202 Pass service permit to storage_proxy
Current cql transport code acquire a permit before processing a query and
release it when the query gets a reply, but some quires leave work behind.
If the work is allowed to accumulate without any limit a server may
eventually run out of memory. To prevent that the permit system should
account for the background work as well. The patch is a first step in
this direction. It passes a permit down to storage proxy where it will
be later hold by background work.
2019-08-12 10:20:43 +03:00
Raphael S. Carvalho
b436c41128 compaction_manager: Prevent sstable runs from being partially compacted
Manager trims sstables off to allow compaction jobs to proceed in parallel
according to their weights. The problem is that trimming procedure is not
sstable run aware, so it could incorrectly remove only a subset of a sstable
run, leading to partial sstable run compaction.

Compaction of a sstable run could lead to inneficiency because the run structure
would be messed up, affecting all amplification factors, and the same generation
could even end up being compacted twice.

This is fixed by making the trim procedure respect the sstable runs.

Fixes #4773.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190730042023.11351-1-raphaelsc@scylladb.com>
2019-08-11 17:20:20 +03:00
Gleb Natapov
ddff7f48cf transport: introduce service_permit class and use it instead of semaphore_units
service_permit is a new class that allows sharing a permit among
different parts of request processing many of which can complete
at different times.
2019-08-11 16:08:55 +03:00
Gleb Natapov
2daa72b7dc transport: hold admission a permit until a reply is sent
Current code release admission permit to soon. If new requests are
admitted faster than client read replies back reply queue can grow to
be very big. The patch moves service permit release until after a reply
is sent.
2019-08-11 16:08:55 +03:00
Gleb Natapov
7e3805ed3d transport: remove cql server load balancer
It is buggy, unused and unnecessary complicates the code.
2019-08-11 16:08:52 +03:00
Nadav Har'El
f9d6eaf5ff reconcilable_result: switch to chunked_vector
Merged patch series from Avi Kivity:

In rare but valid cases (reconciling many tombstones, paging disabled),
a reconciled_result can grow large. This triggers large allocation
warnings. Switch to chunked_vector to avoid the large allocation.
In passing, fix chunked_vector's begin()/end() const correctness, and
add the reverse iterator function family which is needed by the conversion.

Fixes #4780.

Tests: unit (dev)

Commit Summary

    utils: chunked_vector: make begin()/end() const correct
    utils::chunked_vector: add rbegin() and related iterators
    reconcilable_result: use chunked_vector to hold partitions
2019-08-11 16:03:13 +03:00
Avi Kivity
ce2b0b2682 Merge "Add listen/rpc "prefer_ipv6" options to DNS lookup #4775" from Calle
"
Add listen/rpc "prefer_ipv6" options to DNS lookup of bind addresses for API/rpc/prometheus etc .

Fixes #4751

Adds using a preferred address family to dns name lookups related to
listen address and rpc address, adhering to the respective "prefer" options.

API, prometheus and broadcast address are all considered to be covered by
the "listen_interface_prefer_ipv6" option.

Note: scylla does not yet support actual interface binding, but these
options should apply equally to address name parameters.

Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.
"

* 'calle/ipv6' of https://github.com/elcallio/scylla:
  init: Use the "prefer_ipv6" options available for rpc/listen address/interface
  inet_address: Add optional "preferred type" to lookup
  config: Add rpc_interface_prefer_ipv6 parameter
  config: Add listen_interface_perfer_ipv6 parameter
  config.cc: Fix enable_ipv6_dns_lookup actual param name
2019-08-11 15:21:45 +03:00
Pekka Enberg
73113c0ea4 utils/fb_utilities.hh: Kill obsolete FIXME and commented out Java code
The FIXME was added in the very first commit ("utils: Convert
utils/FBUtilities.java") that introduced the fb_utilities class as a
stub. However, we have long implemented the parts that we actually use,
so drop the FIXME as obsolete. In addition, drop the remaining
uncommented Java code as unused and also obsolete.

Message-Id: <20190808182758.1155-1-penberg@scylladb.com>
2019-08-11 10:26:36 +03:00
Pekka Enberg
547c072f93 dbuild: Make Maven local repository accessible
The Maven build tool ("mvn"), which is used by scylla-jmx and
scylla-tools-java, stores dependencies in a local repository stored at
$HOME/.m2. Make sure it's accessible to dbuild.

Message-Id: <20190808140216.26141-1-penberg@scylladb.com>
2019-08-08 17:36:13 +03:00
Avi Kivity
8f19b16fe4 Update seastar submodule
* seastar ed608e3c9e...fe2b5b0c6b (2):
  > Merge "handle discarded futures or suppress warning" from Benny
  > output_stream: Add close() blurb
2019-08-08 16:22:38 +03:00
Avi Kivity
4a5ec61438 Update seastar submodule
* seastar a1cf07858b...ed608e3c9e (4):
  > core: Add ability to abort on EBADF and ENOTSOCK
  > Revert "Merge "handle discarded futures or suppress warning" from Benny"
  > Merge "handle discarded futures or suppress warning" from Benny
  > reactor: remove replace variadic future<pollable_fd, socket_address> with future<tuple>
2019-08-08 14:22:29 +03:00
Raphael S. Carvalho
76cde84540 sstables/compaction_manager: Fix logic for filtering out partial sstable runs
ignore_partial_runs() brings confusion because i__p__r() equal to true
doesn't mean filter out partial runs from compaction. It actually means
not caring about compaction of a partial run.

The logic was wrong because any compaction strategy that chooses not to ignore
partial sstable run[1] would have any fragment composing it incorrectly
becoming a candidate for compaction.
This problem could make compaction include only a subset of fragments composing
the partial run or even make the same fragment be compacted twice due to
parallel compaction.

[1]: partial sstable run is a sstable that is still being generated by
compaction and as a result cannot be selected as candidate whatsoever.

Fix is about making sure partial sstable run has none of its fragments
selected for compaction. And also renaming i__p__r.

Fixes #4729.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190807022814.12567-1-raphaelsc@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
7d4bf10d87 docs/building-packages.md: Document how to build Scylla packages
This documents the steps needed to build Scylla's Linux packages with
the relocatable package infrastructure we use today.

Message-Id: <20190807134017.4275-1-penberg@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
79cece9f33 toolchain: Fix default command for dbuild Docker image
Running "dbuild" without a build command fails as follows:

  $ ./tools/toolchain/dbuild
  Error: This command has to be run under the root user.

Israel Fruchter discovered that the default command of our Docker image is this:

  "Cmd": [
    "bash",
    "-c",
    "dnf -y install python3-cassandra-driver && dnf clean all"
   ]

Let's make "/bin/bash" the default command instead, which will make
"dbuild" with no build command to return to the host shell.

Message-Id: <20190807133955.4202-1-penberg@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
76cdec222f build_reloc.sh: Remove "--with" passed to "configure.py"
The build_reloc.sh script passes "--with=scylla" and "--with=iotune" to
the configure.py script. This is redundant as the
"scylla-package.tar.gz" target of ninja already limits itself to them.

Removing the "--with" options allows building unit tests after a
relocatable package has been built without having to rebuild anything.

Message-Id: <20190807130505.30089-1-penberg@scylladb.com>
2019-08-07 16:28:00 +03:00
Avi Kivity
e548bdb2e8 thrift, transport: switch to new seastar accept() API (#4814)
Seastar switched accept() to return a single struct instead of a variadic future,
adjust the code to the new API to avoid deprecation warnings.
2019-08-07 15:23:26 +02:00
Pekka Enberg
f68fffd99a reloc/build_reloc.sh: Make build mode configurable
Add a '--mode <mode>' command line option to the 'build_reloc.sh' script
so that we can create relocatable packages for debug builds.

The '--mode' command line option defaults to 'release' so existing users
are unaffected.

Message-Id: <20190807120759.32634-1-penberg@scylladb.com>
2019-08-07 16:19:37 +03:00
Asias He
fee26b9f6e repair: Fix use after free in do_estimate_partitions_on_local_shard (#4813)
We need to keep the sstables object alive during the operation of
do_for_each.

Notes: No need to backport to 3.1.

Fixes #4811
2019-08-07 15:19:21 +02:00
Asias He
49a73aa2fc streaming: Move stream_mutation_fragments_cmd to a new file (#4812)
Avoid including the lengthy stream_session.hh in messaging_service.

More importantly, fix the build because currently messaging_service.cc
and messaging_service.hh does not include stream_mutation_fragments_cmd.
I am not sure why it builds on my machine. Spotted this when backporting
the "streaming: Send error code from the sender to receiver" to 3.0
branch.

Refs: #4789
2019-08-07 14:59:46 +02:00
Asias He
288371ce75 streaming: Do not call rpc stream flush in send_mutation_fragments
The stream close() guarantees the data sent will be flushed. No need to
call the stream flush() since the stream is not reused.

Follow up fix for commit bac987e32a (streaming: Send error code from
the sender to receiver).

Refs #4789
2019-08-07 14:31:17 +02:00
Avi Kivity
689fc72bab Update seastar submodule
* seastar d199d27681...a1cf07858b (1):
  > Merge 'Do not return a variadic future form server_socket::accept()' from Avi

Seastar configure.py now has --api-level=1, to keep us one the old variadic future
server_socket::accept() API.
2019-08-06 18:37:27 +03:00
Avi Kivity
97f66c72af Update seastar submodule
* seastar d90834443c...d199d27681 (3):
  > sharded: support for non-cooperative service types
  > shared_future: silence warning about discarded future
  > Fix backtrace suppression message in cpu_stall_detector.

Fixes #4560.
2019-08-06 18:00:48 +03:00
Asias He
bac987e32a streaming: Send error code from the sender to receiver
In case of error on the sender side, the sender does not propagate the
error to the receiver. The sender will close the stream. As a result,
the receiver will get nullopt from the source in
get_next_mutation_fragment and pass mutation_fragment_opt with no value
to the generating_reader. In turn, the generating_reader generates end
of stream. However, the last element that the generating_reader has
generated can be any type of mutation_fragment. This makes the sstable
that consumes the generating_reader violates the mutation_fragment
stream rule.

To fix, we need to propagate the error. However RPC streaming does not
support propagate the error in the framework. User has to send an error
code explicitly.

Fixes: #4789
2019-08-06 16:54:56 +02:00
Piotr Jastrzebski
24f6d90a45 sstables: add test of sstables_mutation_reader for missing partition_end
Reproduces #4783

Issue was fixed by 9b8ac5ecbc

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-08-06 15:11:19 +03:00
Calle Wilund
6c62e5741e init: Use the "prefer_ipv6" options available for rpc/listen address/interface
Fixes #4751

Adds using a preferred address family to dns name lookups related to
listen address and rpc address, adhering to the respective "prefer" options.

API, prometheus and broadcast address are all considered to be covered by
the "listen_interface_prefer_ipv6" option.

Note: scylla does not yet support actual interface binding, but these
options should apply equally to address name parameters.

Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.
2019-08-06 08:32:10 +00:00
Calle Wilund
6c0c1309b3 inet_address: Add optional "preferred type" to lookup
Allows using prio in address family dns lookup. I.e. prefer ipv4/ipv6 if avail.
2019-08-06 08:32:10 +00:00
Calle Wilund
d3410f0e48 config: Add rpc_interface_prefer_ipv6 parameter
As already existing in scylla.yaml
2019-08-06 08:32:10 +00:00
Calle Wilund
0028cecb8e config: Add listen_interface_perfer_ipv6 parameter
As already existing in scylla.yaml.
https://github.com/apache/cassandra/blob/cassandra-3.11/conf/cassandra.yaml#L622
2019-08-06 08:32:10 +00:00
Calle Wilund
39d18178eb config.cc: Fix enable_ipv6_dns_lookup actual param name
When adding option (and iterating through config refactoring)
the member name and the config param name got out of sync
2019-08-06 08:32:09 +00:00
Calle Wilund
298da3fc4b api/storage_service: Add "sstable_info" command
Assembles information and attributes of sstables in one or more
column families.

v2:
* Use (not really legal) nested "type" in json
* Rename "table" param to "cf" for consistency
* Some comments on data sizes
* Stream result to avoid huge string allocations on final json
2019-08-06 08:14:15 +00:00
Calle Wilund
95a8ff12e7 sstables/compress: Make compressor pointer accessible from compression info 2019-08-06 07:07:44 +00:00
Calle Wilund
d15c63627c sstables.hh: Add attribute description API to file extension 2019-08-06 07:07:44 +00:00
Calle Wilund
4c67d702c2 sstables.hh: Add compression component accessor 2019-08-06 07:07:44 +00:00
Calle Wilund
770f912221 sstables.hh: Make "has_component" public 2019-08-06 07:07:44 +00:00
Avi Kivity
b77c4e68c2 Merge "Add Zstandard compression #4802" from Kamil
"
This adds the option to compress sstables using the Zstandard algorithm
(https://facebook.github.io/zstd/).

To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'
to the 'compression' argument when creating a table.
You can also specify a 'compression_level' (default is 3). See Zstd documentation for the available
compression levels.

Resolves #2613.

This PR also fixes a bug in sstables/compress.cc, where chunk length in bytes
was passed to the compressor as chunk length in kilobytes. Fortunately,
none of the compressors implemented until now used this parameter.

Example usage (assuming there exists a keyspace 'a'):

    create table a.a (a text primary key, b int) with compression = {'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': 1, 'chunk_length_in_kb': '64'};

Notes:

 1. The code uses an external dependency: https://github.com/facebook/zstd. Since I'm using "experimental" features of the library (using my own allocated memory to store the compression/decompression contexts), according to the library's documentation we need to link it statically (https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L63). I added a git submodule.
 2. The compressor performs some dynamic allocations. Depending on the specified chunk length and/or compression level the allocations might be big and seastar throws warnings. But with reasonable chunk length sizes it should be OK.
 3. It doesn't yet provide an option to train it with dictionaries, but that should be easy to add in another commit.
"

* 'zstd' of https://github.com/kbr-/scylla:
  Configure: rename seastar_pool to submodule_pool, add more submodules to the pool
  Add unit tests for Zstd compression
  Enable tests that use compressed sstable files
  Add ZStandard compression
  Fix the value of the chunk length parameter passed to compressors
2019-08-05 16:29:27 +03:00
Botond Dénes
23cc6d6fb2 make_flat_mutation_reader_from_fragments: reader: silence discarded future warning
The fragment reader calls `fast_forward_to()` from its constructor to
discard fragments that fall outside the query range. Mmove the
the fast-forward code in to an internal void returning method, and call
that from both the constructor and `fast_forward_to()`, to avoid a
warning on a discarded future<>.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190801133942.10744-1-bdenes@scylladb.com>
2019-08-05 16:21:50 +03:00
Kamil Braun
3a0308f76f Configure: rename seastar_pool to submodule_pool, add more submodules to the pool
Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:56 +02:00
Kamil Braun
c3c7c06e10 Add unit tests for Zstd compression
Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:56 +02:00