This reverts commit b1226fb15a. When the
data volume is mounted from the host (as is usual in container
deployments), we can't expect that the files will be owned by the
in-container scylla user. So that commit didn't really fix#4536.
A follow-up patch will relax the check so it passes in a container
environment.
"
Refs #4726
Implement the api portion of a "describe sstables" command.
Adds rest types for collecting both fixed and dynamic attributes, some grouped. Allows extensions to add attributes as well. (Hint hint)
"
* 'sstabledesc' of https://github.com/elcallio/scylla:
api/storage_service: Add "sstable_info" command
sstables/compress: Make compressor pointer accessible from compression info
sstables.hh: Add attribute description API to file extension
sstables.hh: Add compression component accessor
sstables.hh: Make "has_component" public
Match SQL's LIKE in allowing an empty pattern, which matches only
an empty text field.
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
This simplifies the debug implementation and it now should work with
scylla-gdb.py.
It is not clear what, if anything, is lost by not using random
ids. They were never being reused in the debug implementation anyway.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190618144755.31212-1-espindola@scylladb.com>
After commit 8a0c4d5 (Merge "Repair switch to rpc stream" from Asias),
we increased the row buffer size for repair from 512KiB to 32MiB per
repair instance. We allow repairing 16 ranges (16 repair instance) in
parallel per repair request. So, a node can consume 16 * 32MiB = 512MiB
per user requested repair. In addition, the repair master node can hold
data from all the repair followers, so the memory usage on repair master
can be larger than 512MiB. We need to provide a way to limit the memory
usage.
In this patch, we limit the total memory used by repair to 10% of the
shard memory. The ranges that can be repaired in parallel is:
max_repair_ranges_in_parallel = max_repair_memory / max_repair_memory_per_range.
For example, if each shard has 4096MiB of memory, then we will have
max_repair_ranges_in_parallel = 4096MiB / 32MiB = 12.
Fixes#4675
"
Current admission control takes a permit when cql requests starts and
releases it when reply is sent, but some requests may leave background
work behind after that point (some because there is genuine background
work to do like complete a write or do a read repair, and some because
a read/write may stuck in a queue longer than the request's timeout), so
after Scylla replies with a timeout some resources are still occupied.
The series fixes this by passing the permit down to storage_proxy where
it is held until all background work is completed.
Fixes#4768
"
* 'gleb/admission-v3' of github.com:scylladb/seastar-dev:
transport: add a metric to follow memory available for service permit.
storage_proxy: store a permit in a read executor
storage_proxy: store a permit in a write response handler
Pass service permit to storage_proxy
transport: introduce service_permit class and use it instead of semaphore_units
transport: hold admission a permit until a reply is sent
transport: remove cql server load balancer
A read executor exists until read operation completes in its entirety
so storing a permit there guaranties that it will be freed only after
no background work left for the request on this server.
A write response handler exists until write operation completes in its
entirety so storing a permit there guaranties that it will be freed only
after no background work left for the request on this server.
Current cql transport code acquire a permit before processing a query and
release it when the query gets a reply, but some quires leave work behind.
If the work is allowed to accumulate without any limit a server may
eventually run out of memory. To prevent that the permit system should
account for the background work as well. The patch is a first step in
this direction. It passes a permit down to storage proxy where it will
be later hold by background work.
Manager trims sstables off to allow compaction jobs to proceed in parallel
according to their weights. The problem is that trimming procedure is not
sstable run aware, so it could incorrectly remove only a subset of a sstable
run, leading to partial sstable run compaction.
Compaction of a sstable run could lead to inneficiency because the run structure
would be messed up, affecting all amplification factors, and the same generation
could even end up being compacted twice.
This is fixed by making the trim procedure respect the sstable runs.
Fixes#4773.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190730042023.11351-1-raphaelsc@scylladb.com>
Current code release admission permit to soon. If new requests are
admitted faster than client read replies back reply queue can grow to
be very big. The patch moves service permit release until after a reply
is sent.
Merged patch series from Avi Kivity:
In rare but valid cases (reconciling many tombstones, paging disabled),
a reconciled_result can grow large. This triggers large allocation
warnings. Switch to chunked_vector to avoid the large allocation.
In passing, fix chunked_vector's begin()/end() const correctness, and
add the reverse iterator function family which is needed by the conversion.
Fixes#4780.
Tests: unit (dev)
Commit Summary
utils: chunked_vector: make begin()/end() const correct
utils::chunked_vector: add rbegin() and related iterators
reconcilable_result: use chunked_vector to hold partitions
"
Add listen/rpc "prefer_ipv6" options to DNS lookup of bind addresses for API/rpc/prometheus etc .
Fixes#4751
Adds using a preferred address family to dns name lookups related to
listen address and rpc address, adhering to the respective "prefer" options.
API, prometheus and broadcast address are all considered to be covered by
the "listen_interface_prefer_ipv6" option.
Note: scylla does not yet support actual interface binding, but these
options should apply equally to address name parameters.
Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.
"
* 'calle/ipv6' of https://github.com/elcallio/scylla:
init: Use the "prefer_ipv6" options available for rpc/listen address/interface
inet_address: Add optional "preferred type" to lookup
config: Add rpc_interface_prefer_ipv6 parameter
config: Add listen_interface_perfer_ipv6 parameter
config.cc: Fix enable_ipv6_dns_lookup actual param name
The FIXME was added in the very first commit ("utils: Convert
utils/FBUtilities.java") that introduced the fb_utilities class as a
stub. However, we have long implemented the parts that we actually use,
so drop the FIXME as obsolete. In addition, drop the remaining
uncommented Java code as unused and also obsolete.
Message-Id: <20190808182758.1155-1-penberg@scylladb.com>
The Maven build tool ("mvn"), which is used by scylla-jmx and
scylla-tools-java, stores dependencies in a local repository stored at
$HOME/.m2. Make sure it's accessible to dbuild.
Message-Id: <20190808140216.26141-1-penberg@scylladb.com>
ignore_partial_runs() brings confusion because i__p__r() equal to true
doesn't mean filter out partial runs from compaction. It actually means
not caring about compaction of a partial run.
The logic was wrong because any compaction strategy that chooses not to ignore
partial sstable run[1] would have any fragment composing it incorrectly
becoming a candidate for compaction.
This problem could make compaction include only a subset of fragments composing
the partial run or even make the same fragment be compacted twice due to
parallel compaction.
[1]: partial sstable run is a sstable that is still being generated by
compaction and as a result cannot be selected as candidate whatsoever.
Fix is about making sure partial sstable run has none of its fragments
selected for compaction. And also renaming i__p__r.
Fixes#4729.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190807022814.12567-1-raphaelsc@scylladb.com>
This documents the steps needed to build Scylla's Linux packages with
the relocatable package infrastructure we use today.
Message-Id: <20190807134017.4275-1-penberg@scylladb.com>
Running "dbuild" without a build command fails as follows:
$ ./tools/toolchain/dbuild
Error: This command has to be run under the root user.
Israel Fruchter discovered that the default command of our Docker image is this:
"Cmd": [
"bash",
"-c",
"dnf -y install python3-cassandra-driver && dnf clean all"
]
Let's make "/bin/bash" the default command instead, which will make
"dbuild" with no build command to return to the host shell.
Message-Id: <20190807133955.4202-1-penberg@scylladb.com>
The build_reloc.sh script passes "--with=scylla" and "--with=iotune" to
the configure.py script. This is redundant as the
"scylla-package.tar.gz" target of ninja already limits itself to them.
Removing the "--with" options allows building unit tests after a
relocatable package has been built without having to rebuild anything.
Message-Id: <20190807130505.30089-1-penberg@scylladb.com>
Add a '--mode <mode>' command line option to the 'build_reloc.sh' script
so that we can create relocatable packages for debug builds.
The '--mode' command line option defaults to 'release' so existing users
are unaffected.
Message-Id: <20190807120759.32634-1-penberg@scylladb.com>
Avoid including the lengthy stream_session.hh in messaging_service.
More importantly, fix the build because currently messaging_service.cc
and messaging_service.hh does not include stream_mutation_fragments_cmd.
I am not sure why it builds on my machine. Spotted this when backporting
the "streaming: Send error code from the sender to receiver" to 3.0
branch.
Refs: #4789
The stream close() guarantees the data sent will be flushed. No need to
call the stream flush() since the stream is not reused.
Follow up fix for commit bac987e32a (streaming: Send error code from
the sender to receiver).
Refs #4789
* seastar d199d27681...a1cf07858b (1):
> Merge 'Do not return a variadic future form server_socket::accept()' from Avi
Seastar configure.py now has --api-level=1, to keep us one the old variadic future
server_socket::accept() API.
In case of error on the sender side, the sender does not propagate the
error to the receiver. The sender will close the stream. As a result,
the receiver will get nullopt from the source in
get_next_mutation_fragment and pass mutation_fragment_opt with no value
to the generating_reader. In turn, the generating_reader generates end
of stream. However, the last element that the generating_reader has
generated can be any type of mutation_fragment. This makes the sstable
that consumes the generating_reader violates the mutation_fragment
stream rule.
To fix, we need to propagate the error. However RPC streaming does not
support propagate the error in the framework. User has to send an error
code explicitly.
Fixes: #4789
Fixes#4751
Adds using a preferred address family to dns name lookups related to
listen address and rpc address, adhering to the respective "prefer" options.
API, prometheus and broadcast address are all considered to be covered by
the "listen_interface_prefer_ipv6" option.
Note: scylla does not yet support actual interface binding, but these
options should apply equally to address name parameters.
Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.
Assembles information and attributes of sstables in one or more
column families.
v2:
* Use (not really legal) nested "type" in json
* Rename "table" param to "cf" for consistency
* Some comments on data sizes
* Stream result to avoid huge string allocations on final json
"
This adds the option to compress sstables using the Zstandard algorithm
(https://facebook.github.io/zstd/).
To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'
to the 'compression' argument when creating a table.
You can also specify a 'compression_level' (default is 3). See Zstd documentation for the available
compression levels.
Resolves#2613.
This PR also fixes a bug in sstables/compress.cc, where chunk length in bytes
was passed to the compressor as chunk length in kilobytes. Fortunately,
none of the compressors implemented until now used this parameter.
Example usage (assuming there exists a keyspace 'a'):
create table a.a (a text primary key, b int) with compression = {'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': 1, 'chunk_length_in_kb': '64'};
Notes:
1. The code uses an external dependency: https://github.com/facebook/zstd. Since I'm using "experimental" features of the library (using my own allocated memory to store the compression/decompression contexts), according to the library's documentation we need to link it statically (https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L63). I added a git submodule.
2. The compressor performs some dynamic allocations. Depending on the specified chunk length and/or compression level the allocations might be big and seastar throws warnings. But with reasonable chunk length sizes it should be OK.
3. It doesn't yet provide an option to train it with dictionaries, but that should be easy to add in another commit.
"
* 'zstd' of https://github.com/kbr-/scylla:
Configure: rename seastar_pool to submodule_pool, add more submodules to the pool
Add unit tests for Zstd compression
Enable tests that use compressed sstable files
Add ZStandard compression
Fix the value of the chunk length parameter passed to compressors
The fragment reader calls `fast_forward_to()` from its constructor to
discard fragments that fall outside the query range. Mmove the
the fast-forward code in to an internal void returning method, and call
that from both the constructor and `fast_forward_to()`, to avoid a
warning on a discarded future<>.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190801133942.10744-1-bdenes@scylladb.com>