Currently, calling unfreeze() using the wrong version of the schema
results in undefined behavior. That can cause hard-to-debug
problems. Better to throw in such cases.
Refs #4549.
Tests:
- unit (dev)
Message-Id: <1560459022-23786-1-git-send-email-tgrabiec@scylladb.com>
We saw a node crashing today with nodetool clearsnapshot being called.
After investigation, the reason is that nodetool clearsnapshot ws called
at the same time a new snapshot was created with the same tag. nodetool
clearsnapshot can't delete all files in the directory, because new files
had by then been created in that directory, and crashes on I/O error.
There are, many problems with allowing those operations to proceed in
parallel. Even if we fix the code not to crash and return an error on
directory non-empty, the moment they do any amount of work in parallel
the result of the operation becomes undefined. Some files in the
snapshot may have been deleted by clear, for example, and a user may
then not be able to properly restore from the backup if this snapshot
was used to generate a backup.
Moreover, although we could lock at the granularity of a keyspace or
column family, I think we should use a big hammer here and lock the
entire snapshot creation/deletion to avoid surprises (for example, if a
user requests creation of a snapshot for all keyspaces, and another
process requests clear of a single keyspace)
Fixes#4554
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190614174438.9002-1-glauber@scylladb.com>
In practice, we always want to use the same sanitizer flags with
seastar and scylla. Seastar was already marking its sanitizer flags
public, so what was missing was exporting the link flags via pkgconfig
and dropping the duplicates from scylla.
I am doing this after wasting some time editing the wrong file.
This depends on the seastar patch to export the sanitizer flags in
pkgconfig.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
We used to use /opt/scylladb just for Scylla build toolchain and
dependency libraries, not for Scylla main package.
But since we merged relocatable package, Scylla main binary and
dependency libraries are all located under /opt/scylladb, only
setup scripts remained on /usr/lib/scylla.
It strange to keep using both /usr/lib/<app name> and /opt/<app name>,
we should merge them into single place.
Message-Id: <20190614011038.17827-1-syuu@scylladb.com>
Before this patch mc sstables writer was ignoring
empty cellpaths. This is a wrong behaviour because
it is possible to have empty key in a map. In such case,
our writer creats a wrong sstable that we can't read back.
This is becaus a complex cell expects cellpath for each
simple cell it has. When writer ignores empty cellpath
it writes nothing and instead it should write a length
of zero to the file so that we know there's an empty cellpath.
Fixes#4533
Tests: unit(release)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <46242906c691a56a915ca5994b36baf87ee633b7.1560532790.git.piotr@scylladb.com>
Consider
master: row(pk=1, ck=1, col=10)
follower1: row(pk=1, ck=1, col=20)
follower2: row(pk=1, ck=1, col=30)
When repair runs, master fetches row(pk=1, ck=1, col=20) and row(pk=1,
ck=1, col=30) from follower1 and follower2.
Then repair master sends row(pk=1, ck=1, col=10) and row(pk=1, ck=1,
col=30) to follower1, follower1 will write the row with the same
pk=1, ck=1 twice, which violates uniqueness constraints.
To fix, we apply the row with same pk and ck into the previous row.
We only needs this on repair follower because the rows can come from
multiple nodes. While on repair master, we have a sstable writer per
follower, so the rows feed into sstable writer can come from only a
single node.
Tests: repair_additional_test.py:RepairAdditionalTest.repair_same_row_diff_value_3nodes_test
Fixes: #4510
Message-Id: <cb4fbba1e10fb0018116ffe5649c0870cda34575.1560405722.git.asias@scylladb.com>
On repair follower node, only decorated_key_with_hash and the
mutation_fragment inside repair_row are used in apply_rows() to apply
the rows to disk. Allow repair_row to initialize partially and throw if
the uninitialized member is accessed to be safe.
Message-Id: <b4e5cc050c11b1bafcf997076a3e32f20d059045.1560405722.git.asias@scylladb.com>
To provide test reproducibility use the seastar local_random_engine.
To reproduce a run, use the --random-seed command line option
with the seed printed accordingly.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613114307.31038-1-bhalevy@scylladb.com>
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.
This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).
Fixes#4541
Message-Id: <f08ebae5562d570ece2bb7ee6c84e647345dfe48.1560410018.git.sarna@scylladb.com>
Relocation of python scripts mentions scylla-server in paths explicitly.
It should use {{product}} instead. The current build is failing when
{{product}} is different than scylla-server
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190613012518.28784-1-glauber@scylladb.com>
On branch-3.1 / master, we are getting following error:
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/data: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/commitlog: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/view_hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
It seems like owner verification of data directory fails because
scylla-server process is running in root but data directory owned by
scylla, so we should run services as scylla user.
Fixes#4536
Message-Id: <20190611113142.23599-1-syuu@scylladb.com>
To avoid 'Bad permmisons' error when user changed default umask, we need
to verify system umask is acceptable for scylla-server.
Fixes#4157
Message-Id: <20190612130343.6043-1-syuu@scylladb.com>
* seastar 253d6cb...ded50bd (14):
> Only export sanitizer flags if used
> perftune.py: use pyudev.Devices methods instead of deprecated pyudev.Device ones
> Add a Sanitize build mode
> Merge "perftune.py : new tuning modes" from Vlad
> reactor: clarify how submit_to() destroys the function object
> Export the sanitizer flags via pkgconfig
> smp: Delete unprocessed work items
> iotune: fixed finding mountpoint infinite loop
> net: Fix dereferencing moved object
> Always enable the exception scalability hack
> Merge "Simple cleanups in future.hh" from Rafael
> tests: introduce testing::local_random_engine
> core/deleter: Fix abort when append() is called twice with a shared deleter
> rpc stream: do not crash if a stream is used after eos
Currently, REPAIR_GET_COMBINED_ROW_HASH RPC verb returns only the
repair_hash object. In the future, we will use set reconciliation
algorithm to decode the full row hashes in working row buf. It is useful
to return the number of rows inside working row buf in addition to the
combined row hashes to make sure the decode is successful.
It is also better to use a wrapper class for the verb response so we can
extend the return values later more easily with IDL.
Fixes#4526
Message-Id: <93be47920b523f07179ee17e418760015a142990.1559771344.git.asias@scylladb.com>
With this patch, when using asan, we poison segment memory that has
been allocated from the system but should not be accessible to user
code.
Should help with debugging user after free bugs.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190607140313.5988-1-espindola@scylladb.com>
Introduced in 513d01d53e
The script is trying to determine the branch to shallow clone
when an rpm is missing and has to be built.
This functionality in the current implementation assumes it is being run inside
a git repository, but that must not be the case if the script is triggered after
local rpms were placed on the local directory.
This happens when putting all necessary rpm files in: dist/ami/files
And then running: dist/ami/build_ami.sh --localrpm
The dist/ami/ and dist/ami/files are the only ones required for this action so
querying the git repository in that situation makes no sense.
Fixes#4535
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190611112455.13862-1-bhalevy@scylladb.com>
Build progress virtual reader uses Scylla-specific
scylla_views_builds_in_progress table in order to represent legacy
views_builds_in_progress rows. The Scylla-specific table contains
additional cpu_id clustering key part, which is trimmed before
returning it to the user. That may cause duplicated clustering row
fragments to be emitted by the reader, which may cause undefined
behaviour in consumers. The solution is to keep track of previous
clustering keys for each partition and drop fragments that would cause
duplication. That way if any shard is still building a view, its
progress will be returned, and if many shards are still building, the
returned value will indicate the progress of a single arbitrary shard.
Fixes#4524
Tests:
unit(dev) + custom monotonicity checks from tgrabiec@scylladb.com
Build progress virtual reader uses Scylla-specific
scylla_views_builds_in_progress table in order to represent
legacy views_builds_in_progress rows. The Scylla-specific table contains
additional cpu_id clustering key part, which is trimmed before returning
it to the user. That may cause duplicated clustering row fragments to be
emitted by the reader, which may cause undefined behaviour in consumers.
The solution is to keep track of previous clustering keys for each
partition and drop fragments that would cause duplication. That way if
any shard is still building a view, its progress will be returned,
and if many shards are still building, the returned value will indicate
the progress of a single arbitrary shard.
Fixes#4524
Tests:
unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>
All Scylla code is written with "using namespace seastar", i.e., no
"seastar::" prefix for Seastar symbols. Document this in the coding style.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190610203948.18075-1-nyh@scylladb.com>
Fixes#4525
req_param uses boost::lexical cast to convert text->var.
However, lexical_cast does not handle textual booleans,
thus param=true causes not only wrong values, but
exceptions.
Message-Id: <20190610140511.15478-1-calle@scylladb.com>
Currently, each shard protects itself by not reading from rpc and the native
transport if in-flight requests consume too much memory for that shard. However,
if all shards then forward their requests to some other shard, then that shard
can easily run out of memory since its load can be multiplied by the number of
shards that send it requests.
To protect against this, use the new Seastar smp_service_group infrastructure.
We create three groups: read, write, and write ack (the latter is needed to
avoid ABBA deadlocks is shard A exhausts all its resources sending writes to shard B,
and shard B simulateously does the same; neither will be able to send
acknowledgements, so if the writes are throttled, they will never be unthrottled
until a timeout occurs).
Range scans are not addressed by this patch since they are handled by
multishard_mutation_query, which has its own complex cross-shard communication
scheme, but it be a similar solution.
Ref #1105 (missing range scan protection)
Tests: unit (dev)
Message-Id: <20190512142243.17795-1-avi@scylladb.com>
If a port value passed as a string this makes the cluster.connect() to
fail with Python3.4.
Let's fix this by explicitly declaring a 'port' argument as 'int'.
Fixes#4527
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190606133321.28225-1-vladz@scylladb.com>
We want to use the same branch on the other repos build-ami needs
as the one we're building for. Automatically find the current branch
using the `git branch` command.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190606133648.15877-1-bhalevy@scylladb.com>
This patch adds a warning option to the user for situations where
rows count may get bigger than initially designed. Through the
warning, users can be aware of possible data modeling problems.
The threshold is initially set to '100,000'.
Tests: unit (dev)
Message-Id: <20190528075612.GA24671@shenzou.localdomain>
A lot of code in scylla is only reachable if SEASTAR_DEFAULT_ALLOCATOR
is not defined. In particular, refill_emergency_reserve in the default
allocator case is empty, but in the seastar allocator case it compacts
segments.
I am trying to debug a crash that seems to involve memory corruption
around the lsa allocator, and being able to use a debug build for that
would be awesome.
This patch reduces the differences between the two cases by having a
common segment_pool that defers only a few operations to different
segment_store implementations.
Tests: unit (debug, dev)
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190606020937.118205-1-espindola@scylladb.com>
Prometheus needs to remember which "instance" (node) each measurement
came from. But it doesn't actually need Scylla to tell it the instance
name - it knows which node it got each measurement from.
After Seastar commit 79281ef287
which fixed Seastar issue https://github.com/scylladb/seastar/issues/477,
the "instance" label on measurements no longer comes from Scylla but rather
is added by Prometheus. This patch corrects the documentation to explain the
current situation, instead of incorrectly saying that Scylla adds the
"instance" label itself.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190602074629.14336-1-nyh@scylladb.com>
The relocatable Python is built from Fedora packages. Unfortunately TLS
certificates are in a different location on Debian variants, which
causes "node_exporter_install" to fail as follows:
Traceback (most recent call last):
File "/usr/lib/scylla/libexec/node_exporter_install", line 58, in <module>
data = curl('https://github.com/prometheus/node_exporter/releases/download/v{version}/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), byte=True)
File "/usr/lib/scylla/scylla_util.py", line 40, in curl
with urllib.request.urlopen(req) as res:
File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1360, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1319, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>
Unable to retrieve version information
node exporter setup failed.
Fix the problem by overriding the SSL_CERT_FILE environment variable to
point to the correct location of the TLS bundle.
Message-Id: <20190604175434.24534-1-penberg@scylladb.com>
The to_json_string utility implementation was based on const references
instead of views, which can be a source of unnecessary memory copying.
This patch migrates all to_json_string to use bytes_view and leaves
the const reference version as a thin wrapper.
Message-Id: <2bf9f1951b862f8e8a2211cb4e83852e7ac70c67.1559654014.git.sarna@scylladb.com>
"
Technically queue_reader already exists, however so far it was a
private utility in `multishard_writer.cc`. This mini-series makes it
public and generally useful. The interface is made safer and simpler and
the implementation is improved so it doesn't have two separate buffers.
Also, unit tests are added.
Tests: mutation_reader_test:debug/test_queue_reader, multishard_writer_test:debug
"
* 'queue_reader/v2' of https://github.com/denesb/scylla:
queue_reader: use the reader's buffer as the queue
Make queue_reader public
The queue reader currently uses two buffers, a `_queue` that the
producer pushes fragments into and its internal `_buffer` where these
fragments eventually end up being served to the consumer from.
This double buffering is not necessary. Change the reader to allow the
producer to push fragments directly into the internal `_buffer`. This
complicates the code a little bit, as the producer logic of
`seastar::queue` has to be folded into the queue reader. On the other
hand this introduces proper memory consumption management, as well as
reduces the amount of consumed memory and eliminates the possibility of
outside code mangling with the queue. Another big advantage of the
change is that there is now an explicit way to communicate the EOS
condition, no need to push a disengaged `mutation_fragment_opt`.
The producer of the queue reader now pushes the fragments into the
reader via an opaque `queue_reader_handle` object, which has the
producer methods of `seastar::queue`.
Existing users of queue readers are refactored to use the new interface.
Since the code is more complex now, unit tests are added as well.
We have a script in tree that fixes the schema for distributed system
tables, like tracing, should they change their schema. We use it all the
time but unfortunately it is not distributed with the scylla package,
which makes it using it harder (we want to do this in the server, but
consistent updates will take a while).
One of the problems with the script today that makes distributing it
harder is that it uses the python3 cassandra driver, that we don't want
to have as a server dependency. But now with the relocatable packages in
place there is no reaso not to just add it.
[avi: adjust tools/toolchain/image to point to a new image with
python3-cassandra-driver]
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190603162447.24215-1-glauber@scylladb.com>