Any alter table statement that doesn't explicitly set the default time
to live will reset it to 0.
That can be very dangerous for time series use cases, which rely on
all data being eventually expired, and a default TTL of 0 means
data never being expired.
Fixes#5048.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200402211653.25603-1-raphaelsc@scylladb.com>
podman is compatible with docker, but by default emits a manifest
format that is not understood by old docker clients, so give it
an extra flag to generate the old format instead.
Message-Id: <20200406134526.21521-1-avi@scylladb.com>
There is no reason why the table code has to be aware of the efforts of
rewriting (cleanup, scrub, upgrade) an SSTable versus compacting it.
Rewrite is special, because we need to do it one SSTable at a time,
without lumping it together. However, the compaction manager is totally
capable of doing that itself. If we do that, the special
"table::rewrite_sstables" can be killed.
This code would maybe be better off as a thread, where we wouldn't need
to keep state. However there are some methods like maybe_stop_on_error()
that expect a future so I am leaving this be for now. This is a cleanup
that can be done later.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200401162722.28780-2-glauber@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/6136 from
Piotr Sarna:
An empty partition/clustering key pair is a valid state of the
query paging state. Unfortunately, recent attempts at debugging
a flaky test (#5856) resulted in introducing an assertion (7616290)
which breaks when trying to generate a key from such a pair.
In order to keep the assertion (since it still makes sense in its
scope), but at the same time translate empty keys properly,
empty keys are now explicitly processed at the beginning of the
function.
This behaviour was 100% reproducible in a secondary index dtest below.
Fixes#6134
Refs #5856
Tests: unit(dev),
dtest(TestSecondaryIndexes.test_truncate_base)
To use install.sh as Scylla install script w/o using .rpm/.deb package,
we need to provide a way to upgrade Scylla version, not just install.
With --upgrade option, install.sh does not overwrite config files.
It will install <filename>.new file on same directory, when old config file and
new config file does not contain same data.
If old one and new one is exactly same, it will nothing.
To implement this, rewriting api_ui_dir/api_doc_dir path on scylla.yaml
moved from .rpm/.deb scriptlet to install.sh.
Fixes#5874
On some environment systemd-coredump does not work with symlink directory,
we can use bind-mount instead.
Also, it's better to check systemd-coredump is working by generating coredump.
To fix#5916, drop scylla_coredump_setup from .rpm %post scriptlet.
Fixes#5753Fixes#5916
"
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.
There are two motivations for this work:
Telling users not to ever major compact is easier said than
done: in practice due to a variety of circumstances it might
end up being done in which case data will have a hard time
expiring later.
We are about to start working with offstrategy compactions,
which are compactions that work in parallel with the main
compactions. In those cases we may be converting SSTables from
one format to another and it might be necessary to split a single
big STCS SSTable into something that TWCS expects
In order to achieve that, we start by changing the way resharding works:
it will now work with a read interposer, similar to the one TWCS uses for
streaming data. Once we do that, a lot of assumptions that exist in the
compaction code can be simplified and supporting TWCS major
compactions become a matter of simply enabling its interposer in the
compaction code as well.
There are many further simplifications that this work exposes:
The compaction method create_new_sstable seems out of place. It is not
used by resharding, and it seems duplicated for normal compactions. We
could clean it up with more refactoring in a later patch.
The whole logic of the feed_writer could be part of the consumer code.
Testing details:
scylla unit tests (dev, release)
sstable_datafile_test (debug)
dtests (resharding_test.py)
manual scylla resharding
Fixes#1431
"
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
* 'twcs-major-v3' of github.com:glommer/scylla:
compaction: make major compaction time-aware with TWCS
compaction: do resharding through an interposer
mutation_writer: introduce shard_based splitting writer
mutation_writer: factor out part of the code for the timestamp splitter
compaction: abort if create_new_sstable is called from resharding
If get_schema_for_read() fails "prune" counter will not be decremented.
The patch fixes it by creating RAI object earlier. Also return releasing
of a mutation in release_mutation() which was dropped by mistake.
Fixes#6124
Message-Id: <20200405080233.GA22509@scylladb.com>
Since commit 9948f548a5, the LWT no longer
requires an "experimental" flag, so Alternator documents and scripts
which referred to the need for enabling experimental LWT, are fixed here
to no longer do that.
Fixes#6118.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200405143237.12693-1-nyh@scylladb.com>
When trying to get rid of a large stack warning for gossip test,
I found out that it actually does not run at all for multiple reasons:
1. It segfaults due to wrong initialization order
2. After fixing that, it segfaults on use-after-free (due to capturing
a shared pointer by reference instead of by copy)
3. After that, cleanups are in order:
* seastar thread does not need to be spawned inside another thread;
* default captures are harmful, so they're made explicit instead;
* db::config is moved to heap, to finally get rid of the warning.
Tests: manual(gossip)
Message-Id: <feaca415d0d29a16c541f9987645365310663630.1585128338.git.sarna@scylladb.com>
In order to check regressions related to #6136 and similar issues,
test cases for handling paging state with empty partition/clustering
key pair are added.
An empty partition/clustering key pair is a valid state of the
query paging state. Unfortunately, recent attempts at debugging
a flaky test resulted in introducing an assertion which breaks
when trying to generate a key from such a pair.
In order to keep the assertion (since it still makes sense in its
scope), but at the same time translate empty keys properly,
empty keys are now explicitly processed at the beginning of the
function.
This behaviour was 100% reproducible in a secondary index dtest below.
Fixes#6134
Refs #5856
Tests: unit(dev),
dtest(TestSecondaryIndexes.test_truncate_base)
Specify --pull in order to refresh the base image (some Fedora release).
Usually this is not important, because we run `dnf update`. But if the
cached image happens to be a pre-release version of Fedora, the image
will have the update-testing repository enabled, and we may get some
unwanted updates.
It's sad that we need two separate flags for correctness (the other
is --no-cache.
Message-Id: <20200405164227.8210-1-avi@scylladb.com>
The default serialization path for items was subtly broken -
instead of parsing JSON string representation of objects,
it tried to parse a regular string implementation - which is often
also a valid JSON, but nothing guarantees that it actually is.
Tests: alternator-test(local)
Message-Id: <e1668bf4e9029f2675a4ac28bb4598714575efeb.1586096732.git.sarna@scylladb.com>
Merge patch series from Piotr Sarna:
This series adds extra precautions against potential races
in view building. In particular, it was based on the following scenario:
1. View builder detects that a view V is no longer here, so it schedules
removing its info from bookkeeping, without any semaphores,
and this continuation gets preempted immediately.
2. A view is deleted and recreated with the same name - V.
3. View V building is finished.
4. The continuation from (1.) is finally executed, and it removes old view V
info from bookkeeping - which is a problem, since view building
bookkeeping is based on *names*, not *uuids* - consequently,
the new view bookkeeping info is erroneously removed.
The issue is solved by putting startup code (which also does cleanup
from point (1.)) under the same semaphore as other bookkeeping
operations. With that, it will be impossible to execute step (2.)
before (1.) ends, which effectively prevents the race.
Refs #6094 (possible fixes it too, but since I could not reproduce
the issue...)
Tests: unit(dev)
Piotr Sarna (4):
db,view: fix waiting for a view building future
db,view: remove unneeded implicit capture-by-reference
db,view: nitpick: change & operator to && for booleans
db,view: guard view builder startup with a semaphore
db/view/view.cc | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
This removes the need to include reactor.hh, a source of compile
time bloat.
In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.
Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().
Ref #1
This allows us to drop a #include <reactor.hh>, reducing compile time.
Several translation units that lost access to required declarations
are updated with the required includes (this can be an include of
reactor.hh itself, in case the translation unit that lost it got it
indirectly via logalloc.hh)
Ref #1.
The startup routine performs some bookkeeping operations on views,
and so do these events:
- on_create_view;
- on_drop_view;
- on_update_view.
Since the above events are guarded with a semaphore, the startup
routine should also take the same semaphore - in order to ensure
that all bookkeeping operations are serialized.
Refs #6094
The future was marked with a `FIXME: discarded future`, but there's really
no reason not to wait for it, and it was probably meant to be waited for
since its implementation.
When we switched token representation to int64_t
we added some sanity checks that byte representation
is always 8 bytes long.
It turns out that for token_kind::before_all_keys and
token_kind::after_all_keys bytes can sometimes be empty
because for those tokens they are just ignored. The check
introduced with the change is too strict and sometimes
throws the exception for tokens before/after all keys
created with empty bytes.
This patch relaxes the condition of the check and always
uses 0 as value of _data for special before/after all keys
tokens.
Fixes#6131
Tests: unit(dev, sct)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
* seastar 41c83ec...fd9af3a (7):
> stall_detector: Delete unused member variable
> future: Avoid a move in finally_body
> Merge "Followup cleanups for the apply/invoke split" from Rafael
> Merge "make trivial future related functions noexcept" from Benny
> rpc_test: silence depreceted lambda logger warning
> rpc_demo: stop using variadic futures
> future: Move two static_asserts to the top
Currently we call `on_internal_error()` if `tri_compare()` throws
`marshal_exception`. Some compare paths however might go around
`tri_compare()` and call `abstract_type::compare()` directly. Move the
check there to cover these cases too.
Tests: dev
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200403162530.1175801-1-bdenes@scylladb.com>
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.
There are two motivations for this work:
1. Telling users not to ever major compact is easier said than
done: in practice due to a variety of circumstances it might
end up being done in which case data will have a hard time
expiring later.
2. We are about to start working with offstrategy compactions,
which are compactions that work in parallel with the main
compactions. In those cases we may be converting SSTables from
one format to another and it might be necessary to split a single
big STCS SSTable into something that TWCS expects
With the motivation out of the way, let's talk about the implementation:
The implementation is quite simple and builds upon the previous patches.
It simply specializes the interposer implementation for regular compaction
with a table-specific interposer.
Fixes#1431
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Our resharding code is complex, since the compaction object has to keep
track of many output SSTables, the current shard being written.
When implementing TWCS streaming writers, we ran away from such
write-side complexity by implementing an interposer: the interposer
consumes the flat_mutation_reader stream, creating many different writer
streams. We can do a similar thing for resharding SSTables and have each
writer be guaranteed to contain keys for only a specific source shard.
As we do that, we can move the SSTable and sstable_writer information
to the compacting_sstable_writer object. The compaction object will no
longer be responsible for it and can be simplified, paving the way for
TWCS-major, which will go through an interposer as well.
Note that the compaction_writer, which now holds both the SSTable
pointer and the sstable_writer still needs to be optional. This is
because LCS (and potentially others) still want to create more than
one SSTable per source stream. That is done to guarantee that each
SSTable complies with the max_sstable_size parameter, which is
information available in the sstable_writer that is not present at
the level of the flat_mutation_reader. We want to keep it in the writer
side.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The storage_proxy instances hold references to token_metadata ones and
leave unwaited futures continuing to its query_partition_key_range_concurrent
method.
The latter is called from do_query so it's not that easy to find
out who is leaking. Keep the tokens not freed for a while.
Fixes: #6093
Test: manual start-stop
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200402183538.9674-1-xemul@scylladb.com>
Speculative reader has more targets that needed for CL. In case there is
a digest mismatch the repair runs between all of them, but that violates
provided CL. The patch makes it so that repair runs only between
replicas that answered (there will be CL of them).
Fixes#6123
Reviewed-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402132245.GA21956@scylladb.com>
distcc doesn't like the -x c++ flag, so create an empty.cc file for
this purpose and compile it.
Also drop the "=" from "--include=", which is also disliked by
distcc.
Message-Id: <20200402124312.48963-1-avi@scylladb.com>
This is similar to the timestamp based splitting writer, except
that it splits data based on the shard where the partition key
is supposed to be placed.
It is similar to the multishard_writer, in the sense that it
creates n streams for n shards, but it does not want to process
the streams in the owner shards. We want to use that in processes
like resharding where it is fine for a foreign shard to deal
with a mutation.
One option would be to augment the multishard_writer to optionally
achieve these properties, but having a separate splitter is both
simpler and faster.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
I am about to introduce a new splitter. Therefore, move parts of it
that are common to its own file.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
I am about to get rid of the _shard attribute in the compaction object,
as I will create different streams of writers for different shards.
In preparation for that, remove the arbitrary _shard reference. Raphael
confirms that resharding should never be calling this, as this method is
used exclusively for garbage collection component of run-based
compaction. Therefore we'll just throw in this case and remove the shard
reference.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The shard parameter is ignored for SSTable creation on regular
compaction. It is still good practice and good future proofing
to pass something meaningful here instead of zero. This patch
passes the id of the current shard.
Thanks Botond for pointing that out.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402122212.12218-1-glauber@scylladb.com>
"
This patchseries is part of my effort to make resharding less special -
and hopefully less problematic. The next steps are a bit heavy, so I'd
like to, if possible, get this out of the way.
After these two patches, there is no more need to ever call
reshard_sstables: compact_sstables will do, and it will be able to
recognize resharding compactions.
To do that we need to unify the creator function, which is trivially
done by adding a shard parameter to regular compactions as well: they
can just ignore it. I have considered just making the
compaction_descriptor have a virtual create() function and specializing
it, but because we have to store the creator in the compaction object I
decided to keep the virtual function for now.
In a later cleanup step, if we can for instance store the entire
compaction_descriptor object in the compaction object we could do that.
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Tests: unit tests (dev), dtest (resharding.py)
"
* 'resharding-through-compact-sstables' of github.com:glommer/scylla:
resharding: get rid of special reshard_sstables
compaction: enhance compaction_descriptor with creator and replace function
This reverts commit fdd2d9de3d because it
breaks one heat-weighted load balancing dtest:
FAIL: heat_weighted_load_balancing_cl_QUORUM_test (heat_weighted_load_balancing_test.HeatWeightedLB)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 182, in heat_weighted_load_balancing_cl_QUORUM_test
self.run_heat_weighted_load_balancing('QUORUM')
File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 165, in run_heat_weighted_load_balancing
self.verify_metrics(metrics, cached=False)
File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 73, in verify_metrics
mean_avg, node_mean_avg, key))
AssertionError: 19.0 not found in range(3, 13) : Cache difference between nodes is less then expected: 6469.6/328.2, metric scylla_storage_proxy_coordinator_reads_local_node
I am reverting because it's a test issue, and we should bring this
commit back once the test is fixed.
Gleb Natapov explains:
"dtest result directly depends on replicas we contact. Glauber's patch
make us contacts less replicas, so numbers differ."
The "run" script for the Alternator tests needs to set a system table for
authentication credentials, so we can test this feature.
So far we did this with cqlsh, but cqlsh isn't always installed on build
machines. But install-dependencies.sh already installs the Cassandra driver
for Python, so it makes more sense to use that, so this patch switches to
use it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200331131522.28056-1-nyh@scylladb.com>