Currently, if sstable::write_components() is called to write a new sstable
using the same generation of a sstable that exists, a temporary TOC will
be unconditionally created. Afterwards, the same sstable::write_components()
will fail when it reaches sstable::create_data(). The reason is obvious
because data component exists for that generation (in this scenario).
After that, user will not be able to boot scylla anymore because there is
a generation with both a TOC and a temporary TOC. We cannot simply remove a
generation with TOC and temporary TOC because user data will be lost (again,
in this scenario). After all, the temporary TOC was only created because
sstable::write_components() was wrongly called with the generation of a
sstable that exists.
Solution proposed by this patch is to trigger exception if a TOC file
exists for the generation used.
Some SSTable unit tests were also changed to guarantee that we don't try
to overwrite components of an existing sstable.
Refs #1014.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <caffc4e19cdcf25e4c6b9dd277d115422f8246c4.1457643565.git.raphaelsc@scylladb.com>
(cherry picked from commit 031bf57c19)
Checking schema::is_dense() is not enough to know whether row marker
should be inserted or not as there may be compact storage tables that
are not considered dense (namely, a table with now clustering key).
Row marker should only be insterted if schema::is_cql3_table() is true.
Fixes#931.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1456834937-1630-1-git-send-email-pdziepak@scylladb.com>
corrupt_segment() is meant to write some garbage at arbitrary position
in the commitlog segment. That position is not necessairly properly
aligned for uint32_t.
Silences ubsan complaints about unaligned write.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1456827726-21288-1-git-send-email-pdziepak@scylladb.com>
In each gossip round, i.e., gossiper::run(), we do:
1) send syn message
2) peer node: receive syn message, send back ack message
3) process ack message in handle_ack_msg
apply_state_locally
mark_alive
send_gossip_echo
handle_major_state_change
on_restart
mark_alive
send_gossip_echo
mark_dead
on_dead
on_join
apply_new_states
do_on_change_notifications
on_change
4) send back ack2 message
5) peer node: process ack2 message
apply_state_locally
At the moment, syn is "wait" message, it times out in 3 seconds. In step
3, all the registered gossip callbacks are called which might take
significant amount of time to complete.
In order to reduce the gossip round latency, we make syn "no-wait" and
do not run the handle_ack_msg insdie the gossip::run(). As a result, we
will not get a ack message as the return value of a syn message any
more, so a GOSSIP_DIGEST_ACK message verb is introduced.
With this patch, the gossip message exchange is now async. It is useful
when some nodes are down in the cluster. We will not delay the gossip
round, which is supposed to run every second, 3*n seconds (n = 1-3,
since it talks to 1-3 peer nodes in each gossip round) or even
longer (considering the time to run gossip callbacks).
Later, we can make talking to the 1-3 peer nodes in parallel to reduce
latency even more.
Refs: #900
Test auto-generated and writer-based serialization as well as
deserialization of simple compound type, vectors and variants.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
For simplicity, we want to have keys serializable and deserializable
without schema for now. We will serialize keys in a generic form of a
vector of components where the format of components is specified by
CQL binary protocol. So conversion between keys and vector of
components needs to be possible to do without schema.
We may want to make keys schema-dependent back in the future to apply
space optimizations specific to column types. Existing code should
still pass schema& to construct and access the key when possible.
One optimization had to be reverted in this change - avoidance of
storing key length (2 bytes) for single-component partition keys. One
consequence of this, in addition to a bit larger keys, is that we can
no longer avoid copy when constructing single-component partition keys
from a ready "bytes" object.
I haven't noticed any significant performance difference in:
tests/perf/perf_simple_query -c1 --write
It does ~130K tps on my machine.
Fixes#868.
Registerring exit hooks while reactor is already iterating over exit
hooks is not allowed and currently leads to undefined behavior
observed in #868. While we should make the failure more user friendly,
registering exit hooks concurrently with shutdown will not be allowed.
We don't expect exit hooks to be registered after exit starts because
this would violate the guarantee which says that exit hooks are
executed in reverse order of registration. Starting exit sequence in
the middle of initialization sequence would result in use after free
errors. Btw, I'm not sure if currently there's anything which prevents
this
To solve this problem, move the exit hook to initilization
sequence. In case of tests, the cleanup has to be called explicitly.
Time a node waits after sending gossip shutdown message in milliseconds.
Reduces ./cql_query_test execution time
from
real 2m24.272s
user 0m8.339s
sys 0m10.556s
to
real 1m17.765s
user 0m3.698s
sys 0m11.578
row_cache::update() does not explicitly invalidate the entries it failed
to update in case of a failure. This could lead to inconsistency between
row cache and sstables.
In paractice that's not a problem because before row_cache::update()
fails it will cause all entries in the cache to be invalidated during
memory reclaim, but it's better to be safe and explicitly remove entries
that should be updated but it was not possible to do so.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453829681-29239-1-git-send-email-pdziepak@scylladb.com>
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Compaction manager was initially created at utils because it was
more generic, and wasn't only intended for compaction.
It was more like a task handler based on futures, but now it's
only intended to manage compaction tasks, and thus should be
moved elsewhere. /sstables is where compaction code is located.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
test_cassandra_hash also sort of expects exceptions. ASas causes false
positives here as well with seastar::thread, so do it with normal cont.
Message-Id: <1453295521-29580-2-git-send-email-calle@scylladb.com>
From Paweł:
"This series contains some more fixes for issues related to alter table,
namely: incorrect parsing of collection information in comparator, missing
schema::_raw._collections in equality check, missing compatibility
information for utf8->blob, ascii->blob and ascii->utf8 casts."
test_password_authenticator_operations causes ASan failures, in a way
that I am 99% sure is fully false positive, caused by a combo of
seastar threads, exception throwing and externals.
In lieu of actually identifying what ASan flaw causes this and
potentially cure it, for now, lets just re-write the test in question
to not use seastar::async, but normal continuation. Less easy to read,
but passes ASan.
Message-Id: <1453205136-10308-1-git-send-email-calle@scylladb.com>
When compacting sstable, mutation that doesn't belong to current shard
should be filtered out. Otherwise, mutation would be duplicated in
all shards that share the sstable being compacted.
sstable_test will now run with -c1 because arbitrary keys are chosen
for sstables to be compacted, so test could fail because of mutations
being filtered out.
fixes#527.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1acc2e8b9c66fb9c0c601b05e3ae4353e514ead5.1453140657.git.raphaelsc@scylladb.com>
Representation format is an implementation detail of
partition_key. Code which compares a value to representation makes
assumptions about key's representation. Compare keys to keys instead.
Message-Id: <1453136316-18125-1-git-send-email-tgrabiec@scylladb.com>
"This series makes sure that Scylla rejects adding a collections if
its column name is the same as a collection that existed before and
their types are incompatible.
Fixes#782"
"This patch is intended to add support to column family cleanup, which will
make 'nodetool cleanup' possible.
Why is this feature needed? Remove irrelevant data from a node that loses part
of its token range to a newly added node."