Cassandra-derived tools (such as sstable2json) may write commitlog segments,
that Scylla cannot recognize. Since we now write them with a distinct name,
we can recognize the name and ignore these segments, as we know the data they
contain is not interesting.
Fixes#1112.
Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>
commitlog's sync period is initialized as the batch period, and not as the
sync period itself as it should be.
I've found this by code inspection, but unless I am missing something
really fundamental, this seems to be completely wrong. It's been working
fine because in our defaults, I have checked that both variables default to
the same value. But it seems to me that as long as anyone would change one
of them, the behavior wouldn't be as expected.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>
Seastar wrongly limits the number of concurrent submit_to()s to a single
remote shard. This can cause an ABBA deadlock:
fiberA fiberB (x127)
submit_to(0) # lock schema
<- returns
submit_to(0) # lock schema (waits)
submit_to(0) # do work (waits)
The fiberBs wait for fiberA, which in turn waits for a fiberB to return.
While the correct fix is to remote the client-side limit and replace it
with a server-side per-verb limit, we start with a simpler fix that
replaces the blocking lock call with a non-blocking call, removing the
deadlock.
Fixes#1088.
Message-Id: <1459095357-28950-1-git-send-email-avi@scylladb.com>
At the momment, the callbacks returns void, it is impossible to wait for
the callbacks to complete. Make the callbacks runs inside seastar
thread, so if we need to wait for the callback, we can make it call
foo_operation().get() in the callback. It is easier than making the
callbacks return future<>.
This patch makes sure that every time we need to create a new generation number -
the very first step in the creation of a new SSTable, the respective CF is already
initialized and populated. Failure to do so can lead to data being overwritten.
Extensive details about why this is important can be found
in Scylla's Github Issue #1014
Nothing should be writing to SSTables before we have the chance to populate the
existing SSTables and calculate what should the next generation number be.
However, if that happens, we want to protect against it in a way that does not
involve overwriting existing tables. This is one of the ways to do it: every
column family starts in an unwriteable state, and when it can finally be written
to, we mark it as writeable.
Note that this *cannot* be a part of add_column_family. That adds a column family
to a db in memory only, and if anybody is about to write to a CF, that was most
likely already called. We need to call this explicitly when we are sure we're ready
to issue disk operations safely.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
During bootstrapping additional copies of data has to be made to ensure
that CL level is met (see CASSANDRA-833 for details). Our code does
that, but it does not take into account that bootstraping node can be
dead which may cause request to proceed even though there is no
enough live nodes for it to be completed. In such a case request neither
completes nor timeouts, so it appear to be stuck from CQL layer POV. The
patch fixes this by taking into account pending nodes while checking
that there are enough sufficient live nodes for operation to proceed.
Fixes#965
Message-Id: <20160303165250.GG2253@scylladb.com>
Currently schema changes are only logged at coordinator node which
initiates the change. It would be helpful in post morten analysis to
also see when and how schema changes are resolved when applied on
other nodes.
Message-Id: <1456953095-1982-1-git-send-email-tgrabiec@scylladb.com>
While is is formally better to take a local lock first and
then first contend for a global, in this case it is arguably
better to ensure we get a gate exception synchronously (early)
instead of potentially in a continuation. Old version might
cause us to do a gate::leave even while never entered.
And since we should really only have one active (contending)
segment per shard anyway, it should not matter.
Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>
The segment->segment_manager pointer has, until now, been a raw pointer,
which in a way is sensible, since making circular shared pointer
relations is in general bad. However, since the code and life cycle
of segments has evolved quite a bit since that initial relation
was defined, becoming both more and then suddenly, in a sense,
less, asynchronous over time, the usage of the relation is in fact
more consistent with a shared pointer, in that a segment needs to
access its manager to properly do things like write and flush.
These two ops in particular depend on accessing the segment manager
in a way that might be fine even using raw pointers, if it was not
again for that little annoying thing of continuation reordering.
So, lets just make the relation a shared pointer, solving the issue
of whether the manager is alive when a segment accesses it. If it
has been "released" (shut down), the existing mechanisms (gate)
will then trigger and prevent any actual _actions_ from taking
place. And we don't have to complicate anything else even more.
Only "big" change is that we need to explicitly orphan all
segments in commitlog destructor (segment_manager is essentially
a p-impl).
This fixes some spurious crashes in nightly unit tests.
Fixes#966.
Message-Id: <1456838735-17108-1-git-send-email-calle@scylladb.com>
Fixes#482
See code comment. Reserve segment allocation count sum can temporarily
overflow due to continuation delay/reordering, if we manage to reach the
on_timer code before finally clauses from previous reserve allocation
invocation has processed. However, since these are benign overflows
(just indicating even more that we don't need to do anything right now)
simply capping the count should be fine.
Avoids assert in boost irange.
Message-Id: <1456740679-4537-1-git-send-email-calle@scylladb.com>
Fixes#937
In fixing #884, truncation not truncating memtables properly,
time stamping in truncate was made shard-local. This however
breaks the snapshot logic, since for all shards in a truncate,
the sstables should snapshot to the same location.
This patch adds a required function argument to truncate (and
by extension drop_column_family) that produces a time stamp in
a "join" fashion (i.e. same on all shards), and utilizes the
joinpoint type in caller to do so.
Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>
Fixes#884
Time stamps for truncation must be generated after flush, either by
splitting the truncate into two (or more) for-each-shard operations,
or simply by doing time stamping per shard (this solution).
We generate TS on each shard after flushing, and then rely on the
actual stored value to be the highest time point generated.
This should however, from batch replay point of view, be functionally
equivalent. And not a problem.
Since the table is written from all shards, and we possibly might
have conflicting time stamps, we define the trucated_at time
as the highest time point. I.e. conservative.
Truncation records are not portable between us and Origin.
We need to detect and ensure we neither try to use, and more to the
point, don't crash because of data format error when loading, origin
records from a migrated system.
This problem was seen by Tzach when doing a migration from an origin
setup.
Updated record storage to use IDL-serialized types + added versioning
and magic marking + odd-size-checking to ensure we load only correct
data. The code will also deal with records from an older version of
scylla.
I saw the following Boost format string related warning during commitlog
replay:
INFO [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log
WARN [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed)
While inspecting the code, I noticed that one of the error loggers is
missing an argument. As I don't know how the original failure triggered,
I wasn't able to verify that that was the only one, though.
Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>
Time a node waits after sending gossip shutdown message in milliseconds.
Reduces ./cql_query_test execution time
from
real 2m24.272s
user 0m8.339s
sys 0m10.556s
to
real 1m17.765s
user 0m3.698s
sys 0m11.578
Last series accidently broke batch mode.
With new, fancy, potentitally blocking ways, we need to treat
batch mode differently, since in this case, sync should always
come _after_ alloc-write.
Previous patch caused infinite loop. Broke jenkins.
Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>
Also check closed status in allocate, since alloc queue waiting could
lead to us re-allocating in a segment that gets closed in between
queue enter and us running the continuation.
Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>
Configured on start (for now - and dummy values at that).
When shard write/flush count reaches limit, and incoming ops will queue
until previous ones finish.
Consequently, if an allocation op forces a write, which blocks, any
other incoming allocations will also queue up to provide back pressure.
After this patch, our I/O operations will be tagged into a specific priority class.
The available classes are 5, and were defined in the previous patch:
1) memtable flush
2) commitlog writes
3) streaming mutation
4) SSTable compaction
5) CQL query
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Enable the incremental_backups/--incremental-backups option.
When enabled there will be a hard link created in the
<column family directory>/backup directory for every flushed
sstable.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
* Match origin log messages
- Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
- At info level, like origin
Prompted by dtest that expects origin log output.
Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>