We use boost::any to convert to and from database values (stored in
serlialized form) and native C++ values. boost::any captures information
about the data type (how to copy/move/delete etc.) and stores it inside
the boost::any instance. We later retrieve the real value using
boost::any_cast.
However, data_value (which has a boost::any member) already has type
information as a data_type instance. By teaching data_type intances about
the corresponding native type, we can elimiante the use of boost::any.
While boost::any is evil and eliminating it improves efficiency somewhat,
the real goal is growing native type support in data_type. We will use that
later to store native types in the cache, enabling O(log n) access to
collections, O(1) access to tuples, and more efficient large blob support.
Since 4641dfff24 "service: Copy client
state to query state" after executing a query client state needs to be
merged back. If that's not done client_state::_last_timestamp_micros
won't be advanced properly and mutations originating from the same
source may have exactly the same timestamp.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Since commit 5613979a85
broadcast address has to be set before it's used for the first
time.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
cql_query_test hasn't configured Broadcast address before
it was used for the first time.
Broadcast address is an essential Node's configuration.
There is an assert in utils::fb_utils::get_broadcast_address()
that ensures that broadcast address has been properly configured
before it's used for the first time and it is triggered without
this patch.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
"This series add code for computing mutation_partition difference.
For mutations A and B:
diffA = A.difference(B);
diffB = B.difference(A);
AB = A.apply(B);
diffA is the minimal mutation that when applied to B makes it equal
to AB and diffB is the minimal mutation that applied to A results in AB.
Fixes #430."
"This patchset implements load_new_sstables, allowing one to move tables inside the
data directory of a CF, and then call "nodetool refresh" to start using them.
Keep in mind that for Cassandra, this is deemed an unsafe operation:
https://issues.apache.org/jira/browse/CASSANDRA-6245
It is still for us something we should not recommend - unless the CF is totally
empty and not yet used, but we can do a much better job in the safety front.
To guarantee that, the process works in four steps:
1) All writes to this specific column family are disabled. This is a horrible thing to
do, because dirty memory can grow much more than desired during this. Throughout out
this implementation, we will try to keep the time during which the writes are disabled
to its bare minimum.
While disabling the writes, each shard will tell us about the highest generation number
it has seen.
2) We will scan all tables that we haven't seen before. Those are any tables found in the
CF datadir, that are higher than the highest generation number seen so far. We will link
them to new generation numbers that are sequential to the ones we have so far, and end up
with a new generation number that is returned to the next step
3) The generation number computed in the previous step is now propagated to all CFs, which
guarantees that all further writes will pick generation numbers that won't conflict with
the existing tables. Right after doing that, the writes are resumed.
4) The tables we found in step 2 are passed on to each of the CFs. They can now load those
tables while operations to the CF proceed normally."
The current codes assumes a particular dir/generation pair. We
will use it for a more generic case. This code could really use some
clean up, by the way. We should do it later.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
There is no reason aside from testing for a table to just change its generation
number.
There will be, however, when we support loading new sstables. The method
however needs to be completely rewritten, so let's make sure the tests are not
using that.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
Avoid using long for it, and let's use a fixed size instead. Let's do signed
instead of unsigned to avoid upsetting any code that we may have converted.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
"Fixes: #469
We occasionally generate memtables that are not empty, yet have no
high replay_position set. (Typical case is CL replay, but apparently
there are others).
Moreover, we can do this repeatedly, and thus get caught in the flush
queue ordering restrictions.
Solve this by treating a flush without replay_position as a flush at the
highest running position, i.e. "last" in queue. Note that this will not
affect the actual flush operation, nor CL callbacks, only anyone waiting
for the operation(s) to complete.
To do this, the flush_queue had its restrictions eased, and some introspection
methods added."
test_setup::do_with_test_directory is missing. For some reason,
the test wasn't failing without it until now. Adding it is the
correct thing to do anyway.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
"This patchset introduces leveled compaction to Scylla.
We don't handle all corner cases yet, but we already have the strategy
and compaction working as expected. Test cases were written and I also
tested the stability with a load of cassandra-stress.
Leveled compaction may output more than one sstable because there is
a limit on the size of sstables. 160M by default.
Related to handling of partial compaction, it's still something to be
worked on.
Anyway, it will not be a big problem. Why? Suppose that a leveled
compaction will generate 2 sstables, and scylla is interrupted after
the first sstable is completely written but before the second one is
completely written. The next boot will delete the second sstable,
because it was partially written, but will not do anything with the
first one as it was completely written.
As a result, we will have two sstables with redundant data."
Explicitly use up all the memory in the system as best as we can instead
Still not super reliable, but should have less side effects. And work better
with pre-allocated segment files
Explicitly use up all the memory in the system as best as we can instead
Still not super reliable, but should have less side effects. And work better
with pre-allocated segment files
Adapt our compaction code to start writing a new sstable if the
one being written reached its maximum size. Leveled strategy works
with that concept. If a strategy other than leveled is being used,
everything will work as before.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Fixes spurious failures in test_commitlog_discard_completed_segments
* Do explicit sync on all segments to prevent async flushed from keeping
segements alive.
* Use counter instead of actual file counting to avoid racing with
pre-allocation of segments