Reduces coupling. User's should not rely on the fact that it's an
std::map<>. It also allows us to extend row's interface with
domain-specific methods, which are a lot easier to discover than free
functions.
The immediate motivation for introducing frozen_mutation is inability
to deserialize current "mutation" object, which needs schema reference
at the time it's constructed. It needs schema to initialize its
internal maps with proper key comparators, which depend on schema.
frozen_mutation is an immutable, compact form of a mutation. It
doesn't use complex in-memory strucutres, data is stored in a linear
buffer. In case of frozen_mutation schema needs to be supplied only at
the time mutation partition is visited. Therefore it can be trivially
deserialized without schema.
Origin does that, so should we. Both ttl and expiry time are stored in
sstables. The value of ttl seems to be used to calculate the read
digest (expiry is not used for that).
The API for creating atomic_cells changed a bit.
To create a non-expiring cell:
atomic_cell::make_live(timestamp, value);
To create an expiring cell:
atomic_cell::make_live(timestamp, value, expiry, ttl);
or:
// Expiry is calculated based on current clock reading
atomic_cell::make_live(timestamp, value, ttl_optional);
A lookup can cause several data sources to be merged, in which case we will
have to return a temporary (containing data from all the data sources).
For simplicity, we start by always returning a temporary.
Ensure that read-side accessors are const. This is important in preparation
for multiple memtables (and later, sstables) since a read-side
mutation_partition may be a temporary object coming from multiple memtables
(and sstables) while a write-side mutation_partition is guaranteed to belong
to a single memtable (and thus, not be temporary).
Since writers will want non-const mutation_partitions to write to, they won't
be able to use the read-side accessors by accident.
* A commitlog is created in "work" dirs when initing the db
from a datadir. However, since we have neither disk data storage,
nor replay capability yet (and no real db config), the settings
are basically to just write in-memory serialization, write them to
disk and then discard them. So in fact, pointless. But at least using
the log...
* Moved the actual "apply" of mutation into database. If a commitlog
is active, add an entry to it before applying mutation.
Partitions should be ordered using Origin's ordering, which is first
by token, then by Origin's representation of the key. That is the
natural ordering of decorated_key.
This also changes mutation class to hold decorated_key, to avoid
decoration overhead at different layers.
bytes and sstring are distinct types, since their internal buffers are of
different length, but bytes_view is an alias of sstring_view, which makes
it possible of objects of different types to leak across the abstraction
boundary.
Fix this by making bytes a basic_sstring<int8_t, ...> instead of using char.
int8_t is a 'signed char', which is a distinct type from char, so now
bytes_view is a distinct type from sstring_view.
uint8_t would have been an even better choice, but that diverges from Origin
and would have required an audit.
This patch converts (for very small value of 'converts') some
replication related classes. Only static topology is supported (it is
created in keyspace::create_replication_strategy()). During mutation
no replication is done, since messaging service is not ready yet,
only endpoints are calculated.
* database now holds all keyspace + column family object
* column families are mapped by uuid, either generated or explicit
* lookup by name tuples or uuid
* finder functions now return refs + throws on missing obj
Holding keys and their prefixes as "bytes" is error prone. It's easy
to mix them up (or use wrong types). This change adds wrappers for
keys with accessors which are meant to make misuses as difficult as
possible.
Prefix and full keys are now distinguished. Places which assumed that
the representation is the same (it currently is) were changed not to
do so. This will allow us to introduce more compact storage for non-prefix
keys.
We use bytes for many different things, and it is easy to get confused as
to what format the data is actually in.
Fix that for atomic_cell by proving wrappers. atomic_cell::one corresponds
to a bytes object holding exactly one atomic cell, and atomic_cell::view is
a bytes_view to an atomic_cell. The static functions of atomic_cell itself
are privatized to prevent the unwashed masses from using them on the wrong
objects.
Since a row entry can hold either a an atomic cell, or a collection,
depending on the schema, also introduce a variant type
atomic_cell_or_collection and allow the user to pick the type explicitly.
Internally both are stored as bytes object.
Storing cells as boost::any objects makes us use expensive
boost::any_cast to access the data. This change replaces boost::any
with bytes object which holds the value in serialized form (the same
as will be used for on-wire format).
If the cell type is atomic, you use fields accessors defined in
atomic_cell class, eg like this:
if (column.type.is_atomic()) {
if (atomic_cell::is_live(c) {
auto timestamp = atomic_cell::timestamp(c);
...
}
}
Eventually we could switch to a more officient semi-serialized form
with native byte order but I don't want to introduce it just yet for
simplicity.
Add database::shard_of() to compute the shard hosting the partition
(with a simplistic algorithm, but perhaps not too bad).
Convert non-metadata invoke_on_all() and local calls on the database
to use shard_of().
s/database/distributed<database>/ everywhere.
Use simple distribution rules: writes are broadcast, reads are local.
This causes tremendous data duplication, but will change soon.
Futures hold either a value or an exception; thrift uses two separate
function objects to signal completion, one for success, the other for
an exception.
Add a helper to pass the result of a future to either of these.
For simplicity partition data is stored using the same object which is
used for mutations: mutation_partition. Later we can introduce a more
efficient version.
Regular columns may have names of arbitrary type. See
https://issues.apache.org/jira/browse/CASSANDRA-8178
Primary key columns are UTF8.
This change also does some refactoring of the schema object to make
the change easier to digest (more encapsulation).
Cassandra allows even regular columns to be treated as a sorted map
(column name -> value), accessing it with get_slice(), so sort the column
names to support this.