We cannot capture keyspace_metadata by reference because it can be
allocated on the stack. Fixes SIGSEGV while running cassandra-stress.
The bug was introduced in commit commit cd35617 ("database: Use
keyspace_metadata for creation functions").
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
From Avi:
"This patchset prepares for adding sstables to the read path. Because sstables
involve I/O, their APIs return futures, which means that APIs that may call
those sstable APIs also need to return futures.
This patchset uses the two-space indent + do_with + reference aliases trick
to make patches more readable. Cleanup patches will follow once it is merged."
Initialize replication strategy when keyspace is being created now that
we have access to keyspace_metadata.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Use the keyspace_metadata type for keyspace creation functions. This is
needed to be able to have a mapping from keyspace name to keyspace
metadata for various call-sites.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Pekka says:
"We are going to keep the ks_meta_data class around and use it in core
code like the migration manager. Therefore, clean up the class and move
it to the database.hh where user_types_metadata also is defined in. As a
bonus, this also fixes the circular dependency between ks_meta_data.hh
and database.hh."
Follow the naming convention set by user_types_metadata and rename
ks_meta_data to keyspace_metadata.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This is slightly awkwards, since the directory structure is not sharded.
This requires some processing to occur outside the shard, while the rest
is sharded.
Returning a reference to the keyspace is dangerous in that the keyspace can
be moved away, when we start futurizing the add_keyspace() process. Make
it return void and look up the keyspace at the point of use.
Reduces coupling. User's should not rely on the fact that it's an
std::map<>. It also allows us to extend row's interface with
domain-specific methods, which are a lot easier to discover than free
functions.
If method doesn't want to share schema ownership it doesn't have to
take it by shared pointer. The benefit is that it's slightly cheaper
and those methods may now be called from places which don't own
schema.
Deleted cells store deletion time not expiry time. This change makes
expiry() valid only for live cells with TTL and adds deletion_time(),
which is inteded to be used with deleted cells.
The immediate motivation for introducing frozen_mutation is inability
to deserialize current "mutation" object, which needs schema reference
at the time it's constructed. It needs schema to initialize its
internal maps with proper key comparators, which depend on schema.
frozen_mutation is an immutable, compact form of a mutation. It
doesn't use complex in-memory strucutres, data is stored in a linear
buffer. In case of frozen_mutation schema needs to be supplied only at
the time mutation partition is visited. Therefore it can be trivially
deserialized without schema.
In preparation for multiple memtables, move column_family::partitions into
its own class, and forward relevant calls from column_family.
A testonly_all_memtables() function was added to support sstable_test.
Currently we use the first byte of the token for determining the local
shard. This is suboptimal for two reasons:
1. the first bytes of the token were already used to select the node,
so they are not randomly distributed
2. using a single byte is not sufficient for large core counts, as the
modulo operation will not return evenly distributed results
Fix by using the final two bytes of the token.
A lookup can cause several data sources to be merged, in which case we will
have to return a temporary (containing data from all the data sources).
For simplicity, we start by always returning a temporary.
Ensure that read-side accessors are const. This is important in preparation
for multiple memtables (and later, sstables) since a read-side
mutation_partition may be a temporary object coming from multiple memtables
(and sstables) while a write-side mutation_partition is guaranteed to belong
to a single memtable (and thus, not be temporary).
Since writers will want non-const mutation_partitions to write to, they won't
be able to use the read-side accessors by accident.