"This series addresses two things:
1) fixes a long standing problem with query::partition_range being modeled
inappropriately. It worked on key values only, whereas partition ordering is
determined by token and key. This will be needed by clustering code soon,
which will split a full range into many slices based on token ring topology.
2) refactors mutation reading code in preparation for adding row cache.
mutation_reader merging was extracted to make_combined_reader(). sstables
are now represented by a single mutation_reader."
This change abstracts reading from on-disk data sources behind a single
reader which is then composed with memtable readers. This change also
abstracts all data sources behind a single reader obtained via
column_family::make_reader(). That reader is then used by algorithms
like column_family::for_all_partitions() or
column_family::query(). Having those abstractions will make it easier
to add row cache, because it will be encapsulated in a single place.
Currently column_family::for_all_partitions() relies on monotonicity
of keys. Adding strict monotonicity requirement doesn't hurt
implementaitons, but makes some consumers simpler.
Current model was not really correct because Origin doesn't support
querying of partition ranges by their value. We can query slices
according to dht::decorated_key ordering, which orders partitions
first by token then by key value.
ring_position encapsulates range constraint. Key value is optional, in
which case only token is constrained.
In theory, when we create a new column family, we should also make sure
that the underlying directory exist. However, this would be quite challenging:
there are a lot of entry points for, add_column_family, none of them are futurized,
and futurizing them could prove challenging up the call chain.
Because we can guarantee that the keyspace directory will exist - now that we
have unified that, it is actually a lot simpler to just make sure that the
directory exist when writing the sstable.
If the keyspace directory wouldn't exist we would have to recurse through the
path. As previously said, this patch will assume this away.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Currently, Origin generates sstables in the form CF-UUID, where UUID
is a string of numbers.
We also do CF-UUID, but for us, UUID has dashes separating the UUID components.
Due to the current test, we fails to load our current sstables. That test
really isn't that important, since we are currently not doing anything with the
UUID. And if we were, we should be able to accept both formats anyway.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Because system keyspace is not created using the same way as the others - and
it would be hard to convert, due to the fact that it is created inside the
database constructor, make sure that it is created when the database boots.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We are currently generating an empty config, which is wrong and won't
propagate important characteristics of the keyspace.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
A lot of our tests run in memory only, but now that our write path is complete,
we may start running into problems soon, as we write down the sstables.
It would be nice to force the database to run in-memory only in some situations.
Even in the real world, some scenarios may benefit from that in the future.
This patch forces durable_writes to be always false in case we force the data
directory to be an empty list.
For system tables, the patch also fixes a bug. Because system tables were
forceably initialized with durable_writes = false, we would never write them to
disk, even when we were supposed to.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Otherwise tester may crash if _instances destructor is called when thread
responsible for the allocation (which tester spawned to run seastar in)
no longer running.
There are many situations in which we would like to make sure a directory
exists. We can do that by creating the directory we want, and just ignoring
the relevant error.
It is a lot of code though, and I believe it is an idiom common enough to exist
on its own.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>