Commit Graph

70 Commits

Author SHA1 Message Date
Avi Kivity
788982df33 thrift: improve error messages for exceptions not declared in the thrift interface
When thrift sees an exception that was not declared as part of the interface,
it wraps it using std::exception::what() for the exception text.  This is
often cryptic, so add an "Internal server error" prefix.
2015-05-20 14:26:25 +03:00
Tomasz Grabiec
137b3beb2f Merge tag 'avi/readpath-prep/v1' from seastar-dev.git
From Avi:

"This patchset prepares for adding sstables to the read path.  Because sstables
involve I/O, their APIs return futures, which means that APIs that may call
those sstable APIs also need to return futures.

This patchset uses the two-space indent + do_with + reference aliases trick
to make patches more readable.  Cleanup patches will follow once it is merged."
2015-05-19 20:39:36 +02:00
Pekka Enberg
56d6fdacfe database: Simplify replication strategy initialization
Initialize replication strategy when keyspace is being created now that
we have access to keyspace_metadata.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-05-19 15:27:47 +03:00
Pekka Enberg
cd35617855 database: Use keyspace_metadata for creation functions
Use the keyspace_metadata type for keyspace creation functions. This is
needed to be able to have a mapping from keyspace name to keyspace
metadata for various call-sites.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-05-19 15:27:47 +03:00
Pekka Enberg
f0ad71f9f1 thrift: Fix keyspace metadata init in system_add_keyspace()
Pass replication strategy options to the create_replication_strategy()
function.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-05-19 15:16:14 +03:00
Avi Kivity
db04bba208 db: futurize the single partition query path
Prepare for disk reads.
2015-05-19 15:13:09 +03:00
Pekka Enberg
8380df84b4 database: Rename ks_meta_data to keyspace_metadata
Follow the naming convention set by user_types_metadata and rename
ks_meta_data to keyspace_metadata.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-05-19 11:24:06 +03:00
Pekka Enberg
f1fc575401 Clean up ks_meta_data construction
Simplify ks_meta_data construction in few places by using the default
arguments.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-05-19 11:16:11 +03:00
Pekka Enberg
032af4d53b database: Move ks_meta_data definition to database.hh
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-05-19 11:03:28 +03:00
Avi Kivity
e31648c7fd Merge branch 'master' of github.com:cloudius-systems/urchin into db
Conflicts:
	thrift/handler.cc
2015-05-18 17:10:58 +03:00
Avi Kivity
01e1239608 thrift: rename CassandraAsyncHandler to match coding style 2015-05-18 16:31:44 +03:00
Avi Kivity
af15b24902 thrift: fix system_add_keyspace() indentation 2015-05-18 16:20:25 +03:00
Avi Kivity
9b43fe252a Merge branch 'penberg/user-types-cleanup' of github.com:cloudius-systems/seastar-dev into db
Pekka says:

"There's a user_types_metadata type defined in database.hh. Use that and
remove the left-over ut_meta_data from initial CQL translations."

Conflicts:
	thrift/handler.cc
2015-05-18 16:07:19 +03:00
Pekka Enberg
9ee7e21438 Switch to user_types_metadata
There's a user_types_metadata type in database.hh. Use it and drop the
left-over ut_meta_data from Origin.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-05-18 15:54:44 +03:00
Avi Kivity
07d7f410f3 Merge branch 'memtable' into db
Conflicts:
	database.hh
	memtable changes moved to memtable.hh
2015-05-18 15:50:24 +03:00
Avi Kivity
875148dae6 db: create keyspace/column_family directory structure
This is slightly awkwards, since the directory structure is not sharded.
This requires some processing to occur outside the shard, while the rest
is sharded.
2015-05-18 15:34:41 +03:00
Avi Kivity
394e0d3a8c db: make database::add_keyspace() return void
Returning a reference to the keyspace is dangerous in that the keyspace can
be moved away, when we start futurizing the add_keyspace() process.  Make
it return void and look up the keyspace at the point of use.
2015-05-18 15:34:25 +03:00
Avi Kivity
dda5cbfd0d db: make column_family and keyspace configurable
Currently used for the data directory.
2015-05-18 15:00:31 +03:00
Avi Kivity
fd4f36e499 thrift: add missing #include
Popped up under gcc 5.1.
2015-05-17 23:37:28 +03:00
Tomasz Grabiec
dbc40dfb09 db: Encapsulate the "row" class
Reduces coupling. User's should not rely on the fact that it's an
std::map<>.  It also allows us to extend row's interface with
domain-specific methods, which are a lot easier to discover than free
functions.
2015-05-13 08:56:54 +02:00
Tomasz Grabiec
4ab66de0ae db: Introduce frozen_mutation
The immediate motivation for introducing frozen_mutation is inability
to deserialize current "mutation" object, which needs schema reference
at the time it's constructed. It needs schema to initialize its
internal maps with proper key comparators, which depend on schema.

frozen_mutation is an immutable, compact form of a mutation. It
doesn't use complex in-memory strucutres, data is stored in a linear
buffer. In case of frozen_mutation schema needs to be supplied only at
the time mutation partition is visited. Therefore it can be trivially
deserialized without schema.
2015-05-08 09:19:01 +02:00
Tomasz Grabiec
b1e45e4401 db: Store ttl in atomic_cell
Origin does that, so should we. Both ttl and expiry time are stored in
sstables. The value of ttl seems to be used to calculate the read
digest (expiry is not used for that).

The API for creating atomic_cells changed a bit.

To create a non-expiring cell:

  atomic_cell::make_live(timestamp, value);

To create an expiring cell:

  atomic_cell::make_live(timestamp, value, expiry, ttl);

or:

  // Expiry is calculated based on current clock reading
  atomic_cell::make_live(timestamp, value, ttl_optional);
2015-05-06 19:42:38 +02:00
Tomasz Grabiec
5ba1486ae7 db: Rename "ttl" to "expiry" when it's used as time point
To avoid confusion with "ttl" the duration.
2015-05-06 17:27:22 +02:00
Tomasz Grabiec
36ad6c9aa8 Merge tag 'avi/memtables/v3' from seastar-dev.git
Multiple memtable support from Avi.
2015-05-06 15:02:42 +02:00
Avi Kivity
bc669add40 schema: const correctness
Make schema accessors const, and make schema_ptr refer to a const schema.
2015-05-06 13:52:59 +02:00
Avi Kivity
e811690588 db: return smart pointers for column_family read-side lookups
A lookup can cause several data sources to be merged, in which case we will
have to return a temporary (containing data from all the data sources).

For simplicity, we start by always returning a temporary.
2015-05-05 20:21:04 +03:00
Avi Kivity
8028fb441a db: make column_family a class, not a struct
Don't expose privates in public.
2015-05-05 20:21:03 +03:00
Avi Kivity
3a0de14aa8 db: more const correctness for column_family and component types
Ensure that read-side accessors are const.  This is important in preparation
for multiple memtables (and later, sstables) since a read-side
mutation_partition may be a temporary object coming from multiple memtables
(and sstables) while a write-side mutation_partition is guaranteed to belong
to a single memtable (and thus, not be temporary).

Since writers will want non-const mutation_partitions to write to, they won't
be able to use the read-side accessors by accident.
2015-05-05 19:37:21 +03:00
Tomasz Grabiec
aec740f895 db: Make decorated_key have ordering compatible with Origin 2015-04-30 12:02:39 +02:00
Calle Wilund
aeb83f2874 Add commitlog to db + use it in storage_proxy/handler
* A commitlog is created in "work" dirs when initing the db
  from a datadir. However, since we have neither disk data storage,
  nor replay capability yet (and no real db config), the settings 
  are basically to just write in-memory serialization, write them to 
  disk and then discard them. So in fact, pointless. But at least using
  the log...
* Moved the actual "apply" of mutation into database. If a commitlog
  is active, add an entry to it before applying mutation.
2015-04-29 10:10:21 +02:00
Tomasz Grabiec
5a7e3d3278 db: Order partitions by decorated_key
Partitions should be ordered using Origin's ordering, which is first
by token, then by Origin's representation of the key. That is the
natural ordering of decorated_key.

This also changes mutation class to hold decorated_key, to avoid
decoration overhead at different layers.
2015-04-24 18:01:01 +02:00
Tomasz Grabiec
1c3275c950 mutation: Encapsulate fields 2015-04-24 18:01:01 +02:00
Tomasz Grabiec
731a63e371 schema: Embed raw_schema inside schema
Public fields got encapsulated.
2015-04-24 18:01:01 +02:00
Tomasz Grabiec
4502f01581 thrift: Fix system_add_keyspace()
We should use the same UUID on each core for given column_family,
otherwise they will get different ids on each core.
2015-04-20 12:12:54 +02:00
Tomasz Grabiec
06f198b10c schema: Add id field
It uniquely identifies column_family globally. Will be used for
column_family lookups.
2015-04-15 20:33:48 +02:00
Avi Kivity
30b40bf7b1 db: make bytes even more distinct from sstring
bytes and sstring are distinct types, since their internal buffers are of
different length, but bytes_view is an alias of sstring_view, which makes
it possible of objects of different types to leak across the abstraction
boundary.

Fix this by making bytes a basic_sstring<int8_t, ...> instead of using char.
int8_t is a 'signed char', which is a distinct type from char, so now
bytes_view is a distinct type from sstring_view.

uint8_t would have been an even better choice, but that diverges from Origin
and would have required an audit.
2015-04-07 10:56:19 +03:00
Gleb Natapov
47ac784425 replication strategy
This patch converts (for very small value of 'converts') some
replication related classes. Only static topology is supported (it is
created in keyspace::create_replication_strategy()). During mutation
no replication is done, since messaging service is not ready yet,
only endpoints are calculated.
2015-04-02 16:16:39 +02:00
Calle Wilund
d3fe0c5182 Refactor db/keyspace/column_family toplogy
* database now holds all keyspace + column family object
* column families are mapped by uuid, either generated or explicit
* lookup by name tuples or uuid
* finder functions now return refs + throws on missing obj
2015-04-01 10:08:00 +02:00
Tomasz Grabiec
bdbd5547e3 db: Cleanup key names
clustering_key::one -> clustering_key
clustering_key::prefix::one -> clustering_key_prefix
partition_key::one -> partition_key
clustering_prefix -> exploded_clustering_prefix
2015-03-20 18:59:29 +01:00
Tomasz Grabiec
90298af614 db: Cleanup atomic_cell naming
atomic_cell -> atomic_cell_type
atomic_cell::one -> atomic_cell
atomic_cell::view -> atomic_cell_view
2015-03-20 18:59:29 +01:00
Tomasz Grabiec
1b1af8cdfd db: Introduce types to hold keys
Holding keys and their prefixes as "bytes" is error prone. It's easy
to mix them up (or use wrong types). This change adds wrappers for
keys with accessors which are meant to make misuses as difficult as
possible.

Prefix and full keys are now distinguished. Places which assumed that
the representation is the same (it currently is) were changed not to
do so. This will allow us to introduce more compact storage for non-prefix
keys.
2015-03-17 15:56:29 +01:00
Tomasz Grabiec
89aa2f75e5 thrift: Fix name clash between unimplemented() and namespace "unimplemented" 2015-03-11 14:56:10 +01:00
Avi Kivity
a49330095a db: wrap bytes in atomic_cell format
We use bytes for many different things, and it is easy to get confused as
to what format the data is actually in.

Fix that for atomic_cell by proving wrappers.  atomic_cell::one corresponds
to a bytes object holding exactly one atomic cell, and atomic_cell::view is
a bytes_view to an atomic_cell.  The static functions of atomic_cell itself
are privatized to prevent the unwashed masses from using them on the wrong
objects.

Since a row entry can hold either a an atomic cell, or a collection,
depending on the schema, also introduce a variant type
atomic_cell_or_collection and allow the user to pick the type explicitly.
Internally both are stored as bytes object.
2015-03-04 15:49:35 +02:00
Tomasz Grabiec
74295a9759 db: Use opaque bytes for cell values instead of boost::any
Storing cells as boost::any objects makes us use expensive
boost::any_cast to access the data. This change replaces boost::any
with bytes object which holds the value in serialized form (the same
as will be used for on-wire format).

If the cell type is atomic, you use fields accessors defined in
atomic_cell class, eg like this:

if (column.type.is_atomic()) {
   if (atomic_cell::is_live(c) {
      auto timestamp = atomic_cell::timestamp(c);
      ...
   }
}

Eventually we could switch to a more officient semi-serialized form
with native byte order but I don't want to introduce it just yet for
simplicity.
2015-02-27 10:59:43 +01:00
Tomasz Grabiec
a61d9ee18e schema: Add static columns to schema 2015-02-27 10:48:56 +01:00
Avi Kivity
2720ba34bf db: shard data
Add database::shard_of() to compute the shard hosting the partition
(with a simplistic algorithm, but perhaps not too bad).

Convert non-metadata invoke_on_all() and local calls on the database
to use shard_of().
2015-02-23 11:37:12 +02:00
Avi Kivity
0db67ff121 thrift: add foreign_ptr<> variant to complete()
Some calls will return complex types, so allow them to return a foreign_ptr<>
to ensure cleanup will happen in the correct place.
2015-02-23 11:37:12 +02:00
Avi Kivity
cb63d16b40 thrift: get rid of useless try/catch
Exceptions are now handled with then_wrapped(), nothing is left to catch.
2015-02-19 18:00:03 +02:00
Avi Kivity
70381a6da5 db: distribute database object
s/database/distributed<database>/ everywhere.

Use simple distribution rules: writes are broadcast, reads are local.
This causes tremendous data duplication, but will change soon.
2015-02-19 17:53:13 +02:00
Avi Kivity
3ec83658f3 thrift: store the keyspace name in set_keyspace()
The keyspace pointer is only valid for the local shard.
2015-02-19 15:55:17 +02:00