Commit Graph

82 Commits

Author SHA1 Message Date
Pekka Enberg
3150bb5b78 database: Initialize system keyspace in database constructor
System keyspace is used for things like keyspace and table metadata.
Initialize it in database constructor so that they're always available.
Needed for CQL create keyspace test case, for example.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-03-26 12:41:00 +02:00
Tomasz Grabiec
b26b39504a db: Add find_or_create_keyspace()
Needed for tests.
2015-03-25 10:36:19 +01:00
Tomasz Grabiec
9eafa69d43 db: Avoid unnecessary lookup of row key when applying range tombstones 2015-03-25 10:36:19 +01:00
Tomasz Grabiec
7bd076ed85 db: Extract range tombstone lookup to separate method
While at it, convert affected methods to take a schema by const& instead
of a shared pointer to save on unnecessary shared ptr copies.
2015-03-25 10:36:19 +01:00
Glauber Costa
1880baa873 database: read-in sstables metadata
Now that the code for sstable metadata is ready, we can read it when we are
loading the keyspaces.

At this moment, only the system tables are processed. This is because we will
require the schema to be already determined in order to properly read the
sstables. The system schema is known at compile time. The others will have to
be derived when we are able to read it from the system tables themselves.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-03-24 15:52:24 +02:00
Tomasz Grabiec
e738b213ed schema: Fix default copy constructor
Schema has containers which hash pointers to column definitions
embedded in the schema. It's not safe to just copy those, we need to
rehash them using new locations.
2015-03-24 12:06:58 +01:00
Tomasz Grabiec
e3422525c0 Use column_definition via const reference 2015-03-24 12:03:00 +01:00
Tomasz Grabiec
0330568977 db: Handle range queries on clustering key
That also includes prefix range queries (partially constrained keys).
2015-03-20 19:20:59 +01:00
Tomasz Grabiec
bdbd5547e3 db: Cleanup key names
clustering_key::one -> clustering_key
clustering_key::prefix::one -> clustering_key_prefix
partition_key::one -> partition_key
clustering_prefix -> exploded_clustering_prefix
2015-03-20 18:59:29 +01:00
Tomasz Grabiec
90298af614 db: Cleanup atomic_cell naming
atomic_cell -> atomic_cell_type
atomic_cell::one -> atomic_cell
atomic_cell::view -> atomic_cell_view
2015-03-20 18:59:29 +01:00
Tomasz Grabiec
300a9572bd types: De-virtualize tuple_type
tuple_type is for managing our internal representation of keys. It
shares some interface with abstract_type, but the latter is a basis
for types of data stored in cells. tuple_type does not need to hide
behind a virtual interface.

Note: there is a TupleType in Origin, but it serves a different purpose.
2015-03-19 12:55:28 +01:00
Tomasz Grabiec
a8ce730842 schema: Remove partition_key_prefix_type
We don't need it.
2015-03-19 12:55:28 +01:00
Tomasz Grabiec
6197c5306d db: Optimize range tombstone lookups
From O(N) to O(log(N)) where N is the number of range tombstones.
2015-03-17 15:56:29 +01:00
Tomasz Grabiec
9f60853271 db: Switch clustering key map and row tombstones to boost::intrusive::set
std::map<> does not support lookup using different comparator than the
one used to compare keys. For range prefix queries and for row prefix
tombstone queries we will need to perform lookups using different
comparators.
2015-03-17 15:56:29 +01:00
Tomasz Grabiec
1b1af8cdfd db: Introduce types to hold keys
Holding keys and their prefixes as "bytes" is error prone. It's easy
to mix them up (or use wrong types). This change adds wrappers for
keys with accessors which are meant to make misuses as difficult as
possible.

Prefix and full keys are now distinguished. Places which assumed that
the representation is the same (it currently is) were changed not to
do so. This will allow us to introduce more compact storage for non-prefix
keys.
2015-03-17 15:56:29 +01:00
Tomasz Grabiec
ecf0db17ce db: Drop comment which doesn't seem to be relevant any more 2015-03-17 15:56:28 +01:00
Avi Kivity
1ac75b1609 db: add to_hex(bytes_view) variant
Useful for debugging.
2015-03-16 16:36:14 +02:00
Tomasz Grabiec
2f6d9a4113 db: Introduce query interface 2015-03-11 16:01:13 +01:00
Tomasz Grabiec
acda112314 db: Register system keyspace
This also changes populate() interface a bit. They now work on
existing objects, so that system keyspace definition is not
overriden. For non-system keyspace, the keyspace definition would come
from the data in the system tables.
2015-03-11 16:01:13 +01:00
Tomasz Grabiec
fc00cf4f0f db: Do not fail when creating a table with composite partition key 2015-03-11 16:01:13 +01:00
Tomasz Grabiec
0f1b6b079a schema: Store partition_key_prefix_type
single_column_primary_key_restrictions may generate partition key
prefixes.
2015-03-11 14:56:10 +01:00
Avi Kivity
b77a52398f db: fix merge_cells using wrong column_definition
merge_cells() always used the regular column_definition, even when called
for a static row.

Fix by parametrizing it with a method to get the column_definition.
2015-03-05 19:59:59 +02:00
Avi Kivity
de2e9f9eea db: fix wrong row updated by merge_cells()
merge_cells() is called for both static and clustered rows, yet it always
updates the static row.

Fix by updating the row passed by the caller.
2015-03-05 19:57:34 +02:00
Avi Kivity
42a9c0f7d3 atomic_cell: export merge_column 2015-03-05 19:03:29 +02:00
Avi Kivity
98f2a51df9 db: implement collection mutation merging
Only for maps, as they are the only collection implemented at present.
2015-03-05 18:11:37 +02:00
Avi Kivity
df22293baf atomic_cell: export compare_atomic_cell_for_merge
Will be used for merging maps.
2015-03-05 18:11:37 +02:00
Avi Kivity
ded878212c db: simplify mutation_partition::apply()
Since merging cells is a different operation for atomic cells and
collections, move it into compare_for_merge(), which is where we check
the column type.  Rename compare_for_merge to merge_column(), since it
now does more than compares.
2015-03-05 18:11:37 +02:00
Avi Kivity
a49330095a db: wrap bytes in atomic_cell format
We use bytes for many different things, and it is easy to get confused as
to what format the data is actually in.

Fix that for atomic_cell by proving wrappers.  atomic_cell::one corresponds
to a bytes object holding exactly one atomic cell, and atomic_cell::view is
a bytes_view to an atomic_cell.  The static functions of atomic_cell itself
are privatized to prevent the unwashed masses from using them on the wrong
objects.

Since a row entry can hold either a an atomic cell, or a collection,
depending on the schema, also introduce a variant type
atomic_cell_or_collection and allow the user to pick the type explicitly.
Internally both are stored as bytes object.
2015-03-04 15:49:35 +02:00
Nadav Har'El
8265a13dbd schema: add "comment" string
Add a comment string to a schema, which may be set but is currently
not further used.

The originals Cassandra code has a comment for each of the builtin
schemas, and it's a shame not to remember them.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-03-02 15:18:23 +01:00
Pekka Enberg
99a09020e8 types: Fix bytes_type_impl string conversion
Tomek points out that:

  Origin calls org.apache.cassandra.utils.Hex#hexToBytes here, which is
  not what to_bytes() does. BytesType.getSerializer().toString() calls
  ByteBufferUtil.bytesToHex(value), so you should call to_hex() here.

Fix that up.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-03-02 10:38:27 +01:00
Tomasz Grabiec
74295a9759 db: Use opaque bytes for cell values instead of boost::any
Storing cells as boost::any objects makes us use expensive
boost::any_cast to access the data. This change replaces boost::any
with bytes object which holds the value in serialized form (the same
as will be used for on-wire format).

If the cell type is atomic, you use fields accessors defined in
atomic_cell class, eg like this:

if (column.type.is_atomic()) {
   if (atomic_cell::is_live(c) {
      auto timestamp = atomic_cell::timestamp(c);
      ...
   }
}

Eventually we could switch to a more officient semi-serialized form
with native byte order but I don't want to introduce it just yet for
simplicity.
2015-02-27 10:59:43 +01:00
Tomasz Grabiec
1a0ffdfb99 schema: Encapsulate column sets 2015-02-27 10:48:56 +01:00
Tomasz Grabiec
a61d9ee18e schema: Add static columns to schema 2015-02-27 10:48:56 +01:00
Tomasz Grabiec
8b9078c86a schema: Make column_kind an enum class 2015-02-27 10:48:56 +01:00
Tomasz Grabiec
609e893055 unimplemented: Separate subject from behavior
You can now do:

  fail(unimplemented::cause::PAGING);

and:

  warn(unimplemented::cause::PAGING);
2015-02-27 10:48:56 +01:00
Avi Kivity
d39c844ea3 Merge branch 'master' of github.com:cloudius-systems/seastar into db
Conflicts:
	configure.py
	database.cc (added missing include)
	utils/serialize.hh (added missing inlines)
2015-02-26 17:57:18 +02:00
Avi Kivity
2720ba34bf db: shard data
Add database::shard_of() to compute the shard hosting the partition
(with a simplistic algorithm, but perhaps not too bad).

Convert non-metadata invoke_on_all() and local calls on the database
to use shard_of().
2015-02-23 11:37:12 +02:00
Avi Kivity
70381a6da5 db: distribute database object
s/database/distributed<database>/ everywhere.

Use simple distribution rules: writes are broadcast, reads are local.
This causes tremendous data duplication, but will change soon.
2015-02-19 17:53:13 +02:00
Avi Kivity
8f9f794a73 db: make column_family::apply(mutation) not steal the contents
With replication, we want the contents of the mutation to be available
to multiple replicas.

(In this context, we will replicate the mutation to all shards in the same
node, as a temporary step in sharding a node; but the issue also occurs
when replicating to other nodes).
2015-02-19 16:23:09 +02:00
Avi Kivity
a2519926a6 db: add some iostream output operators
Helps debugging
2015-02-19 15:56:26 +02:00
Tomasz Grabiec
73b143c491 db: Compare serialized bytes when reconciling cells
That's what Origin does, it does not use cell's actual type.
2015-02-16 12:00:03 +01:00
Tomasz Grabiec
aaf9463568 db: Take names by const& in find_*() functions 2015-02-12 19:40:58 +01:00
Tomasz Grabiec
6cd524988d db: Add more methods to schema 2015-02-12 19:40:58 +01:00
Tomasz Grabiec
06ccaa3b5b db: Move method definitions to source file 2015-02-12 19:40:56 +01:00
Glauber Costa
cba2d24210 keyspace: use emplace instead of the [] operator
It inflicted a lot of pain recently. Avoid it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-02-11 20:26:20 +02:00
Tomasz Grabiec
1b66f33455 db: Apply mutations locally from storage_proxy
Eventually we should rather send them to replicas, but for now we just
apply locally.
2015-02-09 10:28:44 +01:00
Tomasz Grabiec
2244eab6c1 db: Steal data from mutations when applying
Taking mutations by r-value reference allows us to avoid copies.
2015-02-09 10:28:44 +01:00
Tomasz Grabiec
e20cc1c1f9 db: Avoid storing schema pointer with each partition 2015-02-09 10:28:44 +01:00
Tomasz Grabiec
48c11a01db db: Add ability to apply mutations into the database
For simplicity partition data is stored using the same object which is
used for mutations: mutation_partition. Later we can introduce a more
efficient version.
2015-02-09 10:28:44 +01:00
Tomasz Grabiec
19e89a6057 db: Introduce mutation_partition::apply(const mutation_partition&)
It merges two partition mutations together. It is assumed that the first
one (invocation target) is much larger.
2015-02-09 10:28:44 +01:00