Commit Graph

126 Commits

Author SHA1 Message Date
Tomasz Grabiec
aec740f895 db: Make decorated_key have ordering compatible with Origin 2015-04-30 12:02:39 +02:00
Tomasz Grabiec
51d26620ca db: Remove comment above partitions map
I think the types are explicit enough now.
2015-04-30 11:16:53 +02:00
Calle Wilund
2f4e7a00f6 Use db/config object in main, database etc
* Uses config object to augument/impl options parsing
* Database now holds config obj
* Commitlog can now be inited with global config obj.
2015-04-29 18:01:17 +02:00
Avi Kivity
3162873d7f Merge branch 'calle/commitlog' of github.com:cloudius-systems/seastar-dev into db
Use commit log in database, from Calle:

"Initial" usage of the commitlog in database mutation path.
A commitlog is created in "work" dirs when initing the db
from a datadir. However, since we have neither disk data storage,
nor replay capability yet (and no real db config), the settings
are basically to just write in-memory serialization, write them to
disk and then discard them. So in fact, pointless. But at least using
the log...
2015-04-29 11:28:05 +03:00
Calle Wilund
aeb83f2874 Add commitlog to db + use it in storage_proxy/handler
* A commitlog is created in "work" dirs when initing the db
  from a datadir. However, since we have neither disk data storage,
  nor replay capability yet (and no real db config), the settings 
  are basically to just write in-memory serialization, write them to 
  disk and then discard them. So in fact, pointless. But at least using
  the log...
* Moved the actual "apply" of mutation into database. If a commitlog
  is active, add an entry to it before applying mutation.
2015-04-29 10:10:21 +02:00
Tomasz Grabiec
cae462c534 Merge remote-tracking branch 'dev/penberg/keyspace-merging/v5' from seastar-dev.git
From Pekka:

"This patch series converts LegacySchemaTables keyspace merging code to
C++. After this series, keyspaces are actually created as demonstrated
by the newly added test in cql_query_test.cc."
2015-04-28 18:06:23 +02:00
Pekka Enberg
33ceac5643 database: add database::delete_keyspace() stub
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-04-28 15:49:33 +03:00
Tomasz Grabiec
6e78344c87 Merge tag 'avi/usertypes-addendum/v1' from seastar-dev.git 2015-04-27 12:53:00 +02:00
Avi Kivity
f779c54d75 db: rename tuple_type family to compound_type
tuples already have a meaning in Cassandra and in C++, let's not overload
the word even more.  Use compound, which is the word used in Origin as well.
2015-04-27 12:27:18 +02:00
Pekka Enberg
cf1d6197d6 database: add database::update_keyspace() stub
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-04-27 11:39:57 +03:00
Avi Kivity
ba0afecf2e db: implement user_types_metadata
This is a simple map of type names to types, with the slight complication
of checking for compatibility when replacing a type.
2015-04-26 18:33:25 +03:00
Tomasz Grabiec
5a7e3d3278 db: Order partitions by decorated_key
Partitions should be ordered using Origin's ordering, which is first
by token, then by Origin's representation of the key. That is the
natural ordering of decorated_key.

This also changes mutation class to hold decorated_key, to avoid
decoration overhead at different layers.
2015-04-24 18:01:01 +02:00
Tomasz Grabiec
1c3275c950 mutation: Encapsulate fields 2015-04-24 18:01:01 +02:00
Tomasz Grabiec
4641bc6f95 database: Move implementation to source file 2015-04-24 18:01:01 +02:00
Tomasz Grabiec
0d4821009c db: Move mutation and mutation_partition to separate headers and compilation units 2015-04-22 18:42:33 +02:00
Tomasz Grabiec
a5c201a685 db: Move column_family::get_partition_slice() to mutation_partition::query()
There's nothing column_family-specific there.
2015-04-22 17:40:02 +02:00
Tomasz Grabiec
de5bea90fe db: Add const qualifiers to mutation_partition methods 2015-04-22 17:37:40 +02:00
Tomasz Grabiec
00f99cefd4 db: split query.hh to reduce header dependencies 2015-04-15 20:44:59 +02:00
Tomasz Grabiec
878a740b9d db: Write query results in serialized form
This gives about 30% increase in tps in:

  build/release/tests/perf/perf_simple_query -c1 --query-single-key

This patch switches query result format from a structured one to a
serialized one. The problems with structured format are:

  - high level of indirection (vector of vectors of vectors of blobs), which
    is not CPU cache friendly

  - high allocation rate due to fine-grained object structure

On replica side, the query results are probably going to be serialized
in the transport layer anyway, so this change only subtracts
work. There is no processing of the query results on replica other
than concatenation in case of range queries. If query results are
collected in serialized form from different cores, we can concatenate
them without copying by simply appending the fragments into the
packet. This optimization is not implemented yet.

On coordinator side, the query results would have to be parsed from
the transport layer buffers anyway, so this also doesn't add work, but
again saves allocations and copying. The CQL server doesn't need
complex data structures to process the results, it just goes over it
linearly consuming it. This patch provides views, iterators and
visitors for consuming query results in serialized form. Currently the
iterators assume that the buffer is contiguous but we could easily
relax this in future so that we can avoid linearization of data
received from seastar sockets.

The coordinator side could be optimized even further for CQL queries
which do not need processing (eg. select * from cf where ...)  we
could make the replica send the query results in the format which is
expected by the CQL binary protocol client. So in the typical case the
coordinator would just pass the data using zero-copy to the client,
prepending a header.

We do need structure for prefetched rows (needed by list
manipulations), and this change adds query result post-processing
which converts serialized query result into a structured one, tailored
particularly for prefetched rows needs.

This change also introduces partition_slice options. In some queries
(maybe even in typical ones), we don't need to send partition or
clustering keys back to the client, because they are already specified
in the query request, and not queried for. The query results hold now
keys as optional elements. Also, meta-data like cell timestamp and
ttl is now also optional. It is only needed if the query has
writetime() or ttl() functions in it, which it typically won't have.
2015-04-15 20:44:50 +02:00
Tomasz Grabiec
b34cdd76ae db: Make the whole database printable
For debugging purposes.
2015-04-15 20:33:48 +02:00
Tomasz Grabiec
0be6cec13f db: Add const qualifier to mutation_partition::range() 2015-04-15 20:33:48 +02:00
Avi Kivity
a190f2db79 db: drop compile-time dependeny on sstables
Move #include "sstables.hh" to .cc file.  Need to explicitly define
destructor for this.
2015-04-11 11:27:48 +03:00
Calle Wilund
bfa9b860a8 db: make database lookup functions explicitly non-modifying
To be more precise, do not take schema_ptr by value.
Fixes crashes in running smp > 1 where mutations applied across shards
(i.e. foreign memory) would cause schema_ptr:s to get out of sync (using
other shards ptr)
2015-04-08 12:25:05 +03:00
Gleb Natapov
47ac784425 replication strategy
This patch converts (for very small value of 'converts') some
replication related classes. Only static topology is supported (it is
created in keyspace::create_replication_strategy()). During mutation
no replication is done, since messaging service is not ready yet,
only endpoints are calculated.
2015-04-02 16:16:39 +02:00
Calle Wilund
6f6f924c9c Serializer object(s) for internal use
For serializing to commit log, and potentially internal wire messaging.

Note: intentionally incompatible with stock C wire/serial format.

Note: intentionally separate from the CQL-centric serialization
for a few reasons.

1.) Need "bulk serializers" for internal objects (mutation etc)
which might not fit well into the "types.hh" serializer schemes.
2.) No need for polymorphism/virtual type parameters since we know
exactly what we serialize and to where.
2015-04-01 10:08:00 +02:00
Calle Wilund
d3fe0c5182 Refactor db/keyspace/column_family toplogy
* database now holds all keyspace + column family object
* column families are mapped by uuid, either generated or explicit
* lookup by name tuples or uuid
* finder functions now return refs + throws on missing obj
2015-04-01 10:08:00 +02:00
Tomasz Grabiec
b52cd91281 db: Properly determine row liveness
In CQL a row is considered as present if its row marker is live or it
has any cells live. The 'insert' statement creates a row
marker. Internally Origin handles that by inserting a special cell
whose name shares the prefix with other cells in that row.

One consequence of this way of things is that when we query a column
slice from sstables we will have to read the whole CQL row, even if
not all columns are queried. We won't have to include the data, but we
will need liveness information in order to commute it with other
mutations, so that we can finally determine if the row is live or not.
2015-03-30 09:07:01 +02:00
Tomasz Grabiec
4aa74f1312 db: Make mutation_partition::clustered_row() return deletable_row reference 2015-03-30 09:07:00 +02:00
Tomasz Grabiec
2bcc368138 db: Move implementations to source file 2015-03-30 09:01:59 +02:00
Tomasz Grabiec
b8063cd76e cql3: Support for querying of static columns 2015-03-26 14:58:36 +01:00
Pekka Enberg
3150bb5b78 database: Initialize system keyspace in database constructor
System keyspace is used for things like keyspace and table metadata.
Initialize it in database constructor so that they're always available.
Needed for CQL create keyspace test case, for example.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-03-26 12:41:00 +02:00
Pekka Enberg
fd8e92ab07 database: Add mutation::set_cell() variant
This adds a set_cell() variant which accepts a column name and a
boost::any value and does column definition lookup and decomposition
under the hood. Simplifies code that manipulates system tables directly
in db/legacy_schema_tables.cc.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-03-26 12:40:58 +02:00
Tomasz Grabiec
b26b39504a db: Add find_or_create_keyspace()
Needed for tests.
2015-03-25 10:36:19 +01:00
Tomasz Grabiec
9eafa69d43 db: Avoid unnecessary lookup of row key when applying range tombstones 2015-03-25 10:36:19 +01:00
Tomasz Grabiec
7bd076ed85 db: Extract range tombstone lookup to separate method
While at it, convert affected methods to take a schema by const& instead
of a shared pointer to save on unnecessary shared ptr copies.
2015-03-25 10:36:19 +01:00
Tomasz Grabiec
866dd449db db: Remove unused methods 2015-03-25 10:36:18 +01:00
Glauber Costa
1880baa873 database: read-in sstables metadata
Now that the code for sstable metadata is ready, we can read it when we are
loading the keyspaces.

At this moment, only the system tables are processed. This is because we will
require the schema to be already determined in order to properly read the
sstables. The system schema is known at compile time. The others will have to
be derived when we are able to read it from the system tables themselves.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-03-24 15:52:24 +02:00
Avi Kivity
1d5f94bad1 db: add mutation_partition::get_cell()
While most mutations don't need existing column values, some (SET my_list[3]=2)
do.  Add an accessor function for them.
2015-03-23 16:04:29 +02:00
Tomasz Grabiec
0330568977 db: Handle range queries on clustering key
That also includes prefix range queries (partially constrained keys).
2015-03-20 19:20:59 +01:00
Tomasz Grabiec
bdbd5547e3 db: Cleanup key names
clustering_key::one -> clustering_key
clustering_key::prefix::one -> clustering_key_prefix
partition_key::one -> partition_key
clustering_prefix -> exploded_clustering_prefix
2015-03-20 18:59:29 +01:00
Tomasz Grabiec
6197c5306d db: Optimize range tombstone lookups
From O(N) to O(log(N)) where N is the number of range tombstones.
2015-03-17 15:56:29 +01:00
Tomasz Grabiec
9f60853271 db: Switch clustering key map and row tombstones to boost::intrusive::set
std::map<> does not support lookup using different comparator than the
one used to compare keys. For range prefix queries and for row prefix
tombstone queries we will need to perform lookups using different
comparators.
2015-03-17 15:56:29 +01:00
Tomasz Grabiec
1b1af8cdfd db: Introduce types to hold keys
Holding keys and their prefixes as "bytes" is error prone. It's easy
to mix them up (or use wrong types). This change adds wrappers for
keys with accessors which are meant to make misuses as difficult as
possible.

Prefix and full keys are now distinguished. Places which assumed that
the representation is the same (it currently is) were changed not to
do so. This will allow us to introduce more compact storage for non-prefix
keys.
2015-03-17 15:56:29 +01:00
Avi Kivity
0d0a4192f4 db: add mutation::set_cell() helper
Rather than checking for a static vs. clustered cell at the call site,
to this in one place.
2015-03-16 16:36:14 +02:00
Tomasz Grabiec
2f6d9a4113 db: Introduce query interface 2015-03-11 16:01:13 +01:00
Tomasz Grabiec
acda112314 db: Register system keyspace
This also changes populate() interface a bit. They now work on
existing objects, so that system keyspace definition is not
overriden. For non-system keyspace, the keyspace definition would come
from the data in the system tables.
2015-03-11 16:01:13 +01:00
Tomasz Grabiec
262fed73f5 db: Add stub for secondary_index_manager class 2015-03-11 14:56:10 +01:00
Avi Kivity
bb0d2a4f03 db: fix mutation::set_*_cell() applied twice to same column
With a collection, setting two separate elements in a collection would
cause the second to override the first.  This also applies, with much
smaller effect, to normal cells (for example, updating the same counter
twice, or issuing two updates to the same cell but with different timestamps,
via thrift).

Fix by merging the two values rather than replacing the old one.
2015-03-05 19:04:02 +02:00
Avi Kivity
b14d9f1f02 mutation: support for collections
We simply store the collection mutation as we do atomic cells -- merging
will be done by the consumer.
2015-03-05 14:03:36 +02:00
Avi Kivity
6d18aa8f20 Decompose database.hh, types.hh into smaller headers
Avoid include hell for new code.
2015-03-04 16:18:48 +02:00