scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	00f99cefd4	db: split query.hh to reduce header dependencies	2015-04-15 20:44:59 +02:00
Tomasz Grabiec	878a740b9d	db: Write query results in serialized form This gives about 30% increase in tps in: build/release/tests/perf/perf_simple_query -c1 --query-single-key This patch switches query result format from a structured one to a serialized one. The problems with structured format are: - high level of indirection (vector of vectors of vectors of blobs), which is not CPU cache friendly - high allocation rate due to fine-grained object structure On replica side, the query results are probably going to be serialized in the transport layer anyway, so this change only subtracts work. There is no processing of the query results on replica other than concatenation in case of range queries. If query results are collected in serialized form from different cores, we can concatenate them without copying by simply appending the fragments into the packet. This optimization is not implemented yet. On coordinator side, the query results would have to be parsed from the transport layer buffers anyway, so this also doesn't add work, but again saves allocations and copying. The CQL server doesn't need complex data structures to process the results, it just goes over it linearly consuming it. This patch provides views, iterators and visitors for consuming query results in serialized form. Currently the iterators assume that the buffer is contiguous but we could easily relax this in future so that we can avoid linearization of data received from seastar sockets. The coordinator side could be optimized even further for CQL queries which do not need processing (eg. select * from cf where ...) we could make the replica send the query results in the format which is expected by the CQL binary protocol client. So in the typical case the coordinator would just pass the data using zero-copy to the client, prepending a header. We do need structure for prefetched rows (needed by list manipulations), and this change adds query result post-processing which converts serialized query result into a structured one, tailored particularly for prefetched rows needs. This change also introduces partition_slice options. In some queries (maybe even in typical ones), we don't need to send partition or clustering keys back to the client, because they are already specified in the query request, and not queried for. The query results hold now keys as optional elements. Also, meta-data like cell timestamp and ttl is now also optional. It is only needed if the query has writetime() or ttl() functions in it, which it typically won't have.	2015-04-15 20:44:50 +02:00
Tomasz Grabiec	ecc5d23456	db: Avoid copying of column_definition Spotted in the perf profile.	2015-04-15 20:33:48 +02:00
Tomasz Grabiec	7ebc7830b7	db: Optimize column family lookup in query path	2015-04-15 20:33:48 +02:00
Tomasz Grabiec	06f198b10c	schema: Add id field It uniquely identifies column_family globally. Will be used for column_family lookups.	2015-04-15 20:33:48 +02:00
Tomasz Grabiec	b34cdd76ae	db: Make the whole database printable For debugging purposes.	2015-04-15 20:33:48 +02:00
Tomasz Grabiec	0be6cec13f	db: Add const qualifier to mutation_partition::range()	2015-04-15 20:33:48 +02:00
Avi Kivity	a190f2db79	db: drop compile-time dependeny on sstables Move #include "sstables.hh" to .cc file. Need to explicitly define destructor for this.	2015-04-11 11:27:48 +03:00
Calle Wilund	bfa9b860a8	db: make database lookup functions explicitly non-modifying To be more precise, do not take schema_ptr by value. Fixes crashes in running smp > 1 where mutations applied across shards (i.e. foreign memory) would cause schema_ptr:s to get out of sync (using other shards ptr)	2015-04-08 12:25:05 +03:00
Avi Kivity	30b40bf7b1	db: make bytes even more distinct from sstring bytes and sstring are distinct types, since their internal buffers are of different length, but bytes_view is an alias of sstring_view, which makes it possible of objects of different types to leak across the abstraction boundary. Fix this by making bytes a basic_sstring<int8_t, ...> instead of using char. int8_t is a 'signed char', which is a distinct type from char, so now bytes_view is a distinct type from sstring_view. uint8_t would have been an even better choice, but that diverges from Origin and would have required an audit.	2015-04-07 10:56:19 +03:00
Gleb Natapov	47ac784425	replication strategy This patch converts (for very small value of 'converts') some replication related classes. Only static topology is supported (it is created in keyspace::create_replication_strategy()). During mutation no replication is done, since messaging service is not ready yet, only endpoints are calculated.	2015-04-02 16:16:39 +02:00
Tomasz Grabiec	66924090c6	Merge tag 'avi/functions/v1' From Avi: This patchsets completes the conversion of scalar functions (TOKEN is still missing, and maybe others, but the infrastructure is there). Conflicts: database.cc	2015-04-02 12:48:21 +02:00
Avi Kivity	955f1ebf06	db: fix to_hex(bytes_opt) Result was inverted.	2015-04-01 20:16:00 +03:00
Avi Kivity	a9ce81a2f8	db: add ostream operator for exploded_clustering_prefix	2015-04-01 20:12:39 +03:00
Avi Kivity	bb4b303bba	db: add ostream operators for atomic_cell	2015-04-01 20:12:39 +03:00
Calle Wilund	d3fe0c5182	Refactor db/keyspace/column_family toplogy * database now holds all keyspace + column family object * column families are mapped by uuid, either generated or explicit * lookup by name tuples or uuid * finder functions now return refs + throws on missing obj	2015-04-01 10:08:00 +02:00
Tomasz Grabiec	9e5a02421a	db: Fix static row not being populated when query limit kicks in Spotted during code review.	2015-03-30 18:38:26 +02:00
Tomasz Grabiec	b52cd91281	db: Properly determine row liveness In CQL a row is considered as present if its row marker is live or it has any cells live. The 'insert' statement creates a row marker. Internally Origin handles that by inserting a special cell whose name shares the prefix with other cells in that row. One consequence of this way of things is that when we query a column slice from sstables we will have to read the whole CQL row, even if not all columns are queried. We won't have to include the data, but we will need liveness information in order to commute it with other mutations, so that we can finally determine if the row is live or not.	2015-03-30 09:07:01 +02:00
Tomasz Grabiec	f155da622f	db: Move row limit check to the right place Could have let in more rows than requested in range queries.	2015-03-30 09:07:01 +02:00
Tomasz Grabiec	70341ceb0a	db: Return only live cells in query::result::row The coordinator filters out dead data anyway.	2015-03-30 09:07:01 +02:00
Tomasz Grabiec	4aa74f1312	db: Make mutation_partition::clustered_row() return deletable_row reference	2015-03-30 09:07:00 +02:00
Tomasz Grabiec	2bcc368138	db: Move implementations to source file	2015-03-30 09:01:59 +02:00
Tomasz Grabiec	b8063cd76e	cql3: Support for querying of static columns	2015-03-26 14:58:36 +01:00
Tomasz Grabiec	35b4199374	Merge remote-tracking branch 'dev/penberg/create-keyspace/v4' From Pekka: This series adds support for creating keyspaces. We already have the CQL front-end implemented so all that remains is converging mutations in legacy_schema_tables.cc as well as parts of migration_manager.hh and wiring that up to the CQL execution path.	2015-03-26 14:25:54 +01:00
Avi Kivity	1c1c4f923a	db: fix collection_type_impl::deserialize_mutation_form() types It accepts a bytes_view instead of the type-safe wrapper.	2015-03-26 14:31:01 +02:00
Pekka Enberg	3150bb5b78	database: Initialize system keyspace in database constructor System keyspace is used for things like keyspace and table metadata. Initialize it in database constructor so that they're always available. Needed for CQL create keyspace test case, for example. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-03-26 12:41:00 +02:00
Avi Kivity	bfa37eb2f8	db: implement get_partition_slice() for collections	2015-03-26 12:14:01 +02:00
Avi Kivity	30c3348702	db: add ostream support to consistency_level	2015-03-26 09:34:49 +02:00
Tomasz Grabiec	b26b39504a	db: Add find_or_create_keyspace() Needed for tests.	2015-03-25 10:36:19 +01:00
Tomasz Grabiec	9eafa69d43	db: Avoid unnecessary lookup of row key when applying range tombstones	2015-03-25 10:36:19 +01:00
Tomasz Grabiec	7bd076ed85	db: Extract range tombstone lookup to separate method While at it, convert affected methods to take a schema by const& instead of a shared pointer to save on unnecessary shared ptr copies.	2015-03-25 10:36:19 +01:00
Glauber Costa	1880baa873	database: read-in sstables metadata Now that the code for sstable metadata is ready, we can read it when we are loading the keyspaces. At this moment, only the system tables are processed. This is because we will require the schema to be already determined in order to properly read the sstables. The system schema is known at compile time. The others will have to be derived when we are able to read it from the system tables themselves. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-03-24 15:52:24 +02:00
Tomasz Grabiec	e738b213ed	schema: Fix default copy constructor Schema has containers which hash pointers to column definitions embedded in the schema. It's not safe to just copy those, we need to rehash them using new locations.	2015-03-24 12:06:58 +01:00
Tomasz Grabiec	e3422525c0	Use column_definition via const reference	2015-03-24 12:03:00 +01:00
Tomasz Grabiec	0330568977	db: Handle range queries on clustering key That also includes prefix range queries (partially constrained keys).	2015-03-20 19:20:59 +01:00
Tomasz Grabiec	bdbd5547e3	db: Cleanup key names clustering_key::one -> clustering_key clustering_key::prefix::one -> clustering_key_prefix partition_key::one -> partition_key clustering_prefix -> exploded_clustering_prefix	2015-03-20 18:59:29 +01:00
Tomasz Grabiec	90298af614	db: Cleanup atomic_cell naming atomic_cell -> atomic_cell_type atomic_cell::one -> atomic_cell atomic_cell::view -> atomic_cell_view	2015-03-20 18:59:29 +01:00
Tomasz Grabiec	300a9572bd	types: De-virtualize tuple_type tuple_type is for managing our internal representation of keys. It shares some interface with abstract_type, but the latter is a basis for types of data stored in cells. tuple_type does not need to hide behind a virtual interface. Note: there is a TupleType in Origin, but it serves a different purpose.	2015-03-19 12:55:28 +01:00
Tomasz Grabiec	a8ce730842	schema: Remove partition_key_prefix_type We don't need it.	2015-03-19 12:55:28 +01:00
Tomasz Grabiec	6197c5306d	db: Optimize range tombstone lookups From O(N) to O(log(N)) where N is the number of range tombstones.	2015-03-17 15:56:29 +01:00
Tomasz Grabiec	9f60853271	db: Switch clustering key map and row tombstones to boost::intrusive::set std::map<> does not support lookup using different comparator than the one used to compare keys. For range prefix queries and for row prefix tombstone queries we will need to perform lookups using different comparators.	2015-03-17 15:56:29 +01:00
Tomasz Grabiec	1b1af8cdfd	db: Introduce types to hold keys Holding keys and their prefixes as "bytes" is error prone. It's easy to mix them up (or use wrong types). This change adds wrappers for keys with accessors which are meant to make misuses as difficult as possible. Prefix and full keys are now distinguished. Places which assumed that the representation is the same (it currently is) were changed not to do so. This will allow us to introduce more compact storage for non-prefix keys.	2015-03-17 15:56:29 +01:00
Tomasz Grabiec	ecf0db17ce	db: Drop comment which doesn't seem to be relevant any more	2015-03-17 15:56:28 +01:00
Avi Kivity	1ac75b1609	db: add to_hex(bytes_view) variant Useful for debugging.	2015-03-16 16:36:14 +02:00
Tomasz Grabiec	2f6d9a4113	db: Introduce query interface	2015-03-11 16:01:13 +01:00
Tomasz Grabiec	acda112314	db: Register system keyspace This also changes populate() interface a bit. They now work on existing objects, so that system keyspace definition is not overriden. For non-system keyspace, the keyspace definition would come from the data in the system tables.	2015-03-11 16:01:13 +01:00
Tomasz Grabiec	fc00cf4f0f	db: Do not fail when creating a table with composite partition key	2015-03-11 16:01:13 +01:00
Tomasz Grabiec	0f1b6b079a	schema: Store partition_key_prefix_type single_column_primary_key_restrictions may generate partition key prefixes.	2015-03-11 14:56:10 +01:00
Avi Kivity	b77a52398f	db: fix merge_cells using wrong column_definition merge_cells() always used the regular column_definition, even when called for a static row. Fix by parametrizing it with a method to get the column_definition.	2015-03-05 19:59:59 +02:00
Avi Kivity	de2e9f9eea	db: fix wrong row updated by merge_cells() merge_cells() is called for both static and clustered rows, yet it always updates the static row. Fix by updating the row passed by the caller.	2015-03-05 19:57:34 +02:00

1 2 3

109 Commits