scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-20 16:40:35 +00:00

Author	SHA1	Message	Date
Avi Kivity	0eb842dc5b	db: write memtable after sealing it Still missing handling after write completes.	2015-05-18 15:00:33 +03:00
Avi Kivity	ca49d73f97	db: allow configuring a column family to be memory-only Useful for tests.	2015-05-18 15:00:33 +03:00
Avi Kivity	dda5cbfd0d	db: make column_family and keyspace configurable Currently used for the data directory.	2015-05-18 15:00:31 +03:00
Avi Kivity	7842113cb6	db: prune some unused column_familiy methods Made redundant by switching tests to using memtable directly.	2015-05-18 14:59:02 +03:00
Avi Kivity	40c2d91cd8	db: add memtable::find_or_create_row_slow() Useful for tests that do not need a column_family.	2015-05-17 10:31:22 +03:00
Tomasz Grabiec	f656ae8ed4	db: Encapsulate deletable_row fields	2015-05-13 08:56:54 +02:00
Tomasz Grabiec	dbc40dfb09	db: Encapsulate the "row" class Reduces coupling. User's should not rely on the fact that it's an std::map<>. It also allows us to extend row's interface with domain-specific methods, which are a lot easier to discover than free functions.	2015-05-13 08:56:54 +02:00
Tomasz Grabiec	56bea440a7	mutation_partition: Pass schema by const& where applicable If method doesn't want to share schema ownership it doesn't have to take it by shared pointer. The benefit is that it's slightly cheaper and those methods may now be called from places which don't own schema.	2015-05-13 08:56:54 +02:00
Avi Kivity	7630fe332d	db: pass correct mutation size to commitlog Use serialized_size() instead of reprentation().size(), to account for the size header.	2015-05-11 19:19:23 +03:00
Tomasz Grabiec	eaceb61801	db: Add atomic_cell::deletion_time() Deleted cells store deletion time not expiry time. This change makes expiry() valid only for live cells with TTL and adds deletion_time(), which is inteded to be used with deleted cells.	2015-05-10 12:03:26 +03:00
Tomasz Grabiec	f7abbda156	db: Apply frozen_mutation directly We don't convert it back to mutation before applying. mutation_partition has now apply() which works on mutation_partition_view.	2015-05-08 09:19:02 +02:00
Tomasz Grabiec	bdcd11efe9	db: Use operator<< for partition printing	2015-05-08 09:19:02 +02:00
Tomasz Grabiec	4ab66de0ae	db: Introduce frozen_mutation The immediate motivation for introducing frozen_mutation is inability to deserialize current "mutation" object, which needs schema reference at the time it's constructed. It needs schema to initialize its internal maps with proper key comparators, which depend on schema. frozen_mutation is an immutable, compact form of a mutation. It doesn't use complex in-memory strucutres, data is stored in a linear buffer. In case of frozen_mutation schema needs to be supplied only at the time mutation partition is visited. Therefore it can be trivially deserialized without schema.	2015-05-08 09:19:01 +02:00
Tomasz Grabiec	f43836eb68	db: Handle expired cells in compare_atomic_cell_for_merge() While at it, clarify some comments.	2015-05-06 18:31:21 +02:00
Tomasz Grabiec	5ba1486ae7	db: Rename "ttl" to "expiry" when it's used as time point To avoid confusion with "ttl" the duration.	2015-05-06 17:27:22 +02:00
Tomasz Grabiec	36ad6c9aa8	Merge tag 'avi/memtables/v3' from seastar-dev.git Multiple memtable support from Avi.	2015-05-06 15:02:42 +02:00
Avi Kivity	ef5c661d11	db: add variant of column_family::for_all_partitions() for unit tests Since it's for tests, we can pass a slower std::function<>.	2015-05-06 15:43:06 +03:00
Avi Kivity	1d6ac071c0	db: add API to seal current active memtable	2015-05-06 15:39:31 +03:00
Avi Kivity	22969aeb18	db: support for multiple memtables Each column family now contains multiple memtables, with one designated as "active" receiving all writes, while the others only serve reads.	2015-05-06 15:39:29 +03:00
Avi Kivity	5e81b92dc0	db: split column_family::partitions into a new memtable class In preparation for multiple memtables, move column_family::partitions into its own class, and forward relevant calls from column_family. A testonly_all_memtables() function was added to support sstable_test.	2015-05-06 15:35:14 +03:00
Avi Kivity	cc291d7e3b	db: improve sharding Currently we use the first byte of the token for determining the local shard. This is suboptimal for two reasons: 1. the first bytes of the token were already used to select the node, so they are not randomly distributed 2. using a single byte is not sufficient for large core counts, as the modulo operation will not return evenly distributed results Fix by using the final two bytes of the token.	2015-05-06 13:19:44 +02:00
Avi Kivity	e811690588	db: return smart pointers for column_family read-side lookups A lookup can cause several data sources to be merged, in which case we will have to return a temporary (containing data from all the data sources). For simplicity, we start by always returning a temporary.	2015-05-05 20:21:04 +03:00
Avi Kivity	8028fb441a	db: make column_family a class, not a struct Don't expose privates in public.	2015-05-05 20:21:03 +03:00
Avi Kivity	3a0de14aa8	db: more const correctness for column_family and component types Ensure that read-side accessors are const. This is important in preparation for multiple memtables (and later, sstables) since a read-side mutation_partition may be a temporary object coming from multiple memtables (and sstables) while a write-side mutation_partition is guaranteed to belong to a single memtable (and thus, not be temporary). Since writers will want non-const mutation_partitions to write to, they won't be able to use the read-side accessors by accident.	2015-05-05 19:37:21 +03:00
Tomasz Grabiec	aec740f895	db: Make decorated_key have ordering compatible with Origin	2015-04-30 12:02:39 +02:00
Calle Wilund	2f4e7a00f6	Use db/config object in main, database etc * Uses config object to augument/impl options parsing * Database now holds config obj * Commitlog can now be inited with global config obj.	2015-04-29 18:01:17 +02:00
Tomasz Grabiec	2693dd2c7b	db: Extract bytes related stuff from database.cc to bytes.cc Some tests (eg murmur_hash_test) need only byte manipulation functions. By specifying dependencies precisely we can drastically reduce recompilation times, which speeds up development cycle. I managed to reduce recompilation time for murmur_hash_test from 5 minutes to 4 seconds by breaking dependency on whole urchin object set.	2015-04-29 15:50:16 +03:00
Avi Kivity	6290dee438	db: const correctness for abstract_type and friends Types are immutable.	2015-04-29 15:40:38 +03:00
Avi Kivity	3162873d7f	Merge branch 'calle/commitlog' of github.com:cloudius-systems/seastar-dev into db Use commit log in database, from Calle: "Initial" usage of the commitlog in database mutation path. A commitlog is created in "work" dirs when initing the db from a datadir. However, since we have neither disk data storage, nor replay capability yet (and no real db config), the settings are basically to just write in-memory serialization, write them to disk and then discard them. So in fact, pointless. But at least using the log...	2015-04-29 11:28:05 +03:00
Calle Wilund	aeb83f2874	Add commitlog to db + use it in storage_proxy/handler * A commitlog is created in "work" dirs when initing the db from a datadir. However, since we have neither disk data storage, nor replay capability yet (and no real db config), the settings are basically to just write in-memory serialization, write them to disk and then discard them. So in fact, pointless. But at least using the log... * Moved the actual "apply" of mutation into database. If a commitlog is active, add an entry to it before applying mutation.	2015-04-29 10:10:21 +02:00
Pekka Enberg	33ceac5643	database: add database::delete_keyspace() stub Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-04-28 15:49:33 +03:00
Pekka Enberg	cf1d6197d6	database: add database::update_keyspace() stub Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-04-27 11:39:57 +03:00
Tomasz Grabiec	5a7e3d3278	db: Order partitions by decorated_key Partitions should be ordered using Origin's ordering, which is first by token, then by Origin's representation of the key. That is the natural ordering of decorated_key. This also changes mutation class to hold decorated_key, to avoid decoration overhead at different layers.	2015-04-24 18:01:01 +02:00
Tomasz Grabiec	1c3275c950	mutation: Encapsulate fields	2015-04-24 18:01:01 +02:00
Tomasz Grabiec	4641bc6f95	database: Move implementation to source file	2015-04-24 18:01:01 +02:00
Tomasz Grabiec	731a63e371	schema: Embed raw_schema inside schema Public fields got encapsulated.	2015-04-24 18:01:01 +02:00
Tomasz Grabiec	c963821e1d	db: Extract schema-specific code to schema.cc	2015-04-23 20:54:12 +02:00
Avi Kivity	da8782b9e5	Merge branch 'tgrabiec/code-moves' of github.com:cloudius-systems/seastar-dev into db Cleanups in preparation for memtables, from Tomasz.	2015-04-23 18:44:40 +03:00
Tomasz Grabiec	0d4821009c	db: Move mutation and mutation_partition to separate headers and compilation units	2015-04-22 18:42:33 +02:00
Tomasz Grabiec	a5c201a685	db: Move column_family::get_partition_slice() to mutation_partition::query() There's nothing column_family-specific there.	2015-04-22 17:40:02 +02:00
Tomasz Grabiec	de5bea90fe	db: Add const qualifiers to mutation_partition methods	2015-04-22 17:37:40 +02:00
Tomasz Grabiec	631dad8a29	schema: Add const qualifiers to lookup methods	2015-04-22 17:36:27 +02:00
Gleb Natapov	57ac231cd2	convert some snitch related classes	2015-04-21 18:24:35 +03:00
Tomasz Grabiec	ef05c5b919	db: Lookup column family by UUID It's a bit faster.	2015-04-20 12:12:55 +02:00
Tomasz Grabiec	5693f73b7a	db: Implement generate_legacy_id() properly	2015-04-17 14:22:29 +02:00
Tomasz Grabiec	00f99cefd4	db: split query.hh to reduce header dependencies	2015-04-15 20:44:59 +02:00
Tomasz Grabiec	878a740b9d	db: Write query results in serialized form This gives about 30% increase in tps in: build/release/tests/perf/perf_simple_query -c1 --query-single-key This patch switches query result format from a structured one to a serialized one. The problems with structured format are: - high level of indirection (vector of vectors of vectors of blobs), which is not CPU cache friendly - high allocation rate due to fine-grained object structure On replica side, the query results are probably going to be serialized in the transport layer anyway, so this change only subtracts work. There is no processing of the query results on replica other than concatenation in case of range queries. If query results are collected in serialized form from different cores, we can concatenate them without copying by simply appending the fragments into the packet. This optimization is not implemented yet. On coordinator side, the query results would have to be parsed from the transport layer buffers anyway, so this also doesn't add work, but again saves allocations and copying. The CQL server doesn't need complex data structures to process the results, it just goes over it linearly consuming it. This patch provides views, iterators and visitors for consuming query results in serialized form. Currently the iterators assume that the buffer is contiguous but we could easily relax this in future so that we can avoid linearization of data received from seastar sockets. The coordinator side could be optimized even further for CQL queries which do not need processing (eg. select * from cf where ...) we could make the replica send the query results in the format which is expected by the CQL binary protocol client. So in the typical case the coordinator would just pass the data using zero-copy to the client, prepending a header. We do need structure for prefetched rows (needed by list manipulations), and this change adds query result post-processing which converts serialized query result into a structured one, tailored particularly for prefetched rows needs. This change also introduces partition_slice options. In some queries (maybe even in typical ones), we don't need to send partition or clustering keys back to the client, because they are already specified in the query request, and not queried for. The query results hold now keys as optional elements. Also, meta-data like cell timestamp and ttl is now also optional. It is only needed if the query has writetime() or ttl() functions in it, which it typically won't have.	2015-04-15 20:44:50 +02:00
Tomasz Grabiec	ecc5d23456	db: Avoid copying of column_definition Spotted in the perf profile.	2015-04-15 20:33:48 +02:00
Tomasz Grabiec	7ebc7830b7	db: Optimize column family lookup in query path	2015-04-15 20:33:48 +02:00
Tomasz Grabiec	06f198b10c	schema: Add id field It uniquely identifies column_family globally. Will be used for column_family lookups.	2015-04-15 20:33:48 +02:00

1 2 3 4

154 Commits