scylladb

Author	SHA1	Message	Date
Pekka Enberg	791031fbc7	database: Extract update_schema_version_and_announce() function It's needed in storage proxy. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-22 11:57:00 +03:00
Raphael S. Carvalho	6ae3ffa319	database: add get_sstables to column_family Returns all sstables added to a given column_family. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-20 10:08:09 -03:00
Raphael S. Carvalho	ebbc7aa43e	database: add compact_sstables to column_family compact_all_sstables is about selecting all available sstables for compaction and executing a compaction code on them. This compaction code was moved to a more generic function called compact_sstables, which will compact a list of given sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-20 10:08:02 -03:00
Pekka Enberg	81cddec777	database: Add versioning support Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-16 14:53:30 +03:00
Raphael S. Carvalho	719898d0e5	introduce automatic compaction As the name implies, this patch introduces the concept of automatic compaction for sstables. Compaction task is triggered whenever a new sstable is written. Concurrent compaction on the same column family isn't supported, so compaction may be postponed if there is an ongoing compression. In addition, seastar::gate is used both to prevent a new compaction from starting and to wait for an ongoing compaction to finish, when the system is asked for a shutdown. This patch also introduces an abstract class for compaction strategy, which is really useful for supporting multiple strategies. Currently, null and major compaction strategies are supported. As the name implies, null compaction strategy does nothing. Major compaction strategy is about compacting all sstables into one. This strategy may end up being helpful when adding support to major compaction via nodetool. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-16 12:00:12 +03:00
Glauber Costa	04c0fbcb8c	remove calls to seal_active_memtable It should not be called directly: externall callers should be calling flush() instead. To be sure it doesn't happen again, make seal_active_memtable private. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-15 10:24:20 -04:00
Glauber Costa	9c464aff9b	database: clean up various APIs In much of our column_families APIs, we need to pass a pointer to the database. The only reason we do that, is so we can properly handle the commit log entries after we seal the current memtables into sstables. Now that we store a pointer to the commit log in the CF itself at the time it is created, we no longer have to do it. As a result, the APIs are a lot cleaner, with no gratuitous parameters. My motivation for this was the flush method, but as a result, apply() also gets cleaner. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-15 10:24:20 -04:00
Glauber Costa	ad46daa6aa	column_family: add the commitlog as a parameter When we create a column family, we can pass as an extra parameter, the commitlog - or lack thereof. Because the commitlog is optional to begin with - it won't exist if we don't call init_commitlog, we can have this to be empty meaning no commit log. The creation of a column family should be always done through add_column_family. And if that is the case, we have the database's commitlog right there and can get the pointer through the db. Only tests are not creating the column family this way, and for them, it is fine. We want to do that, because some column family operations will use the commit log. Right now, they are forcing us to add parameters to APIs that would be much cleaner without it. So while separation is good, this level of coupling is a net win as it allows us to clean up some visible APIs. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-15 10:24:20 -04:00
Tomasz Grabiec	9bea6aa0a3	db: Introduce mutation query interface Mutation query differs from data query in that returns information needed to reconcile data slice with that retruned by other data sources. There is a generic mutation_query() algorithm introduced, which can work with any mutation_source. database::query_mutations() is a shard-local interface for mutation queries. The reconcilable_result is introduced as a medium for mutation query results. It piggy backs on frozen_mutation as a medium for reconcilable data.	2015-07-12 12:51:38 +02:00
Glauber Costa	5545e08bf7	database: introduce flush method We will have to flush it from other places as well, so wrap the flushing code into a method - specially because the current code has issues and it will be easier to deal with it if it is in a single place. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-07 11:38:23 -04:00
Avi Kivity	b27f93af97	Merge "Adding implementation to the storage_service" from Amnon "The storage_service API contains many function, the actuall implementation will be added gradually."	2015-06-29 14:57:45 +03:00
Glauber Costa	b3a4eb83d3	database: keyspaces loading This patch bootstraps the keyspaces found in system sstables and make our in-memory structures reflect them. It tries to reuse as much code as we can from db::legacy_system_tables, but keeping everything local and without applying any mutations to the database - since the latter would only unnecessarily complicate the write path. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-25 09:50:55 -04:00
Glauber Costa	c0ad0dc1d3	database: pass storage proxy to init_data_directory In order for us to call some function from db::legacy_schema_tables, we need a working storage proxy. We will use those functions in order to leverage the work done in keyspace / table creation from mutations Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-25 09:50:55 -04:00
Amnon Heiman	0151ba4372	database: changing the signature of get_config get_config returns a const reference to the configuration object inside the database object because it returns a const referent it could be const. This is helpful when the call is made from a const reference to the database. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-06-25 11:48:58 +03:00
Nadav Har'El	3f5114e415	compaction: compact_all_sstables demo function This is an example of how to use the low-level compact_sstable() function to compact all the sstables of one column family into one. It is not a full-fledged "compaction strategy" but the real ones can be based on this example. Among the things that this code doesn't do yet is to delete the old sstables. In the future, this should happen automatically in the sstable destructor when all the references to the sstable get deleted. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-25 11:05:35 +03:00
Glauber Costa	4d07c952cc	remove global create_keyspace There is only one user in tree, and as Tomek pointed out, it is buggy. The reason is that ksm is a shard-local structure, and it is currently used indiscriminately by all shards which can easily lead to problems. We could fix it with some tricks, but it is way better and safer to make sure the callers are doing it right instead. There is only one caller, so let's fix that. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-24 12:44:05 -04:00
Glauber Costa	f0f04892d3	database: no longer replicate create_keyspace code Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-24 12:44:05 -04:00
Avi Kivity	3d22623a6b	Merge "Flush schema changes to disk" from Glauber "This is the current patchset to flush and persist schema changes to disk. It is not perfect, in the sense that older changes still in flight won't be waited for. But as we discussed - at this moment we'll just note that, and leave the fix for later"	2015-06-24 17:08:33 +03:00
Gleb Natapov	7d846e842c	use write_request_timeout_in_ms for write request timeout Fixes another fixme. Also change default value to 2000 which seams to be what origin uses.	2015-06-24 12:51:33 +03:00
Glauber Costa	a6ef9815e9	database: futurize seal_active_memtables By doing this, it is possible to synchronously wait for the seal to complete by waiting on this future. This is useful in situations where we want to synchronously flush data to disk. Existing callers will not be patched, and this keeps their current behavior, alas, asynchronously initiating a write, is preserved. TODO: A better interface would guarantee that all writes before this one are also complete Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-23 15:23:37 -04:00
Tomasz Grabiec	d4e0e5957b	db: Integrate cache with the read path	2015-06-23 13:49:25 +02:00
Tomasz Grabiec	b9288d9fa7	db: Make column_family managed by lw_shared_ptr<> It will be share-owned by readers.	2015-06-23 13:49:24 +02:00
Vlad Zolotarov	3520d4de10	locator: introduce a global distributed<snitch_ptr> i_endpoint_snitch::snitch_instance() Snitch class semantics defined to be per-Node. To make it so we introduce here a static member in an i_endpoint_snitch class that has to contain the pointer to the relevant snitch class instance. Since the snitch contents are not always pure const it has to be per shard, therefore we'll make it a "distributed". All the I/O is going to take place on a single shard and if there are changes - they are going to be propagated to the rest of the shards. The application is responsible to initialize this distributed<shnitch> before it's used for the first time. This patch effectively reverts most of the "locator: futurize snitch creation" `a2594015f9` patch - the part that modifies the code that was creating the snitch instance. Since snitch is created explicitly by the application and all the rest of the code simply assumes that the above global is initialized we won't need all those changes any more and the code will get back to be nice and simple as it was before the patch above. So, to summarize, this patch does the following: - Reverts the changes introduced by `a2594015f9` related to the fact that every time a replication strategy was created there should have been created a snitch that would have been stored in this strategy object. More specifically, methods like keyspace::create_replication_strategy() do not return a future<> any more and this allows to simplify the code that calls it significantly. - Introduce the global distributed<snitch_ptr> object: - It belongs to the i_endpoint_snitch class. - There has been added a corresponding interface to access both global and shard-local instances. - locator::abstract_replication_strategy::create_replication_strategy() does not accept snitch_ptr&& - it'll get and pass the corresponding shard-local instance of the snitch to the replication strategy's constructor by itself. - Adjusted the existing snitch infrastructure to the new semantics: - Modified the create_snitch() to create and start all per-shard snitch instances and update the global variable. - Introduced a static i_endpoint_snitch::stop_snitch() function that properly stops the global distributed snitch. - Added the code to the gossiping_property_file_snitch that distributes the changed data to all per-shard snitch objects. - Made all existing snitches classes properly maintain their state in order to be able to shut down cleanly. - Patched both urchin and cql_query_test to initialize a snitch instance before all other services. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v6: - Rebased to the current master. - Extended a commit message a little - the summary. New in v5: - database::create_keyspace(): added a missing _keyspaces.emplace() New in v4: - Kept the database::create_keyspace() to return future<> by Glauber's request and added a description to this method that needs to be changed when Glauber adds his bits that require this interface.	2015-06-22 23:18:31 +03:00
Shlomi Livne	0ce374a853	Add support for setting keyspace replication_strategy To support initialization of system tables keyspace replication_strategy without the need of having snitch creation. Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>	2015-06-22 13:19:55 +03:00
Glauber Costa	f4a167670a	database: seal active memtables when we close the database Failing to do so can lead to data not being written to disk when we terminate. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-21 09:39:31 +03:00
Glauber Costa	1f13d3e38f	database: gate seal_active_memtable We need to do that in order to close the database cleanly, flushing all pending data before we do. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-21 09:39:29 +03:00
Avi Kivity	f221301d5e	Merge "preparation work - system table handling" from Glauber	2015-06-18 17:49:29 +03:00
Tomasz Grabiec	51cae834e3	db: Put all sstables behind single reader This change abstracts reading from on-disk data sources behind a single reader which is then composed with memtable readers. This change also abstracts all data sources behind a single reader obtained via column_family::make_reader(). That reader is then used by algorithms like column_family::for_all_partitions() or column_family::query(). Having those abstractions will make it easier to add row cache, because it will be encapsulated in a single place.	2015-06-18 16:33:33 +02:00
Tomasz Grabiec	7f1ff0401e	db: Move mutation_reader definition to separate header	2015-06-18 15:47:40 +02:00
Glauber Costa	057c38b61c	only populate system keyspace Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-18 09:22:20 -04:00
Pekka Enberg	8345874dda	database: Add database::has_schema() helper Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-06-17 15:45:45 +03:00
Gleb Natapov	2d409250f2	remove ad-hoc token_metadata creation	2015-06-15 12:51:09 +03:00
Avi Kivity	446731cf88	Merge "column family API" Column family API, from Amnon.	2015-06-15 10:50:23 +03:00
Vlad Zolotarov	e045d8465c	db: use snitch name from the configuration file Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-06-14 15:31:58 +03:00
Gleb Natapov	b7155ad862	pass partitions_ranges separately from from read_command partitions_ranges will be manipulated upon to be split for different destination, so provide it separately from read_command to not copy the later for each destination.	2015-06-11 15:18:07 +03:00
Avi Kivity	ce6cd4b67e	Merge "Store keyspace strategy options to database" From Pekka: "This series fixes up schema management code to store keyspace strategy options to database. The map is stored as JSON just like in Origin."	2015-06-11 14:21:53 +03:00
Pekka Enberg	d088cb8181	Fix keyspace strategy options to preserve key-value ordering Fix keyspace strategy options to preserve key-value ordering by switching to std::map. We need this to be able to store the map in database as JSON because unordered maps can cause the schema merging code to attempt a keyspace update, which we don't support, even though the values did not change. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-06-11 13:02:42 +03:00
Pekka Enberg	b7a23ddadd	database: Memtable flush batching Currently, we flush out memtables very aggressively which results into lots of small sstable writes. The proper fix here is to do accounting on the memtable size but before that happens, bump up the threshold to another magic number which gives better batching: $ ./build/release/seastar --smp 1 --data-file-directories data --commitlog-directory commitlog/ $ tools/bin/cassandra-stress write -mode cql3 native prepared -rate threads=32 Before: Results: op rate : 37280 partition rate : 37280 row rate : 37280 latency mean : 0.8 latency median : 0.6 latency 95th percentile : 1.1 latency 99th percentile : 7.6 latency 99.9th percentile : 11.9 latency max : 50.5 Total operation time : 00:00:30 END After: Results: op rate : 46721 partition rate : 46721 row rate : 46721 latency mean : 0.7 latency median : 0.5 latency 95th percentile : 0.9 latency 99th percentile : 1.3 latency 99.9th percentile : 5.8 latency max : 96.3 Total operation time : 00:00:39 END Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-06-11 10:24:35 +03:00
Amnon Heiman	b9e3a03483	Expose the column family info in the database The API needs the column family information in the database object. This adds function to the database to expose the column family information.	2015-06-11 09:50:52 +03:00
Calle Wilund	8b9a63a3c6	Database/commitlog: guard against replay position reordering Commit log guarantees that once an RP is assigned to a data frame/caller, it will not block before returning the result via future. However, this is not enough, since we could a.) Have blocked earlier, in which case the return value processing will be async anyway b.) Even if no blocking takes place, future chaining mechanism could decide it has to reorder execution. Assuming though that the case where this happens is rare, and cases where it actually affects the rule of replay position ordering is even rarer, we can guard against it by simply keeping track of the highest RP _discarded_ (sent to sstable flush), and if we attempt to apply a mutation with a higher RP, simply re-do the operation (i.e. write same entry to commit log again). Signed-off-by: Calle Wilund <calle@cloudius-systems.com>	2015-06-10 11:56:45 +03:00
Vlad Zolotarov	a2594015f9	locator: futurize snitch creation - Forbid explicit snitch creation with constructor. - Allow the creation of snitches only with locator::make_snitch() template function. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v4: - Make sure the snitch is stopped before it's destroyed when _snitch_is_ready is returned in an exceptional state. New in v2: - Change snitch_ptr to be std::unique_ptr<i_endpoint_snitch> - abstract_replication_strategy::create_replication_strategy(): explicitly specify (template) types of create_object() parameters. - Re-arrange the loop in marge_keyspaces() so that lambdas that depend on "this" complete before there is a chance that "this" gets destroyed. - create_keyspace(): Don't add a new keyspace if a keyspace with this name already exists. - i_endpoint_snitch: added a stop() virtual method - Added a stop() pure virtual method. - Added an enum class snitch_state and a _state member initialized to snitch_state::initializing, added an assert() in a destructor requiring _state to become snitch_state::stopped, which should be set when stop() is complete. - rack_inferring_snitch: added a stop() method. - simple_snitch: added a stop() method. - Added stop() methods to abstract_replication_strategy and keyspace. - Updated database::stop() to wait for all keyspaces in _keyspaces to stop.	2015-06-09 15:33:38 +03:00
Vlad Zolotarov	c1f0d285bb	database: make the the create_keyspace() function declaration match the definitiion. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-06-09 15:18:46 +03:00
Pekka Enberg	87e525b6b5	database: Add update and drop column family stubs They're needed by table merging in db/legacy_schema_tables.cc. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-06-08 14:42:36 +03:00
Tomasz Grabiec	b2549a7b14	Merge branch 'calle/secondary_index' from seastar-dev.git	2015-06-03 13:22:01 +02:00
Calle Wilund	293dbf66e3	Forward and use replay_position when applying mutation * Forward commitlog replay_position to column_family.memtable, updating highest RP if needed * When flushing memtable, signal back to commitlog that RP has been dealt with to potentially remove finished segment(s) Note: since memtable flushing right now is _not_ explicitly ordered, this does not actually work, since we need to guarantee ordering with regards to RP. I.e. if we flush N blocks, we must guarantee that: a.) We report "flushed RP" in RP order b.) For a given RP1, all RP* lower than RP1 must also have been flushed. (The latter means that it is fine to say, flush X tables at the same time, as long as we report a single RP that is the highest, and no lower RP:s exist in non-flushed tables) I am however letting someone else deal with ensuring MT->sstable flush order. Signed-off-by: Calle Wilund <calle@cloudius-systems.com>	2015-06-03 12:38:13 +03:00
Calle Wilund	724a33c11d	Database: add "existing_index_names"	2015-06-03 10:13:53 +02:00
Paweł Dziepak	8e66bfc9d4	db: add getter for database::_keyspaces Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-02 14:11:34 +02:00
Paweł Dziepak	d50859907f	db: update keyspace_metadata when column family is added Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-02 14:11:34 +02:00
Pekka Enberg	4dc488afb2	database: Store metadata in 'struct keyspace' Store a lw_shared_ptr<keyspace_metadata> in struct keyspace so callers in migration manager, for example, can look it up. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-05-25 09:12:29 +02:00
Avi Kivity	ff42d58881	db: use CoW to modify the memtable list in column_family Allow memtables to be removed from a column_family while a running query continues to use them.	2015-05-20 16:00:00 +03:00

1 2 3 4 5

208 Commits