scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 17:40:34 +00:00

Author	SHA1	Message	Date
Pekka Enberg	d4b4baad98	Merge "Add more information to query result digest" from Paweł "This series adds more information (i.e. keys and tombstones) to the query result digest in order to ensure correctness and increase the chances of early detection of disagreement between replicas. The digest is no longer computed by hashing query::result but build using the query result builder. That is necessary since the query result itself doesn't contain all information required to compute the digest. Another consequence of this is that now replicas asked for a result need to send both the result and the digest to the coordinator as it won't be able to compute the digest itself. Unfortunately, these patches change our on wire communication: 1) hash computation is different 2) format of query::result is changed (and it is made non-final) Fixes #182."	2016-03-14 08:22:05 +02:00
Paweł Dziepak	82d2a2dccb	specify whether query::result, result_digest or both are needed Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Glauber Costa	a339296385	database: turn sstable generation number into an optional This patch makes sure that every time we need to create a new generation number - the very first step in the creation of a new SSTable, the respective CF is already initialized and populated. Failure to do so can lead to data being overwritten. Extensive details about why this is important can be found in Scylla's Github Issue #1014 Nothing should be writing to SSTables before we have the chance to populate the existing SSTables and calculate what should the next generation number be. However, if that happens, we want to protect against it in a way that does not involve overwriting existing tables. This is one of the ways to do it: every column family starts in an unwriteable state, and when it can finally be written to, we mark it as writeable. Note that this cannot be a part of add_column_family. That adds a column family to a db in memory only, and if anybody is about to write to a CF, that was most likely already called. We need to call this explicitly when we are sure we're ready to issue disk operations safely. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:06:05 -05:00
Glauber Costa	8eb4e69053	database: remove unused parameter We are no longer using the in_flight_seals gate, but forgot to remove it. To guarantee that all seal operations will have finished when we're done, we are using the memtable_flush_queue, which also guarantees order. But that gate was never removed. The FIXME code should also be removed, since such interface does exist now. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:05:54 -05:00
Glauber Costa	94e90d4a17	column_family: do not open code generation calculation We already have a function that wraps this, re-use it. This FIXME is still relevant, so just move it there. Let's not lose it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:05:47 -05:00
Glauber Costa	46fdeec60a	colum_family: remove mutation_count We use memory usage as a threshold these days, and nowhere is _mutation_count checked. Get rid of it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:05:47 -05:00
Asias He	4abaacfc61	db: Introduce column_family_exists It is cheaper than throwing a no_such_column_family exception to test if a cf is gone, e.g., deleted.	2016-03-09 16:50:38 +08:00
Raphael S. Carvalho	d65642cee8	fix storage_service::load_new_sstables() to not disable write permanently Avi says: "If an exception happens, then enable_sstable_writes won't be called." The problem is fixed by catching a possible exception and enabling sstable write for the relevant column family if it wasn't enabled already. Closes #953. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <32c1bcb2c60c7b9e5514eb0a95062f40ca92093a.1457119308.git.raphaelsc@scylladb.com>	2016-03-07 13:56:02 +01:00
Vlad Zolotarov	a45ecaf336	database: store "incremental backup" configuration value in per-shard instance Store the "incremental_backups" configuration value in the database class (and use it when creating a keyspace::config) in order to be able to modify it in runtime. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 17:22:48 +02:00
Raphael S. Carvalho	34ed930aa4	sstables: fix lack of accuracy in disk usage report To report disk usage, scylla was only taking into account size of sstable data component. Other components such as index and filter may be relatively big too. Therefore, 'nodetool status' would report an innacurate disk usage. That can be fixed by taking into account size of all sstable components. Fixes #943. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <08453585223570006ac4d25fe5fb909ad6c140a5.1456762244.git.raphaelsc@scylladb.com>	2016-03-01 08:58:42 +02:00
Raphael S. Carvalho	fc4cbcde72	Revert "Revert "database: Fix use and assumptions about pending compations"" This reverts commit `a4d92750eb`. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <8a405e7c1daf94c4d70d8084f59ce7205d56fe52.1456415398.git.raphaelsc@scylladb.com>	2016-02-25 18:02:01 +02:00
Pekka Enberg	a4d92750eb	Revert "database: Fix use and assumptions about pending compations" This reverts commit `9586793c70`. It breaks sstable_test as follows: [penberg@nero scylla]$ build/release/tests/sstable_test --smp 1 Running 81 test cases... INFO [shard 0] compaction_manager - Asked to stop INFO [shard 0] compaction_manager - Stopped sstable_test: database.cc:878: future<> column_family::run_compaction(sstables::compaction_descriptor): Assertion `_stats.pending_compactions > 0' failed. unknown location(0): fatal error in "compaction_manager_test": signal: SIGABRT (application abort requested) tests/sstable_datafile_test.cc(1023): last checkpoint	2016-02-25 15:28:06 +02:00
Calle Wilund	9586793c70	database: Fix use and assumptions about pending compations Fixes #934 - faulty assert in discard_sstables run_with_compaction_disabled clears out a CF from compaction mananger queue. discard_sstables wants to assert on this, but looks at the wrong counters. pending_compactions is an indicator on how much interested parties want a CF compacted (again and again). It should not be considered an indicator of compactions actually being done. This modifies the usage slightly so that: 1.) The counter is always incremented, even if compaction is disallowed. The counters value on end of run_with_compaction_disabled is then instead used as an indicator as to whether a compaction should be re-triggered. (If compactions finished, it will be zero) 2.) Document the use and purpose of the pending counter, and add method to re-add CF to compaction for r_w_c_d above. 3.) discard_sstables now asserts on the right things. Message-Id: <1456332824-23349-1-git-send-email-calle@scylladb.com>	2016-02-25 08:57:04 +02:00
Calle Wilund	590ec1674b	truncate: Require timestamp join-function to ensure equal values Fixes #937 In fixing #884, truncation not truncating memtables properly, time stamping in truncate was made shard-local. This however breaks the snapshot logic, since for all shards in a truncate, the sstables should snapshot to the same location. This patch adds a required function argument to truncate (and by extension drop_column_family) that produces a time stamp in a "join" fashion (i.e. same on all shards), and utilizes the joinpoint type in caller to do so. Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>	2016-02-24 18:59:31 +02:00
Raphael S. Carvalho	59bbe98c21	sstables: keep track of compacting sstables in compacton manager itself Avi says: "Something like unordered_set<unsigned long> is error prone, because ints tend to mix up (also, need to use a sized type, unsigned long varies among machines)." With that in mind, it's better if we keep track of compacting sstables in a unordered_set<shared_sstable>. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <249f0fd4cfcf786cf3c37a79978f7743d07f48ad.1455120811.git.raphaelsc@scylladb.com>	2016-02-15 18:35:43 +02:00
Calle Wilund	18203a4244	database::truncate/drop: Move time stamp generation to shard Fixes #884 Time stamps for truncation must be generated after flush, either by splitting the truncate into two (or more) for-each-shard operations, or simply by doing time stamping per shard (this solution). We generate TS on each shard after flushing, and then rely on the actual stored value to be the highest time point generated. This should however, from batch replay point of view, be functionally equivalent. And not a problem.	2016-02-09 15:45:37 +00:00
Gleb Natapov	a9e4afd8d2	Drop query-result.hh from database.hh It is not needed there but causes a lot of recompilation when changed. Message-Id: <1454496142-14537-3-git-send-email-gleb@scylladb.com>	2016-02-04 13:22:27 +02:00
Tomasz Grabiec	9fa62af96b	database: Move implementation to .cc Message-Id: <1453980679-27226-1-git-send-email-tgrabiec@scylladb.com>	2016-01-28 13:35:33 +02:00
Glauber Costa	f6cfb04d61	add a priority class to mutation readers SSTables already have a priority argument wired to their read path. However, most of our reads do not call that interface directly, but employ the services of a mutation reader instead. Some of those readers will be used to read through a mutation_source, and those have to patched as well. Right now, whenever we need to pass a class, we pass Seastar's default priority class. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Raphael S. Carvalho	2164aa8d5b	move compaction manager from /utils to /sstables Compaction manager was initially created at utils because it was more generic, and wasn't only intended for compaction. It was more like a task handler based on futures, but now it's only intended to manage compaction tasks, and thus should be moved elsewhere. /sstables is where compaction code is located. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-21 15:23:05 -02:00
Raphael S. Carvalho	a5c90194f5	db: add support to clean up a column family Cleanup is a procedure that will discard irrelevant keys from all sstables of a column family, thus saving disk space. Scylla will clean up a sstable by using compaction code, in which this sstable will be the only input used. Compaction manager was changed to become aware of cleanup, such that it will be able to schedule cleanup requests and also know how to handle them properly. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 03:53:04 -02:00
Raphael S. Carvalho	d44a5d1e94	compaction: filter out compacting sstables The implementation is about storing generation of compacting sstables in an unordered set per column family, so before strategy is called, compaction manager will filter out compacting sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 01:18:29 -02:00
Raphael S. Carvalho	9c13c1c738	compaction: move compaction execution from strategy to manager Currently, compaction strategy is the responsible for both getting the sstables selected for compaction and running compaction. Moving the code that runs compaction from strategy to manager is a big improvement, which will also make possible for the compaction manager to keep track of which sstables are being compacted at a moment. This change will also be needed for cleanup and concurrent compaction on the same column family. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 00:04:27 -02:00
Raphael S. Carvalho	5c674091dc	db: move code that rebuilds sstable list to a function That code will be used by column family cleanup, so let's put that code into a function. This change also improves the code readability. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-11 19:51:04 -02:00
Raphael S. Carvalho	58189dd489	db: move generation calculation code to a function Code that calculates generation should be put in a function. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-11 19:51:02 -02:00
Tomasz Grabiec	8deb3f18d3	query_processor: Invalidate prepared statements when columns change Replicates https://issues.apache.org/jira/browse/CASSANDRA-7910 : "Prepare a statement with a wildcard in the select clause. 2. Alter the table - add a column 3. execute the prepared statement Expected result - get all the columns including the new column Actual result - get the columns except the new column"	2016-01-11 10:34:55 +01:00
Tomasz Grabiec	40858612e5	db: Make column_family::schema() return const& to avoid copy	2016-01-11 10:34:54 +01:00
Tomasz Grabiec	8164902c84	schema_tables: Change column_family schema on schema sync Notifications are not implemented yet.	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	d81a46d7b5	column_family: Add schema setters There is one current schema for given column_family. Entries in memtables and cache can be at any of the previous schemas, but they're always upgraded to current schema on access.	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	036974e19b	Make mutation interfaces support multiple versions Schema is tracked in memtable and cache per-entry. Entries are upgraded lazily on access. Incoming mutations are upgraded to table's current schema on given shard. Mutating nodes need to keep schema_ptr alive in case schema version is requested by target node.	2016-01-11 10:34:51 +01:00
Vlad Zolotarov	07f8549683	database: filter out a manifest.json files Filter out manifest.json files when reading sstables during bootup and when loading new sstables ('nodetool refresh'). Fixes issue #529 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1451911734-26511-3-git-send-email-vladz@cloudius-systems.com>	2016-01-07 15:56:02 +02:00
Vlad Zolotarov	d5920705b8	service::storage_service: move clear_snapshot() code to 'database' class service::storage_service::clear_snapshot() was built around _db.local() calls so it makes more sense to move its code into the 'database' class instead of calling _db.local().bla_bla() all the time. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-01-03 14:22:17 +02:00
Avi Kivity	827a4d0010	Merge "streaming: Invalidate cache upon receiving of stream" from Asias "When a node gain or regain responsibility for certain token ranges, streaming will be performed, upon receiving of the stream data, the row cache is invalidated for that range. Refs #484."	2015-12-28 10:24:46 +02:00
Avi Kivity	f3980f1fad	Merge seastar upstream * seastar 51154f7...8b2171e (9): > memcached: avoid a collision of an expiration with time_point(-1). > tutorial: minor spelling corrections etc. > tutorial: expand semaphores section > Merge "Use steady_clock where monotonic clock is required" from Vlad > Merge "TLS fixes + RPC adaption" from Calle > do_with() optimization > tutorial: explain limiting parallelism using semaphores > submit_io: change pending flushes criteria > apps: remove defunct apps/seastar Adjust code to use steady_clock instead of high_resolution_clock.	2015-12-27 14:40:20 +02:00
Asias He	c25393a3f6	database: Add non-const version of get_row_cache We need this to invalidate row cache of a column family.	2015-12-21 14:42:47 +08:00
Paweł Dziepak	25d255390e	database: add non-const getter for compaction_manager Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-17 14:06:41 +01:00
Raphael S. Carvalho	a26fb15d1a	db: add method to get compaction manager from cf Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:50:20 -02:00
Amnon Heiman	7e79d35f85	Estimated histogram: Clean the add interface The add interface of the estimated histogram is confusing as it is not clear what units are used. This patch removes the general add method and replace it with a add_nano that adds nanoseconds or add that gets duration. To be compatible with origin, nanoseconds vales are translated to microseconds.	2015-12-01 15:28:06 +02:00
Asias He	aa2b11f21b	database: Move is_replacing and get_replace_address to database class So they can be used outside storage_service.	2015-11-30 09:15:42 +08:00
Glauber Costa	fa1ae45218	database: export collectd metrics about the state of memtable flushing When analyzing a recent performance issue, I found helpful to keep track of the amount of memtables that are currently in flight, as well as how much memory they are consuming in the system. Although those are memtable statistics, I am grouping them under the "cf_stats" structure: being the column family a central piece of the puzzle, it is reasonable to assume that a lot of metrics about it would be potentially welcome in the future. Note that we don't want to reuse the "stats" structure in the column family: for once, the fields not always map precisely (pending flushes, for instance, only tracks explicit flushes), and also the stats structure is a lot more complex than we need. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-11-12 20:17:22 +02:00
Asias He	20ecb0bede	database: Introduce get_initial_tokens Get initial tokens specified by the initial_token in scylla.conf. E.g., --initial-token "-1112521204969569328,1117992399013959838" --initial-token "1117992399013959838" It can be multiple tokens split by comma.	2015-11-04 10:40:12 +08:00
Avi Kivity	f7087da054	Merge "GET methods for snapshots" from Glauber "The snapshots API need to expose GET methods so people can query information on them. Now that taking snapshots is supported, this relatively simple series implement get_snapshot_details, a column family method, and wire that up through the storage_service."	2015-10-22 15:23:45 +03:00
Avi Kivity	5f3a46eabb	Merge "load_new_sstables" from Glauber "This patchset implements load_new_sstables, allowing one to move tables inside the data directory of a CF, and then call "nodetool refresh" to start using them. Keep in mind that for Cassandra, this is deemed an unsafe operation: https://issues.apache.org/jira/browse/CASSANDRA-6245 It is still for us something we should not recommend - unless the CF is totally empty and not yet used, but we can do a much better job in the safety front. To guarantee that, the process works in four steps: 1) All writes to this specific column family are disabled. This is a horrible thing to do, because dirty memory can grow much more than desired during this. Throughout out this implementation, we will try to keep the time during which the writes are disabled to its bare minimum. While disabling the writes, each shard will tell us about the highest generation number it has seen. 2) We will scan all tables that we haven't seen before. Those are any tables found in the CF datadir, that are higher than the highest generation number seen so far. We will link them to new generation numbers that are sequential to the ones we have so far, and end up with a new generation number that is returned to the next step 3) The generation number computed in the previous step is now propagated to all CFs, which guarantees that all further writes will pick generation numbers that won't conflict with the existing tables. Right after doing that, the writes are resumed. 4) The tables we found in step 2 are passed on to each of the CFs. They can now load those tables while operations to the CF proceed normally."	2015-10-22 13:42:24 +03:00
Amnon Heiman	c130381284	Adding live_scanned and tombstone scaned histogram to column family This series adds a histogrm to the column family for live scanned and tombstone scaned. It expose those histogram via the API instead of the stub implmentation, currently exist. The implementation update of the histogram will be added in a different series. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-10-22 11:13:28 +03:00
Glauber Costa	36cea4313e	column family: load new sstables CF-level code to load new SSTables. There isn't really a lot of complication here. We don't even need to repopulate the entire SSTable directory: by requiring that the external service who is coordinating this tell us explicitly about the new SSTables found in the scan process, we can just load them specifically and add them to the SSTable map. All new tables will start their lifes as shared tables, and will be unshared if it is possible to do so: this all happens inside add_sstable and there isn't really anything special in this front. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	61be9fb02d	reshuffle tables: mechanism to adjust new sstables' generation number Before loading new SSTables into the node, we need to make sure that their generation numbers are sequential (at least if we want to follow Cassandra's footsteps here). Note that this is unsafe by design. More information can be found at: https://issues.apache.org/jira/browse/CASSANDRA-6245 However, we can already to slightly better in two ways: Unlike Cassandra, this method takes as a parameter a generation number. We will not touch tables that are before that number at all. That number must be calculated from all shards as the highest generation number they have seen themselves. Calling load_new_sstables in the absence of new tables will therefore do nothing, and will be completely safe. It will also return the highest generation number found after the reshuffling process. New writers should start writing after that. Therefore, new tables that are created will have a generation number that is higher than any of this, and will therefore be safe. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	1351c1cc13	database: mechanism to stop writing sstables During certain operations we need to stop writing SSTables. This is needed when we want to load new SSTables into the system. They will have to be scanned by all shards, agreed upon, and in most cases even renamed. Letting SSTables be written at that point makes it inherently racy - specially with the rename. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	29e2ad7fd8	column family: commonize code to calculate the desired SSTable generation We will reuse this for load_new_sstables. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:43 +02:00
Glauber Costa	f3bad2032d	database: fix type for sstable generation. Avoid using long for it, and let's use a fixed size instead. Let's do signed instead of unsigned to avoid upsetting any code that we may have converted. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:01:20 +02:00

1 2 3 4 5 ...

333 Commits