scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 08:23:29 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	1c05d7b927	mutation_partition: fix row_marker::apply() for equal timestamps Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-22 12:08:53 +02:00
Paweł Dziepak	7fab0ee867	mutation_partition: add compare_row_marker_for_merge() A compare_atomic_cell_for_merge() equivalent intended to be used with row markers. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-22 12:08:53 +02:00
Asias He	04291dec28	storage_service: Enable call to excise	2015-10-22 17:41:02 +08:00
Asias He	ee551d070f	storage_service: Enable add_expire_time_if_found in excise and handle_state_removing	2015-10-22 17:41:02 +08:00
Asias He	2137ab5522	storage_service: Implement add_expire_time_if_found	2015-10-22 17:41:02 +08:00
Asias He	e2fbd146d7	storage_service: Print keyspace info unbootstrap in debug	2015-10-22 17:41:02 +08:00
Asias He	affab296b0	storage_service: Fix ranges in stream_hints Use range = make_open_ended_both_sides to present the entire ring.	2015-10-22 17:41:02 +08:00
Asias He	0d2fb9c99d	storage_service: Add extract_expire_time	2015-10-22 17:41:02 +08:00
Asias He	58225216b3	storage_service: Fix immediate return for get_changed_ranges_for_leaving It is a leftover when get_changed_ranges_for_leaving is get stubbed.	2015-10-22 17:41:02 +08:00
Asias He	fb27d682ad	storage_service: Fix use after free for stream_plan sp is a stack variable, it is gone when the function returns. Fix it using a shared pointer.	2015-10-22 17:41:02 +08:00
Asias He	69b7028f84	storage_service: Fix token contains in handle_state_leaving std::includes requires sorted container. get_tokens_for returns std::unordered_set. Fix by put tokens into std::set.	2015-10-22 17:41:02 +08:00
Asias He	4785798904	storage_service: Kill unimplemented in decommission	2015-10-22 17:41:02 +08:00
Asias He	ce6dd0f8f8	storage_service: Implement start_leaving	2015-10-22 17:41:02 +08:00
Paweł Dziepak	513ab87b47	row_cache: update hit and miss stats in scanning reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-22 12:25:02 +03:00
Paweł Dziepak	b1b830bcbb	row_cache: merge cache_entry::compare and ring_position_compare Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-22 12:25:02 +03:00
Tomasz Grabiec	f1306d3771	tests: Add tests for reversed mutation queries	2015-10-22 10:32:08 +02:00
Tomasz Grabiec	cc5cc7117d	mutation_query: Respect 'reversed' partition_slice option Fixes #480	2015-10-22 10:32:08 +02:00
Tomasz Grabiec	9dbd5a92d0	partition_slice_builder: Introduce reversed()	2015-10-22 10:32:08 +02:00
Amnon Heiman	c130381284	Adding live_scanned and tombstone scaned histogram to column family This series adds a histogrm to the column family for live scanned and tombstone scaned. It expose those histogram via the API instead of the stub implmentation, currently exist. The implementation update of the histogram will be added in a different series. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-10-22 11:13:28 +03:00
Amnon Heiman	378a97b66b	API: Add row cahe hits and miss per column family Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-10-22 11:12:14 +03:00
Glauber Costa	fd8e5c7e4c	api: load new sstables Just a wrapper into the storage_service's homonymous call. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:30:04 +02:00
Glauber Costa	673788ed46	storage_service load_new_sstables. This is the storage_service implementation of load_new_sstables, and this is where most of the complication lives. Keep in mind that for Cassandra, this is deemed an unsafe operation: https://issues.apache.org/jira/browse/CASSANDRA-6245 It is still for us something we should not recommend - unless the CF is totally empty and not yet used, but we can do a much better job in the safety front. To guarantee that, the process works in four steps: 1) All writes to this specific column family are disabled. This is a horrible thing to do, because dirty memory can grow much more than desired during this. Throughout out this implementation, we will try to keep the time during which the writes are disabled to its bare minimum. While disabling the writes, each shard will tell us about the highest generation number it has seen. 2) We will scan all tables that we haven't seen before. Those are any tables found in the CF datadir, that are higher than the highest generation number seen so far. We will link them to new generation numbers that are sequential to the ones we have so far, and end up with a new generation number that is returned to the next step 3) The generation number computed in the previous step is now propagated to all CFs, which guarantees that all further writes will pick generation numbers that won't conflict with the existing tables. Right after doing that, the writes are resumed. 4) The tables we found in step 2 are passed on to each of the CFs. They can now load those tables while operations to the CF proceed normally. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	36cea4313e	column family: load new sstables CF-level code to load new SSTables. There isn't really a lot of complication here. We don't even need to repopulate the entire SSTable directory: by requiring that the external service who is coordinating this tell us explicitly about the new SSTables found in the scan process, we can just load them specifically and add them to the SSTable map. All new tables will start their lifes as shared tables, and will be unshared if it is possible to do so: this all happens inside add_sstable and there isn't really anything special in this front. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	54aaa58899	sstable_tests: test reshuffle operation Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	a8db2b28c7	sstable tests: test set_generation No code works until it's been tested. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	c5950c7bf7	sstable_test: get rid of frees They exist. They shouldn't. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	f60021f87f	sstable_tests: commonize code to compare two components. The current codes assumes a particular dir/generation pair. We will use it for a more generic case. This code could really use some clean up, by the way. We should do it later. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	61be9fb02d	reshuffle tables: mechanism to adjust new sstables' generation number Before loading new SSTables into the node, we need to make sure that their generation numbers are sequential (at least if we want to follow Cassandra's footsteps here). Note that this is unsafe by design. More information can be found at: https://issues.apache.org/jira/browse/CASSANDRA-6245 However, we can already to slightly better in two ways: Unlike Cassandra, this method takes as a parameter a generation number. We will not touch tables that are before that number at all. That number must be calculated from all shards as the highest generation number they have seen themselves. Calling load_new_sstables in the absence of new tables will therefore do nothing, and will be completely safe. It will also return the highest generation number found after the reshuffling process. New writers should start writing after that. Therefore, new tables that are created will have a generation number that is higher than any of this, and will therefore be safe. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	1351c1cc13	database: mechanism to stop writing sstables During certain operations we need to stop writing SSTables. This is needed when we want to load new SSTables into the system. They will have to be scanned by all shards, agreed upon, and in most cases even renamed. Letting SSTables be written at that point makes it inherently racy - specially with the rename. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	29e2ad7fd8	column family: commonize code to calculate the desired SSTable generation We will reuse this for load_new_sstables. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:43 +02:00
Glauber Costa	3f6d47f1f2	sstables: change the current level of an sstable This will be used, for instance, when importing an SSTable. We would like to force all new SSTables to sit at level 0 for compaction purposes. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	e11b828b6f	sstables: allow an sstable to set its generation number In some situations (restoring a backup from load_new_sstables), we want to change the SSTable generation number. This patch provides a procedure to achieve that. It does so by linking the old files to new ones, and then removing the old ones. The reason we link instead of removing, is that we want to make sure that in case there is a crash in the middle, the old data is still accessible. If the crash happens after the link is done but before we start removing the old files, that is fine: we will end up with duplicated data that will disappear after the next compaction. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	a4d8e99f1c	sstables: fix create_links so a TemporaryTOC is generated That is the way to generate groups of files for the SSTables, so we must do it. Because the links were mostly used by processes like snapshots and backups where and external tool would (hopefully) verify the results, it was not that serious. But we now plan to use links to bring things into the main directory. It must absolutely be done right. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	876e770df6	sstables: allow create_links to work with an arbitrary generation During some situations (restoring a snapshot for instance) we may want a file to get a different generation. This patch changes the code in create_links slightly, so that it is able to link not only to a different location, but to files with a different name, possibly in the same location - that is equivalent to a generation change. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	27efe8bde9	sstables: make read_toc public This is done on behalf of load_new_sstables: we would like to know which components are present in the file, but without triggering the read for the rest of the metadata. As noted by Avi, using this directly can leave the SSTable in an inconsistent state. We will have to fix is later since this is not the first offender. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	fcebf6f72d	sstable tests: don't use set_generation method There is no reason aside from testing for a table to just change its generation number. There will be, however, when we support loading new sstables. The method however needs to be completely rewritten, so let's make sure the tests are not using that. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	f3bad2032d	database: fix type for sstable generation. Avoid using long for it, and let's use a fixed size instead. Let's do signed instead of unsigned to avoid upsetting any code that we may have converted. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:01:20 +02:00
Calle Wilund	412c2a1e5b	storage_proxy: mutate_atomically fix for consistency and BL removal The change to use consistency_level::ONE in send_batchlog_mutation sort of fixes #478, but is not 100% correct. When doing async_remove_from_batchlog, the CL is actually supposed to be ANY. Also, we should _not_ remove the batch log mutation from any nodes if the mutate fails, since having it there in case of failure is sort of the whole point of it. I.e. async_remove_from_batchlog should not be called from a "finally", but from a "then". Refs #478	2015-10-21 16:48:27 +02:00
Tomasz Grabiec	764d913d84	Merge branch 'pdziepak/row-cache-range-query/v4' from seastar-dev.git From Pawel: This series enables row cache to serve range queries. In order to achieve that row cache needs to know whether there are some other partitions in the specified range that are not cached and need to be read from the sstables. That information is provied by key_readers, which work very similarly to mutation_readers, but return only the decorated key of partitions in range. In case of sstables key_readers is implemented to use partition index. Approach like this has the disadvantage of needing to access the disk even if all partitions in the range are cached. There are (at least) two solutions ways of dealing with that problem: - cache partition index - that will also help in all other places where it is neededed - add a flag to cache_entry which, when set, indicates that the immediate successor of the partition is also in the cache. Such flag would be set by mutation reader and cleared during eviction. It will also allow newly created mutations from memtable to be moved to cache provided that both their successors and predecessors are already there. The key_reader part of this patchsets adds a lot of new code that probably won't be used in any other place, but the alternative would be to always interleave reads from cache with reads from sstables and that would be more heavy on partition index, which isn't cached. Fixes #185.	2015-10-21 15:26:45 +02:00
Gleb Natapov	6a2a0d628b	storage_proxy: use CL=ONE to write logged batch This is a regression created by logged batch code rework. Fixes #478.	2015-10-21 15:29:49 +03:00
Avi Kivity	c69c02c162	Merge	2015-10-21 15:17:32 +03:00
Avi Kivity	c49dd5c576	Merge "move dependencies to /opt/scylladb" from Takuya	2015-10-21 15:17:04 +03:00
Glauber Costa	71c1b2fe69	api: get true snapshot size Thin wrapper around storage service's facility. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 13:48:44 +02:00
Glauber Costa	c0630bedc2	api: get_snapshot_details That's basically conversion work between what the storage_service returns and the json types. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 13:48:44 +02:00
Glauber Costa	cf96d68478	storage_service: true snapshot size For CFStats, one of the things needed is the size used by the snapshots. Since the bulk of the work is map-reducing it and adding them together, we will just call get_snapshot_details for the column family, and just selectively add just what we need. No need for a separate method here. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 13:48:44 +02:00
Glauber Costa	718fea7048	storage_service: get_snapshot_details The column family object can, for each column family, provide us with a map between each snapshots it knows about, and two sizes: the total size, and the "real" (or live) size, which is how much extra space the snapshot is costing us. This patch map-reduces all CFs to accumulate that system-wide, and then formats that into an a map of "snapshot_details". That is a more convenient format to be consumed by our json generator. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 13:48:44 +02:00
Glauber Costa	77513a40db	database: get_snapshot_details For each of the snapshots available, the api may query for some information: the total size on disk, and the "real" size. As far as I could understand, the real size is the size that is used by the SSTables themselves, while the total size includes also the metadata about the snapshot - like the manifest.json file. Details follow: In the original Cassandra code, total size is: long sizeOnDisk = FileUtils.folderSize(snapshot); folderSize recurses on directories, and adds file.length() on files. Again, my understanding is that file_size() would give us the same as the length() method for Java. The other value, real (or true) size is: long trueSize = getTrueAllocatedSizeIn(snapshot); getTrueAllocatedSizeIn seems to be a tree walker, whose visitor is an instance of TrueFilesSizeVisitor. What that visitor does, is add up the size of the files within the tree who are "acceptable". An acceptable file is a file which: starts with the same prefix as we want (IOW, belongs to the same SSTable, we will just test that directly), and is not "alive". The alive list is just the list of all SSTables in the system that are used by the CFs. What this tries to do, is to make sure that the trueSnapshotSize is just the extra space on disk used by the snapshot. Since the snapshots are links, then if a table goes away, it adds to this size. If it would be there anyway, it does not. We can do that in a lot simpler fashion: for each file, we will just look at the original CF directory, and see if we can find the file there. If we can't, then it counts towards the trueSize. Even for files that are deleted after compaction, that "eventually" works, and that simplifies the code tremendously given that we don't have to neither list all files in the system - as Cassandra does - or go check other shards for liveness information - as we would have to do. The scheme I am proposing may need some tweaks when we support multiple data directories, as the SSTables may not be directly below the snapshot level. Still, it would be trivial to inform the CF about their possible locations. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 13:48:44 +02:00
Avi Kivity	16006949d0	logalloc: make migrator an object, not a function pointer The migrator tells lsa how to move an object when it is compacted. Currently it is a function pointer, which means we must know how to move the object at compile time. Making it an object allows us to build the migration function at runtime, making it suitable for runtime-defined types (such as tuples and user-defined types). In the future, we may also store the size there for fixed-size types, reducing lsa overhead. C++ variable templates would have made this patch smaller, but unfortunately they are only supported on gcc 5+.	2015-10-21 11:24:56 +02:00
Avi Kivity	e2cd40e3bc	Merge "remove and decommission node support part 2" from Asias "More preparatory patches for remove and decommission node support: - stream hints and reanges - unbootstrap - replication finished notification"	2015-10-21 12:24:14 +03:00
Takuya ASADA	1bf18679bb	dist: add more build dependency for binutils Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>	2015-10-21 09:02:40 +00:00

... 938 939 940 941 942 ...

53948 Commits