scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 07:53:20 +00:00

Author	SHA1	Message	Date
Avi Kivity	3a5e3c8829	sstables: de-futurize write path The sstables write path has been partially de-futurized, but now creates a ton of threads, and yet does not exploit this as everything is serialized. Remove those extra threads and futures and use a single thread to write everything. If needed, we'll employ write-behind in output_stream to increase parallelism. Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-03 20:33:59 +03:00
Avi Kivity	ad443e4771	sstable: add accessor for first/last partition keys	2015-08-03 20:17:41 +03:00
Avi Kivity	6ca6f0c3a4	sstables: add conversion function from sstable key to partition key	2015-08-03 20:17:40 +03:00
Raphael S. Carvalho	477a3586d7	compaction: add missing information to compaction log duration and throughput weren't being calculated. closes #54. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-02 19:15:57 +03:00
Avi Kivity	98ec451d6a	Extract range<> into its own header It's not just for queries any more.	2015-08-02 16:07:42 +03:00
Paweł Dziepak	430f74a8bb	sstables: read expired or expiring row marker Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-07-30 14:10:06 +02:00
Paweł Dziepak	f5e3764570	sstables: properly write expiring row marker Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-07-30 14:10:06 +02:00
Raphael S. Carvalho	c9fdc7dc5d	compaction: get rid of invalid FIXME comment Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-28 19:22:26 +03:00
Avi Kivity	2e745bebad	Merge "use compaction strategy options" from Raphael	2015-07-27 17:06:43 +03:00
Tomasz Grabiec	e5feff5d71	dht: ring_position: Switch to total ordering range::is_wrap_around() and range::contains() rely on total ordering on values to work properly. Current ring_position_comparator was only imposing a weak ordering (token positions equal to all key positions with that token). range::before() and range::after() can't work for weak ordering. If the bound is exclusive, we don't know if user-provided token position is inside or outside. Also, is_wrap_around() can't properly detect wrap around in all cases. Consider this case: (1) ]A; B] (2) [A; B] For A = (tok1) and B = (tok1, key1), (1) is a wrap around and (2) is not. Without total ordering between A and B, range::is_wrap_around() can't tell that. I think the simplest soution is to define a total ordering on ring_position by making token positions positioned either before or after all keys with that token.	2015-07-24 16:08:41 +02:00
Raphael S. Carvalho	70770c261b	sstables: remove double percentage symbol from compaction log message Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-24 10:21:38 +02:00
Raphael S. Carvalho	634d00511b	compaction: use compaction options in strategy Support to compaction strategy options was recently added. Previously, we were using default values in compaction strategy for options, but now we can use the options defined in the schema. Currently, we only support size-tiered strategy, so let's start with it. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-23 15:26:47 -03:00
Glauber Costa	4cd143de87	filter_tracker: define and call a stop method All sharded services "should" define a stop method. Calling them is also a good practice. For this one specifically, though, we will not call stop. We miss a good way to add a Deleter to a shared_ptr class, and that would be the only reliable way to tie into its lifetime. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-23 11:11:57 -04:00
Glauber Costa	96f7c77a04	sstables: write dense tables Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-22 23:10:22 -04:00
Glauber Costa	2757cc595a	sstable partition: read dense tables Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-22 23:10:22 -04:00
Glauber Costa	87c77acbac	sstables: correctly write column names for non compound types This can happen for COMPACT STORAGE. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-22 23:10:21 -04:00
Glauber Costa	3383c619ad	partition: handle reads of non-composite types Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-22 23:10:21 -04:00
Glauber Costa	e9094db7ef	sstable partition: remove dead code This is no longer used Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-22 23:10:21 -04:00
Glauber Costa	5b7c749310	sstables: simplified version of write_column_name for non-clustered columns We still want to wrap it instead of writing the column name directly, so we are able to update the statistics. It is better to have a separate function for this, because write_column_name doesn't have enough information to decide when to do what. Augmenting it so we could have would require passing the schema, or an extra parameter, which would then spread to all callers. Keep in mind that testing for an empty clustering key is not enough, since composite types will serialize the empty clustering key in this case. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-22 23:10:21 -04:00
Raphael S. Carvalho	e57fe36249	compaction: get compaction threshold from schema instead Get values from cf->schema instead of using hardcoded threshold values. In addition, move DEFAULT_MIN_COMPACTION_THRESHOLD and DEFAULT_MAX_COMPACTION_THRESHOLD to schema.hh so as not to have knowledge duplicated. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-22 18:03:23 +03:00
Avi Kivity	6a9d0495f8	sstables: fix memory corruption in metadata parsing Since parsing involves a unique_ptr<metadata> holding a pointer to a subclass of metadata, it must define a virtual destructor, or it can cause memory leaks when deleted, or, with C++14 sized deallocators, it can cause the wrong memory pool to be used for deleting the object. Seen on EC2. Define a virtual destructor to tell the compiler how to destroy and free the object.	2015-07-22 17:46:37 +03:00
Avi Kivity	69a94732df	Merge "logging compaction activity" from Raphael	2015-07-22 16:00:57 +03:00
Nadav Har'El	630ccf5a09	sstables: remember to close() files It is now necessary to close() a file before destroying it, otherwise a big ugly warning message is printed by the reactor. Our sstable read path was especially careless about closing the countless files it opens, and the sstable test generated as many as 400 (!) of these warning messages, despite running correctly. This patch adds the missing close() calls. After this patch, the sstable test still shows 3 warning messages. Those are unavoidable: They happen while broken sstables are being tested, and an exception is thrown in the middle of the sstable processing, causing us to destroy a file object without calling close() on it first. This, in my opinion, proves that requiring close() in the read path is not a good thing, it is un-RAII-like and not exception-safe. But it is benign except the warning message, so whatever. 3 scary warning messages from the test are better than 400... If these 3 remaining messages really bother us, I guess we can fix it by catching the exceptions in the sstable code, closing the file and rethrowing the exception, but it will be quite ugly... Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-07-22 15:30:59 +03:00
Avi Kivity	8870bf1bf8	Merge "Handling of non-full partition range queries" from Tomasz	2015-07-22 15:18:02 +03:00
Raphael S. Carvalho	63b41cc068	sstables: log compaction activity There is some missing information in the last log printout, because it's currently hard to generate such information. Anyway, this patch is a good start towards providing the same log messages as origin. Addresses issue #12 Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-22 09:15:18 -03:00
Raphael S. Carvalho	713953ee5e	sstables: add function to return file name of data component Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-22 08:18:10 -03:00
Tomasz Grabiec	152582a869	sstables: Add read_range_rows() variant which takes a partition_range	2015-07-22 13:13:38 +02:00
Tomasz Grabiec	5fe7c1093f	sstables: Make mutation_reader::impl unmovable and uncopyable	2015-07-22 13:13:38 +02:00
Tomasz Grabiec	7aea858108	sstables: Make data_consume_rows(0, 0) return no rows data_consume_rows(0, 0) was returning all partitions instead of no partitions, because -1 was passed as count in such case, which was then casted to uint64_t. Special-casing it that way is problematic for code which calculates the bounds, and when the key is not found we simple end up with 0 as upper bound. Instead of convoluting the range lookup code to special case for 0, let's simplify the interface so that (0, 0) returns no rows, same as (1, 1). There is a new overload of data_consume_rows() without bounds, which returns all data.	2015-07-22 13:10:01 +02:00
Nadav Har'El	4edf7fe206	clean up uses of lw_shared_ptr<file> recently, "file" started to use a shared_ptr internally, and is already copy-able and reference counted, and there is no reason to use lw_shared_ptr<file>. This patch cleans up a few remaining places where lw_shared_ptr<file> was used. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-07-22 11:51:40 +03:00
Tomasz Grabiec	f68b771927	sstables: Use lower_bound() and upper_bound() to search the partition index I will need those abstractions later to handle inclusiveness/exclusiveness of both staring and ending bounds. They're also familiar abstractions, so the code is hopefully easier to comprehend now.	2015-07-22 10:27:48 +02:00
Tomasz Grabiec	ff0308104c	sstables: Add data_end_position() on summary page level	2015-07-22 10:27:48 +02:00
Tomasz Grabiec	e9a050da78	sstables: Obtain the key from entries using get_key() rather than casting to bytes_view The entry contains not only the key, but other stuff like position. Why would casting to bytes_view give the view on just the key and not the whole entry. Better to be explicit.	2015-07-22 10:27:48 +02:00
Tomasz Grabiec	0b5f908a0b	sstables: Make key_view comparable with partition_key_view	2015-07-22 10:27:48 +02:00
Tomasz Grabiec	73ccd51cc5	sstables: Add key_view::tri_compare()	2015-07-22 10:27:48 +02:00
Asias He	fa2aee57ac	utils: Move util/serialization.hh to utils/serialization.hh Now we will not have the ugly utils and util directories, only utils.	2015-07-21 16:12:54 +08:00
Raphael S. Carvalho	8faa202e98	sstables: add function to return candidates using size-tiered strategy That's helpful for the purpose of testing, and leveled compaction may also end up using size-tiered compaction strategy for selecting candidates. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-20 12:27:33 -03:00
Raphael S. Carvalho	25f24c0748	sstables: fix size-tiered strategy If old average is equals to new average, then we would remove new average entry. That's totally wrong. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-20 12:26:56 -03:00
Avi Kivity	51e3f0a6df	Merge "Size tiered compaction strategy" from Raphael	2015-07-20 17:29:13 +03:00
Avi Kivity	fee1f68b61	Add changes missing from previous commit	2015-07-20 17:28:45 +03:00
Avi Kivity	4a95f1589c	Merge seastar upstream Adjust make_file_*_stream() callers for updated seastar API.	2015-07-20 17:02:46 +03:00
Raphael S. Carvalho	a99c92f1b6	sstable compaction: add initial support to size-tiered strategy Size-tired strategy basically consists of creating buckets with sstables of nearly the same size. Afterwards, it will find the most interesting bucket, which size must be between min threshold and max threshold. Bucket with the smallest average size is the most interesting one. Bucket hotness is also considered when finding the most interesting bucket, but we don't support this yet. We are also missing some code that discards sstable based on its coldness, i.e. hardly read. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-20 10:08:14 -03:00
Raphael S. Carvalho	d627ede812	sstables: add bytes_on_disk Returns the sum of the size of all sstable components. It will be used by size-tiered strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-20 10:07:59 -03:00
Raphael S. Carvalho	719898d0e5	introduce automatic compaction As the name implies, this patch introduces the concept of automatic compaction for sstables. Compaction task is triggered whenever a new sstable is written. Concurrent compaction on the same column family isn't supported, so compaction may be postponed if there is an ongoing compression. In addition, seastar::gate is used both to prevent a new compaction from starting and to wait for an ongoing compaction to finish, when the system is asked for a shutdown. This patch also introduces an abstract class for compaction strategy, which is really useful for supporting multiple strategies. Currently, null and major compaction strategies are supported. As the name implies, null compaction strategy does nothing. Major compaction strategy is about compacting all sstables into one. This strategy may end up being helpful when adding support to major compaction via nodetool. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-16 12:00:12 +03:00
Raphael S. Carvalho	f7a1a5618b	sstables: add missing #include guard Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-16 07:25:56 +03:00
Tomasz Grabiec	c206699c9d	Merge tag 'avi/logger-config/v1' from seastar-dev.git Logger configuration from Avi.	2015-07-15 11:27:09 +02:00
Tomasz Grabiec	4e5cb05aa4	sstables: Fix heap-buffer-overflow in read_range_rows() When passing tokens corresponding to 129th key in the sstable to read_range_rows(), it failed with heap-buffer-overflow pointing to: return make_ready_future<uint64_t>(index_list[min_index_idx].position); The scenario is as follows. We pass the lower bound token, which corresponds to the first partition of some (not first) summary page. That token will compare less than any entry in that page (even less with the key we took it from, cause we want all partitions with that token), so min_idx will point to the previous summary page (correct). Then this code tries to locate the position in the previous page: auto m = adjust_binary_search_index(this->binary_search(index_list, minimum_key(), min_token)); auto min_index_idx = m >= 0 ? m : 0; binary_search() will return ((-index.list_size()) -1), because the token is greater than anything in that page. So "m" and "min_index_idx" will be (index.list_size()-1) after adjusting. Then the code tried this: auto candidate = key_view(bytes_view(index_list[min_index_idx])); auto tcandidate = dht::global_partitioner().get_token(candidate); if (tcandidate < min_token) { min_index_idx++; } The last key compared less than the token also, so min_index_idx is bumped up to index_list.size(). It then tried to use this too large index on index_list, which caused buffer overflow. We clearly need to return the first position of the next page in this case, and this change does it indirectly by calling data_end_position(), which also handles edge cases like if there is no next summary page. I reimplemented the logic top-down, and found that the last special casing for tcandidate was not needed, so I removed it.	2015-07-14 19:58:17 +02:00
Tomasz Grabiec	2a491b2076	sstables: Fix bug in read_range_rows() The method was using the same summary page for both min and max tokens, whereas they can be different if they're distant enough from each other.	2015-07-14 19:58:17 +02:00
Avi Kivity	99a15de9e5	logger: de-thread_local-ize logger The logger class constructor registers itself with the logger registry, in order to enable dynamically setting log levels. However, since thread_local variables may be (and are) initialized at the time of first use, when the program starts up no loggers are registered. Fix by making loggers global, not thread_local. This requires that the registry use locking to prevent registration happening on different threads from corrupting the registry. Note that technically global variables can also be initialized at the point of first use, and there is no portable way for classes to self-register. However this is the best we can do.	2015-07-14 17:18:11 +03:00
Raphael S. Carvalho	d3a83aa549	sstables: finish streaming_histogram::update This method was incomplete, and thus would fail if map size were greater than max_bin_size, bringing the application down. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-12 11:06:03 +03:00

1 2 3 4 5 ...

297 Commits