scylladb

Author	SHA1	Message	Date
Raphael S. Carvalho	f7a1a5618b	sstables: add missing #include guard Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-07-16 07:25:56 +03:00
Tomasz Grabiec	c206699c9d	Merge tag 'avi/logger-config/v1' from seastar-dev.git Logger configuration from Avi.	2015-07-15 11:27:09 +02:00
Tomasz Grabiec	4e5cb05aa4	sstables: Fix heap-buffer-overflow in read_range_rows() When passing tokens corresponding to 129th key in the sstable to read_range_rows(), it failed with heap-buffer-overflow pointing to: return make_ready_future<uint64_t>(index_list[min_index_idx].position); The scenario is as follows. We pass the lower bound token, which corresponds to the first partition of some (not first) summary page. That token will compare less than any entry in that page (even less with the key we took it from, cause we want all partitions with that token), so min_idx will point to the previous summary page (correct). Then this code tries to locate the position in the previous page: auto m = adjust_binary_search_index(this->binary_search(index_list, minimum_key(), min_token)); auto min_index_idx = m >= 0 ? m : 0; binary_search() will return ((-index.list_size()) -1), because the token is greater than anything in that page. So "m" and "min_index_idx" will be (index.list_size()-1) after adjusting. Then the code tried this: auto candidate = key_view(bytes_view(index_list[min_index_idx])); auto tcandidate = dht::global_partitioner().get_token(candidate); if (tcandidate < min_token) { min_index_idx++; } The last key compared less than the token also, so min_index_idx is bumped up to index_list.size(). It then tried to use this too large index on index_list, which caused buffer overflow. We clearly need to return the first position of the next page in this case, and this change does it indirectly by calling data_end_position(), which also handles edge cases like if there is no next summary page. I reimplemented the logic top-down, and found that the last special casing for tcandidate was not needed, so I removed it.	2015-07-14 19:58:17 +02:00
Tomasz Grabiec	2a491b2076	sstables: Fix bug in read_range_rows() The method was using the same summary page for both min and max tokens, whereas they can be different if they're distant enough from each other.	2015-07-14 19:58:17 +02:00
Avi Kivity	99a15de9e5	logger: de-thread_local-ize logger The logger class constructor registers itself with the logger registry, in order to enable dynamically setting log levels. However, since thread_local variables may be (and are) initialized at the time of first use, when the program starts up no loggers are registered. Fix by making loggers global, not thread_local. This requires that the registry use locking to prevent registration happening on different threads from corrupting the registry. Note that technically global variables can also be initialized at the point of first use, and there is no portable way for classes to self-register. However this is the best we can do.	2015-07-14 17:18:11 +03:00
Raphael S. Carvalho	d3a83aa549	sstables: finish streaming_histogram::update This method was incomplete, and thus would fail if map size were greater than max_bin_size, bringing the application down. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-12 11:06:03 +03:00
Avi Kivity	dd29ac9593	Merge "cqlsh" from Glauber System table Work to make cqlsh connect.	2015-07-07 19:33:23 +03:00
Glauber Costa	45905ec94d	dht: change partitioner name to sstring It is a better fit for things that are names, not blobs. We have a user that expects a bytes parameter, but that is for no other reason than the fact that the field used to be of bytes type. Let's fix that, and future users will be able to use sstrings Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-07-07 11:38:22 -04:00
Tomasz Grabiec	66dfeb33d7	db: Filter out sstable partitions not belonging to current shard	2015-07-07 16:56:25 +02:00
Pekka Enberg	86d913954a	db/legacy_schema_tables: Store CF "is_dense" to system tables Persist column family's "is_dense" value to system tables. Please note that we throw an exception if "is_dense" is null upon read. That needs to be fixed later by inferring the value from other information like Origin does. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-07 12:36:50 +02:00
Paweł Dziepak	16963b214b	sstables: do not write expired cells Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-07-02 17:26:14 +02:00
Paweł Dziepak	8fa0049cae	sstables: keep current timestamp Current timestamp will be used to filter out expired cells. Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-07-02 17:26:06 +02:00
Nadav Har'El	53aa239865	sstables: remove some dead code Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-07-02 12:13:22 +03:00
Raphael S. Carvalho	c3372c36a2	sstables: keep track of compacted sstable's ancestors In C*, every compacted sstable keeps track of its ancestors in the statistics file. Supposedly, that info is used to discard sstable files from ancestors which for some odd reason weren't deleted. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-30 18:09:01 +03:00
Raphael S. Carvalho	8324ff990c	sstables: add method add_generation metadata_collector was made member of class sstable, such that the compaction procedure will be able to use the method add_generation from a sstable object. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-30 18:09:01 +03:00
Nadav Har'El	c892228018	compaction: remove compacted sstables After compaction, remove the source sstables. This cannot be done immediately, as ongoing reads might be using them, so we mark the sstable as "to be deleted", and when all references to this sstable are lost and the object is destroy, we see this flag and delete the on-disk files. This patch doesn't change the low-level compact_sstables() (which doesn't mark its input sstables for deletion), but rather the higher-level example "strategy" column_family::compact_all_sstables(). I thought we might want to do this to allow in the future strategies that might only mark the input sstables for deletion after doing perhaps other steps and to be sure it doesn't want to abort the compaction and return to the old files. If we decide this isn't needed, we can easily move the mark_for_deletion() call to compact_sstables(). Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-30 15:00:39 +03:00
Raphael S. Carvalho	6dcf136702	sstables: enable trim_to_size option of compressed_file_output_stream Following Nadav's discovery of the problem with large writes to output stream, it turns out that compressed_file_output_stream also needs the option trim_to_ size enabled. Otherwise, a write to compressed_file_output_stream larger than _size would result in a buffer larger than chunk size being flushed, which is definitely wrong. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-28 19:57:07 +03:00
Nadav Har'El	9c7f1744b3	sstables: add missing virtual destructor A base class with virtual functions should also have a virtual destructor, so if someone deletes it by the base class pointer, the concrete class's destructor will be called. I thought this missing virtual destructor is to blame for a bug I was hunting, but it's not - but it's still worth adding this missing definition. The silly "default" definition of the move constructor is also necessary, because when you define the destructor explicitly, the compiler no longer defines any constructors implicitly for you. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-28 13:28:12 +03:00
Nadav Har'El	c7b7cf3ca4	sstables write: don't hide exception Our sstable code currently has a bug (not solved by this patch) in writing large summary files, where several aio write operations are done and one of them fails with an EINVAL. Unfortunately and inexplicably, sstable::write_simple simply hides this exception (catches it and ignores it), so the write never knows it fails, and we only get an exception later when sstable::write_components() tries to load() the sstable it just created. So in this patch, I remove the hiding of the exception, and now when writing an sstable with 1,000,000 partitions, I see this in the output: failed to write sstable: Invalid argument Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-28 12:04:06 +03:00
Raphael S. Carvalho	c5eff7d263	sstables: use move semantics when updating stats Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-28 10:10:36 +03:00
Raphael S. Carvalho	261f1f75d0	sstables: update stats when writing tombstones Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-28 10:10:36 +03:00
Avi Kivity	90412a2b26	Merge "Compression parameter fixes" from Paweł	2015-06-25 18:48:30 +03:00
Paweł Dziepak	89ede1fc00	schema: rename [set, get]_compressor to [set, get]_compressor_params Compressor type is only a part of the information kept in compressor parameters and things like schema.get_compressor().get_compressor() do not look very good. Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-25 17:05:53 +02:00
Raphael S. Carvalho	92054f8391	sstables: fix typo in compaction code s/estimated_parititions/estimated_partitions Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-25 17:45:06 +03:00
Avi Kivity	71e04b412e	Merge "Compression parameters" from Paweł "This patch series introduces compression_parameters class which is used to handle compression options specified at table creation. Such information is now properly propagated to the database internals."	2015-06-25 15:48:15 +03:00
Paweł Dziepak	6845cc0ee5	cql3: propagate compression info to schema and sstables code Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-25 13:33:00 +02:00
Raphael S. Carvalho	b54d35dcbb	sstables: fix use-after-free at the end of compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-25 08:39:10 +03:00
Raphael S. Carvalho	79532b6603	sstables: merge prepare_statistics and add_statistics_metadata The two separate functions can now be merged. As a result, the code that generates statistics data is now much easier to understand. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-24 18:58:17 +03:00
Raphael S. Carvalho	74febd97f2	sstables: avoid assert when generating cardinality metadata Assert should be completely avoided. Instead, we should trigger an exception, allowing the db to proceed with a "smooth" shutdown. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-24 18:58:17 +03:00
Nadav Har'El	27c238d6b7	sstable: fix load of new sstable Apparently, after writing a new sstable, with write_components(), it is necessary to load() it. I'm not sure why, but we get a crash on an aio to a closed file descriptor if we don't. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-24 16:44:25 +03:00
Nadav Har'El	0b297b9f6c	sstable compaction: simplify compact_sstables() function Instead of requiring the user to subclass a "sstable_creator" class to specify how to create a new sstable (or in the future, several of them), switch to an std::function. In practice, it is much easier to specify a lambda than a class, especialy since C++11 made it easy to capture variables into lambdas - but not into local classes. The "commit()" function is also unnecessary. Then intention there was to provide a function to "commit" the new sstables (i.e., rename them). But the caller doesn't need to supply this function - it can just wait for the future of the end of compaction, and do his own committing code right then. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-24 16:44:11 +03:00
Raphael S. Carvalho	0f95066be4	sstables: fix a bug which invalidated generated sstables Column stats min_timestamp, max_timestamp and max_local_deletion_time were being update incorrectly. max_local_deletion_time should be std::numeric_limits<int>::max() by default, and then keep track of max local deletion time, if any. This bug prevented a sstable generated by us from being compacted by c* because max_local_deletion_time was storing std::numeric_limits<int> ::min(), and thus the sstable would be considered fully expired. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-23 12:13:26 -03:00
Raphael S. Carvalho	f831d1bce9	sstables: add support to generate compaction metadata compaction metadata is composed of ancestors and cardinality. ancestors data is generated via compaction process, so it will be empty by the time being. cardinality data is generated by hashing the keys, offering the values to hyperloglog and retrieving a buffer with the data to be stored. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-23 12:13:26 -03:00
Raphael S. Carvalho	954688b327	hyperloglog: modify it according to our needs The first change was to add the function get_bytes, which will create a temporary buffer with the format expected by compaction metadata's cardinality. For creating the format, I had to import write_unsigned_ var_int from stream-lib. write_unsigned_var_int is about using fewer bytes to encode smaller integer values, but will use slighly more bytes to larger values. The last change was to add the function offer_hashed, which receives a 64-bit hashed value instead. Hash algorithm used by c* is murmur hash - hash2_64. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-23 12:13:25 -03:00
Raphael S. Carvalho	2c95fdd7b6	sstables: import hyperloglog implementation Imported from: https://github.com/hideo55/cpp-HyperLogLog HEAD: 3ff431b5af84aa28a6390430bbc60c4678bec763 hyperloglog is a cardinality estimator that will used by sstable compaction metadata stored by Statistics file. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-23 11:59:45 -03:00
Raphael S. Carvalho	0d345dfd32	sstables: extend metadata collector to add ancestors Ancestors is a member of compaction metadata. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-23 11:18:50 -03:00
Nadav Har'El	f26dae3bf9	sstable: basic compaction function This patch adds the basic compaction function sstables::compact_sstables, which takes a list of input sstables, and creates several (currently one) merged sstable. This implementation is pretty simple once we have all the infrastructure in place (combining reader, writer, and a pipe between them to reduce context switches). This is already working compaction, but not quite complete: We'll need to add compaction strategies (which sstables to compact, and when), better cardinality estimator, sstable management and renaming, and a lot of other details, and we'll probably still need to change the API. But we can already write a test for compacting existing sstables (see the next patch), and I wanted to get this patch out of the way, so we can start working on applying compaction in a real use case. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-23 09:48:58 +03:00
Nadav Har'El	6063d4502f	sstable: method for estimating number of partitions in sstable The sstable has a lot of data, but suprisingly, and accurate count of the number of partitions isn't available. We can get a good estimate by looking at the number of summary entries. Based on Origin's IndexSummary.getEstimatedKeyCount(). We need this estimate for compaction if we can't get (yet) a better estimate from the cardinality estimator algorithm. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-23 09:48:57 +03:00
Glauber Costa	6336c02cab	sstables: fix bug with old sstable Some version of Origin will write 0 instead of -1 as the start of range marker for a range tombstone. I've just came across one of such tables, that ended up breaking our code. Let's be more flexible in what we accept. We don't really have a choice. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-22 11:13:42 -04:00
Raphael S. Carvalho	118e4fc8be	sstable: make do_write_components more pleasant to read Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-22 16:27:47 +03:00
Nadav Har'El	00e9f9e06a	sstables: don't use size_t for on-disk sizes As Avi suggested, we should use size_t only for memory sizes, not disk sizes, as some hypothetical 32-bit machine could have 32-bit size_t but still support 64-bit file sizes. So this patch changes a number of places we used size_t in sstables/ to use uint64_t instead. It doesn't change all uses of size_t: Where the size_t refers to a size of an object in memory (or an object that should fit into memory - like the summary file), I left size_t. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-22 08:02:50 +03:00
Nadav Har'El	646c31e596	sstables: take generic mutation_reader instead of memtable The current sstable write interface only knows how to write a memtable. For compaction, we also want it to be able to write the compaction's output, which we can represent as a mutation_reader. So this patch changes the sstable::write_components() method to accept a mutation_reader, and whatever else is needed (a schema and the number of partitions in the reader - or an estimate thereof). Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-21 23:32:17 +03:00
Avi Kivity	7faebf4524	Merge "Bloom filter tracker" from Glauber "This is the code responsible for tracking bloom filter activity. Origin uses atomics, we will just keep local counters and map-reduce (yay!)"	2015-06-21 08:15:54 +03:00
Raphael S. Carvalho	113d3b1001	sstables: update compression ratio stats If compression is used, we should provide both uncompressed and compressed length to metadata collector, so as for the ratio to be computed. Stats metadata stores compression ratio. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-21 08:14:07 +03:00
Glauber Costa	23cae98cd6	sstables: count filter hits and misses Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-18 11:17:36 -04:00
Glauber Costa	272b97f01c	sstables: initialize filter_tracker when filter file is read Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-18 11:17:29 -04:00
Glauber Costa	1c5d2141db	add sstables filter tracker This class will be used to generate filter hit / miss statistics to be consumed by upper layers Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-18 11:17:22 -04:00
Glauber Costa	4238ecffd3	sstables: make sure sstable directory exists In theory, when we create a new column family, we should also make sure that the underlying directory exist. However, this would be quite challenging: there are a lot of entry points for, add_column_family, none of them are futurized, and futurizing them could prove challenging up the call chain. Because we can guarantee that the keyspace directory will exist - now that we have unified that, it is actually a lot simpler to just make sure that the directory exist when writing the sstable. If the keyspace directory wouldn't exist we would have to recurse through the path. As previously said, this patch will assume this away. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-06-18 09:22:58 -04:00
Raphael S. Carvalho	657c817d14	sstables: add support to deflate compression Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-17 11:55:02 -03:00
Raphael S. Carvalho	7244d392cd	sstables: add support to snappy compression Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-17 11:55:00 -03:00

1 2 3 4 5 ...

253 Commits