scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Pekka Enberg	ae9e3e049c	schema: Improve column_definition operator<< output Make operator<< for column_definition print more information. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-31 13:35:26 +03:00
Pekka Enberg	61d7e8de1c	schema: Add to_string() for column_kind and index_type enums Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-31 13:35:26 +03:00
Pekka Enberg	03e0bcd8cb	database: Add operator<< for keyspace_metadata Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-31 13:35:19 +03:00
Nadav Har'El	f6ae567ab1	repair: implement primaryRange and ranges options This patch implements repair's "primaryRange" and "ranges" options: Without these options, a repair defaults to repair all the ranges for which this nodes holds a replica (each range is repaired by contacting the other replicas of this range). If the "primaryRange" option is passed, instead of repairing all ranges, only the "primary ranges" of this node is repaired - for each range, only one node has this range as its "primary range". The intention is that a user can start a "primaryRange" repair on all nodes, and the result would be that each range will only be repaired once. If the "ranges" option is passed, it can explicitly list a list of ranges to repair, overriding the automatic determination of ranges explained above. Fixes #212. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-31 10:02:03 +03:00
Nadav Har'El	cc4117d6c1	repair: do not use an atomic integer Avi asked not to use an atomic integer to produce ids for repair operations. The existing code had another bug: It could return some id immediately, but because our start_repair() hasn't started running code on cpu 0 yet, the new id was not yet registered and if we were to call repair_get_status() for this id too quickly, it could fail. The solution for both issues is that start_repair() should return not an int, but a future<int>: the integer id is incremented on cpu 0 (so no atomics are needed), and then returned and the future is fulfilled. Note that the future returned by start_repair() does not wait for the repair to be over - just for its index to be registered and be usable to a call to repair_get_status(). Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-31 09:31:19 +03:00
Gleb Natapov	821d81786e	fix timeout of background read repair request Do not set _cl_promise on timeout if timeout happens after cl is achieved. It may happen for background read repair requests.	2015-08-30 19:07:29 +03:00
Gleb Natapov	5bb37bc92e	fix race between speculating read timer and request completion Speculating timer may expire after request is complete, but before a continuation that cancels it runs. In this case the timer should not initiate additional request and just do nothing instead.	2015-08-30 19:07:29 +03:00
Avi Kivity	2ef5816996	Merge seastar upstream * seastar a503442...9cc5cd0 (3): > fstream: fix write-behind on filesystems that don't support fallocate() > fstream: return correct error > fstream: reinitialize _background_writes_done after an error	2015-08-30 15:18:28 +03:00
Avi Kivity	4ec4a4b53c	Merge seastar upstream * seastar 2e041c2...a503442 (4): > fstream: write-behind > output_stream: improve flush() support > thread: initialize stack in debug mode > sharded: do not capture remote service pointer on remote invocation lambda	2015-08-30 12:09:51 +03:00
Avi Kivity	554645db91	Revert "Merge "Move the API configuration from command line to configuration" from Amnon" See issue #59 for details. This reverts commit `5aa0244d32`, reversing changes made to `7fb109a58d`.	2015-08-30 12:09:00 +03:00
Avi Kivity	15987f80cf	Merge "Avoid allocations in the read indexes path" from Glauber "We can avoid small allocations when doing read_index. Doing that will yield us another 4 % gain. Before: 839484.65 +- 585.52 partitions / sec (30 runs, 1 concurrent ops) After: 873323.18 +- 442.52 partitions / sec (30 runs, 1 concurrent ops)"	2015-08-30 08:43:18 +03:00
Glauber Costa	b1bfcda38c	column helper: loop once only while gathering statistics. the code to gather statistics about the column_name is showing in the benchmark. If we really want to collect those statistics, I guess they will never be free because they involve a byte copy which implies an allocation. But one easy thing we can do to make it better, is collect both min and max statistics in the same loop. There is also no need to special case the case of an empty vector, since may_grow will already take care of that. That yields us a ~ 0.77 % boost, which although not earth shattering, is easy enough for us not to reap. Before: 200582.94 +- 293.91 partitions / sec (30 runs, 1 concurrent ops) After: 202120.06 +- 341.95 partitions / sec (30 runs, 1 concurrent ops) Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-30 08:43:02 +03:00
Glauber Costa	aab1ae9dc1	index_entry: don't generate a temporary bytes element The one thing that is still showing pretty high at the read_indexes flamegraph, is allocations. We can, however, do better. Since most of the index is the keys anyway - and we need all of them, the amount of memory we use by copying the buffers over is about the same as the space we would use by just keeping the buffers around. So we can change index_entry to just keep the shared_buffers, and since we always access it through views anyway, that is perfectly fine. The index_entry destructor will then release() the temporary_buffer, instead of doing this after the buffer copy. This gives us a nice additional 4 %. perf_sstable_g --smp 1 --iterations 30 --parallelism 1 --mode index_read Before: 839484.65 +- 585.52 partitions / sec (30 runs, 1 concurrent ops) After: 873323.18 +- 442.52 partitions / sec (30 runs, 1 concurrent ops) Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 14:09:53 -05:00
Glauber Costa	a9ab31dd9c	index_entry: move its fields to private visibility And provide accessors. This will give us the freedom to change their internal storage. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 14:05:36 -05:00
Glauber Costa	1fbd14354f	index_entry: provide a constructor This is a preparation to have their internal fields as private. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 14:05:36 -05:00
Glauber Costa	13d59c9618	index_entry: do away with the disk_string<> fields Now that we are using the NSM, and not the general parser for the index, there is no reason to keep using disk_string<>s in it. Since it is staying in the way of further optimizations, let's get rid of it and use bytes directly. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 14:05:36 -05:00
Glauber Costa	b53511b422	sstables: don't return after processing collections The code as is is blatantly wrong, and is an artifact of the seastar-thread conversion. This happened because the way we move to the next element in a do_for_each future loop, is by returning the current lambda, and so it was converted this way. Since we are now using a for loop, we should not return: we should continue. I found this while searching for a bug, which is unfortunately not fixed by this. But this is totally wrong, and has to be fixed. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 20:37:39 +03:00
Glauber Costa	2623362d20	continuous_data_consumer: do not pass reference to child Since the child is a base class, we don't need to pass a reference: we can just cast our 'this' pointer. By doing that, the move constructor can come back. Welcome back, move constructor. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 20:32:56 +03:00
Avi Kivity	5aa0244d32	Merge "Move the API configuration from command line to configuration" from Amnon "This series address issues #59 and #23. It moves the API configuration from the command line argument to the general config, it also move the api-doc directory to be configurable instead of hard coded." Fixes #59 Fixes #23	2015-08-29 12:34:04 +03:00
Avi Kivity	7fb109a58d	Merge "Types cleanup" from Pekka "Remove type name duplication in types.cc."	2015-08-29 11:48:41 +03:00
Glauber Costa	0dd57fbca8	checksummed file writer: some cleanups - no need to mark us as a friend of file_writer - should be constructing the fields directly instead of using the constructors body. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 11:44:48 +03:00
Glauber Costa	66cc546781	sstable writer: compute checksum at larger chunks What we are doing now, is computing checksum at every write() operation, possibly at a small byte quantity - like 2 or 4 bytes, since we write those a lot as sizes. While adler32 allows those computations and make them very easy, that doesn't mean they are efficient. It is a lot more efficient to compute the checksum on larger buffer. We can do that by doing it at put() time in a data_sink_impl, instead of keeping that in the file abstraction. The code for the checksum itself now also becomes remarkably simpler - since there is no need anymore to keep state: we'll always be presented with full buffers. The data sink implementation and the file_writer share the full_checksum and the checksum struct variables: and with that in place, the file writer can still expose the final results of the computation in the same way it does at present. Benchmarked with: perf_sstable_g --smp 1 --iterations 30 --parallelism 1 --mode write --num_columns 5 --partitions 500000 Before: 178829.07 +- 141.28 partitions / sec (30 runs, 1 concurrent ops) After: 199744.71 +- 201.64 partitions / sec (30 runs, 1 concurrent ops) gain: 11.70 % Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 11:44:47 +03:00
Avi Kivity	e9917a5862	Merge "Improve read index performance further" from Glauber "This patch improves the read_indexes performance by an extra 16 %. The total gain so far is now 98 %, and although there are still things I believe we can do to improve it further, I consider a 2-fold increase sufficient to declare Issue #94 fixed. So: Fixes #94 The speed up is achieved by converting the reader to the NSM. To do that, I had to commonize most parts of the NSM. I had attempted this before, and for this new cycle, I had a new tool to aid me in this task: the sstable performance microbenchmark. Every change to the NSM was individually tested to make sure the performance of the read path was not regressing. When it did regress, I took alternate approaches and tried my best to discuss the whys in the changelogs, with the appropriate result. So I can be quite confident in affirming that we are not taking any drop here, while read_index performance is increased significantly"	2015-08-29 11:28:03 +03:00
Amnon Heiman	f1cda74c15	API: storage_service - return an error for wrong keyspace name This patch addresses issu #155, it adds a helper function that if a keyspace does not exists it throw a bad parameter exception. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-29 11:22:27 +03:00
Glauber Costa	babccb1112	read_indexes: convert to the NSM Reading each member individually is not as efficient. Better convert to the NSM. Before: 717101.20 +- 649.77 partitions / sec (30 runs, 1 concurrent ops) After: 838169.80 +- 575.04 partitions / sec (30 runs, 1 concurrent ops) Gains: 16.88 % Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-28 19:07:39 -05:00
Glauber Costa	4b174c754d	commonize the NSM In order to reuse the NSM in other scenarios, we need to push as much code as possible into a common class. This patch does that, making the continuous_data_consumer class now the main placeholder for the NSM class. The actual readers will have to inherit from it. However, despite using inheritance, I am not using virtual functions at all instead, we let the continuous_data_consumer receive an instance of the derived class, and then it can safely call its methods without paying the cost of virtual functions. In other attempt, I had kept the main process() function in the derived class, that had the responsibility of then coding the loop. With the use of the new pattern, we can keep the loop logic in the base class, which is a lot cleaner. There is a performance penalty associated with it, but it is fairly small: 0.5 % in the sequential_read perf_sstable test. I think we can live with it. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-28 18:56:26 -05:00
Glauber Costa	f8d35ef5ec	sstables: move exception to its own file. I am moving the malformed exception here, to avoid circular dependencies. But since the file now exists, let's move them all. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-28 17:30:44 -05:00
Glauber Costa	d9b7f4bde3	row consumer: separate processing of buffers from the main loop In my previous attempt, I have separated the state processor for the main loop, leaving that to be filled by a derived class. That felt a lot more natural, because then we don't have to replicate the loop logic in the derived classes. But well, oh, well, life is hard. Specially on fast paths. Doing that makes us insert an extra call in this loop, and that is noticeable: we would be 1.5 % slower, and that is not even counting the cost of making the state processing a virtual function later on. I could just argue that this is acceptable due to decoupling gains, but why I would argue that, if I can just rewrite it in a way that no performance is lost? And then I did. The disadvantage of this, is that the derived class will now have to re-code the loop, but no performance is lost. Another advantage of this, is that the derived class will now be able to call into process_buffer directly, without using virtual functions in this path for any of them. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-28 17:30:44 -05:00
Glauber Costa	fbd68c3b01	row consumer: move consume_be to consumer.hh It will be reused by the continuous_data_consumer Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-28 17:30:43 -05:00
Glauber Costa	e1945e473b	row consumer: make non_consuming an instance member It is now a static member that gets the instance members as parameters. There is no reason for that, and this will complicate the decoupling, since the prestate reader won't know about state. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-28 17:18:19 -05:00
Glauber Costa	f45b807f34	row consumer: move proceed class to a separate class Continuing the work of decoupling the the prestate and state parts of the NSM so we can reuse it, move the proceed class to a different holding class. Proceeding or not has nothing to do with "rows". Signed-off-by: Glauber Costa <glommer@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-28 17:18:06 -05:00
Glauber Costa	49ac04a60a	row consumer: fall through more often Because we didn't had before a way to know whether or not the read completed, we would always go back to the main loop, and would only optimize sequential reads for some kinds of data. However, As one could see in the previous patch, the new read_X functions will notify completion, allowing us to just fallthrough to the next case if that is the only possibility. In most cases, it isn't. With this, we can apply this optimization throughout all cases where we don't branch states, and with a very elegant resulting code. The performance actually increases by 0.75 %. It is not much, but it is more than the error margin (which sits at 0.20 %), and because the code is not made unreadable by it, this is a clear win to me. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-28 16:30:22 -05:00
Glauber Costa	1f930cda4a	row consumer: extend use of read for multi-value fields In an attempt to gain some cycles, we are testing whether we can read many values at once, and if so, using consume_be directly for those. What we can do in this situation, is read the first value, and let the read fall through the next case if the read succeeds. The code actually looks a lot more elegant this way. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-28 16:30:22 -05:00
Glauber Costa	0ad8afb0ec	row consumer: extend usage of the read_* functions In some places, we cannot use our read_* functions, because we don't know whether or not it succeeded, and that is important when passing the state along. The fix for this is trivial, since we can just return it from the reader. Note for reviewers: The commend in one of the functions say we should use: "read_bytes(data, _u32, _key ...". But in the actual code, the where buffer is _val, not _key. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-28 16:30:22 -05:00
Glauber Costa	13af0ffbd2	row consumer: fix read_bytes temporary len It shouldn't be _u16, but rather whatever we passed as len. It currently works because all callers pass _u16 as len. But this will soon change. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-28 16:30:22 -05:00
Glauber Costa	62a26ef411	row consumer: don't switch state implicitly Soon enough, all the state machine will be separated from the prestate handling. To make it easier, we will decouple them as much as we can. Not automatically switching states in the read functions is part of this. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-28 16:30:22 -05:00
Avi Kivity	c734ef2b72	Merge seastar upstream * seastar 10e09b0...2e041c2 (7): > Merge "Change app_template::run() to terminate when callback is done" from Tomasz > resource: Fix compilation for hwloc version 1.8.0 > memory: Fix infinite recursion when throwing std::bad_alloc > core/reactor: Throw the right error code when connect() fails > future: improve exception safety > xen: add missing virtual destructors > circular_buffer: do not destroy uninitialized object app_template::run() users updated to call app_template::run_depracated().	2015-08-28 23:52:49 +03:00
Amnon Heiman	800578f164	API: Take the API doc directory from configuration The API doc directory will now be taken from configuration instead of been hard coded. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-28 20:26:30 +03:00
Amnon Heiman	9ef7d1ee69	main: Take the http configuration from the configuration object This replaces the http configuration to use the general configuration object instead of the command line argument. This will allow to configure the API from configuration file and not just from the command line. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-28 20:24:59 +03:00
Amnon Heiman	dd77f7e288	configuration: Add the API configuration to the general configuration This adds the API configuration parameters to the configurtion, so it will be possible to take them from the configuration file or from the command line. The following configuration were defined: api_port api_addres api_ui_dir api_doc_dir Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-28 20:22:55 +03:00
Amnon Heiman	7b1c973884	API: Add doc directory parameter to the http context Adding a parameter to the http context so it will not be hard coded and could be configured. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-28 20:20:20 +03:00
Avi Kivity	012fd41fc0	db: hard dirty memory limit Unlike cache, dirty memory cannot be evicted at will, so we must limit it. This patch establishes a hard limit of 50% of all memory. Above that, new requests are not allowed to start. This allows the system some time to clean up memory. Note that we will need more fine-grained bandwidth control than this; the hard limit is the last line of defense against running our of reclaimable memory. Tested with a mixed read/write load; after reads start to dominate writes (due to the proliferation of small sstables, and the inability of compaction to keep up, dirty memory usage starts to climb until the hard stop prevents it from climbing further and ooming the server).	2015-08-28 14:47:17 +02:00
Pekka Enberg	78b8ca1a2c	types: Unify type names Fix duplicate type names in the types map and the classes themselves. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-28 14:39:46 +03:00
Pekka Enberg	dfbf84ce18	types: Introduce ascii_type_impl and utf8_type_impl classes In preparation for reducing type name duplication, introduce classes for ascii and utf8 types. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-28 14:01:55 +03:00
Avi Kivity	f171d71c16	utils: optimize murmur3_hash data fetch By using a recognized idiom, gcc can optimize the unaligned little endian load as a single instruction (actually less than an instruction, as it combines it with a succeeding arithmetic operation).	2015-08-28 12:37:43 +03:00
Avi Kivity	cb1372523a	Merge "CQL code cleanups" from Pekka "Here's another round of cleanups to the CQL code. Nothing exciting here, mostly moving code to source files which makes changing the code less painful in terms of compilation times."	2015-08-27 18:32:45 +03:00
Pekka Enberg	28aad6fa67	cql3: Move ks_prop_defs implementation to source file Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-27 18:16:28 +03:00
Avi Kivity	7e8c6eddbb	Merge "Buffer related read performance improvement" from Glauber "As we could see, the flamegraphs shows a lot of performance still left in the table. However, from the I/O point of view, we have determined through our write performance testing, that 128k is the sweet spot for buffers. Worse yet: reads are still trapped at 8k. While it is true that when we want to read just a little data, smaller is better, it is also true that reads (and now that includes the index), tend to give hints about the size they want read. So we can read the whole thing at once if smaller than 128k, or chop it at 128k increments if they are not. The performance gains coming from doing this are considerable: 39 % for data, 67 % for index."	2015-08-27 18:07:27 +03:00
Pekka Enberg	c2ff7b67ce	cql3: Move user_types implementation to source file Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-27 17:50:54 +03:00
Avi Kivity	cf0825182e	Merge "New modes for sstable perf tests" from Glauber "index_read, sequential_read, and write"	2015-08-27 17:26:42 +03:00

1 2 3 4 5 ...

6076 Commits