scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-19 16:15:07 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	27d86dfe18	sstables: Enable skipping to cells at data_consume_context level	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	ed530dfb3a	tests: sstables: Add test for skipping within a compressed stream Refs #2143.	2017-03-13 13:08:24 +01:00
Paweł Dziepak	04b80272f2	cell_locker: add metrics for lock acquisition	2017-03-02 09:05:12 +00:00
Paweł Dziepak	5905729c4a	sstables: read counter cells	2017-02-02 10:35:14 +00:00
Piotr Jastrzebski	4bbe05dd47	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Avi Kivity	1d9ee358f1	Revert "Merge "Reduce the size of mutation_partition" from Piotr" This reverts commit `aa392810ff`, reversing changes made to a24ff47c637e6a5fd158099b8a65f1191fc2d023; it uses boost::intrusive::detail directly, which it must not, and doesn't compile on all boost versions as a consequence.	2016-12-25 16:07:48 +02:00
Piotr Jastrzebski	2af6ff68d9	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:29:07 +01:00
Gleb Natapov	9222a47fed	sstable test: add test for generated summary data Message-Id: <20161117155051.GV6765@scylladb.com>	2016-11-20 19:50:45 +02:00
Avi Kivity	7faf2eed2f	build: support for linking statically with boost Remove assumptions in the build system about dynamically linked boost unit tests. Includes seastar update which would have otherwise broken the build.	2016-10-26 08:51:21 +03:00
Paweł Dziepak	25b91c51e2	ssables: add data_consume_rows_context::reset() reset() is going to be used to restore valid state after fast forwarding the reader. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Raphael S. Carvalho	1f31223f32	sstables: store schema in sstable object That will be needed for optimization that will store decorated keys in the sstable object, and also for a subsequent work that will detect wrong metadata (min/max column names) by looking at columns in the schema. As schema is stored in sstable, there's no longer a need to store ks and cf names in it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:17 -03:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Nadav Har'El	fc063ae62d	tests: add test for promoted index writing In this unit test, we create using Scylla C++ code, the same large partition with 13520 CQL rows as we previously imported from Cassandra for the large partition test. We then verify that the sstable index file we just wrote is byte-for-byte identical to the one previously created by Cassandra. They should indeed be identical, because the data file has the same layout (even if timestamps are different) and our default promoted- index block size is the same (64K) so the sample of columns should be identical. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:15 +03:00
Nadav Har'El	1975abbfd6	sstables: disk_read_range Currently, the main sstable data parsing entry point data_consume_rows() takes a contiguous range of bytes to read from disk and parse. This range is supposed to be an entire partition or contiguous group of partitions. and is self contained (can be parsed without extra information about the identity of these partitions). For the promoted index feature (which we will add in a following patch) we will want the range to span only a part of a partition, and will need the caller to provide some information not available to the parser (such as the partition's key). In the future, we will also want to support a vector of byte ranges, instead of just one. So in preparation for this, this patch simply replaces the start/end pair by a new class disk_read_range, which can be easily extended in later patches. No new functionality is introduced in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:02 +03:00
Nadav Har'El	3dd079fb7a	tests: add test for reading parts of a large partition This patch adds a test that takes an sstable with one partition of 13,520 clustering rows (spanning 700 KB in the data file), and attempts to read various slices CQL rows, counting that we got back the expected number of rows. The sstable included here was generated by Cassandra, and includes a promoted index. Promoted index reading is not supported yet (we will add it in the next patch), so for now the code will always read the entire partition from disk; But still the clustering-key filtering is already functional, and will drop some of the rows as requested, so this test will pass. Later, when we add promoted index support, we should check that this test still passes - promoted index will make the reads in this test more efficient (which the test cannot verify), but the important thing to check is that it doesn't break any of these tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:46:59 +03:00
Paweł Dziepak	9e8db53c46	sstables: allow row consumer to stop at any point Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Raphael S. Carvalho	70b793e4d3	tests: add test for statistics rewrite Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-05-20 17:26:12 -03:00
Raphael S. Carvalho	5aeeb0b3e8	compaction: add support to parallel compaction on the same column family It was noticed that small sstables will accumulate for a column family because scylla was limited to two compaction per shard, and a column family could have at most one compaction running at a given shard. With the number of sstables increasing rapidly, read performance is degraded. At the moment, our compaction manager works by running two compaction task handlers that run in parallel to the rest of the system. Each task handler gets to run when needed, gets a column family from compaction manager queue, runs compaction on it, and goes to sleep again. That's basically its cycle. Compaction manager only allows one instance of a column family to be on its queue, meaning that it's impossible for a column family to be compacted in parallel. One compaction starts after another for a given column family. To solve the problem described, we want to concurrently run compaction jobs of a column family that have different "size tier" (or "weight"). For those unfamiliar, compaction job contains a list of sstables that will be compacted together. The "size tier" of a compaction job is the log of the total size of the input sstables. So a compaction job only gets to run if its "size tier" is not the same of an ongoing compaction. There is no point in compacting concurrently at the same "size tier", because that slows down both compactions. We will no longer queue column families in compaction manager. Instead, we create a new fiber to run compaction on demand. This fiber that runs asynchronously will do the following: 1) Get a compaction job from compaction strategy. 2) Calculate "size tier" of compaction job. 3) Run compaction job if its "size tier" is not the same of an ongoing compaction for the given column family. As before, it may decide to re-compact a column family based on a stat stored in column family object. Ran all compaction-related dtests. Fixes #1216. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d30952ff136192a522bde4351926130addec8852.1462311908.git.raphaelsc@scylladb.com>	2016-05-04 11:46:09 +03:00
Glauber Costa	60ab3b3f50	sstable_tests: make sure the generation of the Summary is sane When we recreate the summary from a missing Summary, we should make sure it is generated sanely, and that it resembles the Summary that would have otherwise been there. In this tests we'll grab one of the Summary tests we've been doing, and just apply them to the non-existent Summary file. We expect the same results on those cases. Plus, a new test is added with some sanity checking. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:55:01 -04:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Raphael Carvalho	370b1336fe	service: fix refresh Vlad and I were working on finding the root of the problems with refresh. We found that refresh was deleting existing sstable files because of a bug in a function that was supposed to return the maximum generation of a column family. The intention of this function is to get generation from last element of column_family::_sstables, which is of type std::map. However, we were incorrectly using std::map::end() to get last element, so garbage was being read instead of maximum generation. If the garbage value is lower than the minimum generation of a column family, then reshuffle_sstables() would set generation of all existing sstables to a lower value. That would confuse our mechanism used to delete sstables because sstables loaded at boot stage were touched. Solution to this problem is about using rbegin() instead of end() to get last element from column_family::_sstables. The other problem is that refresh will only load generations that are larger than or equal to X, so new sstables with lower generation will not be loaded. Solution is about creating a set with generation of live SSTables from all shards, and using this set to determine whether a generation is new or not. The last change was about providing an unused generation to reshuffle procedure by adding one to the maximum generation. That's important to prevent reshuffle from touching an existing SSTable. Tested 'refresh' under the following scenarios: 1) Existing generations: 1, 2, 3, 4. New ones: 5, 6. 2) Existing generations: 3, 4, 5, 6. New ones: 1, 2. 3) Existing generations: 1, 2, 3, 4. New ones: 7, 8. 4) No existing generation. No new generation. 5) No existing generation. New ones: 1, 2. I also had to adapt existing testcase for reshuffle procedure. Fixes #1073. Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com> Message-Id: <1c7b8b7f94163d5cd00d90247598dd7d26442e70.1458694985.git.raphaelsc@scylladb.com>	2016-03-23 10:21:58 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Raphael S. Carvalho	031bf57c19	sstables: bail out if toc exists for generation used by write_components Currently, if sstable::write_components() is called to write a new sstable using the same generation of a sstable that exists, a temporary TOC will be unconditionally created. Afterwards, the same sstable::write_components() will fail when it reaches sstable::create_data(). The reason is obvious because data component exists for that generation (in this scenario). After that, user will not be able to boot scylla anymore because there is a generation with both a TOC and a temporary TOC. We cannot simply remove a generation with TOC and temporary TOC because user data will be lost (again, in this scenario). After all, the temporary TOC was only created because sstable::write_components() was wrongly called with the generation of a sstable that exists. Solution proposed by this patch is to trigger exception if a TOC file exists for the generation used. Some SSTable unit tests were also changed to guarantee that we don't try to overwrite components of an existing sstable. Refs #1014. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <caffc4e19cdcf25e4c6b9dd277d115422f8246c4.1457643565.git.raphaelsc@scylladb.com>	2016-03-11 09:22:51 +02:00
Glauber Costa	a339296385	database: turn sstable generation number into an optional This patch makes sure that every time we need to create a new generation number - the very first step in the creation of a new SSTable, the respective CF is already initialized and populated. Failure to do so can lead to data being overwritten. Extensive details about why this is important can be found in Scylla's Github Issue #1014 Nothing should be writing to SSTables before we have the chance to populate the existing SSTables and calculate what should the next generation number be. However, if that happens, we want to protect against it in a way that does not involve overwriting existing tables. This is one of the ways to do it: every column family starts in an unwriteable state, and when it can finally be written to, we mark it as writeable. Note that this cannot be a part of add_column_family. That adds a column family to a db in memory only, and if anybody is about to write to a CF, that was most likely already called. We need to call this explicitly when we are sure we're ready to issue disk operations safely. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:06:05 -05:00
Glauber Costa	8e4bf025ae	sstables: wire priority for read path All the SSTable read path can now take an io_priority. The public functions will take a default parameter which is Seastar's default priority. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Glauber Costa	74fbd8fac0	do not call open_file_dma directly We have an API that wraps open_file_dma which we use in some places, but in many other places we call the reactor version directly. This patch changes the latter to match the former. It will have the added benefit of allowing us to make easier changes to these interfaces if needed. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>	2016-01-05 10:37:57 +02:00
Avi Kivity	47499dcf18	data_value: make conversion from bytes explicit Since bytes is a very generic value that is returned from many calls, it is easy to pass it by mistake to a function expecting a data_value, and to get a wrong result. It is impossible for the data_value constructor to know if the argument is a genuine bytes variable, a data_value of another type, but serialized, or some other serialized data type. To prevent misuse, make the data_value(bytes) constructor (and complementary data_value(optional<bytes>) explicit.	2015-11-13 17:12:29 +02:00
Avi Kivity	2c3591cbd9	data_value de-any-fication We use boost::any to convert to and from database values (stored in serlialized form) and native C++ values. boost::any captures information about the data type (how to copy/move/delete etc.) and stores it inside the boost::any instance. We later retrieve the real value using boost::any_cast. However, data_value (which has a boost::any member) already has type information as a data_type instance. By teaching data_type intances about the corresponding native type, we can elimiante the use of boost::any. While boost::any is evil and eliminating it improves efficiency somewhat, the real goal is growing native type support in data_type. We will use that later to store native types in the cache, enabling O(log n) access to collections, O(1) access to tuples, and more efficient large blob support.	2015-10-30 17:38:51 +01:00
Glauber Costa	54aaa58899	sstable_tests: test reshuffle operation Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	a8db2b28c7	sstable tests: test set_generation No code works until it's been tested. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	c5950c7bf7	sstable_test: get rid of frees They exist. They shouldn't. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	f60021f87f	sstable_tests: commonize code to compare two components. The current codes assumes a particular dir/generation pair. We will use it for a more generic case. This code could really use some clean up, by the way. We should do it later. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:06:22 +02:00
Glauber Costa	fcebf6f72d	sstable tests: don't use set_generation method There is no reason aside from testing for a table to just change its generation number. There will be, however, when we support loading new sstables. The method however needs to be completely rewritten, so let's make sure the tests are not using that. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Glauber Costa	a9ab31dd9c	index_entry: move its fields to private visibility And provide accessors. This will give us the freedom to change their internal storage. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 14:05:36 -05:00
Glauber Costa	13d59c9618	index_entry: do away with the disk_string<> fields Now that we are using the NSM, and not the general parser for the index, there is no reason to keep using disk_string<>s in it. Since it is staying in the way of further optimizations, let's get rid of it and use bytes directly. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 14:05:36 -05:00
Glauber Costa	93e55969f2	sstables: modify read_indexes so it no longer takes a quantity read_indexes was one of the first functions coded in the sstable read path. At the time, I made the (now so obviously) wrong decision to code it generic enough so that we could specify the number of items to be read, instead of an upper bound in the file. The main reason for that, was that without the Summary, we have no way to know where to stop reading, and the Summary is a relatively new addition to the C* codebase: while I didn't really check when it got in, the code is full of tests for its presence. That turned out to be totally useless: we always read the indexes with the help of the Summary. While the Summary is a relatively new addition to C*, it is present in all version we aim to support. Meaning that reads without the Summary will never happen in our codebase. Even if, in the future, we happen to ditch the Summary file, we are very likely to do so in favor of some other structure that also allows us to manipulate precise borders in the Index. The code as it is, however, would not be too big of a problem if that wasn't causing us performance problems. But it is, and the majority of it is caused by the fact that our underlying read_indexes do not know in advance how many bytes to read, forcing us to do an element-per-element read. It's time for a change. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-27 16:44:25 +03:00
Glauber Costa	976de6f6f4	sstables: get cf and ks strings for filename We will need them to properly build names in some situations. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-07 08:31:55 -05:00
Glauber Costa	cd8c9ad288	sstables: add ks and cf name to sstable constructor When a schema is available, we use it. However, we have, by now, way too many tests. Some of them use tables for which we don't even know the schema. It would have been a massive amount of work to require a schema for all of them - so I am keeping both constructors around. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-07 08:31:55 -05:00
Raphael S. Carvalho	004af400de	tests: add method in sstables::test to write components Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-06 17:39:05 +03:00
Avi Kivity	c720cddc5c	tests: mv tests/urchin/* -> tests/ Now that seastar is in a separate repository, we can use the tests/ directory.	2015-08-05 14:16:52 +03:00

41 Commits