scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 09:30:45 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	66292c0ef0	sstables: Fix bug in promoted index generation maybe_flush_pi_block, which is called for each cell, assumes that block_first_colname will be empty when the first cell is encountered for each partition. This didn't hold after writing partition which generated no index entry, because block_first_colname was cleared only when there way any data written into the promoted index. Fix by always clearing the name. The effect was that the promoted index entry for the next partition would be flushed sooner than necessary (still counting since the start of the previous partition) and with offset pointing to the start of the current partition. This will cause parsing error when such sstable is read through promoted index entry because the offset is assumed to point to a cell not to partition start. Fixes #1567 Message-Id: <1470909915-4400-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `f1c2481040`)	2016-08-11 13:09:05 +03:00
Nadav Har'El	47bf8181af	Avoid some warnings in debug build The sanitizer of the debug build warns when a "bool" variable is read when containing a value not 0 or 1. In particular, if a class has an uninitialized bool field, which class logic allows to only be set later, then "move"ing such an object will read the uninitialized value and produce this warning. This patch fixes four of these warnings seen in sstable_test by initializing some bool fields to false, even though the code doesn't strictly need this initialization. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com> (cherry picked from commit `c2e4f5ba16`)	2016-08-09 16:58:27 +03:00
Nadav Har'El	8d542221eb	Fix failing tests Commit `0d8463aba5` broke some of the tests with an assertion failure about local_is_initialized(). It turns out that there is more than one level of local_is_initialized() we need to check... For some tests, neither locals were initialized, but for others, one was and the other wasn't, and the wrong one was tested. With this patch, all unit tests except "flush_queue_test.cc" pass on my machine. I doubt this test is relevant to the promoted index patches, but I'll continue to investigate it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com> (cherry picked from commit `bce020efbd`)	2016-08-09 16:58:27 +03:00
Nadav Har'El	c0e387e1ac	sstables: promoted index write support This patch adds writing of promoted index to sstables. The promoted index is basically a sample of columns and their positions for large partitions: The promoted index appears in the sstable's index file for partitions which are larger than 64 KB, and divides the partition to 64 KB blocks (as in Cassandra, this interval is configurable through the column_index_size_in_kb config parameter). Beyond modifying the index file, having a promoted index may also modify the data file: Since each of blocks may be read independently, we need to add in the beginning of each block the list of range tombstones that are still open at that position. See also https://github.com/scylladb/scylla/wiki/SSTables-Index-File Fixes #959 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `0d8463aba5`)	2016-08-09 16:58:27 +03:00
Paweł Dziepak	93cc4454a6	streamed_mutation: emit range_tombstones directly Originally, streamed_mutations guaranteed that emitted tombstones are disjoint. In order to achieve that two separate objects were produced for each range tombstone: range_tombstone_begin and range_tombstone_end. Unfortunately, this forced sstable writer to accumulate all clustering rows between range_tombstone_begin and range_tombstone_end. However, since there is no need to write disjoint tombstones to sstables (see #1153 "Write range tombstones to sstables like Cassandra does") it is also not necessary for streamed_mutations to produce disjoint range tombstones. This patch changes that by making streamed_mutation produce range_tombstone objects directly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:18 +01:00
Duarte Nunes	ad8ff1df7e	sstables: Replace composite class This patch replaces the sstables::composite class with the one in compound_compat.hh. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-11 16:55:11 +02:00
Avi Kivity	8dab93a853	sstables: fix low disk utilization with compression and small chunk lengths As Nadav notes we use the chunk length as the buffer size for the compressed stream too. Fix by using it only for the outer (uncompressed) stream; the inner (compressed) stream uses the sstable buffer size, 128 kiB. Fixes #1402. Message-Id: <1467910556-5759-1-git-send-email-avi@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2016-07-07 18:13:30 +01:00
Paweł Dziepak	5bc51821fe	sstables: allow writing unsealed sstables The purpose of this patch is to split the actions of writing sstable and sealing it. As long as the sstable is unsealed it is considered incomplete and is going to be removed on reboot. Such functionality is needed in order to defer visibility of sstables created during streaming until the streaming is complete. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Paweł Dziepak	a7b6c1110f	sstables: do not require seal_sstable() to be run in thread Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Raphael S. Carvalho	cc6c383249	sstables: properly keep track of max local deletion time We weren't updating max local deletion time for cells that contain ttl, or for tombstone cells. If there is a live cell with no ttl, then max local deletion time is supposed to store maximum value, which means that the sstable will not be fully expired later on. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:13:24 -03:00
Paweł Dziepak	4acf77d755	sstables: drop unused data_stream_at() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:43 +01:00
Paweł Dziepak	2cdf498bbd	sstables: close input stream in sstable::data_read() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:42 +01:00
Paweł Dziepak	8931b939a1	sstables: use finally() to close input streams Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:42 +01:00
Paweł Dziepak	b150720361	sstable: enable read ahead Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 13:18:24 +01:00
Paweł Dziepak	55a6911d7a	sstables: close input_stream<> properly If read ahead is going to be enabled it is important to close input_stream<> properly (and wait for completion) before destroying it. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	e44e12c74a	sstables: drop no longer needed code Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	18a9ee105f	sstables: add consumer-style sstable writer sstable_writer encapsulates all logic related to writing sstable. Previously introduced component_writer is used to write actual mutations. sstable_writer is intended to be used with consume_flattened_in_thread(). Its purpose is to be used by higher-level consumer that needs to write possibly more than one sstable (sstable compaction is an example of such consumer). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	0e8b8463ba	sstables: introduce consumer-style components writer This patch rewrites do_write_components() so that it can use consume_flattened_in_thread(). All components-writing code is moved to a new consumer: component_writer. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	737eb73499	mutation_reader: make readers return streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Duarte Nunes	70083efee2	sstables: Read and write range tombstone bounds This patch uses the composite_marker to add inclusiveness information to the prefixes of a range tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	91aac30f12	mutations: Row tombstones are now a set of ranges This patch changes the type of the mutation partition's row_tombstones to be a range_tombstone_list, so that they are now represented as a set of disjoint ranges. All of its usages are updated accordingly. Fixes #1155 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Pekka Enberg	94c35cc135	sstables/sstables: Add sstable filename to thrown malformed_sstable_exceptions	2016-06-01 17:11:05 +03:00
Raphael S. Carvalho	74c8a87777	sstables: fix statistics rewrite It's not working because it tries to overwrite existing statistics file with exclusive flag. It's fixed by writing new statistics into temporary file and renaming it into place. If Scylla failed in middle of rewrite, a temporary file is left over. So boot code was adjusted to delete a temporary file created by this rewrite procedure. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-05-20 17:24:15 -03:00
Raphael S. Carvalho	ee0f66eef6	db: fix migration of sstables with level greater than 0 Refresh will rewrite statistics of any migrated sstable with level > 0. However, this operation is currently not working because O_EXCL flag is used, meaning that create will fail. It turns out that we don't actually need to change on-disk level of a sstable by overwriting statistics file. We can only set in-memory level of a sstable to 0. If Scylla reboots before all migrated sstables are compacted, leveled strategy is smart enough to detect sstables that overlap, and set their in-memory level to 0. Fixes #1124. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-05-17 11:08:08 -03:00
Avi Kivity	ee7225a9cb	sstables: silence atomic deletion cancellation logs during sstable deletion Those logs are expected during shutdown.	2016-05-07 20:37:49 +03:00
Avi Kivity	43221fc7e2	sstables: make delete_atomically() throw a distinct exception when cancelled Throwing a runtime_error makes it impossible to catch the cancellation exception, so replace it with a distinct exception class.	2016-05-07 20:37:46 +03:00
Calle Wilund	49d3d79dfe	sstables: Fix compilation error on boost 1.55 Message-Id: <1461067254-526-2-git-send-email-calle@scylladb.com>	2016-04-25 12:54:44 +03:00
Pekka Enberg	3f2286d02e	Merge "Delete compacted sstables atomically" from Avi "If we compact sstables A, B into a new sstable C we must either delete both A and B, or none of them. This is because a tombstone in B may delete data in A, and during compaction, both the tombstone and the data are removed. If only B is deleted, then the data gets resurrected. Non-atomic deletion occurs because the filesystem does not support atomic deletion of multiple files; but the window for that is small and is not addressed in this patchset. Another case is when A is shared across multiple shards (as is the case when changing shard count, or migrating from existing Cassandra sstables). This case is covered by this patchset. Fixes #1181."	2016-04-14 22:04:15 +03:00
Avi Kivity	3798d04ae8	sstables: convert sstable::mark_for_deletion() to atomic deletion infrastructure All deletions must go through the same data structure, or some atomic deletions will never be satisified.	2016-04-14 17:14:26 +03:00
Avi Kivity	2ba584db8d	sstables: add delete_atomically(), for atomically deleting multiple sstables When we compact a set of sstables, we have to remove the set atomically, otherwise we can resurrect data if the following happens: insert data to sstable A insert tombstone to sstable B compact A+B -> C (removing both data and tombstone) delete B only read data from A Since an sstable may be shared by multiple shard, and each shard performs compaction at a different time, we need to defer deletion of an sstable set until all shards agree that the set can be deleted. An additional atomicity issue exists because posix does not provide a way to atomically delete multiple files. This issue is not addressed by this patch.	2016-04-14 17:14:26 +03:00
Pekka Enberg	60352f810a	Merge "Fixes for the reading of missing Summary" from Glauber "This patchset contains some fixes spotted during post-merged review by {Nad,}av{,i}. I don't consider any of them a must for backport to 1.0, but since we haven't yet even backported the main series, might as well backport everything. It also includes some unit tests to make sure that they will be kept working in the future."	2016-04-13 11:32:05 +03:00
Raphael S. Carvalho	15246f31f7	sstables: fix incorrect sstable size when compression is enabled Size of uncompressed sstable was being unconditionally used to determine when to stop writing a table. When compression is enabled, compressed size should be used instead. Problem affected Scylla when compression and leveled strategy were used. Fixes #1177. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d9bf26def41fb33ca297f4127ce042b7f67adf96.1460484529.git.raphaelsc@scylladb.com>	2016-04-13 09:01:01 +03:00
Glauber Costa	114ba5e3a8	be robust against broken summary files Now that we can boot without a Summary file, we can just as easily boot with a broken one. Suggested by Nadav, and it is actually very easy to do, so do it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:55:01 -04:00
Glauber Costa	72dc45999d	review fixes for generate_summary Spotted by Avi post-merge 1) Need to close the file 2) Should be using the parameter pc instead of the default_class Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:55:01 -04:00
Glauber Costa	f78f43850d	clear components if reading toc fail This shouldn't be a problem in practice, because if read_toc() fails, the users will just tend to discard the sstable object altogether, and not insist on using it. However, if somebody does try to keep using it, a subsequent read_toc() could theoretically have some components filled up leading the new reader to believe the toc was populated successfully. It is easier to just clear the _components set and never worry about it, than trying to reason about whether or not that could happen. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:55:01 -04:00
Glauber Costa	8a50b027aa	summary: generate one if it is not present There are cases in which a Summary file will not be present, and imported SSTables will have just the Index and Data files. In earlier versions of Cassandra, a Summary didn't exist, so one may not be generated when migrating. In Issue #1170, we can see an example of tables generated by CQLSSTableWriter, and they lack a Summary. Cassandra is robust against this and can cope perfectly with the Summary not existing. I will argue that we should do the same. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-08 17:14:29 -04:00
Glauber Costa	4de26fdec8	sstables: allow read_toc to be called more than once We do that by bailing immediately if we detect that the components map is already populated. This allow us to call read_toc() earlier if we need to - for instance, to inquire about the existence of the Summary - without the need to re-read the components again later. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-08 17:14:29 -04:00
Glauber Costa	736e21222e	sstables: avoid passing schema unnecessarily for prepare_summary we can just pass the min interval as a parameter and avoid having the schema do yet another hop. For sealing the summary, it is completely unused and we can do away with it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-08 17:14:29 -04:00
Glauber Costa	0de3a32147	index reader: make index_consumer a template parameter This is done so we can use other consumers. An example of that, is regeneration of the Summary from an existing Index. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-08 17:14:29 -04:00
Glauber Costa	8453ff7788	make get_sstable_key_range an instance method Because just creating an SSTable object does not generate any I/O, get_sstable_key_range should be an instance method. The main advantage of doing that is that we won't have to read the summary twice. The way we're doing it currently, if happens to be a shard-relevant table we'll call load() - which reads the summary again. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-08 17:14:29 -04:00
Glauber Costa	6ae601a025	do not re-read the summary There are times in which we read the Summary file twice. That actually happens every time during normal boot (it doesn't during refresh). First during get_sstable_key_range and then again during load(). Every summary will have at least one entry, so we can easily test for whether or not this is properly initialized. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-08 17:14:29 -04:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Raphael S. Carvalho	e15ce5eb4d	api: Add support to get column family compression ratio After this change, user can query compression ratio on a per column family basis with 'nodetool cfstats'. look at 'nodetool cfstats' output: ./bin/nodetool cfstats ks.test5 Keyspace: ks Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Flushes: 0 Table: test5 SSTable count: 1 Space used (live): 4774 Space used (total): 4774 Space used by snapshots (total): 0 Off heap memory used (total): 131384 SSTable Compression Ratio: 0.833333 ... Fixes #636. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <a1bee5a23fe63787df3e387a88f2d216ba4a4134.1459802771.git.raphaelsc@scylladb.com>	2016-04-05 12:46:40 +03:00
Gleb Natapov	70575699e4	commitlog, sstables: enlarge XFS extent allocation for large files With big rows I see contention in XFS allocations which cause reactor thread to sleep. Commitlog is a main offender, so enlarge extent to commitlog segment size for big files (commitlog and sstable Data files). Message-Id: <20160404110952.GP20957@scylladb.com>	2016-04-04 14:15:00 +03:00
Raphael Carvalho	d515a7fd85	sstables: fix deletion of sstable with temporary TOC After `4e52b41a4`, remove_by_toc_name() became aware of temporary TOC files, however, it doesn't consider that some components may be missing if temporary TOC is present. When creating a new sstable, the first thing we do is to write all components into temporary TOC, so content of a temporary TOC isn't reliable until it is renamed. Solution is about implementing the following flow (described by Avi): "Flow should be: - remove all components in parallel - forgive ENOENT, since the compoent may not have been written; otherwise deletion error should be raised - fsync the directory - delete the temporary TOC " This problem can be reproduced by running compaction without disk space, so compaction would fail and leave a partial sstable that would be marked for deletion. Afterwards, remove_by_toc_name() would try to delete a component that doesn't exist because it looked at the content of temporary TOC. Fixes #1095. Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com> Message-Id: <0cfcaacb43cc5bad3a8a7ea6c1fa6f325c5de97d.1459194263.git.raphaelsc@scylladb.com>	2016-03-29 10:38:01 +03:00
Calle Wilund	4e52b41a46	sstables: Add delete func to rename TOC ensuring table is marked dead Note: "normal" remove_by_toc_name must now be prepared for and check if the TOC of the sstable is already moved to temp file when we get to the juicy delete parts. Message-Id: <1458575440-505-1-git-send-email-calle@scylladb.com>	2016-03-24 12:01:53 +02:00
Benoît Canet	3b1d3d977d	exceptions: Shutdown communications on non file I/O errors Apply the same treatment to non file filesystem I/O errors. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:54 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Vlad Zolotarov	ce47fcb1ba	sstables: properly account removal requests The same shard may create an sstables::sstable object for the same SStable that doesn't belong to it more than once and mark it for deletion (e.g. in a 'nodetool refresh' flow). In that case the destructor of sstables::sstable accounted the deletion requests from the same shard more than once since it was a simple counter incremented each time there was a deletion request while it should account request from the same shard as a single request. This is because the removal logic waited for all shards to agree on a removal of a specific SStable by comparing the counter mentioned above to the total number of shards and once they were equal the SStable files were actually removed. This patch fixes this by replacing the counter by an std::unordered_set<unsigned> that will store a shard ids of the shards requesting the deletion of the sstable object and will compare the size() of this set to smp::count in order to decide whether to actually delete the corresponding SStable files. Fixes #1004 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1457886812-32345-1-git-send-email-vladz@cloudius-systems.com>	2016-03-14 11:45:08 +02:00
Raphael S. Carvalho	1ff7d32272	sstables: make write_simple() safer by using exclusive flag We should guarantee that write_simple() will not try to overwrite an existing file. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <194bd055f1f2dc1bb9766a67225ec38c88e7b005.1457818073.git.raphaelsc@scylladb.com>	2016-03-14 11:45:00 +02:00

1 2 3 4 5 ...

304 Commits