scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	743cf43847	sstables: Avoid checksum_combine() for the crc32 checksummer checksum_combine() is much slower than re-feeding the buffer to checksum() for the zlib CRC32 checksummer. Introduce Checksum::prefer_combine() to determine this and select more optimal behavior for given checksummer. Improves performance of memtable flush with compression enabled by 30%.	2018-11-26 18:57:33 +01:00
Tomasz Grabiec	88cf1c61ba	sstables: compress: Avoid unnecessary checksum_combine()	2018-11-26 14:31:38 +01:00
Tomasz Grabiec	8372cf7bcc	sstables: checksum_utils: Add missing include	2018-11-26 14:31:38 +01:00
Rafael Ávila de Espíndola	6746907999	Use fully covered switches in continuous_data_consumer do_process_buffer had two unreachable default cases and a long if-else-if chain. This converts the the if-else-if chain to a switch and a helper function. This moves the error checking from run time to compile time. If we were to add a 128 bit integer for example, gcc would complain about it missing from the switch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181125221451.106067-1-espindola@scylladb.com>	2018-11-25 22:52:11 +00:00
Raphael S. Carvalho	2058001f94	sstables/compaction: propagate sstable replacement to all compaction of a CF This is needed for parallel compaction to work with sstable run based approach. That's because regular compaction clones a set containing all sstables of its column family. So compaction A can potentially hold a reference to a compacting sstable of compaction B, so preventing compacting B from releasing its exhausted sstable. So all replacements are propagated to all compactions of a given column family, and compactions in turn, including the one which initiated the propagation, will do the replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:30 -02:00
Raphael S. Carvalho	953fdcc867	sstables: store cf pointer in compaction_info motivation is that we need a more efficient way to find compactions that belong to a given column family in compaction list. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:28 -02:00
Raphael S. Carvalho	824c20b76d	sstables: add sstable's on closed handling Motivation is that it will be useful for catching regression on compaction when releasing early exhausted sstables. That's because sstable's space is only released once it's closed. So this will allow us to write a test case and possibly use it for entities holding exhausted sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:25 -02:00
Raphael S. Carvalho	e88d1d54b9	sstables/compaction_manager: prevent partial run from being selected for compaction Filter out sstable belonging to a partial run being generated by an ongoing compaction. Otherwise, that could lead to wrong decisions by the compaction strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:22 -02:00
Raphael S. Carvalho	23884fe9f6	compaction: use same run identifier for sstables generated by same compaction SSTables composing the same run will share the same run identifier. Therefore, a new compaction strategy will be able to get all sstables belong to the same run from sstable_set, which now keeps track of existing runs. Same UUID is passed to writers of a given compaction. Otherwise, a new UUID is picked for every sstable created by compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:20 -02:00
Raphael S. Carvalho	4f68cb34a6	sstables: introduce sstable run sstable run is a structure that will hold all sstables that has the same run identifier. All sstables belonging to the same run will not overlap with one another. It can be used by compaction strategy to work on runs instead of individual sstables. sstable_set structure which holds all sstables for a given column family will be responsible for providing to its user an interface to work with runs instead of individual sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:18 -02:00
Raphael S. Carvalho	fc92fb955d	sstables/compaction_manager: release reference to exhausted sstable through callback That's important for the reference to sstable to not be kept throughout the compaction procedure, which would break the goal of releasing space during compaction. Manager passes a callback to compaction which calls it whenever there's sstable replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:16 -02:00
Raphael S. Carvalho	3f309ebba9	sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor Motivation is that we want to release space for exhausted sstable and that will only happen when all references to it are gone and that backlog tracker takes the early replacement into account. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:13 -02:00
Raphael S. Carvalho	f6df949c1a	compaction: share sstable set with incremental reader selector By doing that, we'll be able to release exhausted sstable from both simulteaneously. That's achieved by sharing set containing input sstables with the incremental reader selector and removing exhausted sstables from shared set when the time has come. Step towards reducing disk requirement for compaction by making it delete sstable which all data is in a sealed new sstable. For that to happen, all references must be gone. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:10 -02:00
Raphael S. Carvalho	e5a0b05c15	sstables/compaction: release space earlier of exhausted input sstables Currently, compaction only replace input sstables at end of compaction, meaning compaction must be finished for all the space of those sstables to be released. What we can do instead is to delete earlier some input sstable under some conditions: 1) SStable data should be committed to a new, sealed output sstable, meaning it's exhausted. 2) Exhausted sstable mustn't overlap with a non-exhausted sstable because a tombstone in the exhausted could have been purged and the shadowed data in non-exhausted could be ressurected if system crashes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:07 -02:00
Raphael S. Carvalho	ace070c8fc	sstables: make partitioned sstable set's incremental selector resilient to changes in the set The motivation is that compaction may remove a sstable from the set while the incremental selector is alive, and for that to work, we need to invalidate the iterators stored by the selector. We could have added a method to notify it, but there will be a case where the one keeping the set cannot forward the notification to the selector. So it's better for the selector to take care of itself. Change counter approach is used which allows the selector to know when to invalidate the iterators. After invalidation, selector will move the iterator back into its right place by looking for lower bound for current pos. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:05 -02:00
Raphael S. Carvalho	a66b1954cc	sstables: use a random uuid for sstables without run identifier Older sstables must have an identifier for them to be associated with their own run. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:01 -02:00
Raphael S. Carvalho	62025fa52c	sstables: add run identifier to scylla metadata It identifies a run which a particular sstable belongs to. Existing sstables will have a random uuid associated with it in memory. UUID is the correct choice because it allows sstables to be exported without having conflicts when using identifier generated by different nodes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:52:44 -02:00
Rafael Ávila de Espíndola	d18bbe9d45	Remove unreachable default cases. These switches are fully covered. We can be sure they will stay this way because of -Werror and gcc's -Wswitch warning. We can also be sure that we never have an invalid enum value since the state machine values are not read from disk. The patch also removes a superfluous ';'. Message-Id: <20181124020128.111083-1-espindola@scylladb.com>	2018-11-24 09:31:51 +00:00
Raphael S. Carvalho	d29482dce8	sstables: deprecate sstable metadata's ancestors The reason for that is that it's not available in sstable format mc, so we can no longer rely on it in common code for the currently supported formats. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>	2018-11-23 19:38:32 +01:00
Paweł Dziepak	edb5402a73	sstable: use format() instead of sprint() The format message was using the new stlye formatting markers ("{}") which are understood by format() but not by sprint() (the latter is basically deprecated).	2018-11-22 11:30:31 +00:00
Tomasz Grabiec	049926bfb8	sstables: mc: Avoid serialization of promoted index when empty calculate_write_size() adds some overhead, even if we're not going to write anything.	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	0a9f5b563a	sstables: mc: Avoid double serialization of rows The old code was serializing the row twice. Once to get the size of its block on disk, which is needed to write the block length, and then to actually write the block. This patch avoids this by serializing once into a temporary buffer and then appending that buffer to the data file writer. I measured about 10% improvement in memtable flush throughput with this for the small-part dataset in perf_fast_forward.	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	8e8b96c6ed	sstables: checksummed_file_data_sink_impl: Bypass output_stream We can avoid the data copying by switching from this: sink -> stream -> sink to this: sink -> sink	2018-11-21 14:04:27 +01:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Glauber Costa	c6811bd877	sstables: correctly parse estimated histograms In commit `a33f0d6`, we changed the way we handle arrays during the write and parse code to avoid reactor stalls. Some potentially big loops were transformed into futurized loops, and also some calls to vector resizes were replaced by a reserve + push_back idiom. The latter broke parsing of the estimated histogram. The reason being that the vectors that are used here are already initialized internally by the estimated_histogram object. Therefore, when we push_back, we don't fill the array all the way from index 0, but end up with a zeroed beginning and only push back some of the elements we need. We could revert this array to a resize() call. After all, the reason we are using reserve + push_back is to avoid calling the constructor member for each element, but We don't really expect the integer specialization to do any of that. However, to avoid confusion with future developers that may feel tempted to converted this as well for the sake of consistency, it is safer to just make sure these arrays are zeroed. Fixes #3918 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181116130853.10473-1-glauber@scylladb.com>	2018-11-16 20:52:44 +02:00
Piotr Sarna	ff361ca877	sstables: add move_to_new_dir_in_thread function When moving sstables between directories, this helper function will create links and update generation and dir accordingly. It's expected to be called in thread context.	2018-11-13 11:45:30 +01:00
Piotr Sarna	b7977f4790	sstables: add staging directory to regex datadir/staging directory becomes a valid path for an sstable.	2018-11-13 11:45:30 +01:00
Piotr Sarna	3970808294	sstables: add is_staging() method This method returns true if the last part of directory structure is /staging.	2018-11-13 11:45:30 +01:00
Paweł Dziepak	6469a1b451	Merge "Write static rows for all partitions if there are static columns" from Vladimir " It appears that in case when there are any static columns in serialization header, Cassandra would write a (possibly empty) static row to every partition in the SSTables file. This patchset alings Scylla's logic with that of Cassandra. Note that Scylla optimizes the case when no partition contains a static row because it keeps track of updated columns that Scylla currently does not do - see #3901 for details. Fixes #3900. " * 'projects/sstables-30/write-all-static-rows/v1' of https://github.com/argenet/scylla: tests: Test writing empty static rows for partitions in tables with static columns. sstables: Ignore empty static rows on reading. sstables: Write empty static rows when there are static columns in the table.	2018-11-09 12:01:25 -08:00
Raphael S. Carvalho	1c5934c934	sstables: fix procedure to get fully expired sstables with MC format MC format lacks ancestors metadata, so we need to workaround it by using ancestors in metadata collector, which is only available for a sstable written during this instance. It works fine here because we only want to know if a sstable recently compacted has an ancestor which wasn't yet deleted. Fixes #3852. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <20181102154951.22950-1-raphaelsc@scylladb.com>	2018-11-06 09:28:37 +02:00
Vladimir Krivopalov	f767dfbb33	sstables: Ignore empty static rows on reading. Fixes #3900. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-05 13:47:30 -08:00
Vladimir Krivopalov	89051d37e3	sstables: Write empty static rows when there are static columns in the table. This is consistent with what Cassandra does. Fixes #3900. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-05 13:28:50 -08:00
Avi Kivity	455f00e993	sstables: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	738e713edf	sstables: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Vladimir Krivopalov	6bd738ceb1	sstables: Check for complex deletion when writing static rows. It is possible to have collections in a static row so we need to check for collection-wide tombstones like with clustering rows. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 14:59:19 -07:00
Vladimir Krivopalov	6b7003088a	sstables: Use std::reference_wrapper<> instead of a helper structure. No need to store column_id separately as it can be accessed from the column_definition. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 14:58:08 -07:00
Vladimir Krivopalov	8592b834d1	sstables: Partition static columns by atomicity when reading/writing SSTables 3.x. Collections are permitted in static rows so same partitioning as for regular columns is required. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 10:32:02 -07:00
Vladimir Krivopalov	7e56e9fca6	sstables: Re-order columns (atomic first, then collections) for SSTables 3.x. In Cassandra, row columns are stored in a BTree that uses the following ordering on them: - all atomic columns go first, then all multi-cell ones - columns of both types (atomic and multi-cell) are lexicographically ordered by name regarding each other Since schema already has all columns lexicographically sorted by name, we only need to stably partition them by atomicity for that. Fixes #3853 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-26 15:58:33 -07:00
Vladimir Krivopalov	210507b867	sstables: Use a compound structure for storing information used for reading columns. This representation makes it easier to operate with compound structures instead of separate values that were stored in multiple containers. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-26 11:32:44 -07:00
Vladimir Krivopalov	44043cfd44	sstables: Honour the column kind when writing missing columns in 'mc' format. Previously, we've been writing the wrong missing columns indices for static rows because write_missing_columns() explicitly used regular columns internally. Now, it takes the proper column kind into account. Fixes #3892 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-25 17:09:09 -07:00
Benny Halevy	44e5c2643b	compaction_manager::maybe_stop_on_error: add stop_iteration param some call sites are stopping in any case, regardless of what maybe_stop_on_error returns. Reflect that in the log messages. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181017105758.9602-2-bhalevy@scylladb.com>	2018-10-24 18:39:52 +03:00
Avi Kivity	8210f4c982	Merge "Properly writing/reading shadowable deletions with SSTables 3.x." from Vladimir " This patchset adddresses two problems with shadowable deletions handling in SSTables 3.x. ('mc' format). Firstly, we previously did not set a flag indicating the presence of extended flags byte with HAS_SHADOWABLE_DELETION bitmask on writing. This would break subsequent reading and cause all types of failures up to crash. Secondly, when reading rows with this extended flag set, we need to preserve that information and create a shadowable_tombstone for the row. Tests: unit {release} + Verified manually with 'hexdump' and using modified 'sstabledump' that second (shadowable) tombstone is written for MV tables by Scylla. + DTest (materialized_views_test.py:TestMaterializedViews.hundred_mv_concurrent_test) that originally failed due to this issue has successfully passed locally. " * 'projects/sstables-30/shadowable-deletion/v4' of https://github.com/argenet/scylla: tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x. tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x. sstables: Support Scylla-specific extension for writing shadowable tombstones. sstables: Introduce a feature for shadowable tombstones in Scylla.db. memtable: Track regular and shadowable tombstones separately in encoding_stats_collector. sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion. sstables: Support checking row extension flags for Cassandra shadowable deletion.	2018-10-24 18:20:16 +03:00
Tomasz Grabiec	9e756d3863	sstable_mutation_reader: Do not read partition index when scanning Even when we're using a full clustering range, need_skip() will return true when we start a new partition and advance_context() will be called with position_in_partition::before_all_clustered_rows(). We should detect that there is no need to skip to that position before the call to advance_to(*_current_partition_key), which will read the index page. Fixes #3868. Message-Id: <1539881775-8578-1-git-send-email-tgrabiec@scylladb.com>	2018-10-24 15:55:13 +03:00
Paweł Dziepak	637b9a7b3b	atomic_cell_or_collection: make operator<< show cell content After the new in-memory representation of cells was introduced there was a regression in atomic_cell_or_collection::operator<< which stopped printing the content of the cell. This makes debugging more incovenient are time-consuming. This patch fixes the problem. Schema is propagated to the atomic_cell_or_collection printer and the full content of the cell is printed. Fixes #3571. Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>	2018-10-24 13:29:51 +03:00
Vladimir Krivopalov	759d36a26e	sstables: Support Scylla-specific extension for writing shadowable tombstones. The original SSTables 'mc' format, as defined in Cassandra, does not provide a way to store shadowable deletion in addition to regular row deletion for materialized views. It is essential to store it because of known corner-case issues that otherwise appear. For this to work, we introduce a Scylla-specific extended flag to be set in SSTables in 'mc' format that indicates a shadowable tombstone is written after the regular row tombstone. This is deemed to be safe because shadowable tombstones are specific to materialized views and MV tables are not supposed to be imported or exported. Note that a shadowable tombstone can be written without a regular tombstone as well as along with it. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	e168433945	sstables: Introduce a feature for shadowable tombstones in Scylla.db. This is used to indicate that the SSTables being read may contain a Scylla-specific HAS_SCYLLA_SHADOWABLE_TOMBSTONE extended flag set. If feature is not disabled, we should not honour this flag. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	b7d48c1ccd	sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion. This flag can be only set in MV tables that are not supported to be imported to Scylla. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	8f79f76116	sstables: Support checking row extension flags for Cassandra shadowable deletion. This flag can be only used in MV tables that are not supposed to be imported to Scylla. Since Scylla representation of shadowable tombstones differs from that of Cassandra, such SSTables are rejected on read and Scylla never sets this flag on writing. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Piotr Jastrzebski	cafb3dc2ae	sstables 3: Correctly handle dropped columns in column_translation Previously we were making assumptions about missing columns (the size of its value, whether it's a collection or a counter) but they didn't have to be always true. Now we're using column type from serialization header to use the right values. Fixes #3859 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-18 19:13:44 +02:00
Glauber Costa	7edae5421d	sstables: print sstable path in case of an exception Without that, we don't know where to look for the problems Before: compaction failed: sstables::malformed_sstable_exception (Too big ttl: 3163676957) After: compaction_manager - compaction failed: sstables::malformed_sstable_exception (Too big ttl: 4294967295 in sstable /var/lib/scylla/data/system_traces/events-8826e8e9e16a372887533bc1fc713c25/mc-832-big-Data.db) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181016181004.17838-1-glauber@scylladb.com>	2018-10-16 20:31:20 +01:00

1 2 3 4 5 ...

1616 Commits