scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Benny Halevy	1ccd72f115	sstables: mc: use int64_t for local_deletion_time and ttl In preparation for changing gc_clock::duration::rep to int64_t. Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	156f9ffa11	sstables: add capped_local_deletion_time stats counter Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	7609a04565	sstables: mc: metadata collector: cap local_deletion_time at max max local_deletion_time_tracker in stats is int32_t so just track the limit of (max int32_t - 1) if time_point is greater than the limit. This corresponds to Cassandra's MAX_DELETION_TIME. Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	bd6861989d	sstables: mc: use proper gc_clock types for local_deletion_time and ttl Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	6465a673f5	sstables: mc: define expired_liveness_ttl as signed int32_t Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	c4c2133e3e	sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time mc format only writes delta local_deletion_time of tombstones. Conventional deletion_time is written only for the partition header. Restructure the code to pass a tombstone to write_delta_deletion_time rather than struct deletion_time to prepare for using 64-bit deletion times. The tombstone uses gc_clock::time_point while struct deletion_time is limited to int32_t local_deletion_time. Note that for "live" tombstones we encode <api::missing_timestamp, no_deletion_time> as was previously evaluated by to_deletion_time(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	844a2de263	sstables: mc: prevent signed integer overflow Fix runtime error: signed integer overflow introduced by `2dc3776407` Delta-encoded values may wrap around if the encoded value is less than the base value. This could happen in two places: In the mc-format serialization header itself, where the base values are implicit Cassandra epoch time, and in the sstables data files, where the base values are taken from the encoding_stats (later written to the serialization_header). In these cases, when the calculation is done using signed integer/long we may see "runtime error: signed integer overflow" messages in debug mode (with -fsanitize=undefined / -fsanitize=signed-integer-overflow). Overflow here is expected and harmless since we do not gurantee that neither the base values in the serialization header are greater than or equal to Cassandra's epoch now that the delta-encoded values are always greater than or equal to the respective base values in the serialization header. To prevent these warnings, the subtraction/addition should be done with unsigned (two's complement) arithmetic and the result converted to the signed type. Note that to keep the code simple where possible, when also rely on implicit conversion of signed integers to unsigned when either one of added value is unsigned and the other is signed. Fixes: #4098 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190120142950.15776-1-bhalevy@scylladb.com>	2019-01-20 16:59:46 +02:00
Benny Halevy	2dc3776407	sstables: mc: sign-extend serialization_header min_local_deletion_time_base and min_ttl_base Refs #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190110141439.1324-1-bhalevy@scylladb.com>	2019-01-10 16:23:20 +02:00
Benny Halevy	60323b79d1	sstables: mc: sign-extend delta local_deletion_time and delta ttl Follow Cassandra's encoding so that values that are less than the baseline encoding_stats will wrap-around in 64-bits rather tham 32. Fixes #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com>	2019-01-09 21:43:30 +02:00
Rafael Ávila de Espíndola	26ac2c23ef	Change _row_ names that refer to partitions This renames some variables and functions to make it clear that they refer to partitions and not rows. Old versions of sstablemetadata used to refer to a row histogram, but current versions now mention a partition histogram instead. This patch doesn't change the exposed API names. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181229223311.4184-2-espindola@scylladb.com>	2019-01-09 14:53:42 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Benny Halevy	40410465d7	sstables: mc: expired_liveness_ttl should be max int32_t rather than max uint32_t Corresponding to Cassandra's EXPIRED_LIVENESS_TTL = Integer.MAX_VALUE; Fixes #4060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190107172457.20430-1-bhalevy@scylladb.com>	2019-01-07 18:41:37 +01:00
Tomasz Grabiec	a4721b4d50	sstables: types: Extract sstable_enabled_features::all()	2018-12-12 12:06:45 +01:00
Tomasz Grabiec	fad4fba4bc	sstables: Templetize write() functions on the writer Will allow writing to both a file_writer, or an in-memory writer like a bytes_ostream.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	aa19f98d18	sstables: Write Statistics.db offset map entries in the same order as Cassandra Before this patch we were writing offset map enteies in unspecified order, the one returned by std::unorderd_map. Cassandra writes them sorted by metadata_type. Use the same order for improved compatibility. Fixes #3955. Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com>	2018-12-03 16:40:24 +02:00
Raphael S. Carvalho	a66b1954cc	sstables: use a random uuid for sstables without run identifier Older sstables must have an identifier for them to be associated with their own run. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:01 -02:00
Raphael S. Carvalho	62025fa52c	sstables: add run identifier to scylla metadata It identifies a run which a particular sstable belongs to. Existing sstables will have a random uuid associated with it in memory. UUID is the correct choice because it allows sstables to be exported without having conflicts when using identifier generated by different nodes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:52:44 -02:00
Raphael S. Carvalho	d29482dce8	sstables: deprecate sstable metadata's ancestors The reason for that is that it's not available in sstable format mc, so we can no longer rely on it in common code for the currently supported formats. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>	2018-11-23 19:38:32 +01:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Vladimir Krivopalov	759d36a26e	sstables: Support Scylla-specific extension for writing shadowable tombstones. The original SSTables 'mc' format, as defined in Cassandra, does not provide a way to store shadowable deletion in addition to regular row deletion for materialized views. It is essential to store it because of known corner-case issues that otherwise appear. For this to work, we introduce a Scylla-specific extended flag to be set in SSTables in 'mc' format that indicates a shadowable tombstone is written after the regular row tombstone. This is deemed to be safe because shadowable tombstones are specific to materialized views and MV tables are not supposed to be imported or exported. Note that a shadowable tombstone can be written without a regular tombstone as well as along with it. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	e168433945	sstables: Introduce a feature for shadowable tombstones in Scylla.db. This is used to indicate that the SSTables being read may contain a Scylla-specific HAS_SCYLLA_SHADOWABLE_TOMBSTONE extended flag set. If feature is not disabled, we should not honour this flag. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	8f79f76116	sstables: Support checking row extension flags for Cassandra shadowable deletion. This flag can be only used in MV tables that are not supposed to be imported to Scylla. Since Scylla representation of shadowable tombstones differs from that of Cassandra, such SSTables are rejected on read and Scylla never sets this flag on writing. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	e71cc5ab20	sstables: Introduce TTL limitation and special 'expired TTL' value. This allows to store expired liveness info in SSTables 3.x format without introducing a possible conflict with real TTL values. As per Cassandra, TTL cannot exceed 20 years so taking the maximum value as a special value for indicating expired liveness info is safe. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-10 11:44:14 -07:00
Vladimir Krivopalov	bdca27ae41	sstables: Always store only min bases in serialization_header. There previously was an inconsistency in treating min values stored in a serialization_header. They are written to or read from a Statistics.db as deltas against fixed bases, but when we parse timeouts from the data file, we need the full bases, not just deltas. This inconsistency causes wrong timestamp values if we write an sstable and then read from it using one and the same sstable object because we turn min values into bases on write and then don't adjust them back because we already have them in memory. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	48fa088ec6	sstables: Do not parse ancestors from compaction metadata for SSTables 3.x Ancestors array has been removed starting from 'ma' format (CASSANDRA-7066). Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-19 17:11:43 -07:00
Vladimir Krivopalov	4bf1e9de3f	sstables: Support resetting data_consume_rows_context_m to indexable_element::cell. Set the proper parsing state when resetting to indexable_element::cell. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-17 10:09:19 -07:00
Vladimir Krivopalov	a497edcbda	sstables: Move promoted_index_block from types.hh to index_entry.hh. It is only being used by index_reader internally and never exposed so should not be listed in commonly used types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-28 12:28:59 -07:00
Piotr Jastrzebski	a3683d6e0f	sstables 3: add serialization_header::adjust In SSTables 3, min timestamp and min deletion time in serialization header are not stored normally but instead the difference between their value and the cassandra "epoch" is stored. This is supposed to make SSTables smaller. As a consequence, we have to add the "epoch" after reading the values to obtain the actual values of min timestamp and min deletion time. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:10:48 +02:00
Piotr Jastrzebski	2b8ff15f9f	column_flags_m: add HAS_COMPLEX_DELETION Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 22:47:19 +02:00
Piotr Jastrzebski	f6e1c38486	Introduce column_flags_m This will be used for reading columns from data file. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 19:54:16 +02:00
Piotr Jastrzebski	d8cd8e04ed	Add unfiltered_flags_m::has_all_columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:39:52 +02:00
Piotr Jastrzebski	b849eefc8c	Use disk_string_vint_size for bytes_array_vint_size Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:39:52 +02:00
Piotr Jastrzebski	5ca4bfd69a	disk_array_vint_size: Remove unused Size template parameter Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:15:44 +02:00
Vladimir Krivopalov	56ac941a2e	Fix the order of items in stats_metadata. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:45:10 -07:00
Vladimir Krivopalov	5db6002720	Write serialization header to Statistics.db for SSTables 3.x. Serialization header is a new components in Statistics.db introduced in SSTables 3.0 ('ma') format. It is essential for reading data file as it contains the base values used for delta-encoded values (timestamps, TTLs, local deletion times) and description of column types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:43:17 -07:00
Vladimir Krivopalov	3e471116b4	Separate statistics for count of cells, columns and rows in column_stats. SSTables 3.0 format makes a distinction between count of cells and count of columns. In that sense, a column of a collection type counts as one column but every atomic cell in it counts as a separate cell. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Piotr Jastrzebski	2ee3d8b87b	Introduce consumer_m and data_consume_rows_context_m Those classes can handle SSTables in MC format. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-26 12:49:38 +02:00
Piotr Jastrzebski	df457166b0	Add support for 3_x stats metadata Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	e1e23ec555	Pass sstable version to describe_type Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	1cc1f9af5f	Pass sstable version to write methods This will allow writing different versions differently Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	08da518dae	metadata_type: add Serialization type Ignore it while reading sstable 3_x and throw if it's present when reading 2_x. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Avi Kivity	28be4ff5da	Revert "Merge "Implement loading sstables in 3.x format" from Piotr" This reverts commit `513479f624`, reversing changes made to `01c36556bf`. It breaks booting. Fixes #3376.	2018-04-23 06:47:00 +03:00
Piotr Jastrzebski	b683870644	Add support for 3_x stats metadata Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 15:06:51 +02:00
Piotr Jastrzebski	26ab3056ae	Pass sstable version to describe_type Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:41:11 +02:00
Piotr Jastrzebski	0022c309ee	Pass sstable version to write methods This will allow writing different versions differently Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:41:10 +02:00
Piotr Jastrzebski	65fe564cd2	metadata_type: add Serialization type Ignore it while reading sstable 3_x and throw if it's present when reading 2_x. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:40:04 +02:00
Glauber Costa	b2f9958071	large_bitset: use a chunked_vector internally and simplify API save and load functions for the large_bitset were introduced by Avi with `d590e327c0`. In that commit, Avi says: "... providing iterator-based load() and save() methods. The methods support partial load/save so that access to very large bitmaps can be split over multiple tasks." The only user of this interface is SSTables. And turns out we don't really split the access like that. What we do instead is to create a chunked vector and then pass its begin() method with position = 0 and let it write everything. The problem here is that this require the chunked vector to be fully initialized, not just reserved. If the bitmap is large enough that in itself can take a long time without yielding (up to 16ms seen in my setup). We can simplify things considerably by moving the large_bitset to use a chunked vector internally: it already uses a poor man's version of it by allocating chunks internally (it predates the chunked_vector). By doing that, we can turn save() into a simple copy operation, and do away with load altogether by adding a new constructor that will just copy an existing chunked_vector. Fixes #3341 Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180409234726.28219-1-glauber@scylladb.com>	2018-04-10 10:25:06 +03:00
Glauber Costa	f5c32423b8	summary: don't go through all entries when computing memory size. Summary has a function, memory_size(), that estimates the amount of memory the summary takes. It is my understanding that this is called to serve information to tooling. First, this function is innacurate because it doesn't take into account the tokens per each entry, just the keys. But more importantly, it has to iterate over all keys which can be pretty expensive if the entries list is long. We are now keeping that in a memory area, with just pointers in the entry. So instead of iterating through the entries, we can iterate through the memory areas, which is much cheaper. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180316120915.16809-1-glauber@scylladb.com>	2018-03-16 12:57:19 +00:00
Glauber Costa	e680c7c8cc	abstract summary entry version of the token with a token view dht::token doesn't have a trivial destructor, so destroying an array full of those can be quite expensive. If we use the same trick as we used for the summary - storing the token data in a stable memory location - we can leave the entries with a trivial destructor and destroy the chunks themselves. Those being larger, they will be more efficient to delete. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-15 12:24:15 -04:00
Glauber Costa	091b0f9d41	summary_entry: do not store key bytes in each summary entry If we store a bytes_view instead of bytes, that has a trivial destructor and then we don't need to destroy each element individually. To do that, we allocate the data in a couple of large arrays which can be disposed of easily and point to it. We still can't destroy trivially because of the token. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-14 10:46:20 -04:00

1 2 3

137 Commits