scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 12:36:56 +00:00

Author	SHA1	Message	Date
Avi Kivity	cb549c767a	database: rename column_family to table The name "column_family" is both awkward and obsolete. Rename to the modern and accurate "table". An alias is kept to avoid huge code churn. To prevent a One Definition Rule violation, a preexisting "table" type is moved to a new namespace row_cache_stress_test. Tests: unit (release) Message-Id: <20180624065238.26481-1-avi@scylladb.com>	2018-06-24 14:54:46 +03:00
Tomasz Grabiec	2d4177355a	Merge "Support for writing range tombstones to SSTables 3.x" from Vladimir This patchset brings support for writing range tombstones to SSTables 3.x. ('mc' format). In SSTables 3.x, range tombstones are represented by so-called range tombstone markers (hereafter RT markers) that denote range tombstone start and end bounds. So each range tombstone is represented in data file by two ordered RT markers. There are also markers that both close the previous range tombstone and open the new one in case if two range tombstones are ajdacent. This is done to consume less disk space on such occasions. Range tombstones written as RT markers are naturally non-overlapping. * github.com:argenet/scylla projects/sstables-30/write-range-tombstones/v6 range_tombstone_stream: Remove an unused boolean flag. Revert "Add missing enum values to bound_kind." sstables: Move to_deletion_time helper up and make it static. sstables: Write end-of-partition byte before flushing the last index block. sstables: Add support for writing range tombstones in SSTables 3.x format. tests: Add unit test covering simple range tombstone. tests: Add unit test covering adjacent range tombstones. tests: Add test to cover non-adjacent RTs. tests: Add test covering mixed rows and range tombstones. tests: Add test covering SSTables 3.x with many RTs. tests: Add unit test covering overlapping RTs and rows. tests: Add tests writing a range tombstone and a row overlapping with its start. tests: Add tests writing a range tombstone and a row overlapping with its end. tests: Add function that writes from multiple memtable into SSTables. tests: Add test where 2nd range tombstone covers the remainder of the 1st one. tests: Add test writing two non-adjacent range tombstones with same clustering key prefix at their bounds. tests: Add test covering overlapped range tombstones.	2018-06-22 15:47:18 +02:00
Vladimir Krivopalov	5559fc2121	sstables: Add support for writing range tombstones in SSTables 3.x format. For SSTables 3.x. ('mc' format), range tombstones are represented by their bounds that are written to the data file as so-called RT markers. For adjacent range tombstones, an RT marker can be of a 'boundary' type which means it closes the previous range tombstone and opens the new one. Internally, sstable_writer_m relies on range_tombstone_stream to both de-overlap incoming range tombstones and order them so that when they are drained they can be easily thought of as just pairs of their bounds.	2018-06-20 18:08:36 -07:00
Avi Kivity	b97e1aeff5	Merge "Consume row marker correctly" from Piotr " Make sure we properly handle row marker and row tombstone when reading a row. Tests: unit {release} " * 'haaawk/sstables3/read-liveness-info-v4' of ssh://github.com/scylladb/seastar-dev: sstable: consume row marker in data_consume_rows_context_m sstable: Add consumer_m::consume_row_marker_and_tombstone sstable: add is_set and to_row_marker to liveness_info	2018-06-20 14:44:03 +03:00
Piotr Jastrzebski	75edaff7b6	sstable: consume row marker in data_consume_rows_context_m Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-20 13:13:29 +02:00
Piotr Jastrzebski	cbfc741d70	sstable: Add consumer_m::consume_row_marker_and_tombstone Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-20 13:13:16 +02:00
Tomasz Grabiec	5548eb96f7	Merge "store prepared statements parameters values" from Vlad * https://github.com/vladzcloudius/scylla.git tracing_prepared_parameters-v6: cql3::query_options: add get_names() method tracing::trace_state: hide the internals of params_values tracing: store queries statements for BATCH tracing: store the prepared statements parameters values	2018-06-19 19:12:26 +02:00
Vladimir Krivopalov	100eb03f29	sstables: Write end-of-partition byte before flushing the last index block. This is to stay compliant with the Origin for SSTables 3.x. It differs from SSTables 2.x (ka/la) as for those the last promoted index block is pushed first and the end-of-partition byte is written after. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-18 14:28:25 -07:00
Vladimir Krivopalov	ad0b911b03	sstables: Move to_deletion_time helper up and make it static. It is used for writing end_open_marker for promoted index. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-18 14:25:13 -07:00
Vladimir Krivopalov	03cf20676c	Revert "Add missing enum values to bound_kind." This reverts commit `3ecc9e9ce4`. It also adds another enum to be used instead.	2018-06-18 14:22:12 -07:00
Piotr Jastrzebski	4c261d2e51	sstable: add is_set and to_row_marker to liveness_info Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-18 20:26:39 +02:00
Glauber Costa	fd51ff3d9e	STCS: bypass min_threshold unless configure to enforce strictly If we fail to produce a SizeTiered compaction with the configured min_threshold, we can try again to compact any two - unless there is a global bypass telling us no to. This will still privilege doing larger compactions in size buckets where that is possible, but if we are idle will try to compact any two Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-15 14:27:22 -04:00
Piotr Jastrzebski	2942f6eecc	data_consume_rows_context_m: support reading counters Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:11:09 +02:00
Piotr Jastrzebski	785e14dfb9	Add consumer_m::consume_counter_column Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:11:09 +02:00
Piotr Jastrzebski	6f559445d0	Extract make_counter_cell It will be used by both consumers. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:11:09 +02:00
Piotr Jastrzebski	88b66189b7	row.hh & mp_row_consumer.hh: Add required includes Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:11:09 +02:00
Piotr Jastrzebski	369e4a4987	Use serialization_header::adjust in read_statistics Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:11:09 +02:00
Piotr Jastrzebski	a3683d6e0f	sstables 3: add serialization_header::adjust In SSTables 3, min timestamp and min deletion time in serialization header are not stored normally but instead the difference between their value and the cassandra "epoch" is stored. This is supposed to make SSTables smaller. As a consequence, we have to add the "epoch" after reading the values to obtain the actual values of min timestamp and min deletion time. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:10:48 +02:00
Piotr Jastrzebski	42d2a162dd	data_consume_rows_context_m: add is_column_counter Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-13 09:27:58 +02:00
Piotr Jastrzebski	d4d3e6f8eb	data_consume_rows_context_m: Remove unused CELL_PATH_SIZE state Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-13 09:27:58 +02:00
Piotr Jastrzebski	ca7ede7eaf	column_translation: add is_counter Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-13 09:27:58 +02:00
Vlad Zolotarov	818b5b75ba	tracing: store the prepared statements parameters values Store the prepared statement positional parameters values in the corresponding system_traces.sessions entry in the 'parameters' column (which has a map<text,text> type). Parameters are stored as a pair of "param[X]" : "value", where X is the index of the parameter starting from 0 and the "value" is the first 64 characters of the parameter's value string representation. If parameters were given with their names attached (see the description on bit 0x40 of QUERY flags in the CQL binary protocol specification) then parameters are going to be stored in the "param[X](<bound variable name>)" : "value" form. If the value's string representation is longer than 64 characters then the "value" will contain only first 64 characters of it and will have the "..." at the end. For a BATCH of prepared statements the parameter "name" will have a form of param[Y][X] where Y is the index of the corresponding prepared statement in the BATCH and X is the index of the parameter. Both X and Y start from 0. Note: Had to switch to boost::range::find() in sstables::big_sstable_set in order to address the "ambiguous overload" compilation error. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-06-12 10:57:05 -04:00
Gleb Natapov	59da525e0d	Provide available memory size to compaction_manager object during creation	2018-06-11 15:34:14 +03:00
Avi Kivity	74469ecc09	Merge "Support reading collections" from Piotr " Implement and test support for reading collections in SSTables 3. Tests: unit {release} " * 'haaawk/sstables3/read-collections-v1' of ssh://github.com/scylladb/seastar-dev: sstables 3: Add tests for reading collections flat_mutation_reader_assertions: add more flexible asserts data_consume_rows_context_m: add support for collections mp_row_consumer_m: Add support for collections data_consume_rows_context_m: introduce cell_path Use column_translation::_is_collection in reading column_translation: add _column_is_collection() column_flags_m: add HAS_COMPLEX_DELETION Use read_unsigned_vint_length_bytes for COLUMN_VALUE Use read_unsigned_vint_length_bytes for CK_BLOCKS Implement read_unsigned_vint_length_bytes	2018-06-10 17:10:52 +03:00
Avi Kivity	2582f53b44	Merge "database and API: Add column_family::get_sstables_by_key" from Amnon " This is series is for nodetool getsstables. This patch is based on: `8daaf9833a` With some minor adjustments because of the code change in sstables. The idea is to allow searching for all the sstables that contains a given key. After this patch if there is a table t1 in keyspace k1 and it has a key called aa. curl -X GET "http://localhost:10000/column_family/sstables/by_key/k1%3At1?key=aa" Will return the list of sstables file names that contains that key. " * 'amnon/sstable_for_key_v4' of github.com:scylladb/seastar-dev: Add the API implementation to get_sstables_by_key api: column_family.json make the get_sstables_for_key doc clearer column_family: Add the get_sstables_by_partition_key method sstable test: add has_partition_key test sstable: Add has_partition_key method keys_test: add a test for nodetool_style string keys: Add from_nodetool_style_string factory method	2018-06-10 16:53:56 +03:00
Piotr Jastrzebski	f9c62b8188	data_consume_rows_context_m: add support for collections Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 23:39:07 +02:00
Piotr Jastrzebski	fd89f42b09	mp_row_consumer_m: Add support for collections Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 23:35:12 +02:00
Piotr Jastrzebski	ffb6b9ed24	data_consume_rows_context_m: introduce cell_path Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 23:30:40 +02:00
Piotr Jastrzebski	5e1dd89d4d	Use column_translation::*_is_collection in reading Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 22:50:23 +02:00
Piotr Jastrzebski	7bb25a2dd9	column_translation: add *_column_is_collection() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 22:48:43 +02:00
Piotr Jastrzebski	2b8ff15f9f	column_flags_m: add HAS_COMPLEX_DELETION Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 22:47:19 +02:00
Piotr Jastrzebski	f7a1d5a437	Use read_unsigned_vint_length_bytes for COLUMN_VALUE Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-06 15:54:17 +02:00
Piotr Jastrzebski	3b8b165053	Use read_unsigned_vint_length_bytes for CK_BLOCKS Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-06 15:44:53 +02:00
Piotr Jastrzebski	21a0e95a06	Implement read_unsigned_vint_length_bytes It's a common operation that's used in multiple places so it's best to have it implemented once. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-06 15:44:06 +02:00
Paweł Dziepak	24764712b6	sstable: fix capture by reference of stack variable in continuation Message-Id: <20180604102542.21799-1-pdziepak@scylladb.com>	2018-06-04 14:35:49 +03:00
Piotr Jastrzebski	0b72594c1f	data_consume_rows_context_m: Use find_first and find_next Those methods of boost::dynamic_bitset allow much more efficient implementation of skip_absent_columns and move_to_next_column. Also fix some indentation and variable naming. Test: unit {release} Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <8a4dea51060c5a02bb774eac43e9eb67d316049a.1528100153.git.piotr@scylladb.com>	2018-06-04 11:18:03 +03:00
Avi Kivity	9b21fbc055	Merge "LCS: enable compaction controller" from Glauber " In preparation, we change LCS so that it tries harder to push data to the last level, where the backlog is supposed to be zero. The backlog is defined as: backlog_of_stcs_in_l0 + Sum(L in level) sizeof(L) * (max_level - L) * fan_out where: * the fan_out is the amount of SSTables we usually compact with the next level (usually 10). * max_levels is the number of levels currently populated * sizeof(L) is the total amount of data in a particular level. Tests: unit (release) " * 'lcs-backlog-v2' of github.com:glommer/scylla: LCS: implement backlog tracker for compaction controller LCS: don't construct property in the body of constructor LCS: try harder to move SSTables to highest levels. leveled manifest: turn 10 into a constant backlog: add level to write progress monitor	2018-06-04 10:29:56 +03:00
Glauber Costa	6317bd45d7	LCS: implement backlog tracker for compaction controller This is the last missing tracker among the major strategies. After this, only DTCS is left. To calculate the backlog, we will define the point of zero-backlog as having all data in the last level. The backlog is then: Sum(L in levels) sizeof(L) * (max_levels - L) * fan_out, where: * the fan_out is the amount of SSTables we usually compact with the next level (usually 10). * max_levels is the number of levels currently populated * sizeof(L) is the total amount of data in a particular level. Care is taken for the backlog not to jump when a new level has been just recently created. Aside from that, SSTables that accumulate in L0 can be subject to STCS. We will then add a STCS backlog in those SSTables to represent that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-03 18:14:09 -04:00
Glauber Costa	04546df55c	LCS: don't construct property in the body of constructor Right now we are constructing the _max_sstable_size_in_mb property in the body of the constructor, which it makes it hard for us to use from other properties. We are doing that because we'd like to test for bounds of that value. So a cleaner way is to have a helper function for that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-03 18:14:09 -04:00
Glauber Costa	28382cb25c	LCS: try harder to move SSTables to highest levels. Our current implementation of LCS can end up with situations in which just a bit of data is in the highest levels, with the majority in the lowest levels. That happens because we will only promote things to highest levels if the amount of data in the current level is higher than the maximum. This is a pre-existing problem in itself, but became even clearer when we started trying to define what is the backlog for LCS. We have discussed ways to fix this it by redefining the criteria on when to move data to the next levels. That would require us to change the way things are today considerably, allowing parallel compactions, etc. There is significant risk that we'll increase write amplication and we would need to carefully validate that. For now I will propose a simpler change, that essentially solves the "inverted pyramid" problem of current LCS without major disruption: keep selecting compaction candidates with the same criteria that we do today, we should help make sure we are not compacting high levels for no reason; but if there is nothing to do, use the idle time to push data to higher levels. As an added benefit, old data that is in the higher level can also be compacted away faster. With this patch we see that in an idle, post-load system all data is eventually pushed to the last level. Systems under constant writes keep behaving the same way they did before. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-03 18:12:19 -04:00
Glauber Costa	e64b471e3d	leveled manifest: turn 10 into a constant We increase levels in powers of 10 but that is a parameter of the algorithm. At least make it into a constant so that we can reuse it somewhere else. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-03 16:55:58 -04:00
Avi Kivity	6f2d3b7f9f	Merge "Fix previous row size calculation for SSTables 3.x" from Vladimir " SSTables 3.x format ('m') stores the size of previous row or RT marker inside each row/marker. That potentially allows to traverse rows/markers in reverse order. The previous code calculating those sizes appeared to produce invalid values for all rows except the first one. The problem with detecting this bug was that neither Cassandra itself nor the sstabledump tool use those values, they are simply rejected on reading. From UnfilteredSerializer.deserializeRowBody() method, https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java#L562 : if (header.isForSSTable()) { in.readUnsignedVInt(); // Skip row size in.readUnsignedVInt(); // previous unfiltered size } So while the previous test files were technically correct in that they contained valid data readable by Cassandra/sstabledump, they didn't follow the format specification. This patchset fixes the code to produce correct values and replaces incorrect data files with correct ones. The newly generated data files have been validated to be identical to files generated with Cassandra using same data and timestamps as unit tests. Tests: Unit {release} " * 'projects/sstables-30/fix-prev-row_size/v1' of https://github.com/argenet/scylla: tests: Fix test files to use correct previous row sizes. sstables: Fix calculation of previous row size for SSTables 3.x sstables: Factor out code building promoted index blocks into separate helpers.	2018-06-03 11:38:22 +03:00
Avi Kivity	a43b3e22fc	Merge "Fix clustering blocks serialization for SSTables 3.x" from Vladimir " This patchset contains two fixes to the clustering key prefixes serialization logic for SSTables 3.x. First, it fixes a vexing typo: a bitwise-and (&) has been used instead of a remainder operator (%) for truncating the shift value. This did not show up in existing tests because they all had non-empty clustering columns values. Added tests to cover empty clustering columns values. Second, it fixes the logic of serialization to write values up to the prefix length, not the length of the clustering key as defined by schema. This matches the way it is done by the Origin. There is, however, a special case where the prefix size is smaller than that of a clustering key but we still need to serialize up to the full size. This is the case when a compact table is being used and some rows in it are added using incomplete clustering keys (containing null for trailing columns). In Cassandra, these prefixes still have a full length and missing columns are just set to 'null'. In our code those prefixes have their real length, but since we need to serialize beyond it, we pass a flag to indicate this. " * 'projects/sstables-30/fix-clustering-blocks/v1' of https://github.com/argenet/scylla: tests: Add test covering compact table with non-full clustering key. sstables: Improve clustering blocks writing, use logical clustering prefix size. tests: Add test covering large clustering keys (>32 columns) for SSTables 3.x tests: Add unit test covering empty values in clustering key. sstables: Fix typo in clustering blocks write helper.	2018-06-03 11:35:49 +03:00
Avi Kivity	1071e481ed	Merge "Implement support for missing columns in SSTable 3.0" from Piotr " Add handling for missing columns and tests for it. There are 3 cases: 1. Number of columns in a table is smaller than 64 2. Number of columns in a table is greater than 64 2a. and less than half of all possible columns are present in sstable 2b. and at least half of all possible columns are present in sstable Case 1 is implemented using bit mask and column is present if mask & (1 << <column number>) == 0 Case 2 is implemented by storing list of column numbers for each present column case 3 is implemented by storing list of column numbers for each absent column " * 'haaawk/sstables3/read-missing-columns-v3' of ssh://github.com/scylladb/seastar-dev: sstables 3: add test for reading big dense subset of columns sstables 3: support reading big dense subsets of columns sstables 3: add test for reading big sparse subset of columns sstables 3: support reading big sparse subsets of columns sstables 3: add test for reading small subset of columns sstables 3: support reading small subsets of columns	2018-06-03 10:42:00 +03:00
Piotr Jastrzebski	829f0c5f80	sstables 3: support reading big dense subsets of columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-02 10:41:18 +02:00
Piotr Jastrzebski	e5fb499736	sstables 3: support reading big sparse subsets of columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-01 21:35:28 +02:00
Piotr Jastrzebski	63d45c4f24	sstables 3: support reading small subsets of columns Small subset is contains no more than 63 elements. Support for large subsets will come in the following patches. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-01 21:33:50 +02:00
Glauber Costa	7e3093709a	backlog: add level to write progress monitor For SSTables being written, we don't know their level yet. Add that information to the write monitor. New SSTables will always be at L0. Compacted SSTables will have their level determined by the compaction process. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-31 21:09:38 -04:00
Vladimir Krivopalov	47a7e78bc8	sstables: Improve clustering blocks writing, use logical clustering prefix size. In the Origin, the size of the clustering key prefix used during serialization is the actual length of the prefix and not the full size as defined in schema. So the code is fixed to align with that logic. This, in particular, is needed to write clustering blocks for RT markers. There is, however, a special case where the prefix size is smaller than that of a clustering key but we still need to serialize up to the full size. This is the case when a compact table is being used and some rows in it are added using incomplete clustering keys (containing null for trailing columns). In Cassandra, these prefixes still have a full length and missing columns are just set to 'null'. In our code those prefixes have their real length, but since we need to serialize beyond it, we pass a flag to indicate this. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-31 17:30:36 -07:00
Vladimir Krivopalov	0dadd4fdf3	sstables: Fix typo in clustering blocks write helper. What supposed to be an operation of taking remainder turned to be a bitwise 'and'. This didn't show up in existing tests only because they all had non-empty clustering values. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-31 15:12:40 -07:00

1 2 3 4 5 ...

1478 Commits