scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Author	SHA1	Message	Date
Vlad Zolotarov	2dde372ae6	locator::ec2_multi_region_snitch: don't call for ec2_snitch::gossiper_starting() ec2_snitch::gossiper_starting() calls for the base class (default) method that sets _gossip_started to TRUE and thereby prevents to following reconnectable_snitch_helper registration. Fixes #3454 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1528208520-28046-1-git-send-email-vladz@scylladb.com>	2018-06-06 12:00:17 +03:00
Asias He	6496cdf0fb	db: Get rid of the streaming memtable delayed flush In `455d5a5` (streaming memtables: coalesce incoming writes), we introduced the delayed flush to coalesce incoming streaming mutations from different stream_plan. However, most of the time there will be one stream plan at a time, the next stream plan won't start until the previous one is finished. So, the current coalescing does not really work. The delayed flush adds 2s of dealy for each stream session. If we have lots of table to stream, we will waste a lot of time. We stream a keyspace in around 10 stream plans, i.e., 10% of ranges a time. If we have 5000 tables, even if the tables are almost empty, the delay will waste 5000 * 10 * 2 = 27 hours. To stream a keyspace with 4 tables, each table has 1000 rows. Before: [shard 0] stream_session - [Stream #944373d0-5d9c-11e8-9cdb-000000000000] Executing streaming plan for Bootstrap-ks-index-0 with peers={127.0.0.1}, master [shard 0] stream_session - [Stream #944373d0-5d9c-11e8-9cdb-000000000000] Streaming plan for Bootstrap-ks-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1030 KiB, 125.21 KiB/s [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks succeeded, took 8.233 seconds After: [shard 0] stream_session - [Stream #e00bf6a0-5d99-11e8-a7b8-000000000000] Executing streaming plan for Bootstrap-ks-index-0 with peers={127.0.0.1}, master [shard 0] stream_session - [Stream #e00bf6a0-5d99-11e8-a7b8-000000000000] Streaming plan for Bootstrap-ks-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1030 KiB, 4772.32 KiB/s [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks succeeded, took 0.216 seconds Fixes #3436 Message-Id: <cb2dde263782d2a2915ddfe678c74f9637ffd65b.1526979175.git.asias@scylladb.com>	2018-06-06 10:16:02 +03:00
Tomasz Grabiec	f775fc2e4c	mvcc: Fix partition_entry::open_version() After `70c72773be` it's possible that open_version() is called with a phase which is smaller than the phase of the latest version, because latest version belongs to the in-progress cache update. In such case we must return the existing non-latest snapshot and not create a new version on top of the in-progress update. Not doing this violates several invariants, and may lead to inconsistencies, including violation of write atomicity or temporary loss of writes. partition_entry::read() was already adjusted by the aforementioned commit. Do a similar adjustement for open_version(). Fixes sporadic failures of row_cache_test.cc::test_concurrent_reads_and_eviction Message-Id: <1528211847-22825-1-git-send-email-tgrabiec@scylladb.com>	2018-06-05 18:22:38 +03:00
Takuya ASADA	60844ae67b	dist/common/scripts/scylla_coredump_setup: don't run sysctl on Ubuntu 18.04 Since 99-scylla.conf is not included on Ubuntu 18.04, skip running it. Fixes #3494 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180605093619.9197-1-syuu@scylladb.com>	2018-06-05 12:47:46 +03:00
Takuya ASADA	222b8588ee	dist/common/systemd/scylla-server.service.in: add local-fs.target as dependency We mistakenly only added network-online.target is doens't promises to wait /var/lib/scylla mount. To do this we need local-fs.target. Fixes #3441 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180521083349.8970-1-syuu@scylladb.com>	2018-06-05 12:26:21 +03:00
Piotr Sarna	6130a00597	dist: add scylla/hints directory to scripts /var/lib/scylla/hints directory was missing from dist-specific scripts, which may cause package installations to fail. Package building scripts and descriptions are updated/ Fixes #3495 Message-Id: <0f5596cb49500416820ece023b7f76a4e2427799.1528184949.git.sarna@scylladb.com>	2018-06-05 11:33:29 +03:00
Avi Kivity	4aaf7bbc1d	Merge "Add test for compression" from Piotr " It turns out that compression just works for SSTables 3.x. Thanks to the previous work done on the write path. This series cleans up tests a bit and introduces test for compression on the read path. " * 'haaawk/sstables3/read-compression-v1' of ssh://github.com/scylladb/seastar-dev: Add test for compression in sstables 3.x Extract test_partition_key_with_values_of_different_types_read sstable_3_x_test: use SEASTAR_THREAD_TEST_CASE Drop UNCOMPRESSD_ when code will be used for compressed too	2018-06-04 20:33:50 +03:00
Piotr Jastrzebski	25a7f03f7f	Add test for compression in sstables 3.x Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-04 18:41:10 +02:00
Piotr Jastrzebski	be9c7391aa	Extract test_partition_key_with_values_of_different_types_read It will be used also for testing compression. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-04 18:41:10 +02:00
Piotr Jastrzebski	1f324b7fc8	sstable_3_x_test: use SEASTAR_THREAD_TEST_CASE Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-04 18:40:52 +02:00
Piotr Jastrzebski	3e3ccdb323	Drop UNCOMPRESSD_ when code will be used for compressed too Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-04 18:29:02 +02:00
Avi Kivity	6d6c355dc0	Merge "augment system.local with sharding information" from Glauber " This patch adds nr_shards, msb_ignore, and the actual sharding algorithm to the system.local table. Drivers and other tools can then make use of this information to talk to scylla in an optimal way " * 'system_tables-v3' of github.com:glommer/scylla: system_keyspace: add sharding information to local table partitioner: export the name of the algorithm used to do intra-node sharding	2018-06-04 18:50:28 +03:00
Glauber Costa	bdce561ada	system_keyspace: add sharding information to local table We would like the clients to be able to route work directly to the right shards. To do that, they need to know the sharding algorithm and its parameters. The algorithm can be copied into the client, but the parameters need to be exported somewhere. Let's use the local table for that. Signed-off-by: Glauber Costa <glauber@scylladb.com> --- v2: force msb to zero on non-murmur	2018-06-04 11:25:58 -04:00
Glauber Costa	250d9332dc	partitioner: export the name of the algorithm used to do intra-node sharding We will export this on system tables. To avoid hard-coding it in the system table level, keep it at least in the dht layer where it belongs. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-04 11:25:58 -04:00
Takuya ASADA	ad4ca1e166	dist: simplified build script templates Currently, build_deb.sh looks very complicated because each of distribution requires different parameter, and we are applying them by sed command one-by-one. This patch will replace them by Mustache, it's simple and easy syntax template language. Both .rpm distributions and .deb distributions have pystache (a Python implimentation of Mustache), we will use it. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180604104026.22765-1-syuu@scylladb.com>	2018-06-04 14:38:52 +03:00
Paweł Dziepak	24764712b6	sstable: fix capture by reference of stack variable in continuation Message-Id: <20180604102542.21799-1-pdziepak@scylladb.com>	2018-06-04 14:35:49 +03:00
Duarte Nunes	dfa779ebe7	Merge 'Separate hinted handoff manager for materialized views' from Piotr " This series introduces a separate hinted handoff manager for materialized views. Steps: * decouple resource limits from hinted handoff, so multiple instances can share space and throughput limits in order to avoid internal fragmentation for every instance's reservations * add a subdirectory to data/, responsible for storing materialized view hints * decouple registering global metrics from hinted handoff constructor, now that there can be more than one instance - otherwise 'registering metrics twice' errors are going to occur * add a hints_for_views_manager to storage proxy and route failed view updates to use it instead of the original hints_manager * restore previous semantics for enabling/disabling hinted handoff - regular hinted handoff can be disabled or enabled just for specific datacenters without influencing materialized views flow " * 'separate_hh_for_mv_4' of https://github.com/psarna/scylla: storage_proxy: restore optional hinted handoff storage_proxy: add hints manager for views hints: decouple hints manager metrics from constructor db, config: add view_pending_updates directory hints: move space_watchdog to resource manager hints: move send limiter to resource manager hints: move constants to resource_manager	2018-06-04 12:03:59 +01:00
Vlad Zolotarov	e759803f48	cql3::authorized_prepared_statements_cache: properly set the expiration timeout Because authorized_prepared_statements_cache caches the information that comes from the permissions cache and from the prepared statements cache it should has the entries expiration period set to the minimum of expiration periods of these caches. The same goes to the entry refresh period but since prepared statements cache does have a refresh period authorized_prepared_statements_cache's entries refresh period is simply equal to the one of the permissions cache. Fixes #3473 Tests: dtest{release} auth_test.py Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1527789716-6206-1-git-send-email-vladz@scylladb.com>	2018-06-04 10:34:54 +02:00
Piotr Jastrzebski	0b72594c1f	data_consume_rows_context_m: Use find_first and find_next Those methods of boost::dynamic_bitset allow much more efficient implementation of skip_absent_columns and move_to_next_column. Also fix some indentation and variable naming. Test: unit {release} Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <8a4dea51060c5a02bb774eac43e9eb67d316049a.1528100153.git.piotr@scylladb.com>	2018-06-04 11:18:03 +03:00
Piotr Sarna	f12fdcffdb	storage_proxy: restore optional hinted handoff Since hinted handoff for materialized views is now a separate entity, regular hinted handoff can go back to being optional.	2018-06-04 09:46:06 +02:00
Piotr Sarna	a6aae369da	storage_proxy: add hints manager for views This commit adds a separate hints manager that serves only failed materialized view updates.	2018-06-04 09:46:06 +02:00
Piotr Sarna	204bc17bd7	hints: decouple hints manager metrics from constructor Now that more than one instance of hints manager can be present at the same time, registering metrics is moved out of the constructor to prevent 'registering metrics twice' errors.	2018-06-04 09:46:06 +02:00
Piotr Sarna	a791dce0ae	db, config: add view_pending_updates directory Hints for materialized view updates need to be kept somewhere, because their dedicated hints manager has to have a root directory. view_pending_updates directory resides in /data and is used for that purpose.	2018-06-04 09:46:06 +02:00
Piotr Sarna	f345efc79a	hints: move space_watchdog to resource manager Space watchdog is decoupled from hints manager and moved to resource manager, so it can be shared among different hints manager instances.	2018-06-04 09:46:01 +02:00
Piotr Sarna	ef40f7e628	hints: move send limiter to resource manager Send limiting semaphore is moved from hints manager to resource manager. In consequence, hints manager now keeps a reference to its resource manager.	2018-06-04 09:35:58 +02:00
Piotr Sarna	2315937854	hints: move constants to resource_manager Constants related to managing resources are moved to newly created resource_manager class. Later, this class will be used to manage (potentially shared) resources of hints managers.	2018-06-04 09:35:58 +02:00
Avi Kivity	9b21fbc055	Merge "LCS: enable compaction controller" from Glauber " In preparation, we change LCS so that it tries harder to push data to the last level, where the backlog is supposed to be zero. The backlog is defined as: backlog_of_stcs_in_l0 + Sum(L in level) sizeof(L) * (max_level - L) * fan_out where: * the fan_out is the amount of SSTables we usually compact with the next level (usually 10). * max_levels is the number of levels currently populated * sizeof(L) is the total amount of data in a particular level. Tests: unit (release) " * 'lcs-backlog-v2' of github.com:glommer/scylla: LCS: implement backlog tracker for compaction controller LCS: don't construct property in the body of constructor LCS: try harder to move SSTables to highest levels. leveled manifest: turn 10 into a constant backlog: add level to write progress monitor	2018-06-04 10:29:56 +03:00
Amos Kong	364c2551c8	scylla_setup: fix conditional statement of silent mode Commit `300af65555` introdued a problem in conditional statement, script will always abort in silent mode, it doesn't care about the return value. Fixes #3485 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <1c12ab04651352964a176368f8ee28f19ae43c68.1528077114.git.amos@scylladb.com>	2018-06-04 10:14:06 +03:00
Glauber Costa	6317bd45d7	LCS: implement backlog tracker for compaction controller This is the last missing tracker among the major strategies. After this, only DTCS is left. To calculate the backlog, we will define the point of zero-backlog as having all data in the last level. The backlog is then: Sum(L in levels) sizeof(L) * (max_levels - L) * fan_out, where: * the fan_out is the amount of SSTables we usually compact with the next level (usually 10). * max_levels is the number of levels currently populated * sizeof(L) is the total amount of data in a particular level. Care is taken for the backlog not to jump when a new level has been just recently created. Aside from that, SSTables that accumulate in L0 can be subject to STCS. We will then add a STCS backlog in those SSTables to represent that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-03 18:14:09 -04:00
Glauber Costa	04546df55c	LCS: don't construct property in the body of constructor Right now we are constructing the _max_sstable_size_in_mb property in the body of the constructor, which it makes it hard for us to use from other properties. We are doing that because we'd like to test for bounds of that value. So a cleaner way is to have a helper function for that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-03 18:14:09 -04:00
Glauber Costa	28382cb25c	LCS: try harder to move SSTables to highest levels. Our current implementation of LCS can end up with situations in which just a bit of data is in the highest levels, with the majority in the lowest levels. That happens because we will only promote things to highest levels if the amount of data in the current level is higher than the maximum. This is a pre-existing problem in itself, but became even clearer when we started trying to define what is the backlog for LCS. We have discussed ways to fix this it by redefining the criteria on when to move data to the next levels. That would require us to change the way things are today considerably, allowing parallel compactions, etc. There is significant risk that we'll increase write amplication and we would need to carefully validate that. For now I will propose a simpler change, that essentially solves the "inverted pyramid" problem of current LCS without major disruption: keep selecting compaction candidates with the same criteria that we do today, we should help make sure we are not compacting high levels for no reason; but if there is nothing to do, use the idle time to push data to higher levels. As an added benefit, old data that is in the higher level can also be compacted away faster. With this patch we see that in an idle, post-load system all data is eventually pushed to the last level. Systems under constant writes keep behaving the same way they did before. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-03 18:12:19 -04:00
Glauber Costa	e64b471e3d	leveled manifest: turn 10 into a constant We increase levels in powers of 10 but that is a parameter of the algorithm. At least make it into a constant so that we can reuse it somewhere else. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-03 16:55:58 -04:00
Avi Kivity	6f2d3b7f9f	Merge "Fix previous row size calculation for SSTables 3.x" from Vladimir " SSTables 3.x format ('m') stores the size of previous row or RT marker inside each row/marker. That potentially allows to traverse rows/markers in reverse order. The previous code calculating those sizes appeared to produce invalid values for all rows except the first one. The problem with detecting this bug was that neither Cassandra itself nor the sstabledump tool use those values, they are simply rejected on reading. From UnfilteredSerializer.deserializeRowBody() method, https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java#L562 : if (header.isForSSTable()) { in.readUnsignedVInt(); // Skip row size in.readUnsignedVInt(); // previous unfiltered size } So while the previous test files were technically correct in that they contained valid data readable by Cassandra/sstabledump, they didn't follow the format specification. This patchset fixes the code to produce correct values and replaces incorrect data files with correct ones. The newly generated data files have been validated to be identical to files generated with Cassandra using same data and timestamps as unit tests. Tests: Unit {release} " * 'projects/sstables-30/fix-prev-row_size/v1' of https://github.com/argenet/scylla: tests: Fix test files to use correct previous row sizes. sstables: Fix calculation of previous row size for SSTables 3.x sstables: Factor out code building promoted index blocks into separate helpers.	2018-06-03 11:38:22 +03:00
Avi Kivity	a43b3e22fc	Merge "Fix clustering blocks serialization for SSTables 3.x" from Vladimir " This patchset contains two fixes to the clustering key prefixes serialization logic for SSTables 3.x. First, it fixes a vexing typo: a bitwise-and (&) has been used instead of a remainder operator (%) for truncating the shift value. This did not show up in existing tests because they all had non-empty clustering columns values. Added tests to cover empty clustering columns values. Second, it fixes the logic of serialization to write values up to the prefix length, not the length of the clustering key as defined by schema. This matches the way it is done by the Origin. There is, however, a special case where the prefix size is smaller than that of a clustering key but we still need to serialize up to the full size. This is the case when a compact table is being used and some rows in it are added using incomplete clustering keys (containing null for trailing columns). In Cassandra, these prefixes still have a full length and missing columns are just set to 'null'. In our code those prefixes have their real length, but since we need to serialize beyond it, we pass a flag to indicate this. " * 'projects/sstables-30/fix-clustering-blocks/v1' of https://github.com/argenet/scylla: tests: Add test covering compact table with non-full clustering key. sstables: Improve clustering blocks writing, use logical clustering prefix size. tests: Add test covering large clustering keys (>32 columns) for SSTables 3.x tests: Add unit test covering empty values in clustering key. sstables: Fix typo in clustering blocks write helper.	2018-06-03 11:35:49 +03:00
Avi Kivity	1071e481ed	Merge "Implement support for missing columns in SSTable 3.0" from Piotr " Add handling for missing columns and tests for it. There are 3 cases: 1. Number of columns in a table is smaller than 64 2. Number of columns in a table is greater than 64 2a. and less than half of all possible columns are present in sstable 2b. and at least half of all possible columns are present in sstable Case 1 is implemented using bit mask and column is present if mask & (1 << <column number>) == 0 Case 2 is implemented by storing list of column numbers for each present column case 3 is implemented by storing list of column numbers for each absent column " * 'haaawk/sstables3/read-missing-columns-v3' of ssh://github.com/scylladb/seastar-dev: sstables 3: add test for reading big dense subset of columns sstables 3: support reading big dense subsets of columns sstables 3: add test for reading big sparse subset of columns sstables 3: support reading big sparse subsets of columns sstables 3: add test for reading small subset of columns sstables 3: support reading small subsets of columns	2018-06-03 10:42:00 +03:00
Avi Kivity	78182a704b	partition_snapshot_row_cursor: initialize _dummy and _continuous Debug mode view_schema_test sometimes complains that a bool member doesn't contain in-range values, apparenty in the move constructor. Initialize them for its benefit to avoid false-positive test failures. Message-Id: <20180602184934.31258-1-avi@scylladb.com>	2018-06-02 19:51:36 +01:00
Avi Kivity	187ebdbe46	auth: fix possible use of disengaged optional in has_salted_hash() untyped_result_set_row's cell data type is bytes_opt, and the get_block() accessor accesses the value assuming it's engaged (relying on the caller to call has()). has_unsalted_hash() calls get_blob() without calling has() beforehand, potentially triggering undefined behavior. Fix by using get_or() instead, which also simplifies the caller. I observed failures in Jenkins in this area. It's hard to be sure this is the root cause, since the failures triggered an internal consistency assertion in asan rather than an asan report. However, the error is hard to reproduce and the fix makes sense even if it doesn't prevent the error. See #3480 for the asan error. Fixes #3480 (hopefully). Message-Id: <20180602181919.29204-1-avi@scylladb.com>	2018-06-02 19:46:32 +01:00
Piotr Jastrzebski	2fd0566eb7	sstables 3: add test for reading big dense subset of columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-02 10:41:18 +02:00
Piotr Jastrzebski	829f0c5f80	sstables 3: support reading big dense subsets of columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-02 10:41:18 +02:00
Piotr Jastrzebski	4e4972ffea	sstables 3: add test for reading big sparse subset of columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-02 10:40:56 +02:00
Piotr Jastrzebski	e5fb499736	sstables 3: support reading big sparse subsets of columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-01 21:35:28 +02:00
Piotr Jastrzebski	24e9ab4ab6	sstables 3: add test for reading small subset of columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-01 21:34:03 +02:00
Piotr Jastrzebski	63d45c4f24	sstables 3: support reading small subsets of columns Small subset is contains no more than 63 elements. Support for large subsets will come in the following patches. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-01 21:33:50 +02:00
Glauber Costa	7e3093709a	backlog: add level to write progress monitor For SSTables being written, we don't know their level yet. Add that information to the write monitor. New SSTables will always be at L0. Compacted SSTables will have their level determined by the compaction process. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-31 21:09:38 -04:00
Vladimir Krivopalov	b6511d1b07	tests: Add test covering compact table with non-full clustering key. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-31 17:30:36 -07:00
Vladimir Krivopalov	47a7e78bc8	sstables: Improve clustering blocks writing, use logical clustering prefix size. In the Origin, the size of the clustering key prefix used during serialization is the actual length of the prefix and not the full size as defined in schema. So the code is fixed to align with that logic. This, in particular, is needed to write clustering blocks for RT markers. There is, however, a special case where the prefix size is smaller than that of a clustering key but we still need to serialize up to the full size. This is the case when a compact table is being used and some rows in it are added using incomplete clustering keys (containing null for trailing columns). In Cassandra, these prefixes still have a full length and missing columns are just set to 'null'. In our code those prefixes have their real length, but since we need to serialize beyond it, we pass a flag to indicate this. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-31 17:30:36 -07:00
Vladimir Krivopalov	3f404f19dc	tests: Add test covering large clustering keys (>32 columns) for SSTables 3.x Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-31 17:30:36 -07:00
Vladimir Krivopalov	487796de85	tests: Add unit test covering empty values in clustering key. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-31 17:30:36 -07:00
Vladimir Krivopalov	0dadd4fdf3	sstables: Fix typo in clustering blocks write helper. What supposed to be an operation of taking remainder turned to be a bitwise 'and'. This didn't show up in existing tests only because they all had non-empty clustering values. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-31 15:12:40 -07:00
Avi Kivity	aab6b0ee27	Merge "Introduce new in-memory representation for cells" from Paweł " This is the first part of the first step of switching Scylla. It covers converting cells to the new serialisation format. The actual structure of the cells doesn't differ much from the original one with a notable exception of the fact that large values are now fragmented and linearisation needs to be explicit. Counters and collections still partially rely on their old, custom serialisation code and their handling is not optimial (although not significantly worse than it used to be). The new in-memory representation allows objects to be of varying size and makes it possible to provide deserialisation context so that we don't need to keep in each instance of an IMR type all the information needed to interpret it. The structure of IMR types is described in C++ using some metaprogramming with the hopes of making it much easier to modify the serialisation format that it would be in case of open-coded serialisation functions. Moreover, IMR types can own memory thanks to a limited support for destructors and movers (the latter are not exactly the same thing as C++ move constructors hence a different name). This makes it (relatively) to ensure that there is an upper bound on the size of all allocations. For now the only thing that is converted to the IMR are atomic_cells and collections which means that the reduction in the memory footprint is not as big as it can be, but introducing the IMR is a big step on its own and also paves the way towards complete elimination of unbounded memory allocations. The first part of this patchset contains miscellaneous preparatory changes to various parts of the Scylla codebase. They are followed by introduction of the IMR infrastructure. Then structure of cells is defined and all helper functions are implemented. Next are several treewide patches that mostly deal with propagating type information to the cell-related operations. Finally, atomic_cell and collections are switched to used the new IMR-based cell implementation. The IMR is described in much more detail in imr/IMR.md added in "imr: add IMR documentation". Refs #2031. Refs #2409. perf_simple_query -c4, medians of 30 results: ./perf_base ./perf_imr diff read 308790.08 309775.35 0.3% write 402127.32 417729.18 3.9% The same with 1 byte values: ./perf_base1 ./perf_imr1 diff read 314107.26 314648.96 0.2% write 463801.40 433255.96 -6.6% The memory footprint is reduced, but that is partially due to removal of small buffer optimisation (whether it will be restored depends on the exact mesurements of the performance impact). Generally, this series was not expected to make a huge difference as this would require converting whole rows to the IMR. Memory footprint: Before: mutation footprint: - in cache: 1264 - in memtable: 986 After: mutation footprint: - in cache: 1104 - in memtable: 866 Tests: unit (release, debug) " * tag 'imr-cells/v3' of https://github.com/pdziepak/scylla: (37 commits) tests/mutation: add test for changing column type atomic_cell: switch to new IMR-based cell reperesentation atomic_cell: explicitly state when atomic_cell is a collection member treewide: require type for creating collection_mutation_view treewide: require type for comparing cells atomic_cell: introduce fragmented buffer value interface treewide: require type to compute cell memory usage treewide: require type to copy atomic_cell treewide: require type info for copying atomic_cell_or_collection treewide: require type for creating atomic_cell atomic_cell: require column_definition for creating atomic_cell views tests: test imr representation of cells types: provide information for IMR data: introduce cell data: introduce type_info imr/utils: add imr object holder imr: introduce concepts imr: add helper for allocating objects imr: allow creating lsa migrators for IMR objects imr: introduce placeholders ...	2018-05-31 19:21:15 +03:00

1 2 3 4 5 ...

15660 Commits