scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Duarte Nunes	6cb0bbd978	tests/mutation_test: Test xx_hasher alongside md5_hasher Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	20132fe1b5	schema: Remove unneeded include Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	d7af8ff0e0	service/storage_proxy: Enable hash caching Set the option that enables the underlying memtable and cache readers to request caching of a cell's hash, for requests that require a digest. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	0bab3e59c2	service/storage_service: Add and use xxhash feature We add a cluster feature that informs whether the xxHash algorithm is supported, and allow nodes to switch to it. We use a cluster feature because older versions are not ready to receive a different digest algorithm than MD5 when answering a data request. If we ever should add a new hash algorithm, we would also need to add a new cluster feature for that algorithm. The alternative would be to add code so a coordinator could negotiate what digest algorithm to use with the set of replicas it is contacting. Fixes #2884 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	440ea56010	message/messaging_service: Specify algorithm when requesting digest While not strictly needed, specify which algorithm to use when request a digest from a remote node. This is more flexible than relying on a cluster wide feature, although that's what we'll do in subsequent patches. It also makes the verb more consistent with the data request. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	1ee7413b6e	storage_proxy: Extract decision about digest algorithm to use Introduce the digest_algorithm() function, which encapsulates the decision of which digest algorithm to use. Right now it is set to MD5, but future patches will change this. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	712c051de6	cache_flat_mutation_reader: Pre-calculate cell hash When digest is requested, pre-calculate the cell's hash. We consider the case when the cell is already in the cache, and the case when it added by the underlying reader. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	ec5b7fb553	partition_snapshot_reader: Pre-calculate cell hash When digest is requested, pre-calculate the cell's hash. A downside of this approach is that more work will be done when there are multiple versions of a row that contain values for the same cell, but we expect these cases to be rare and the upside of caching a cell's hash to compensate for the extra work. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	4ea2f52ddb	query::partition_slice: Add option to specify when digest is requested Having this option enables us to communicate from the upper to the lower layers whether a digest was requested, so that we can pre-calculate and cache a cell's hash in the readers that have access to the actual in-memory cells (within the memtable and the row cache). Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	42f407ad9e	row: Use cached hash for hash calculation This entails doing the cell hash calculation slightly differently, where the cell is hashed individually, the resulting hash being added to the running one. Instead of propagating a flag all through the call chain, we detect whether we are in the new mode by the employed hash algorithm. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:49 +00:00
Duarte Nunes	d773e4b9d4	mutation_partition: Replace hash_row_slice with appending_hash This enables us to only branch once per row on the actual hash algorithm, instead of once per row data item. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:49 +00:00
Duarte Nunes	99a3e3aa76	mutation_partition: Allow caching cell hashes We add storage to a row to hold the cached hashes of each individual cell. We don't store the hash in each cell because that would a) change the cell equality function, and b) require us to change a cell in a potentially fragmented buffer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:47 +00:00
Duarte Nunes	71ba99d53e	mutation_partition: Force vector_storage internal storage size This patch forces the size of vector_storage's internal storage to 5, meaning that the underlying managed_vector will ensure it doesn't need to externally allocate a buffer to hold the row, if only its first 5 cells are set. We define this size explicitly so we can change the vector's value type in upcoming patches without affecting the optimization. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:51 +00:00
Duarte Nunes	996e47a6f9	test.py: Increase memory for row_cache_stress_test Cells and rows will require more memory when we start caching the cell hash. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:51 +00:00
Duarte Nunes	7ba63b1521	atomic_cell_hash: Add specialization for atomic_cell_or_collection Replace the atomic_cell_or_collection::feed_hash() member function with the specialization of appending_hash, and use that instead. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:51 +00:00
Duarte Nunes	b2e1a91f4d	query-result: Use digester instead of md5_hasher Use the digester class instead of md5_hasher to encapsulate the decision of which hash algorithm to use. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	a0d748c71c	range_tombstone: Replace feed_hash() member function with appending_hash Replace range_tombstone::feed_hash() with the specialization of appending_hash, so that we can use the general feed_hash() function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	12507fb9ce	keys: Replace feed_hash() member function with appending_hash Replace the feed_hash() member function of partition_key and clustering_key_prefix with the specialization of appending_hash, so that we can use the general feed_hash() function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	6b4b429883	query-result: Introduce class result_options Introduce class result_options to carry result options through the request pipeline, which at this point mean the result type and the digest algorithm. This class allows us to encapsulate the concrete digest algorithm to use. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	041acb7aea	query: Add class to encapsulate digest algorithm This patch paves the way for us to encapsulate the actual digest algorithm used for a query. The digester class dispatches to a concrete implementation based on the digest algorithm being used. It wraps the xxHash algorithm to provide a 128 bit hash, which is the size of digest expected by the inter-node protocol. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	839ed4e3a4	md5_hasher: Extract hash size Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	5f6aab832b	digest_algorithm: Add xxHash option Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	c803ae24fc	digest: Introduce xxHash hash algorithm This patch introduces xx_hasher, a class conforming to the Hasher concept, which will be used to calculate the data digest in subsequent patches. It is expected to be an order of magnitude faster than md5. We use the 64 bit variant of the algorithm, the 128 bit one still being under development. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	4f0295a35c	CMakeLists: Add xxhash directory Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	edb9193c9c	configure.py: Configure xxhash Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Duarte Nunes	102cf40bb7	Add xxhash (fast non-cryptographic hash) as submodule Signed-off-by: Duarte Nunes <duarte@scylladb.com> Note: xxhash repo should be cloned to Scylla organization, and that git url should be used instead.	2018-02-01 00:22:50 +00:00
Avi Kivity	4463e9071a	Merge "Adding the API V2 Swagger definition file" from Amnon "This series adds the base for the V2 Swagger definition file. After the series, the definition file will be at: http://localhost:10000/v2 It can be used with the swagger ui, by replacing the url in the search path." * 'amnon/swagger_20' of github.com:scylladb/seastar-dev: Register the API V2 swagger file Adding the header part of the swagger2.0 API	2018-01-31 14:47:50 +02:00
Duarte Nunes	cf6110d840	tests/cell_locker_test: Ensure timeout test finishes in useful time Use saturating_substract to prevent a really long timeout and having the test hang. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180130221336.1773-1-duarte@scylladb.com>	2018-01-31 11:34:08 +01:00
Duarte Nunes	01a8e5abb9	Merge 'Materialized views: add local locking' from Nadav "Before this patch set, our Materialized Views implementation can produce incorrect results when given concurrent updates of the same base-table row. Such concurrent updates may result, in certain cases, with two different rows in the view table, instead of just one with the latest data. In this series we add locking which serializes the two conflicting updates, and solves this problem. I explain in more detail why such locking is needed, and what kinds of locks are needed, in the third patch." * 'master' of https://github.com/nyh/scylla: Materialized views: serialize read-modify-update of base table Materialized views: test row_locker class Materialized views: implement row and partition locking mechanism	2018-01-30 17:40:12 +00:00
Tomasz Grabiec	cdd31918d0	Merge 'Make memtable reads exception safe' from Paweł These patches change the memtable reader implementation (in particular partition_snapshot_reader) so that the existing exception safety paroblems are fixed, but also in a way that, hopefully, would make it easier to reason about the error handling and avoid future bugs in that area. The main difficulty related to exception safety is that when an exception is thrown out of an allocating section that code is run again with increased memory reserved. If the retryable code has side effects it is very easy to get incorrect behaviour. In addition to that, entering an allocating section is not exactly cheap which encourages doing so rarely and having large sections. The approach taken by this series is to, first, make entering allocating sections cheaper and then reducing the amount of logic that runs inside of them to a minimum. This means that instead of entering a section once per a call to flat_mutation_reader::fill_buffer() the allocation section is entered once for each emitted row. The only state modified from within the section are cached iterators to the current row, which are dropped on retry. Hopefully, this would make the reader code easier to reason about. The optimisations to the allocating sections and managed_bytes linearised context has successfully eliminated any penalty caused by much more fine grained allocating sections. Fixes #3123. Fixes #3133. Tests: unit-tests (release) BEFORE test iterations median mad min max memtable.one_partition_one_row 1155362 869.139ns 0.282ns 868.465ns 873.253ns memtable.one_partition_many_rows 127252 7.871us 15.252ns 7.851us 7.886us memtable.many_partitions_one_row 58715 17.109us 2.765ns 17.013us 17.112us memtable.many_partitions_many_rows 4839 206.717us 212.385ns 206.505us 207.448us AFTER test iterations median mad min max memtable.one_partition_one_row 1194453 839.223ns 0.503ns 834.952ns 842.841ns memtable.one_partition_many_rows 133785 7.477us 4.492ns 7.473us 7.507us memtable.many_partitions_one_row 60267 16.680us 18.027ns 16.592us 16.700us memtable.many_partitions_many_rows 4975 201.048us 144.929ns 200.822us 201.699us ./before_sq ./after_sq diff read 337373.86 353694.24 4.8% write 388759.99 394135.78 1.4% * https://github.com/pdziepak/scylla.git memtable-exception-safety/v2: tests/perf: add microbenchmarks for memtable reader flat_mutation_reader: add allocation point in push_mutation_fragment linearization_context: remove non-trivial operations from fast path lsa: split alloc section into reserving and reclamation-disabled parts lsa: optimise disabling reclamation and invalidation counter mutation_fragment: allow creating clustering row in place paratition_snapshot_reader: minimise amount of retryable code memtable: drop memtable_entry::read() tests/memtable: add test for reader exception safety	2018-01-30 18:33:27 +01:00
Paweł Dziepak	1406ac5088	tests/memtable: add test for reader exception safety	2018-01-30 18:33:26 +01:00
Paweł Dziepak	ea7248056f	memtable: drop memtable_entry::read()	2018-01-30 18:33:26 +01:00
Paweł Dziepak	0420ca48a5	paratition_snapshot_reader: minimise amount of retryable code Retryable code that has side effects is a recipe for bugs. This patch reworkds the snapshot reader so that the amount of logic run with reclamation disabled is minimal and has a very limited side effects.	2018-01-30 18:33:26 +01:00
Paweł Dziepak	b1cb7d214e	mutation_fragment: allow creating clustering row in place Moving clustering_row is expensive due to amount of data stored internally. Adding a mutation_fragment constructor that builds a clustering_row in-place saves some of that moving.	2018-01-30 18:33:26 +01:00
Paweł Dziepak	dcd79af8ed	lsa: optimise disabling reclamation and invalidation counter Most of the lsa gory details are hidden in utils/logalloc.cc. That includes the actual implementation of a lsa region: region_impl. However, there is code in the hot path that often accesses the _reclaiming_enabled member as well as its base class allocation_strategy. In order to optimise those accesses another class is introduced: basic_region_impl that inherits from allocation_strategy and is a base of region_impl. It is defined in utils/logalloc.hh so that it is publicly visible and its member functions are inlineable from anywhere in the code. This class is supposed to be as small as possible, but contain all members and functions that are accessed from the fast path and should be inlined.	2018-01-30 18:33:26 +01:00
Paweł Dziepak	d825ae37bf	lsa: split alloc section into reserving and reclamation-disabled parts Allocating sections reserves certain amount of memory, then disables reclamation and attempts to perform given operation. If that fails due to std::bad_alloc the reserve is increased and the operation is retried. Reserving memory is expensive while just disabling reclamation isn't. Moreover, the code that runs inside the section needs to be safely retryable. This means that we want the amount of logic running with reclamation disabled as small as possible, even if it means entering and leaving the section multiple times. In order to reduce the performance penalty of such solution the memory reserving and reclamation disabling parts of the allocating sections are separated.	2018-01-30 18:33:26 +01:00
Paweł Dziepak	eb2e88e925	linearization_context: remove non-trivial operations from fast path Since linearization_context is thread_local every time it is accessed the compiler needs to emit code that checks if it was already constructed and does so if it wasn't. Moreover, upon leaving the context from the outermost scope the map needs to be cleared. All these operations impose some performance overhead and aren't really necessary if no buffers were linearised (the expected case). This patch rearranges the code so that lineatization_context is trivially constructible and the map is cleared only if it was modified.	2018-01-30 18:33:25 +01:00
Paweł Dziepak	a1278b4d6a	flat_mutation_reader: add allocation point in push_mutation_fragment Exception safety tests inject a failure at every allocation and verify whether the error is handled properly. push_mutation_fragment() adds a mutation fragment to a circular_buffer, in theory any call to that function can result in a memory allocation, but in practice that depends on the implementation details. In order to improve the effectiveness of the exception safety tests this patch adds an explicit allocation point in push_mutation_fragment().	2018-01-30 18:33:25 +01:00
Paweł Dziepak	486e0d8740	tests/perf: add microbenchmarks for memtable reader	2018-01-30 18:33:25 +01:00
Avi Kivity	00d70080af	Merge "Consume promoted index incrementally" from Vladimir "This patchset makes index_reader consume promoted index incrementally on demand as the reader advances through the current partition instead of storing the entire promoted index which can be huge. When the current page is parsed, data for promoted indices are turned into input streams that are only read and parsed if a particular position within a partition is seeked for. This avoids potentially large allocations for big partitions." * 'issues/2981/v10' of https://github.com/argenet/scylla: Use advance_past for single partition upper bound. Remove obsolete types and methods. Simplify continuous_data_consumer::consume_input() interface. Parse promoted index entries lazily upon request rather than immediately. Add helper input streams: buffer_input_stream and prepended_input_stream. Support skipping over bytes from input stream in parsers based on continuous_data_consumer Add performance tests for large partition slicing using clustering keys.	2018-01-30 18:22:28 +02:00
Nadav Har'El	2ea1922a4d	Materialized views: serialize read-modify-update of base table Before this patch, our Materialized Views implementation can produce incorrect results when given concurrent updates of the same base-table row. Such concurrent updates may result, in certain cases, in two different rows added to the view table, instead of just one with the latest data. In this patch we we add locking which serializes the two conflicting updates, and solves this problem. The locking for a single base-table column_family is implemented by the row_locker class introduced in a previous patch. A long comment in the code of this patch explains in more detail why this locking is needed, when, and what types of locks are needed: We sometimes need to lock a single clustering row, sometimes an entire partition, sometimes an exclusive lock and sometimes a shared lock. Fixes #3168 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-01-30 16:21:43 +02:00
Nadav Har'El	52e91623ce	Materialized views: test row_locker class This is a unit test for the row_locker facility. It tests various combination of shared and exclusive locks on rows and on partitions, some should succeed immediately and some should block. This tests the row_locker's API only, it does not use or test anything in Materialized Views. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-01-30 16:19:43 +02:00
Nadav Har'El	31d0a1dd0c	Materialized views: implement row and partition locking mechanism This patch adds a "row_locker" class providing locking (shard-locally) of individual clustering rows or entire partitions, and both exclusive and shared locks (a.k.a. reader/writer lock). As we'll see in a following patch, we need this locking capability for materialized views, to serialize the read-modify-update modifications which involve the same rows or partitions. The new row_locker is significantly different from the existing cell_locker. The two main differences are that 1. row_locker also supports locking the entire partition, not just individual rows (or cells in them), and that 2. row_locker supports also shared (reader) locks, not just exclusive locks. For this reason we opted for a new implementation, instead of making large modificiations to the existing cell_locker. And we put the source files in the view/ directory, because row_locker's requirements are pretty specific to the needs of materialized views. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-01-30 16:16:27 +02:00
Takuya ASADA	bec2b015e3	dist/debian: link yaml-cpp statically To avoid incompatibility between distribution provided libyaml-cpp, link it statically. Fixes #3164 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1517313320-10712-1-git-send-email-syuu@scylladb.com>	2018-01-30 14:22:02 +02:00
Botond Dénes	b7d902a9e9	database: remove unused concurrency config members Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <b257c7e9d403c55aaec34fc48863c18f9c9ae11a.1517314398.git.bdenes@scylladb.com>	2018-01-30 14:21:25 +02:00
Botond Dénes	71be2e1d0d	test.py: don't fail if test's exit code is not 0 on --help test.py invokes all test executables once with --help to determine whether it needs a -- to seperate scylla args or not. For this check it doesn't matter what exit code the test exits with, so don't fail if it's not 0. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d05be7c3819349e3b22b6249bb83fbf9269d14cb.1517314408.git.bdenes@scylladb.com>	2018-01-30 14:21:01 +02:00
Piotr Jastrzebski	d9415e8ed0	Remove unused consume_streamed_mutation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Tests: units (release) Message-Id: <fec7f2d01d42921270c90198a7b77b76960ff705.1517310923.git.piotr@scylladb.com>	2018-01-30 13:24:55 +02:00
Duarte Nunes	1e3fae5bef	db/schema_tables: Only drop UDTs after merging tables Dropping a user type requires that all tables using that type also be dropped. However, a type may appear to be dropped at the same time as a table, for instance due to the order in which a node receives schema notifications, or when dropping a keyspace. When dropping a table, if we build a schema in a shard through a global_schema_pointer, then we'll check for the existence of any user type the schema employs. We thus need to ensure types are only dropped after tables, similarly to how it's done for keyspaces. Fixes #3068 Tests: unit-tests (release) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180129114137.85149-1-duarte@scylladb.com>	2018-01-30 12:07:04 +01:00
Avi Kivity	e1f4b06295	Merge seastar upstream * seastar 770c450...19efbd9 (3): > configure.py: add --static-yaml-cpp option to link libyaml-cpp statically > Merge 'Avoid kernel stalls due to fsync' from Avi > rwlock: add exception-safe lock/unlock alternative	2018-01-30 11:44:00 +02:00
Pekka Enberg	da06339b13	scripts/find-maintainer: Find subsystem maintainer This patch adds a scripts/find-maintainer script, similar to script/get_maintainer.pl in Linux, which looks up maintainers and reviewers for a specific file from a MAINTAINERS file. Example usage looks as follows: $ ./scripts/find-maintainer cql3/statements/create_view_statement.cc CQL QUERY LANGUAGE Tomasz Grabiec <tgrabiec@scylladb.com> [maintainer] Pekka Enberg <penberg@scylladb.com> [maintainer] MATERIALIZED VIEWS Duarte Nunes <duarte@scylladb.com> [maintainer] Pekka Enberg <penberg@scylladb.com> [maintainer] Nadav Har'El <nyh@scylladb.com> [reviewer] Duarte Nunes <duarte@scylladb.com> [reviewer] The main objective of this script is to make it easier for people to find reviewers and maintainers for their patches. Message-Id: <20180119075556.31441-1-penberg@scylladb.com>	2018-01-30 09:42:35 +00:00

1 2 3 4 5 ...

14454 Commits