scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 13:45:53 +00:00

Author	SHA1	Message	Date
Rafael Ávila de Espíndola	bca4eb8b8c	Build: Garbage collect dead sections In another patch I noticed gcc producing dead functions. I am not sure why gcc is doing that. Some of those functions are already placed in independent sections, and so can be garbage collected by the linker. This is a 1% text section reduction in scylla, from 39363380 to 38974324 bytes. There is no difference in the tps reported by perf_simple_query. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200723152511.8214-1-espindola@scylladb.com>	2020-07-23 18:57:01 +03:00
Nadav Har'El	5a35632cd3	alternator: refactor api_error class In the patch "Add exception overloads for Dynamo types", Alternator's single api_error exception type was replaced by a more complex hierarchy of types. The implementation was not only longer and more complex to understand - I believe it also negated an important observation: The "api_error" exception type is special. It is not an exception created by code for other code. It is not meant to be caught in Alternator code. Instead, it is supposed to contain an error message created for the user, containing one of the few supported exception exception "names" described in the DynamoDB documentation, and a user-readable text message. Throwing such an exception in Alternator code means the thrower wants the request to abort immediately, and this message to reach the user. These exceptions are not designed to be caught in Alternator code. Code should use other exceptions - or alternatives to exceptions (e.g., std::optional) for problems that should be handled before returning a different error to the user. Moreover, "api_error" isn't just thrown as an exception - it can also be returned-by-value in a executor::request_return_type) - which is another reason why it should not be subclassed. For these reasons, I believe we should have a single api_error type, and it's wrong to subclass it. So in this patch I am reverting the subclasses and template added in the aforementioned patch. Still, one correct observation made in that patch was that it is inconvenient to type in DynamoDB exception names (no help from the editor in completing those strings) and also error-prone. In this patch we propse a different - simpler - solution to the same problem: We add trivial factory functions, e.g., api_error::validation(std::string) as a shortcut to api_error("ValidationException"). The new implementation is easy to understand, and also more self explanatory to readers: It is now clear that "api_error::validation()" is actually a user-visible "api_error", something which was obscured by the name validation_exception() used before this patch. Finally, this patch also improves the comment in error.hh explaining the purpose of api_error and the fact it can be returned or thrown. The fact it should not be subclassed is legislated with a "finally". There is also no point of this class inheriting from std::exception or having virtual functions, or an empty constructor - so all these are dropped as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-23 15:36:39 +03:00
Rafael Ávila de Espíndola	bc20b71e6a	configure: Don't use pkg-config for xxhash The pkg-config for xxhash points to the wrong directory. I reported https://bugzilla.redhat.com/show_bug.cgi?id=1858407 But xxhash is such a simple library that it is trivial to avoid pkg-config. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200717204344.601729-1-espindola@scylladb.com>	2020-07-20 21:51:23 +03:00
Rafael Ávila de Espíndola	44cf4d74cd	build: Put test.py invocations in the console pool Ninja has a special pool called console that causes programs in that pool to output directly to the console instead of being logged. By putting test.py in it is now possible to run just $ ninja dev-test And see the test.py output while it is running. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200716204048.452082-1-espindola@scylladb.com>	2020-07-17 00:33:10 +03:00
Dejan Mircevski	cc86d915ed	configure.py: $mode-test targets depend on scylla The targets {dev\|debug\|release}-test run all unit tests, including alternator/run. But this test requires the Scylla executable, which wasn't among the dependencies. Fix it by adding build/$mode/scylla to the dependency list. Fixes #6855. Tests: `ninja dev-test` after removing build/dev/scylla Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-16 16:38:48 +03:00
Nadav Har'El	61f52da9b1	merge: Alternator/CDC: Implement streams support Merged pull request https://github.com/scylladb/scylla/pull/6694 by Calle Wilund: Implementation of DynamoDB streams using Scylla CDC. Fixes #5065 Initial, naive implementation insofar that it uses 1:1 mapping CDC stream to DynamoDB shard. I.e. there are a lot of shards. Includes tests verified against both local DynamoDB server and actual AWS remote one. Note: Because of how data put is implemented in alternator, currently we do not get "proper" INSERT labels for first write of data, because to CDC it looks like an update. The test compensates for this, but actual users might not like it.	2020-07-16 08:18:25 +03:00
Takuya ASADA	e52ae78f79	reloc: support unified relocatable package This introduce unified relocatable package, a single tarball to install all Scylla packages. Fixes #6626 See scylladb/scylla-pkg#1218	2020-07-15 20:29:31 +03:00
Tomasz Grabiec	b8531fb885	Merge "Switch partitions cache from BST to B+tree & array" from Pavel E. The data model is now bplus::tree<Key = int64_t, T = array<entry>> where entry can be cache_entry or memtable_entry. The whole thing is encapsulated into a collection called "double_decker" from patch #3. The array<T> is an array of T-s with 0-bytes overhead used to resolve hash conflicts (patch #2). branch: tests: unit(debug) tests before v7: unit(debug) for new collections, memtable and row_cache unit(dev) for the rest perf(dev) * https://github.com/xemul/scylla/commits/row-cache-over-bptree-9: test: Print more sizes in memory_footprint_test memtable: Switch onto B+ rails row_cache: Switch partition tree onto B+ rails memtable: Count partitions separately token: Introduce raw() helper and raw comparator row-cache: Use ring_position_comparator in some places dht: Detach ring_position_comparator_for_sstables double-decker: A combination of B+tree with array intrusive-array: Array with trusted bounds utils: B+ tree implementation test: Move perf measurement helpers into header	2020-07-15 14:54:29 +02:00
Calle Wilund	8fb9b32bd3	alternator: Implement ListStreams command	2020-07-15 08:10:23 +00:00
Calle Wilund	8a7b24dea1	alternator::error: Add exception overloads for Dynamo types Add types exception overloads for ValidationException, ResourceNotFoundException, etc, to avoid writing explicit error type as string everywhere (with the potential for spelling errors ever present). Also allows intellisense etc to complete the exception when coded.	2020-07-15 08:10:23 +00:00
Pavel Emelyanov	cf1315cde5	double-decker: A combination of B+tree with array The collection is K:V store bplus::tree<Key = K, Value = array_trusted_bounds<V>> It will be used as partitions cache. The outer tree is used to quickly map token to cache_entry, the inner array -- to resolve (expected to be rare) hash collisions. It also must be equipped with two comparators -- less one for keys and full one for values. The latter is not kept on-board, but it required on all calls. The core API consists of just 2 calls - Heterogenuous lower_bound(search_key) -> iterator : finds the element that's greater or equal to the provided search key Other than the iterator the call returns a "hint" object that helps the next call. - emplace_before(iterator, key, hint, ...) : the call construct the element right before the given iterator. The key and hint are needed for more optimal algo, but strictly speaking not required. Adding an entry to the double_decker may result in growing the node's array. Here to B+ iterator's .reconstruct() method comes into play. The new array is created, old elements are moved onto it, then the fresh node replaces the old one. // TODO: Ideally this should be turned into the // template <typename OuterCollection, typename InnerCollection> // but for now the double_decker still has some intimate knowledge // about what outer and inner collections are. Insertion into this collection _may_ invalidate iterators, but may leave intact. Invalidation only happens in case of hashing conflict, which can be clearly seen from the hint object, so there's a good room for improvement. The main usage by row_cache (the find_or_create_entry) looks like cache_entry find_or_create_entry() { bound_hint hint; it = lower_bound(decorated_key, &hint); if (!hint.found) { it = emplace_before(it, decorated_key.token(), hint, <constructor args>) } return *it; } Now the hint. It contains 3 booleans, that are - match: set to true when the "greater or equal" condition evaluated to "equal". This frees the caller from the need to manually check whether the entry returned matches the search key or the new one should be inserted. This is the "!found" check from the above snippet. To explain the next 2 bools, here's a small example. Consider the tree containing two elements {token, partition key}: { 3, "a" }, { 5, "z" } As the collection is sorted they go in the order shown. Next, this is what the lower_bound would return for some cases: { 3, "z" } -> { 5, "z" } { 4, "a" } -> { 5, "z" } { 5, "a" } -> { 5, "z" } Apparently, the lower bound for those 3 elements are the same, but the code-flows of emplacing them before one differ drastically. { 3, "z" } : need to get previous element from the tree and push the element to it's vector's back { 4, "a" } : need to create new element in the tree and populate its empty vector with the single element { 5, "a" } : need to put the new element in the found tree element right before the found vector position To make one of the above decisions the .emplace_before would need to perform another set of comparisons of keys and elements. Fortunately, the needed information was already known inside the lower_bound call and can be reported via the hint. Said that, - key_match: set to true if tree.lower_bound() found the element for the Key (which is token). For above examples this will be true for cases 3z and 5a. - key_tail: set to true if the tree element was found, but when comparing values from array the bounding element turned out to belong to the next tree element and the iterator was ++-ed. For above examples this would be true for case 3z only. And the last, but not least -- the "erase self" feature. Which is given only the cache_entry pointer at hands remove it from the collection. To make this happen we need to make two steps: 1. get the array the entry sits in 2. get the b+ tree node the vectors sits in Both methods are provided by array_trusted_bounds and bplus::tree. So, when we need to get iterator from the given T pointer, the algo looks like - Walk back the T array untill hitting the head element - Call array_trusted_bounds::from_element() getting the array - Construct b+ iterator from obtained array - Construct the double_decker iterator from b+ iterator and from the number of "steps back" from above - Call double_decker::iterator.erase() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:29:53 +03:00
Pavel Emelyanov	eb70644c1c	intrusive-array: Array with trusted bounds A plain array of elements that grows and shrinks by constructing the new instance from an existing one and moving the elements from it. Behaves similarly to vector's external array, but has 0-bytes overhead. The array bounds (0-th and N-th elemements) are determined by checking the flags on the elements themselves. For this the type must support getters and setters for the flags. To remove an element from array there's also a nothrow option that drops the requested element from array, shifts the righter ones left and keeps the trailing unused memory (so called "train") until reconstruction or destruction. Also comes with lower_bound() helper that helps keeping the elements sotred and the from_element() one that returns back reference to the array in which the element sits. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:29:49 +03:00
Pavel Emelyanov	95f15ea383	utils: B+ tree implementation // The story is at // https://groups.google.com/forum/#!msg/scylladb-dev/sxqTHM9rSDQ/WqwF1AQDAQAJ This is the B+ version which satisfies several specific requirements to be suitable for row-cache usage. 1. Insert/Remove doesn't invalidate iterators 2. Elements should be LSA-compactable 3. Low overhead of data nodes (1 pointer) 4. External less-only comparator 5. As little actions on insert/delete as possible 6. Iterator walks the sorted keys The design, briefly is: There are 3 types of nodes: inner, leaf and data, inner and leaf keep build-in array of N keys and N(+1) nodes. Leaf nodes sit in a doubly linked list. Data nodes live separately from the leaf ones and keep pointers on them. Tree handler keeps pointers on root and left-most and right-most leaves. Nodes do _not_ keep pointers or references on the tree (except 3 of them, see below). changes in v9: - explicitly marked keys/kids indices with type aliases - marked the whole erase/clear stuff noexcept - disposers now accept object pointer instead of reference - clear tree in destructor - added more comments - style/readability review comments fixed Prior changes - Add noexcepts where possible - Restrict Less-comparator constraint -- it must be noexcept - Generalized node_id - Packed code for beging()/cbegin() - Unsigned indices everywhere - Cosmetics changes - Const iterators - C++20 concepts - The index_for() implmenetation is templatized the other way to make it possible for AVX key search specialization (further patching) - Insertion tries to push kids to siblings before split Before this change insertion into full node resulted into this node being split into two equal parts. This behaviour for random keys stress gives a tree with ~2/3 of nodes half-filled. With this change before splitting the full node try to push one element to each of the siblings (if they exist and not full). This slows the insertion a bit (but it's still way faster than the std::set), but gives 15% less total number of nodes. - Iterator method to reconstruct the data at the given position The helper creates a new data node, emplaces data into it and replaces the iterator's one with it. Needed to keep arrays of data in tree. - Milli-optimize erase() - Return back an iterator that will likely be not re-validated - Do not try to update ancestors separation key for leftmost kid This caused the clear()-like workload work poorly as compared to std:set. In particular the row_cache::invalidate() method does exactly this and this change improves its timing. - Perf test to measure drain speed - Helper call to collect tree counters - Fix corner case of iterator.emplace_before() - Clean heterogenous lookup API - Handle exceptions from nodes allocations - Explicitly mark places where the key is copied (for future) - Extend the tree.lower_bound() API to report back whether the bound hit the key or not - Addressed style/cleanness review comments Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:29:43 +03:00
Pekka Enberg	7ef50d7c71	configure.py: Don't install dependencies when building submodules Let's pass the "--nodeps" option to "build_reloc.sh" script of the submodules to avoid the build system running "sudo"... Reported-by: Piotr Sarna <sarna@scylladb.com> Reported-by: Pavel Emelyanov <xemul@scylladb.com> Tested-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200714114340.440781-1-penberg@scylladb.com>	2020-07-14 14:50:59 +03:00
Pekka Enberg	f0ae550553	configure.py: Add 'build' target for building artifats The default ninja build target now builds artifacts and packages. Let's add a 'build' target that only builds the artifacts. Message-Id: <20200714105042.416698-1-penberg@scylladb.com>	2020-07-14 13:55:32 +03:00
Pekka Enberg	c6116c36e0	configure.py: Remove obsolete "--with-osv" option The "--with-osv" option is has been a no-op since commit `cc17c44640` ("Move seastar to a submodule"). Let's remove it as obsolete. Message-Id: <20200713131333.125634-1-penberg@scylladb.com>	2020-07-13 17:14:44 +03:00
Pekka Enberg	ace1b15ed6	configure.py: Make "dist" part of default target This adds a new "dist-<mode>" target, which builds the server package in selected build mode together with the other packages, and wires it to the "<mode>" target, which is built as part of default "ninja" invocation. This allows us to perform a full build, package, and test cycle across all build modes with: ./configure.py && ninja && ./test.py Message-Id: <20200713101918.117692-1-penberg@scylladb.com>	2020-07-13 17:14:44 +03:00
Avi Kivity	d74582fbc5	move jmx/tools submodules to tools directory Move all package repositories to tools directory.	2020-07-13 17:14:14 +03:00
Pekka Enberg	5476efabb3	configure.py: Make output less verbose by default The configure.py script outputs the Seastar build command it executes: ['./cooking.sh', '-i', 'dpdk', '-d', '../build/release/seastar', '--', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_C_COMPILER=gcc', '-DCMAKE_CXX_COMPILER=g++', '-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON', '-DSeastar_CXX_FLAGS=;-Wno-error=stack-usage=-ffile-prefix-map=/home/penberg/src/scylla/scylla=.;-march=westmere;-O3;-Wstack-usage=13312;--param;inline-unit-growth=300', '-DSeastar_LD_FLAGS=-Wl,--build-id=sha1,--dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 ', '-DSeastar_CXX_DIALECT=gnu++20', '-DSeastar_API_LEVEL=4', '-DSeastar_UNUSED_RESULT_ERROR=ON', '-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm'] The output is mostly useful for debugging the build process itself, so hide it behind a "--verbose" flag, and make it more human-readable while at it: ./cooking.sh \ -i \ dpdk \ -d \ ../build/release/seastar \ -- \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DCMAKE_C_COMPILER=gcc \ -DCMAKE_CXX_COMPILER=g++ \ -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON \ -DSeastar_CXX_FLAGS=;-Wno-error=stack-usage=-ffile-prefix-map=/home/penberg/src/scylla/scylla=.;-march=westmere;-O3;-Wstack-usage=13312;--param;inline-unit-growth=300 \ -DSeastar_LD_FLAGS=-Wl,--build-id=sha1,--dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 \ -DSeastar_CXX_DIALECT=gnu++20 \ -DSeastar_API_LEVEL=4 \ -DSeastar_UNUSED_RESULT_ERROR=ON \ -DSeastar_DPDK=ON \ -DSeastar_DPDK_MACHINE=wsm Message-Id: <20200713065509.83184-1-penberg@scylladb.com>	2020-07-13 09:57:38 +03:00
Avi Kivity	b0698dfb38	Merge 'Rewrite CQL3 restriction representation' from dekimir " This is the first stage of replacing the existing restrictions code with a new representation. It adds a new class `expression` to replace the existing class `restriction`. Lots of the old code is deleted, though not all -- that will come in subsequent stages. Tests: unit (dev, debug restrictions_test), dtest (next-gating) " * dekimir-restrictions-rewrite: cql3/restrictions: Drop dead code cql3/restrictions: Use free functions instead of methods cql3/restrictions: Create expression objects cql3/restrictions: Add free functions over new classes cql3/restrictions: Add new representation	2020-07-08 10:22:17 +03:00
Dejan Mircevski	37ebe521e3	cql3/restrictions: Use free functions instead of methods Instead of `restriction` class methods, use the new free functions. Specific replacement actions are listed below. Note that class `restrictions` (plural) remains intact -- both its methods and its type hierarchy remain intact for now. Ensure full test coverage of the replacement code with new file test/boost/restrictions_test.cc and some extra testcases in test/cql/*. Drop some existing tests because they codify buggy behaviour (reference #6369, #6382). Drop others because they forbid relation combinations that are now allowed (eg, mixing equality and inequality, comparing to NULL, etc.). Here are some specific categories of what was replaced: - restriction::is_foo predicates are replaced by using the free function find_if; sometimes it is used transitively (see, eg, has_slice) - restriction::is_multi_column is replaced by dynamic casts (recall that the `restrictions` class hierarchy still exists) - utility methods is_satisfied_by, is_supported_by, to_string, and uses_function are replaced by eponymous free functions; note that restrictions::uses_function still exists - restriction::apply_to is replaced by free function replace_column_def - when checking infinite_bound_range_deletions, the has_bound is replaced by local free function bounded_ck - restriction::bounds and restriction::value are replaced by the more general free function possible_lhs_values - using free functions allows us to simplify the multi_column_restriction and token_restriction hierarchies; their methods merge_with and uses_function became identical in all subclasses, so they were moved to the base class - single_column_primary_key_restrictions<clustering_key>::needs_filtering was changed to reuse num_prefix_columns_that_need_not_be_filtered, which uses free functions Fixes #5799. Fixes #6369. Fixes #6371. Fixes #6372. Fixes #6382. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-07 23:08:09 +02:00
Piotr Sarna	446b89f408	test: move json tests from manual/ to boost/ Manual tests are, as the name suggests, not run automatically, which makes them more prone to regressions. JSON tests are fast and correct, so there's no reason for them to be marked as manual. Message-Id: <dea75b0a0d1c238d12382a28840978884ac6ec2c.1594023481.git.sarna@scylladb.com>	2020-07-06 11:24:12 +03:00
Benny Halevy	0bb1c0f37d	sstables: random_access_reader: move functions out of line These are not good candidates for inlining. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-05 18:47:04 +03:00
Nadav Har'El	8e3ecc30a9	merge: Migrate from libjsoncpp to rjson Merged patch series by Piotr Sarna: The alternator project was in need of a more optimized JSON library, which resulted in creating "rjson" helper functions. Scylla generally used libjsoncpp for its JSON handling, but in order to reduce the dependency hell, the usage is now migrated to rjson, which is faster and offers the same functionality. The original plan was to be able to drop the dependency on libjsoncpp-lib altogether and remove it from install-dependencies.sh, but one last usage of it remains in our test suite, namely cql_repl. The tool compares its output JSON textually, so it depends on how a library presents JSON - what are the delimeters, indentation, etc. It's possible to provide a layer of translation to force rjson to print in an identical format, but the other issue is that libjsoncpp keeps subobjects sorted by their name, while rjson uses an unordered structure. There are two possible solutions for the last remaining usage of libjsoncpp: 1. change our test suite to compare JSON documents with a JSON parser, so that we don't rely on internal library details 2. provide a layer of translation which forces rjson to print its objects in a format idential to libjsoncpp. (1.) would be preferred, since now we're also vulnerable for changes inside libjsoncpp itself - if they change anything in their output format, tests would start failing. The issue is not critical however, so it's left for later. Tests: unit(dev), manual(json_test), dtest(partitioner_tests.TestPartitioner.murmur3_partitioner_test) Piotr Sarna (8): alternator,utils: move rjson.hh to utils/ alternator: remove ambiguous string overloads in rjson rjson: add parse_to_map helper function rjson: add from_string_map function rjson: add non-throwing parsing rjson: move quote_json_string to rjson treewide: replace libjsoncpp usage with rjson configure: drop json.cc and json.hh helpers alternator/base64.hh \| 2 +- alternator/conditions.cc \| 2 +- alternator/executor.hh \| 2 +- alternator/expressions.hh \| 2 +- alternator/expressions_types.hh \| 2 +- alternator/rmw_operation.hh \| 2 +- alternator/serialization.cc \| 2 +- alternator/serialization.hh \| 2 +- alternator/server.cc \| 2 +- caching_options.hh \| 9 +- cdc/log.cc \| 4 +- column_computation.hh \| 5 +- configure.py \| 3 +- cql3/functions/functions.cc \| 4 +- cql3/statements/update_statement.cc \| 24 ++-- cql3/type_json.cc \| 212 ++++++++++++++++++---------- cql3/type_json.hh \| 7 +- db/legacy_schema_migrator.cc \| 12 +- db/schema_tables.cc \| 1 - flat_mutation_reader.cc \| 1 + index/secondary_index.cc \| 80 +++++------ json.cc \| 80 ----------- json.hh \| 113 --------------- schema.cc \| 25 ++-- test/boost/cql_query_test.cc \| 9 +- test/manual/json_test.cc \| 4 +- test/tools/cql_repl.cc \| 1 + {alternator => utils}/rjson.cc \| 75 +++++++++- {alternator => utils}/rjson.hh \| 40 +++++- 29 files changed, 344 insertions(+), 383 deletions(-) delete mode 100644 json.cc delete mode 100644 json.hh rename {alternator => utils}/rjson.cc (86%) rename {alternator => utils}/rjson.hh (81%)	2020-07-03 18:23:56 +02:00
Piotr Sarna	449e72826f	configure: drop json.cc and json.hh helpers Now that only rjson is used in the code, the old helper is not used anywhere in the code, so it can be dropped.	2020-07-03 10:27:23 +02:00
Piotr Sarna	4de23d256e	alternator,utils: move rjson.hh to utils/ rjson is going to replace libjsoncpp, so it's moved from alternator to the common utils/ directory.	2020-07-03 08:30:01 +02:00
Avi Kivity	a3dd1ba76f	build: thrift: avoid rebuild if cassandra.thrift is touched but not modified Thrift 0.12 includes a change [1] that avoids writing the generated output if it has not changed. As a result, if you touch cassandra.thrift (but not change it), the generated files will not update, and as a result ninja will try to rebuild them every time. The compilation of thrift files will be fast due to ccache, but still we will re-link everything. This touching of cassandra.thrift can happen naturally when switching to a different git branch and then switching back. The net result is that cassandra.thrift's contents has not changed, but its timestamp has. Fix by adding the "restat" option to the thrift rule. This instructs ninja to check of the output has changed as expected or not, and to avoid unneeded rebuilds if it has not. [1] https://issues.apache.org/jira/browse/THRIFT-4532	2020-07-03 08:24:41 +02:00
Tomasz Grabiec	8bd7359d93	Merge "lwt: introduce LWT flag in prepared statement metadata" from Pavel This patch set adds a few new features in order to fix issue The list of changes is briefly as follows: - Add a new `LWT` flag to `cql3::prepared_metadata`, which allows clients to clearly distinguish betwen lwt and non-lwt statements without need to execute some custom parsing logic (e.g. parsing the prepared query with regular expressions), which is obviously quite fragile. - Introduce the negotiation procedure for cql protocol extensions. This is done via `cql_protocol_extension` enum and is expected to have an appropriate mirroring implementation on the client driver side in order to work properly. - Implmenent a `LWT_ADD_METADATA_MARK` cql feature on top of the aforementioned algorithm to make the feature negotiable and use it conditionally (iff both server and client agrees with each other on the set of cql extensions). The feature is meant to be further utilized by client drivers to use primary replicas consistently when dealing with conditional statements. * git@github.com:ManManson/scylla feature/lwt_prepared_meta_flag_2: lwt: introduce "LWT" flag in prepared statement metadata transport: introduce `cql_protocol_extension` enum and cql protocol extensions negotiation	2020-06-30 12:40:19 +03:00
Avi Kivity	509442b128	Merge "Move snapshot code from storage_service into independent component" from Pavel E " The snapshotting code is already well isolated from the rest of the storage_service, so it's relatively easy to move it into independent component, thus de-bloating the storage_service. As a side effect this allows painless removal of calls to global get_storage_service() from schema::describe code. Test: unit(debug), dtest.snapshot_test(dev), manual start-stop " * 'br-snapshot-controller-4' of https://github.com/xemul/scylla: snap: Get rid of storage_service reference in schema.cc main: Stop http server snapshot: Make check_snapshot_not_exist a method snapshots: Move ops gate from storage_service snapshot: Move lock from storage_service snapshot: Move all code into db::snapshot_ctl class storage_service: Move all snapshot code into snapshot-ctl.cc snapshots: Initial skeleton snapshots: Properly shutdown API endpoints api: Rewrap set_server_snapshot lambda	2020-06-28 13:17:32 +03:00
Dejan Mircevski	2d91e5f6a0	configure.py: Drop unused var cassandra_interface The variable doesn't appear to be used anywhere. Tests: manually run configure.py Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-27 21:20:05 +03:00
Dejan Mircevski	65030f1406	configure.py: Update gcc version check As HACKING.md suggests, we now require gcc version >= 10. Set the minimum at 10.1.1, as that is the first official 10 release: https://gcc.gnu.org/releases.html Tests: manually run configure.py and ensure it passes/fails appropriately. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-27 21:19:00 +03:00
Pavel Emelyanov	d989d9c1c7	snapshots: Initial skeleton A placeholder for snapshotting code that will be moved into it from the storage_service. Also -- pass it through the API for future use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 19:54:14 +03:00
Rafael Ávila de Espíndola	64c8164e6c	everywhere: Update to seastar api v4 (when_all_succeed returning a tuple) We now just need to replace a few calls to then with then_unpack. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200618172100.111147-1-espindola@scylladb.com>	2020-06-23 19:40:18 +03:00
Benny Halevy	aa4b4311e2	configure: do not define SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION in debug mode Seastar uses the default allocator in debug mode so it can't inject allocation failures in this mode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Test: mutation_test(debug) Message-Id: <20200621131819.72108-1-bhalevy@scylladb.com>	2020-06-22 14:03:13 +03:00
Avi Kivity	5d99d667ec	Merge "Build system improvements for packaging" from Pekka " This patch series attempts to decouple package build and release infrastructure, which is internal to Scylla (the company). The goal of this series is to make it easy for humans and machines to build the full Scylla distribution package artifacts, and make it easy to quickly verify them. The improvements to build system are done in the following steps. 1. Make scylla.git a super-module, which has git submodules for scylla-jmx and scylla-tools. A clone of scylla.git is now all that is needed to access all source code of all the different components that make up a Scylla distribution, which is a preparational step to adding "dist" ninja build target. A scripts/sync-submodules.sh helper script is included, which allows easy updating of the submodules to the latest head of the respective git repositories. 2. Make builds reproducible by moving the remaining relocatable package specific build options from reloc/build_reloc.sh to the build system. After this step, you can build the exact same binaries from the git repository by using the dbuild version from scylla.git. 3. Add a "dist" target to ninja build, which builds all .rpm and .deb packages with one command. To build a release, run: $ ./tools/toolchain/dbuild ./configure.py --mode release $ ./tools/toolchain/dbuild ninja-build dist and you will now have .rpm and .deb packages to all the components of a Scylla distribution. 4. Add a "dist-check" target to ninja build for verification of .rpm and .deb packages in one command. To verify all the built packages, run: $ ninja-build dist-check Please note that you must run this step on the host, because the target uses Docker under the hood to verify packages by installing them on different Linux distributions. Currently only CentOS 7 verification is supported. All these improvements are done so that backward compatibility is retained. That is, any existing release infrastructure or other build scripts are completely unaffacted. Future improvements to consider: - Package repository generation: add a "ninja repo" command to generate a .rpm and .deb repositories, which can be uploaded to a web site. This makes it possible to build a downloadable Scylla distribution from scylla.git. The target requires some configuration, which user has to provide. For example, download URL locations and package signing keys. - Amazon Machine Image (AMI) support: add a "ninja ami" command to simplify the steps needed to generate a Scylla distribution AMI. - Docker image support: add a "ninja docker" command to simplify the steps needed to generate a Scylla distribution Docker image. - Simplify and unify package build: simplify and unify the various shell scripts needed to build packages in different git repositories. This step will break backward compatiblity and can be done only after relevant build scripts and release infrastructure is updated. " * 'penberg/packaging/v5' of github.com:penberg/scylla: docs: Update packaging documentation build: Add "dist-check" target scripts/testing: Add "dist-check" for package verification build: Add "dist" target reloc: Add '--builddir' option to build_deb.sh build: Add "-ffile-prefix-map" to cxxflags docs: Document sync-submodules.sh script in maintainer.md sync-submodules.sh: Add script for syncing submodules Add scylla-tools submodule Add scylla-jmx submodule	2020-06-18 12:59:52 +03:00
Avi Kivity	9322c07c71	Merge "Use binary search in sstable promoted index" from Tomasz " The "promoted index" is how the sstable format calls the clustering key index within a given partition. Large partitions with many rows have it. It's embedded in the partition index entry. Currently, lookups in the promoted index are done by scanning the index linearly so the lookup is O(N). For large partitions that's inefficient. It consumes both a lot of CPU and I/O. We could do better and use binary search in the index. This patch series switches the mc-format index reader to do that. Other formats use the old way. The "mc" format promoted index has an extra structure at the end of the index called "offset map". It's a vector of offsets of consecutive promoted index entries. This allows us to access random entries in the index without reading the whole index. The location of the offset entry for a given promoted index entry can be derived by knowing where the offset vector ends in the index file, so the offset map also doesn't have to be read completely into the memory. The most tricky part is caching. We need to cache blocks read from the index file to amortize the cost of binary search: - if the promoted index fits in the 32 KiB which was read from the index when looking for the partition entry, we don't want to issue any additional I/O to search the promoted index. - with large promoted indexes, the last few bisections will fall into the same I/O block and we want to reuse that block. - we don't want the cache to grow too big, we don't want to cache the whole promoted index as the read progresses over the index. Scanning reads may skip multiple times. This series implements a rather simple approach which meets all the above requirements and is not worse than the current state of affairs: - Each index cursor has its own cache of the index file area which corresponds to promoted index This is managed by the cached_file class. - Each index cursor has its own cache of parsed blocks. This allows the upper bound estimation to reuse information obtained during lower bound lookup. This estimation is used to limit read-aheads in the data file. - Each cursor drops entries that it walked past so that memory footprint stays O(log N) - Cached buffers are accounted to read's reader_permit. Later, we could have a single cache shared by many readers. For that, we need to come up with eviction policy. Fixes #4007. TESTING RESULTS * Point reads, large promoted index: Config: rows: 10000000, value size: 2000 Partition size: 20 GB Index size: 7 MB Notes: - Slicing read into the middle of partition (offset=5000000, read=1) is a clear win for the binary search: time: 1.9ms vs 22.9ms CPU utilization: 8.9% vs 92.3% I/O: 21 reqs / 172 KiB vs 29 reqs / 3'520 KiB It's 12x faster, CPU utilization is 10x times smaller, disk utilization is 20x smaller. - Slicing at the front (offset=0) is a mixed bag. time is similar: 1.8ms CPU utilization is 6.7x smaller for bsearch: 8.5% vs 57.7% disk bandwidth utilization is smaller for bsearch but uses more IOs: 4 reqs / 320 KiB (scan) vs 17 reqs / 188 KiB (bsearch) bsearch uses less bandwidth because the series reduces buffer size used for index file I/O. scan is issuing: 2 * 128 KB (index page) 2 * 32 KB (data file) bsearch is issuing: 1 * 64 KB (index page) 15 * 4 KB (promoted index) 1 * 64 KB (data file) The 1 * 64 KB is chosen dynamically by seastar. Sometimes it chooses 2 * 32 KB (with read-ahead). 32 KB is the minimum I/O currently. Disk utilization could be further improved by changing the way seastar's dynamic I/O adjustments work so that it uses 1 * 4 KB when it suffices. This is left for the follow-up. Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-slicing-clustering-keys -c1 --test-case-duration=1 Before: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001836 172 1 545 9 563 175 4.0 4 320 2 2 0 1 1 0 0 0 57.7% 0 0 32 0.001858 502 32 17220 126 17776 11526 3.2 3 324 2 1 0 1 1 0 0 0 56.4% 0 0 256 0.002833 339 256 90374 427 91757 85931 7.0 7 776 3 1 0 1 1 0 0 0 41.1% 0 0 4096 0.017211 58 4096 237984 2011 241802 233870 66.1 66 8376 59 2 0 1 1 0 0 0 21.4% 0 5000000 1 0.022952 42 1 44 1 45 41 29.2 29 3520 22 2 0 1 1 0 0 0 92.3% 0 5000000 32 0.023052 43 32 1388 14 1414 1331 31.1 32 3588 26 2 0 1 1 0 0 0 91.7% 0 5000000 256 0.024795 41 256 10325 129 10721 9993 43.1 39 4544 29 2 0 1 1 0 0 0 86.4% 0 5000000 4096 0.038856 27 4096 105414 398 106918 103162 95.2 95 12160 78 5 0 1 1 0 0 0 61.4% 0 After (v2): offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001831 248 1 546 21 581 252 17.6 17 188 2 0 0 1 1 0 0 0 8.5% 0 0 32 0.001910 535 32 16751 626 17770 13896 17.9 19 160 3 0 0 1 1 0 0 0 8.8% 0 0 256 0.003545 266 256 72207 2333 89076 62852 26.9 24 764 7 0 0 1 1 0 0 0 9.7% 0 0 4096 0.016800 56 4096 243812 524 245430 239736 83.6 83 8700 64 0 0 1 1 0 0 0 16.6% 0 5000000 1 0.001968 351 1 508 19 538 380 21.3 21 172 2 0 0 1 1 0 0 0 8.9% 0 5000000 32 0.002273 431 32 14077 436 15503 11551 22.7 22 268 3 0 0 1 1 0 0 0 8.9% 0 5000000 256 0.003889 257 256 65824 2197 81833 57813 34.0 37 652 18 0 0 1 1 0 0 0 11.2% 0 5000000 4096 0.017115 54 4096 239324 834 241310 231993 88.3 88 8844 65 0 0 1 1 0 0 0 16.8% 0 After (v1): offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001886 259 1 530 4 545 261 18.0 18 376 2 2 0 1 1 0 0 0 9.1% 0 0 32 0.001954 513 32 16381 93 16844 15618 19.0 19 408 3 2 0 1 1 0 0 0 9.3% 0 0 256 0.003266 318 256 78393 1820 81567 61663 30.8 26 1272 7 2 0 1 1 0 0 0 10.4% 0 0 4096 0.017991 57 4096 227666 855 231915 225781 83.1 83 8888 55 5 0 1 1 0 0 0 15.5% 0 5000000 1 0.002353 232 1 425 2 432 232 23.0 23 396 2 2 0 1 1 0 0 0 8.7% 0 5000000 32 0.002573 384 32 12437 47 12571 429 25.0 25 460 4 2 0 1 1 0 0 0 8.5% 0 5000000 256 0.003994 259 256 64101 2904 67924 51427 37.0 35 1484 11 2 0 1 1 0 0 0 10.6% 0 5000000 4096 0.018567 56 4096 220609 448 227395 219029 89.8 89 9036 59 5 0 1 1 0 0 0 15.1% 0 * Point reads, small promoted index (two blocks): Config: rows: 400, value size: 200 Partition size: 84 KiB Index size: 65 B Notes: - No significant difference in time - the same disk utilization - similar CPU utilization Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-slicing-clustering-keys -c1 --test-case-duration=1 Before: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.000279 470 1 3587 31 3829 478 3.0 3 68 2 1 0 1 1 0 0 0 21.1% 0 0 32 0.000276 3498 32 116038 811 122756 104033 3.0 3 68 2 1 0 1 1 0 0 0 24.0% 0 0 256 0.000412 2554 256 621044 1778 732150 559221 2.0 2 72 2 0 0 1 1 0 0 0 32.6% 0 0 4096 0.000510 1901 400 783883 4078 819058 665616 2.0 2 88 2 0 0 1 1 0 0 0 36.4% 0 200 1 0.000339 2712 1 2951 8 3001 2569 2.0 2 72 2 0 0 1 1 0 0 0 17.8% 0 200 32 0.000352 2586 32 91019 266 92427 83411 2.0 2 72 2 0 0 1 1 0 0 0 20.8% 0 200 256 0.000458 2073 200 436503 1618 453945 385501 2.0 2 88 2 0 0 1 1 0 0 0 29.4% 0 200 4096 0.000458 2097 200 436475 1676 458349 381558 2.0 2 88 2 0 0 1 1 0 0 0 29.0% 0 After (v1): Testing slicing of large partition using clustering keys: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.000278 492 1 3598 30 3831 500 3.0 3 68 2 1 0 1 1 0 0 0 19.4% 0 0 32 0.000275 3433 32 116153 753 122915 92559 3.0 3 68 2 1 0 1 1 0 0 0 22.5% 0 0 256 0.000458 2576 256 559437 2978 728075 504375 2.1 2 88 2 0 0 1 1 0 0 0 29.0% 0 0 4096 0.000506 1888 400 790064 3306 822360 623109 2.0 2 88 2 0 0 1 1 0 0 0 36.6% 0 200 1 0.000382 2493 1 2619 10 2675 2268 2.0 2 88 2 0 0 1 1 0 0 0 16.3% 0 200 32 0.000398 2393 32 80422 333 84759 22281 2.0 2 88 2 0 0 1 1 0 0 0 19.0% 0 200 256 0.000459 2096 200 435943 1608 453989 380749 2.0 2 88 2 0 0 1 1 0 0 0 30.5% 0 200 4096 0.000458 2097 200 436410 1651 455779 382485 2.0 2 88 2 0 0 1 1 0 0 0 29.2% 0 * Scan with skips, large index: Config: rows: 10000000, value size: 2000 Partition size: 20 GB Index size: 7 MB Notes: - Similar time, slightly worse for binary search: 36.1 s (scan) vs 36.4 (bsearch) - Slightly more I/O for bsearch: 153'932 reqs / 19'703'260 KiB (scan) vs 155'651 reqs / 19'704'088 KiB (bsearch) Binary search reads more by 828 KB and by 1719 IOs. It does more I/O to read the the promoted index offset map. - similar (low) memory footprint. The danger here is that by caching index blocks which we touch as we scan we would end up caching the whole index. But this is protected against by eviction as demonstrated by the last "mem" column. Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-skips -c1 --test-case-duration=1 Before: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 36.103451 4 5000000 138491 38 138601 138453 153932.0 153932 19703260 153561 1 0 1 1 0 0 0 31.5% 502690 After (v2): read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 37.000145 4 5000000 135135 6 135146 135128 155651.0 155651 19704088 138968 0 0 1 1 0 0 0 34.2% 0 After (v1): read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 36.965520 4 5000000 135261 30 135311 135231 155628.0 155628 19704216 139133 1 0 1 1 0 0 0 33.9% 248738 Also in: git@github.com:tgrabiec/scylla.git sstable-use-index-offset-map-v2 Tests: - unit (all modes) - manual using perf_fast_forward " * tag 'sstable-use-index-offset-map-v2' of github.com:tgrabiec/scylla: sstables: Add promoted index cache metrics position_in_partition: Introduce external_memory_usage() cached_file, sstables: Add tracing to index binary search and page cache sstables: Dynamically adjust I/O size for index reads sstables, tests: Allow disabling binary search in promoted index from perf tests sstables: mc: Use binary search over the promoted index utils: Introduce cached_file sstables: clustered_index: Relax scope of validity of entry_info sstables: index_entry: Introduce owning promoted_index_block_position compound_compat: Allow constructing composite from a view sstables: index_entry: Rename promoted_index_block_position to promoted_index_block_position_view sstables: mc: Extract parser for promoted index block sstables: mc: Extract parser for clustering out of the promoted index block parser sstables: consumer: Extract primitive_consumer sstables: Abstract the clustering index cursor behavior sstables: index_reader: Rearrange to reduce branching and optionals	2020-06-18 12:09:39 +03:00
Pekka Enberg	9e279ec2a9	build: Add "dist-check" target This adds a "dist-check" target to ninja build. The target needs to be run on the host because package verification is done with Docker.	2020-06-18 10:20:08 +03:00
Pekka Enberg	8e1a561fba	build: Add "dist" target	2020-06-18 10:16:46 +03:00
Pekka Enberg	013f87f388	build: Add "-ffile-prefix-map" to cxxflags This patch adds "-ffile-prefix-map" to cxxflags for all build modes. This has two benefits: 1, Relocatable packages no longer have any special build flags, which makes deeper integration with the build system possible (e.g. targets for packages). 2 Builds are now reproducible, which makes debugging easier in case you only have a backtrace, but no artifacts. Rafael explains: "BTW, I think I found another argument for why we should always build with -ffile-prefix-map=. There was user after free test failure on next promotion. I am unable to reproduce it locally, so it would be super nice to be able to decode the backtrace. I was able to do it, but I had to create a /jenkins/workspace/scylla-master/next/ directory and build from there to get the same results as the bot." Acked-by: Botond Dénes <bdenes@scylladb.com> Acked-by: Nadav Har'El <nyh@scylladb.com> Acked-by: Rafael Avila de Espindola <espindola@scylladb.com>	2020-06-18 09:54:37 +03:00
Tomasz Grabiec	c95dd67d11	utils: Introduce cached_file It is a read-through cache of a file. Will be used to cache contents of the promoted index area from the index file. Currently, cached pages are evicted manually using the invalidate_*() method family, or when the object is destroyed. The cached_file represents a subset of the file. The reason for this is to satisfy two requirements. One is that we have a page-aligned caching, where pages are aligned relative to the start of the underlying file. This matches requirements of the seastar I/O engine on I/O requests. Another requirement is to have an effective way to populate the cache using an unaligned buffer which starts in the middle of the file when we know that we won't need to access bytes located before the buffer's position. See populate_front(). If we couldn't assume that, we wouldn't be able to insert an unaligned buffer into the cache.	2020-06-16 16:15:23 +02:00
Pavel Solodovnikov	6028588148	transport: introduce `cql_protocol_extension` enum and cql protocol extensions negotiation The patch introduces two new features to aid with negotiating protocol extensions for the CQL protocol: - `cql_protocol_extensions` enum, which holds all supported extensions for the CQL protocol (currently contains only `LWT_ADD_METADATA_MARK` extension, which will be mentioned below). - An additional mechainsm of negotiating cql protocol extensions to be used in a client connection between a scylla server and a client driver. These extensions are propagated in SUPPORTED message sent from the server side with "SCYLLA_" prefix and received back as a response from the client driver in order to determine intersection between the cql extensions that are both supported by the server and acknowledged by a client driver. This intersection of features is later determined to be a working set of cql protocol extensions in use for the current `client_state`, which is associated with a particular client connection. This way we can easily settle on the used extensions set on both sides of the connection. Currently there is only one value: `LWT_ADD_METADATA_MARK`, which regulates whether to set a designated bit in prepared statement metadata indicating if the statement at hand is an lwt statement or not (actual implementation for the feature will be in a later patch). Each extension can also propagate some custom parameters to the corresponding key. CQL protocol specification allows to send a list of values with each key in the SUPPORTED message, we use that to pass parameters to extensions as `PARAM=VALUE` strings. In case of `LWT_ADD_METADATA_MARK` it's `SCYLLA_LWT_OPTIMIZATION_META_BIT_MASK` which designates the bitmask for LWT flag in prepared statement metadata in order to be used for lookup in a client library. The associated bits of code in `cql3::prepared_metadata` are adjusted to accomodate the feature. The value for the flag is chosen on purpose to be the last bit in the flags bitset since we don't want to possibly clash with C* implementation in case they add more possible flag values to prepared metadata (though there is an issue regarding that: https://issues.apache.org/jira/browse/CASSANDRA-15746). If it's fixed in upstream Cassandra, then we could synchronize the value for the flag with them. Also extend the underlying type of `flag` enum in `cql3::prepared_metadata` to be `uint32_t` instead of `uint8_t` because in either case flags mask is serialized as 32-bit integer. In theory, shard-awareness extension support also should be reworked in terms of provided minimal infrastructure, but for the sake of simplicity, this is left to be done in a follow-up some time later. This solution eliminates the need to assume that all the client drivers follow the CQL spec carefully because scylla-specific features and protocol extensions could be enabled only in case both server and client driver negotiate the supported feature set. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-06-16 11:35:52 +03:00
Avi Kivity	d17b05e911	Merge 'Adding Optimized pseudo floating point estimated histogram' from Amnon " This series Adds a pseudo-floating-point histogram implementation. The histogram is used for time_estimated_histogram a histogram for latency tracking and then used in storage_proxy as a more efficient with a higher resolution histogram. Follow up series would use the new histogram in other places in the system and will add an implementation that supports lower values. Fixes #5815 Fixes #4746 " * amnonh-quicker_estimated_histogram: storage_proxy: use time_estimated_histogram for latencies test/boost/estimated_histogram_test utils/histogram_metrics_helper Adding histogram converter utils/estimated_histogram: Adding approx_exponential_histogram	2020-06-15 10:19:36 +03:00
Avi Kivity	493d16e800	build: fix --enable-dpdk/--disable-dpdk configure switch `5ceb20c439` switched --enable-dpdk to a tristate switch, but forgot that add_tristate() prepends --enable and --disable itself; so now the switch looks like --enable-enable-dpdk and --disable-enable-dpdk. Fix by removing the "enable-" prefix.	2020-06-15 09:37:45 +03:00
Amnon Heiman	1cbc2e3d3e	test/boost/estimated_histogram_test This patch adds basic testing for the approx_exponential_histogram implementations. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:57 +03:00
Avi Kivity	4220ed849b	Merge "Use abseil's hash map in a couple places" from Rafael " This is part of the work for replacing global sstring variables with constexpr std::string_view ones. To have std::string_view values we have to convert a few APIs to take std::string_view instead of sstring references. The API conversions are complicated by the fact that std::unordered_map doesn't support heterogeneous lookup, so we need another hash map. The one provided by abseil seems like a natural choice since it has an API that looks like what is being proposed for c++ (http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1690r0.html) but is also much faster. A nice side effect is that this series is a 0.46% win in perf_simple_query --duration 16 --smp 1 -m4G Over 500 runs with randomized section layout and environment on each run. " * 'espindola/absl-v10' of https://github.com/espindola/scylla: database: Use a flat_hash_map for _ks_cf_to_uuid database: Use flat_hash_map for _keyspaces Add absl wrapper headers build: Link with abseil cofigure: Don't overwrite seastar_cflags Add abseil as a submodule	2020-06-14 18:26:59 +03:00
Rafael Ávila de Espíndola	dd0d4ae217	Add absl wrapper headers Using these instead of using the absl headers directly adds support for heterogeneous lookup with sstring as key. The is no gain from having the hash function inline, so this implements it in a .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	7d1f6725dd	build: Link with abseil It is a pity we have to list so many libraries, but abseil doesn't provide a .pc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	2ad09aefb6	cofigure: Don't overwrite seastar_cflags The variable seastar_cflags was being used for flags passed to seastar and for flags extracted from the seastar.pc file. This introduces a new variable for the flags extracted from the seastar.pc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Avi Kivity	76d082c2b2	Merge "Decouple client services from storage_service" from Pavel E " The cql_server and thrift are "owned" by storage_service for the sake of managing those, i.e. starting and stopping. Since other services (still) need the storage_service this creates dependencies loops. This set makes the client services independent from the storage service. As a consequence of it the auth service is also removed from storage_service and put standalone. This, in turn, sets some tests free from the need to start and stop auth and makes one step towards NOT join_cluster()-ing in unit tests. Also the set fixes few wierd races on scylla start and stop that can trigger local_is_initialized() asserts, and one case of unclear aborted shutdown when client services remain running till the scylla process exit. Yet another benefit is localization of "isolating" functionality that sits deeper in storage_service than it should. One thing that's not completely clean after it is the need for cql server to continue referencing the service_memory_limiter semaphore from the storage_service, but this will go away with one of the next sets. tests: unit(debug), manual start-stop, nodetool check of cql/thrift start/stop " * 'br-split-transport-1' of https://github.com/xemul/scylla: storage_service: Isolate isolator auth: Move away from storage_service auth: Move start-stop code into main main: Don't forget to stop cql/thrift when start is aborted thrift_controller: Switch on standalone thrift_controller: Pass one through management API thrift_controller: Move the code into thrift/ thrift_controller: Introduce own lock for management thrift: Wrap start/stop/is_running code into a class cql_controller: Switch on standalone cql_controller: Pass one through management API cql_controller: Move the code into transport/ cql_controller: Introduce own lock for management cql: Wrap start/stop/is_running code into a class api: Tune reg/unreg of client services control endpoints	2020-06-14 13:49:23 +03:00
Glauber Costa	b0a0c207c3	twcs: move implementations to its own file LCS and SCTS already have their own files, reducing the clutter in compaction_strategy.cc. Do the same for TWCS. I am doing this in preparation to add more functions. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200611230906.409023-6-glauber@scylladb.com>	2020-06-14 11:50:08 +03:00

1 2 3 4 5 ...

1204 Commits