scylladb

Author	SHA1	Message	Date
Yaron Kaikov	4ae9a56466	release: prepare for 4.0.11	2020-10-26 18:12:47 +02:00
Avi Kivity	0374c1d040	Update seastar submodule * seastar 065a40b34a...748428930a (1): > append_challenged_posix_file_impl: allow destructing file with no queued work Fixes #7285.	2020-10-19 15:06:24 +03:00
Botond Dénes	9cb0fe3b33	reader_permit: reader_resources: make true RAII class Currently in all cases we first deduct the to-be-consumed resources, then construct the `reader_resources` class to protect it (release it on destruction). This is error prone as it relies on no exception being thrown while constructing the `reader_resources`. Albeit the `reader_resources` constructor is `noexcept` right now this might change in the future and as the call sites relying on this are disconnected from the declaration, the one modifying them might not notice. To make this safe going forward, make the `reader_resources` a true RAII class, consuming the units in its constructor and releasing them in its destructor. Refs: #7256 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200922150625.1253798-1-bdenes@scylladb.com> (cherry picked from commit `a0107ba1c6`) Message-Id: <20200924081408.236353-1-bdenes@scylladb.com>	2020-10-19 15:05:13 +03:00
Takuya ASADA	a813ff4da2	install.sh: set LC_ALL=en_US.UTF-8 on python3 thunk scylla-python3 causes segfault when non-default locale specified. As workaround for this, we need to set LC_ALL=en_US.UTF_8 on python3 thunk. Fixes #7408 Closes #7414 (cherry picked from commit `ff129ee030`)	2020-10-18 15:03:04 +03:00
Avi Kivity	d5936147f4	Merge "materialized views: Fix undefined behavior on base table schema changes" from Tomasz " The view_info object, which is attached to the schema object of the view, contains a data structure called "base_non_pk_columns_in_view_pk". This data structure contains column ids of the base table so is valid only for a particular version of the base table schema. This data structure is used by materialized view code to interpret mutations of the base table, those coming from base table writes, or reads of the base table done as part of view updates or view building. The base table schema version of that data structure must match the schema version of the mutation fragments, otherwise we hit undefined behavior. This may include aborts, exceptions, segfaults, or data corruption (e.g. writes landing in the wrong column in the view). Before this patch, we could get schema version mismatch here after the base table was altered. That's because the view schema did not change when the base table was altered. Another problem was that view building was using the current table's schema to interpret the fragments and invoke view building. That's incorrect for two reasons. First, fragments generated by a reader must be accessed only using the reader's schema. Second, base_non_pk_columns_in_view_pk of the recorded view ptrs may not longer match the current base table schema, which is used to generate the view updates. Part of the fix is to extract base_non_pk_columns_in_view_pk into a third entity called base_dependent_view_info, which changes both on base table schema changes and view schema changes. It is managed by a shared pointer so that we can take immutable snapshots of it, just like with schema_ptr. When starting the view update, the base table schema_ptr and the corresponding base_dependent_view_info have to match. So we must obtain them atomically, and base_dependent_view_info cannot change during update. Also, whenever the base table schema changes, we must update base_dependent_view_infos of all attached views (atomically) so that it matches the base table schema. Fixes #7061. Tests: - unit (dev) - [v1] manual (reproduced using scylla binary and cqlsh) " * tag 'mv-schema-mismatch-fix-v2' of github.com:tgrabiec/scylla: db: view: Refactor view_info::initialize_base_dependent_fields() tests: mv: Test dropping columns from base table db: view: Fix incorrect schema access during view building after base table schema changes schema: Call on_internal_error() when out of range id is passed to column_at() db: views: Fix undefined behavior on base table schema changes db: views: Introduce has_base_non_pk_columns_in_view_pk() (cherry picked from commit `3daa49f098`)	2020-10-06 17:12:28 +03:00
Juliusz Stasiewicz	a3d3b4e185	tracing: Fix error on slow batches `trace_keyspace_helper::make_slow_query_mutation_data` expected a "query" key in its parameters, which does not appear in case of e.g. batches of prepared statements. This is example of failing `record.parameters`: ``` ...{"query[0]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}, {"query[1]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}... ``` In such case Scylla recorded no trace and said: ``` ERROR 2020-09-28 10:09:36,696 [shard 3] trace_keyspace_helper - No "query" parameter set for a session requesting a slow_query_log record ``` Fix here is to leave query empty if not found. The users can still retrieve the query contents from existing info. Fixes #5843 Closes #7293 (cherry picked from commit `0afa738a8f`)	2020-10-04 18:05:00 +03:00
Tomasz Grabiec	4ca2576c98	Merge "evictable_reader: validate buffer on reader recreation" from Botond This series backports the evictable reader validation patchset (merged as `97c99ea9f` to master) to 4.1. I only had to do changes to the tests. Tests: unit(dev), some exception safety tests are failing with or without my patchset Fixes: #7208 * https://github.com/denesb/scylla.git denesb/evictable-reader-validate-buffer/backport-4.1: mutation_reader_test: add unit test for evictable reader self-validation evictable_reader: validate buffer after recreation the underlying evictable_reader: update_next_position(): only use peek'd position on partition boundary mutation_reader_test: add unit test for evictable reader range tombstone trimming evictable_reader: trim range tombstones to the read clustering range position_in_partition_view: add position_in_partition_view before_key() overload flat_mutation_reader: add buffer() accessor (cherry picked from commit `7f3ffbc1c8`)	2020-10-02 11:52:57 +02:00
Tomasz Grabiec	e99a0c7b89	schema: Fix race in schema version recalculation leading to stale schema version in gossip Migration manager installs several feature change listeners: if (this_shard_id() == 0) { _feature_listeners.push_back(_feat.cluster_supports_view_virtual_columns().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_digest_insensitive_to_expiry().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_cdc().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_per_table_partitioners().when_enabled(update_schema)); } They will call update_schema_version_and_announce() when features are enabled, which does this: return update_schema_version(proxy, features).then([] (utils::UUID uuid) { return announce_schema_version(uuid); }); So it first updates the schema version and then publishes it via gossip in announce_schema_version(). It is possible that the announce_schema_version() part of the first schema change will be deferred and will execute after the other four calls to update_schema_version_and_announce(). It will install the old schema version in gossip instead of the more recent one. The fix is to serialize schema digest calculation and publishing. Fixes #7200 (cherry picked from commit `1a57d641d1`)	2020-10-01 18:18:53 +02:00
Yaron Kaikov	f8c7605657	release: prepare for 4.0.10	2020-09-28 20:33:24 +03:00
Avi Kivity	7b9e33dcd4	Update seastar submodule * seastar e87ce4941c...065a40b34a (1): > lz4_fragmented_compressor: Fix buffer requirements Fixes #6925.	2020-09-23 12:07:11 +03:00
Yaron Kaikov	d86a31097a	release: prepare for 4.0.9	2020-09-17 14:24:32 +03:00
Nadav Har'El	bd9d6f8e45	alternator: fix corruption of PutItem operation in case of contention This patch fixes a bug noted in issue #7218 - where PutItem operations sometimes lose part of the item's data - some attributes were lost, and the name of other attributes replaced by empty strings. The problem happened when the write-isolation policy was LWT and there was contention of writes to the same partition (not necessarily the same item). To use CAS (a.k.a. LWT), Alternator builds an alternator::rmw_operation object with an apply() function which takes the old contents of the item (if needed) and a timestamp, and builds a mutation that the CAS should apply. In the case of the PutItem operation, we wrongly assumed that apply() will be called only once - so as an optimization the strings saved in the put_item_operation were moved into the returned mutation. But this optimization is wrong - when there is contention, apply() may be called again when the changed proposed by the previous one was not accepted by the Paxos protocol. The fix is to change the one place where put_item_operation moved strings out of the saved operations into the mutations, to be a copy. But to prevent this sort of bug from reoccuring in future code, this patch enlists the compiler to help us verify that it can't happen: The apply() function is marked "const" - it can use the information in the operation to build the mutation, but it can never modify this information or move things out of it, so it will be fine to call this function twice. The single output field that apply() does write (_return_attributes) is marked "mutable" to allow the const apply() to write to it anyway. Because apply() might be called twice, it is important that if some apply() implementation sometimes sets _return_attributes, then it must always set it (even if to the default, empty, value) on every call to apply(). The const apply() means that the compiler verfies for us that I didn't forget to fix additional wrong std::move()s. Additionally, a test I wrote to easily reproduce issue #7218 (which I will submit as a dtest later) passes after this fix. Fixes #7218. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200916064906.333420-1-nyh@scylladb.com> (cherry picked from commit `5e8bdf6877`)	2020-09-16 23:05:23 +03:00
Benny Halevy	11ef23e97a	test: cql_query_test: test_cache_bypass: use table stats test is currently flaky since system reads can happen in the background and disturb the global row cache stats. Use the table's row_cache stats instead. Fixes #6773 Test: cql_query_test.test_cache_bypass(dev, debug) Credit-to: Botond Dénes <bdenes@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200811140521.421813-1-bhalevy@scylladb.com> (cherry picked from commit `6deba1d0b4`)	2020-09-16 18:20:30 +03:00
Asias He	2c0eac09ae	migration_manager: Make sync_schema return error when node is down sync_schema is supposed to make sure that this node knows about all schema changes known by "nodes" that were made prior to this call. Currently, when a node is down, the sync is sliently skipped. To fix, add a flag to migration_task::run_may_throw to indicate that it should fail if a node is down. Fixes #4791 (cherry picked from commit `7ba821cbc0`)	2020-09-16 16:01:44 +03:00
Dejan Mircevski	713a7269d0	cql3: Fix NULL reference in get_column_defs_for_filtering There was a typo in get_column_defs_for_filtering(): it checked the wrong pointer before dereferencing. Add a test exposing the NULL dereference and fix the typo. Tests: unit (dev) Fixes #7198. Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `9d02f10c71`)	2020-09-16 15:47:09 +03:00
Avi Kivity	1724301d4d	reconcilable_result_builder: don't aggrevate out-of-memory condition during recovery Consider an unpaged query that consumes all of available memory, despite `fea5067dfa` which limits them (perhaps the user raised the limit, or this is a system query). Eventually we will see a bad_alloc which will abort the query and destroy this reconcilable_result_builder. During destruction, we first destroy _memory_accounter, and then _result. Destroying _memory_accounter resumes some continuations which can then allocate memory synchronously when increasing the task queue to accomodate them. We will then crash. Had we not crashed, we would immediately afterwards release _result, freeing all the memory that we would ever need. Fix by making _result the last member, so it is freed first. Fixes #7240. (cherry picked from commit `9421cfded4`)	2020-09-16 15:41:10 +03:00
Avi Kivity	9971f2f5db	Merge "Fix repair stalls in get_sync_boundary and apply_rows_on_master_in_thread" from Asias " This path set fixes stalls in repair that are caused by std::list merge and clear operations during test_latency_read_with_nemesis test. Fixes #6940 Fixes #6975 Fixes #6976 " * 'fix_repair_list_stall_merge_clear_v2' of github.com:asias/scylla: repair: Fix stall in apply_rows_on_master_in_thread and apply_rows_on_follower repair: Use clear_gently in get_sync_boundary to avoid stall utils: Add clear_gently repair: Use merge_to_gently to merge two lists utils: Add merge_to_gently (cherry picked from commit `4547949420`)	2020-09-10 13:15:01 +03:00
Avi Kivity	ee328c22ca	repair: apply_rows_on_follower(): remove copy of repair_rows list We copy a list, which was reported to generate a 15ms stall. This is easily fixed by moving it instead, which is safe since this is the last use of the variable. Fixes #7115. (cherry picked from commit `6ff12b7f79`)	2020-09-10 11:53:55 +03:00
Avi Kivity	3a9c9a8a12	Update seastar submodule * seastar 861b7edd61...e87ce4941c (1): > core/reactor: complete_timers(): restore previous scheduling group Fixes #7184.	2020-09-07 11:28:55 +03:00
Raphael S. Carvalho	c03445871a	compaction: Prevent non-regular compaction from picking compacting SSTables After `8014c7124`, cleanup can potentially pick a compacting SSTable. Upgrade and scrub can also pick a compacting SSTable. The problem is that table::candidates_for_compaction() was badly named. It misleads the user into thinking that the SSTables returned are perfect candidates for compaction, but manager still need to filter out the compacting SSTables from the returned set. So it's being renamed. When the same SSTable is compacted in parallel, the strategy invariant can be broken like overlapping being introduced in LCS, and also some deletion failures as more than one compaction process would try to delete the same files. Let's fix scrub, cleanup and ugprade by calling the manager function which gets the correct candidates for compaction. Fixes #6938. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200811200135.25421-1-raphaelsc@scylladb.com> (cherry picked from commit `11df96718a`)	2020-09-06 18:41:12 +03:00
Takuya ASADA	565ac1b092	aws: update enhanced networking supported instance list Sync enhanced networking supported instance list to latest one. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Fixes #6991 (cherry picked from commit `7cccb018b8`)	2020-09-06 18:21:46 +03:00
Yaron Kaikov	7d1180b98f	release: prepare for 4.0.8	2020-08-30 09:42:34 +03:00
Piotr Sarna	f258e6f6ee	Merge 'counters: Fix filtering of counters' from Juliusz Queries with `ALLOW FILTERING` and constraints on counter values used to be rejected as "unimplemented". The reason was a missing tri-comparator, which is added in this patch. Fixes #5635 * jul-stas-5635-filtering-on-counters: cql/tests: Added test for filtering on counter columns counters: add comparator and remove `unimplemented` from restrictions (cherry picked from commit `c32faee657`)	2020-08-27 18:42:30 +03:00
Avi Kivity	2708b0d664	Merge "repair: row_level: prevent deadlocks when repairing homogenous nodes" from Botond " This series backports the series "repair: row_level: prevent deadlocks when repairing homogenous nodes" (merged as `a9c7a1a86`) to branch-4.1. " Fixes #6272 * 'repair-row-level-evictable-local-reader/branch-4.1' of https://github.com/denesb/scylla: repair: row_level: destroy reader on EOS or error repair: row_level: use evictable_reader for local reads mutation_reader: expose evictable_reader mutation_reader: evictable_reader: add auto_pause flag mutation_reader: make evictable_reader a flat_mutation_reader mutation_reader: s/inactive_shard_read/inactive_evictable_reader/ mutation_reader: move inactive_shard_reader code up mutation_reader: fix indentation mutation_reader: shard_reader: extract remote_reader as evictable_reader mutation_reader: reader_lifecycle_policy: make semaphore() available early (cherry picked from commit `59aa1834a7`)	2020-08-27 17:44:27 +03:00
Asias He	e31ffbf2e6	compaction_manager: Avoid stall in perform_cleanup The following stall was seen during a cleanup operation: scylla: Reactor stalled for 16262 ms on shard 4. \| std::_MakeUniq<locator::tokens_iterator_impl>::__single_object std::make_unique<locator::tokens_iterator_impl, locator::tokens_iterator_impl&>(locator::tokens_iterator_impl&) at /usr/include/fmt/format.h:1158 \| (inlined by) locator::token_metadata::tokens_iterator::tokens_iterator(locator::token_metadata::tokens_iterator const&) at ./locator/token_metadata.cc:1602 \| locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at simple_strategy.cc:? \| (inlined by) locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at ./locator/simple_strategy.cc:56 \| locator::abstract_replication_strategy::get_ranges(gms::inet_address, locator::token_metadata&) const at /usr/include/fmt/format.h:1158 \| locator::abstract_replication_strategy::get_ranges(gms::inet_address) const at /usr/include/fmt/format.h:1158 \| service::storage_service::get_ranges_for_endpoint(seastar::basic_sstring<char, unsigned int, 15u, true> const&, gms::inet_address const&) const at /usr/include/fmt/format.h:1158 \| service::storage_service::get_local_ranges(seastar::basic_sstring<char, unsigned int, 15u, true> const&) const at /usr/include/fmt/format.h:1158 \| (inlined by) operator() at ./sstables/compaction_manager.cc:691 \| (inlined by) _M_invoke at /usr/include/c++/9/bits/std_function.h:286 \| std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>::operator()(table const&) const at /usr/include/fmt/format.h:1158 \| (inlined by) compaction_manager::rewrite_sstables(table, sstables::compaction_options, std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>) at ./sstables/compaction_manager.cc:604 \| compaction_manager::perform_cleanup(table) at /usr/include/fmt/format.h:1158 To fix, we furturize the function to get local ranges and sstables. In addition, this patch removes the dependency to global storage_service object. Fixes #6662 (cherry picked from commit `07e253542d`)	2020-08-27 13:11:39 +03:00
Raphael S. Carvalho	801994e299	sstables: optimize procedure that checks if a sstable needs cleanup needs_cleanup() returns true if a sstable needs cleanup. Turns out it's very slow because it iterates through all the local ranges for all sstables in the set, making its complexity: O(num_sstables * local_ranges) We can optimize it by taking into account that abstract_replication_strategy documents that get_ranges() will return a list of ranges that is sorted and non-overlapping. Compaction for cleanup already takes advantage of that when checking if a given partition can be actually purged. So needs_cleanup() can be optimized into O(num_sstables * log(local_ranges)). With num_sstables=1000, RF=3, then local_ranges=256(num_tokens)*3, it means the max # of checks performed will go from 768000 to ~9584. Fixes #6730. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200629171355.45118-2-raphaelsc@scylladb.com> (cherry picked from commit `cf352e7c14`)	2020-08-27 13:11:37 +03:00
Raphael S. Carvalho	3b932078bf	sstables: export needs_cleanup() May be needed elsewhere, like in an unit test. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200629171355.45118-1-raphaelsc@scylladb.com> (cherry picked from commit `a9eebdc778`)	2020-08-27 13:11:24 +03:00
Asias He	608f62a0e9	abstract_replication_strategy: Add get_ranges_in_thread Add a version that runs inside a seastar thread. The benefit is that get_ranges can yield to avoid stalls. Refs #6662 (cherry picked from commit `94995acedb`)	2020-08-27 13:10:32 +03:00
Asias He	d8619d3320	abstract_replication_strategy: Add get_ranges which takes token_metadata It is useful when the caller wants to calculate ranges using a custom token_metadata. It will be used soon in do_rebuild_replace_with_repair for replace operation. Refs: #5482 (cherry picked from commit `b640614aa6`)	2020-08-27 13:10:26 +03:00
Asias He	4f0c99a187	gossip: Fix race between shutdown message handler and apply_state_locally 1. The node1 is shutdown 2. The node1 sends shutdown message to node2 3. The node2 receives gossip shutdown message but the handler yields 4. The node1 is restarted 5. The node1 sends new gossip endpoint_state to node2, node2 applies the state in apply_state_locally and calls gossiper::handle_major_state_change and then calls gossiper::mark_alive 6. The shutdown message handler in step 3 resumes and sets status of node1 to SHUTDOWN 7. The gossiper::mark_alive fiber in step 5 resumes and calls gossiper::real_mark_alive, node2 will skip to mark node1 as alive because the status of node1 is SHUTDOWN. As a result, node1 is alive but it is not marked as UP by node2. To fix, we serialize the two operations. Fixes #7032 (cherry picked from commit `e6ceec1685`)	2020-08-27 11:16:10 +03:00
Nadav Har'El	ada79df082	alternator test: configurable temporary directory The test/alternator/run script creates a temporary directory for the Scylla database in /tmp. The assumption was that this is the fastest disk (usually even a ramdisk) on the test machine, and we didn't need anything else from it. But it turns out that on some systems, /tmp is actually a slow disk, so this patch adds a way to configure the temporary directory - if the TMPDIR environment variable exists, it is used instead of /tmp. As before this patch, a temporary subdirectry is created in $TMPDIR, and this subdirectory is automatically deleted when the test ends. The test.py script already passes an appropriate TMPDIR (testlog/$mode), which after this patch the Alternator test will use instead of /tmp. Fixes #6750 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200713193023.788634-1-nyh@scylladb.com> (cherry picked from commit `8e3be5e7d6`)	2020-08-26 19:48:45 +03:00
Nadav Har'El	1935f2b480	alternator: fix order conditions on binary attributes We implemented the order operators (LT, GT, LE, GE, BETWEEN) incorrectly for binary attributes: DynamoDB requires that the bytes be treated as unsigned for the purpose of order (so byte 128 is higher than 127), but our implementation uses Scylla's "bytes" type which has signed bytes. The solution is simple - we can continue to use the "bytes" type, but we need to use its compare_unsigned() function, not its "<" operator. This bug affected conditional operations ("Expected" and "ConditionExpression") and also filters ("QueryFilter", "ScanFilter", "FilterExpression"). The bug did not affect Query's key conditions ("KeyConditions", "KeyConditionExpression") because those already used Scylla's key comparison functions - which correctly compare binary blobs as unsigned bytes (in fact, this is why we have the compare_unsigned() function). The patch also adds tests that reproduce the bugs in conditional operations, and show that the bug did not exist in key conditions. Fixes #6573 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200603084257.394136-1-nyh@scylladb.com> (cherry picked from commit `f6b1f45d69`) Manyally removed tests in test_key_conditions.py which didn't exist in this branch.	2020-08-26 19:28:47 +03:00
Avi Kivity	44a76ed231	Merge "Unregister RPC verbs on stop" from Pavel E " There are 5 services, that register their RPC handlers in messaging service, but quite a few of them unregister them on stop. Unregistering is somewhat critical, not just because it makes the code look clean, but also because unregistration does wait for the message processing to complete, thus avoiding use-after-free's in the handlers. In particular, several handlers call service::get_schema_for_write() which, in turn, may end up in service::maybe_sync() calling for the local migration manager instance. All those handlers' processing must be waited for before stopping the migration manager. The set brings the RPC handlers unregistration in sync with the registration part. tests: unit (dev) dtest (dev: simple_boot_shutdown, repair) start-stop by hands (dev) fixes: #6904 " * 'br-rpc-unregister-verbs' of https://github.com/xemul/scylla: main: Add missing calls to unregister RPC hanlers messaging: Add missing per-service unregistering methods messaging: Add missing handlers unregistration helpers streaming: Do not use db->invoke_on_all in vain storage_proxy: Detach rpc unregistration from stop main: Shorten call to storage_proxy::init_messaging_service (cherry picked from commit `01b838e291`)	2020-08-26 14:42:40 +03:00
Raphael S. Carvalho	aeb49f4915	cql3/statements: verify that counter column cannot be added into non-counter table A check, to validate that counter column cannot be added into non-counter table, is missing for alter table statement. Validation is performed when building new schema, but it's limited to checking that a schema will not contain both counter and non-counter columns. Due to lack of validation, the added counter column could be incorrectly persisted to the schema, but this results in a crash when setting the new schema to its table. On restart, it can be confirmed that the schema change was indeed persisted when describing the table. This problem is fixed by doing proper validation for the alter table statement, which consists of making sure a new counter column cannot be added to a non-counter table. The test cdc_disallow_cdc_for_counters_test is adjusted because one of its tests was built on the assumption that counter column can be added into a non-counter table. Fixes #7065. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200824155709.34743-1-raphaelsc@scylladb.com> (cherry picked from commit `1c29f0a43d`)	2020-08-25 18:46:01 +03:00
Gleb Natapov	8d6b35ad20	lwt: fix possible leak of "prune" counter If get_schema_for_read() fails "prune" counter will not be decremented. The patch fixes it by creating RAI object earlier. Also return releasing of a mutation in release_mutation() which was dropped by mistake. Fixes #6124 Message-Id: <20200405080233.GA22509@scylladb.com> (cherry picked from commit `e5f7ccc4c8`)	2020-08-23 19:29:06 +03:00
Takuya ASADA	b123700ebe	dist/debian: disable debuginfo compression on .deb Since older binutils on some distribution does not able to handle compressed debuginfo generated on Fedora, we need to disable it. However, debian packager force debuginfo compression since debian/compat = 9, we have to uncompress them after compressed automatically. Fixes #6982 (cherry picked from commit `75c2362c95`)	2020-08-23 19:03:13 +03:00
Botond Dénes	6786b521f9	scylla-gdb.py: find_db(): don't return current shard's database for shard=0 The `shard` parameter of `find_db()` is optional and is defaulted to `None`. When missing, the current shard's database instance is returned. The problem is that the if condition checking this uses `not shard`, which also evaluates to `True` if `shard == 0`, resulting in returning the current shard's database instance for shard 0. Change the condition to `shard is None` to avoid this. Fixes: #7016 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200812091546.1704016-1-bdenes@scylladb.com> (cherry picked from commit `4cfab59eb1`)	2020-08-23 18:56:39 +03:00
Botond Dénes	fda0d1ae8e	table: get_sstables_by_partition_key(): don't make a copy of selected sstables Currently we assign the reference to the vector of selected sstables to `auto sst`. This makes a copy and we pass this local variable to `do_for_each()`, which will result in a use-after-free if the latter defers. Fix by not making a copy and instead just keep the reference. Fixes: #7060 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200818091241.2341332-1-bdenes@scylladb.com> (cherry picked from commit `78f94ba36a`)	2020-08-19 00:02:22 +03:00
Yaron Kaikov	e7cffb978a	release: prepare for 4.0.7	2020-08-17 00:38:43 +03:00
Benny Halevy	79a1c74921	db::commitlog: close file if wrapping failed When I/O error (e.g. EMFILE / ENOSPC) happens we hit an assert in ~append_challenged_posix_file_impl(): Assertion _closing_state == state::closed' failed. Commit `6160b9017d` add close on failure of the lamda defined in allocate_segment_ex, but it doesn't handle an error after the file is opened/created while it is wrapped with commitlog_file_extensions. Refs #5657 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Calle Wilund <calle@scylladb.com> Message-Id: <20200414115231.298632-1-bhalevy@scylladb.com> (cherry picked from commit `35892e4557`)	2020-08-16 19:58:23 +03:00
Calle Wilund	3ee854f9fc	cdc::log: Missing "preimage" check in row deletion pre-image Fixes #6561 Pre-image generation in row deletion case only checked if we had a pre-image result set row. But that can be from post-image. Also check actual existance of the pre-image CK. Message-Id: <20200608132804.23541-1-calle@scylladb.com> (cherry picked from commit `5105e9f5e1`)	2020-08-12 13:55:10 +03:00
Avi Kivity	2b65984d14	Merge "Fix GCC-10 related bugs and fix deletion of temporary garbage-collected sstables" from Raphael " Temporary garbage-collected SSTables, involved in the incremental compaction process which can be enabled for LCS, were incorrectly invalidating the cache when added to the set of SSTables. Also, those same temporary SSTables could be incorrectly removed, causing deletion warnings. The patchset "Don't invalidate row cache when adding GC SSTable" fixes those two issues by using the SSTable replacement mechanism, which is the correct method for replacing SSTables in the set. " * 'backport_fix_issue_6275_for_branch_4_0' of github.com:raphaelsc/scylla: row_cache_alloc_stress_test: Make sure GCC can't delete a new tests: Wait for a few futures sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set sstables/compaction: Change meaning of compaction_completion_desc input and output fields sstables/compaction: Clean up code around garbage_collected_sstable_writer compaction: enhance compaction_descriptor with creator and replace function	2020-08-11 18:16:41 +03:00
Nadav Har'El	52d1099d09	Update Seastar submodule > http: add "Expect: 100-continue" handling Fixes #6844	2020-08-11 13:33:45 +03:00
Asias He	3a03906377	repair: Switch to btree_set for repair_hash. In one of the longevity tests, we observed 1.3s reactor stall which came from repair_meta::get_full_row_hashes_source_op. It traced back to a call to std::unordered_set::insert() which triggered big memory allocation and reclaim. I measured std::unordered_set, absl::flat_hash_set, absl::node_hash_set and absl::btree_set. The absl::btree_set was the only one that seastar oversized allocation checker did not warn in my tests where around 300K repair hashes were inserted into the container. - unordered_set: hash_sets=295634, time=333029199 ns - flat_hash_set: hash_sets=295634, time=312484711 ns - node_hash_set: hash_sets=295634, time=346195835 ns - btree_set: hash_sets=295634, time=341379801 ns The btree_set is a bit slower than unordered_set but it does not have huge memory allocation. I do not measure real difference of total time to finish repair of the same dataset with unordered_set and btree_set. To fix, switch to absl btree_set container. Fixes #6190 (cherry picked from commit `67f6da6466`) (cherry picked from commit `a27188886a`)	2020-08-11 12:35:34 +03:00
Rafael Ávila de Espíndola	2395a240b4	build: Link with abseil It is a pity we have to list so many libraries, but abseil doesn't provide a .pc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> (cherry picked from commit `7d1f6725dd`) Ref #6190.	2020-08-11 12:35:32 +03:00
Rafael Ávila de Espíndola	d182c595a1	Add abseil as a submodule This adds the https://abseil.io library as a submodule. The patch series that follows needs a hash table that supports heterogeneous lookup, and abseil has a really good hash table that supports that (https://abseil.io/blog/20180927-swisstables). The library is still not available in Fedora, but it is fairly easy to use it directly from a submodule. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> (cherry picked from commit `383a9c6da9`) Ref #6190	2020-08-11 12:35:31 +03:00
Rafael Ávila de Espíndola	fe9c4611b3	cofigure: Don't overwrite seastar_cflags The variable seastar_cflags was being used for flags passed to seastar and for flags extracted from the seastar.pc file. This introduces a new variable for the flags extracted from the seastar.pc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> (cherry picked from commit `2ad09aefb6`) Ref #6190.	2020-08-11 12:35:28 +03:00
Calle Wilund	29df416720	database: Do not assert on replay positions if truncate does not flush Fixes #6995 In `c2c6c71` the assert on replay positions in flushed sstables discarded by truncate was broken, by the fact that we no longer flush all sstables unless auto snapshot is enabled. This means the low_mark assertion does not hold, because we maybe/probably never got around to creating the sstables that would hold said mark. Note that the (old) change to not create sstables and then just delete them is in itself good. But in that case we should not try to verify the rp mark. (cherry picked from commit `9620755c7f`)	2020-08-10 23:28:00 +03:00
Nadav Har'El	1d3c00572c	Update Seastar submodule with some backported fixes Fixes #7008 > futures_test: Don't use * on an optional without a value > net: Use offsetof instead of accessing a null pointer > allocator_test: Avoid undefined conversion > http: Don't use moved value > circular_buffer_fixed_capacity_test: Fix indentation > circular_buffer_fixed_capacity: Always mask indexes > rpc: Fix a use-after-return	2020-08-10 20:39:35 +03:00
Avi Kivity	9d6e2c5a71	Update seastar submodule * seastar 4ee384e15f...2dbd81d5db (1): > memory: fix small aligned free memory corruption Fixes #6831	2020-08-09 18:39:01 +03:00
Pavel Emelyanov	386741e3b7	storage_proxy_stats: Make get_ep_stat() noexcept The .get_ep_stat(ep) call can throw when registering metrics (we have issue for it, #5697). This is not expected by it callers, in particular abstract_write_response_handler::timeout_cb breaks in the middle and doesn't call the on_timeout() and the _proxy->remove_response_handler(), which results in not removed and not released responce handler. In turn not released response handler doesn't set the _ready future on which response_wait() waits -> stuck. Although the issue with .get_ep_stat() should be fixed, an exception in it mustn't lead to deadlocks, so the fix is to make the get_ep_stat() noexcept by catching the exception and returning a dummy stat object instead to let caller(s) finish. Fixes #5985 Tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200430163639.5242-1-xemul@scylladb.com> (cherry picked from commit `513ce1e6a5`)	2020-08-09 18:18:50 +03:00
Avi Kivity	d0fdc3960a	Merge 'hinted handoff: fix commitlog memory leak' from Piotr D " When commitlog is recreated in hints manager, only shutdown() method is called, but not release(). Because of that, some internal commitlog objects (`segment_manager` and `segment`s) may be left pointing to each other through shared_ptr reference cycles, which may result in memory leak when the parent commitlog object is destroyed. This PR prevents memory leaks that may happen this way by calling release() after shutdown() from the hints manager. Fixes: #6409, Fixes #6776 " * piodul-fix-commitlog-memory-leak-in-hinted-handoff: hinted handoff: disable warnings about segments left on disk hinted handoff: release memory on commitlog termination (cherry picked from commit `4c221855a1`)	2020-08-09 17:26:17 +03:00
Tomasz Grabiec	4035cf4f9f	thrift: Fix crash on unsorted column names in SlicePredicate The column names in SlicePredicate can be passed in arbitrary order. We converted them to clustering ranges in read_command preserving the original order. As a result, the clustering ranges in read command may appear out of order. This violates storage engine's assumptions and lead to undefined behavior. It was seen manifesting as a SIGSEGV or an abort in sstable reader when executing a get_slice() thrift verb: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed. Fixes #6486. Tests: - added a new dtest to thrift_tests.py which reproduces the problem Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `bfd129cffe`)	2020-08-08 19:48:46 +03:00
Rafael Ávila de Espíndola	09367742b1	row_cache_alloc_stress_test: Make sure GCC can't delete a new We want to test that a std::bad_alloc is thrown, but GCC 10 has a new optimization (-fallocation-dce) that removes dead allocations. This patch assigns the value returned by new to a global so that GCC cannot delete it. With this all tests in a dev build pass with GCC 10. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200424201531.225807-1-espindola@scylladb.com> (cherry picked from commit `0d89bbd57f`)	2020-08-07 16:49:33 -03:00
Rafael Ávila de Espíndola	a18ff57b29	tests: Wait for a few futures GCC 10 now warns on these. This fixes the dev build with gcc 10. backport note: remove unneeded change which is not compatible with the branch in error_injection_test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200424161006.17857-1-espindola@scylladb.com> (cherry picked from commit `543a9ebd9b`)	2020-08-07 16:32:12 -03:00
Raphael S. Carvalho	4734ba21a7	sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set Garbage collected SSTable is incorrectly added to SSTable set with a function that invalidates row cache. This problem is fixed by adding GC SStable to set using mechanism which replaces old sstables with new sstables. Also, adding GC SSTable to set in a separate call is not correct. We should make sure that GC SSTable reaches the SSTable set at the same time its respective old (input) SSTable is removed from the set, and that's done using a single request call to table. Fixes #5956. Fixes #6275. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `a214ccdf89`)	2020-08-06 19:08:46 -03:00
Raphael S. Carvalho	425af4c543	sstables/compaction: Change meaning of compaction_completion_desc input and output fields input_sstables is renamed to old_sstables and is about old SSTables that should be deleted and removed from the SSTable set. output_sstables is renamed to new_sstables and is about new SSTable that should be added to the SSTable set, replacing the old ones. This will allow us, for example, to add auxiliary SSTables to SSTable set using the same call which replaces output SSTables by input SSTables in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `8f4458f1d5`)	2020-08-06 18:51:21 -03:00
Raphael S. Carvalho	55f096d01b	sstables/compaction: Clean up code around garbage_collected_sstable_writer This cleanup allows us to get rid of the ugly compaction::create_new_sstable(), and reduce complexity by getting rid of observable. garbage_collected_sstable_writer::data is introduced to allow compaction to directly communicate with the GC writer, which is stored in mutation_compaction, making it unreachable after the compaction has started. By making compaction store GC writer's data and using that same data to create g__c__s__w, compaction is able to communicate with GC writer without the complexity of observable utility. This move is important for the subsequent work which will fix a couple of issues regarding management of GC SSTables. [Backport note: there were a few conflicts as this patch was written after interposer consumer, but the conflicts weren't hard to solve] Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `cc5e0d8da8`)	2020-08-06 18:01:12 -03:00
Glauber Costa	fc79da5912	compaction: enhance compaction_descriptor with creator and replace function There are many differences between resharding and compaction that are artificial, arising more from the way we ended up implementing it than necessity. This patch attempts to pass the creator and replacer functions through the compaction_descriptor. There is a difference between the creator function for resharding and regular compaction: resharding has to pass the shard number on behalf of which the SSTable is created. However regular compactions can just ignore this. No need to have a special path just for this. After this is done, the constructor for the compaction object can be greatly simplified. In further patches I intend to simplify it a bit further, but some more cleanup has to happen first. To make that happen we have to construct a compaction_descriptor object inside the resharding function. This is temporary: resharding currently works with a descriptor, but at some point that descriptor is lost and broken into pieces to be passed to this function. The overarching goal of this work is exactly to be able to keep that descriptor for as long as possible, which should simplify things a lot. Callers are patched, but there are plenty for sstable_datafile_test.cc. For their benefit, a helper function is provided to keep the previous signature (test only). Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit `e8801cd77b`)	2020-08-06 17:45:40 -03:00
Yaron Kaikov	da9e7080ca	release: prepare for 4.0.6	2020-08-06 14:18:11 +03:00
Takuya ASADA	01b0195c22	scylla_util.py: always use relocatable CLI tools On some CLI tools, command options may different between latest version vs older version. To maximize compatibility of setup scripts, we should always use relocatable CLI tools instead of distribution version of the tool. Related #6954 (cherry picked from commit `a19a62e6f6`)	2020-08-03 10:42:14 +03:00
Takuya ASADA	d05b567a40	create-relocatable-package.py: add lsblk for relocatable CLI tools We need latest version of lsblk that supported partition type UUID. Fixes #6954 (cherry picked from commit `6ba2a6c42e`)	2020-08-03 10:42:12 +03:00
Juliusz Stasiewicz	2c11efbbae	aggregate_fcts: Use per-type comparators for dynamic types For collections and UDTs the `MIN()` and `MAX()` functions are generated on the fly. Until now they worked by comparing just the byte representations of arguments. This patch uses specific per-type comparators to provide semantically sensible, dynamically created aggregates. Fixes #6768 (cherry picked from commit `5b438e79be`)	2020-08-03 10:26:28 +03:00
Calle Wilund	c60d71dc69	cql3::lists: Fix setter_by_uuid not handing null value Fixes #6828 When using the scylla list index from UUID extension, null values were not handled properly causing throws from underlying layer. (cherry picked from commit `3b74b9585f`)	2020-08-03 10:20:28 +03:00
Takuya ASADA	79930048db	scylla_post_install.sh: generate memory.conf for CentOS7 On CentOS7, systemd does not support percentage-based parameter. To apply memory parameter on CentOS7, we need to override the parameter in bytes, instead of percentage. Fixes #6783 (cherry picked from commit `3a25e7285b`)	2020-07-30 16:41:40 +03:00
Tomasz Grabiec	82b4f4a6c2	commitlog: Fix use-after-free on mutation object during replay The mutation object may be freed prematurely during commitlog replay in the schema upgrading path. We will hit the problem if the memtable is full and apply_in_memory() needs to defer. This will typically manifest as a segfault. Fixes #6953 Introduced in `79935df` Tests: - manual using scylla binary. Reproduced the problem then verified the fix makes it go away Message-Id: <1596044010-27296-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `3486eba1ce`)	2020-07-30 16:37:08 +03:00
Avi Kivity	5b99195d21	dist: debian: do not require root during package build Debian package builds provide a root environment for the installation scripts, since that's what typical installation scripts expect. To avoid providing actual root, a "fakeroot" system is used where syscalls are intercepted and any effect that requires root (like chown) is emulated. However, fakeroot sporadically fails for us, aborting the package build. Since our install scripts don't really require root (when operating in the --packaging mode), we can just tell dpkg-buildpackage that we don't need fakeroot. This ought to fix the sporadic failures. As a side effect, package builds are faster. Fixes #6655. (cherry picked from commit `b608af870b`)	2020-07-29 16:03:53 +03:00
Takuya ASADA	edde256228	scylla_setup: skip boot partition On GCE, /dev/sda14 reported as unused disk but it's BIOS boot partition, should not use for scylla data partition, also cannot use for it since it's too small. It's better to exclude such partiotion from unsed disk list. Fixes #6636 (cherry picked from commit `d7de9518fe`)	2020-07-29 09:51:05 +03:00
Asias He	3cf28ac18e	repair: Fix race between create_writer and wait_for_writer_done We saw scylla hit user after free in repair with the following procedure during tests: - n1 and n2 in the cluster - n2 ran decommission - n2 sent data to n1 using repair - n2 was killed forcely - n1 tried to remove repair_meta for n1 - n1 hit use after free on repair_meta object This was what happened on n1: 1) data was received -> do_apply_rows was called -> yield before create_writer() was called 2) repair_meta::stop() was called -> wait_for_writer_done() / do_wait_for_writer_done was called with _writer_done[node_idx] not engaged 3) step 1 resumed, create_writer() was called and _repair_writer object was referenced 4) repair_meta::stop() finished, repair_meta object and its member _repair_writer was destroyed 5) The fiber created by create_writer() at step 3 hit use after free on _repair_writer object To fix, we should call wait_for_writer_done() after any pending operations were done which were protected by repair_meta::_gate. This prevents wait for writer done finishes before the writer is in the process of being created. Fixes: #6853 Fixes: #6868 Backports: 4.0, 4.1, 4.2 (cherry picked from commit `e6f640441a`)	2020-07-29 09:51:02 +03:00
Raphael S. Carvalho	58b65f61c0	sstable: index_reader: Make sure streams are all properly closed on failure Turns out the fix `f591c9c710` wasn't enough to make sure all input streams are properly closed on failure. It only closes the main input stream that belongs to context, but it misses all the input streams that can be opened in the consumer for promote index reading. Consumer stores a list of indexes, where each of them has its own input stream. On failure, we need to make sure that every single one of them is properly closed before destroying the indexes as that could cause memory corruption due to read ahead. Fixes #6924. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200727182214.377140-1-raphaelsc@scylladb.com> (cherry picked from commit `0d70efa58e`)	2020-07-29 09:50:48 +03:00
Yaron Kaikov	466cfb0ca6	release: prepare for 4.0.5	2020-07-28 09:13:02 +03:00
Raphael S. Carvalho	1cd6f50806	table: Fix Staging SSTables being incorrectly added or removed from the backlog tracker Staging SSTables can be incorrectly added or removed from the backlog tracker, after an ALTER TABLE or TRUNCATE, because the add and removal don't take into account if the SSTable requires view building, so a Staging SSTable can be added to the tracker after a ALTER table, or removed after a TRUNCATE, even though not added previously, potentially causing the backlog to become negative. Fixes #6798. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200716180737.944269-1-raphaelsc@scylladb.com> (cherry picked from commit `b67066cae2`)	2020-07-21 12:57:41 +03:00
Asias He	3f6fe7328a	repair: Relax size check of get_row_diff and set_diff In case a row hash conflict, a hash in set_diff will get more than one row from get_row_diff. For example, Node1 (Repair master): row1 -> hash1 row2 -> hash2 row3 -> hash3 row3' -> hash3 Node2 (Repair follower): row1 -> hash1 row2 -> hash2 We will have set_diff = {hash3} between node1 and node2, while get_row_diff({hash3}) will return two rows: row3 and row3'. And the error below was observed: repair - Got error in row level repair: std::runtime_error (row_diff.size() != set_diff.size()) In this case, node1 should send both row3 and row3' to peer node instead of fail the whole repair. Because node2 does not have row3 or row3', otherwise node1 won't send row with hash3 to node1 in the first place. Refs: #6252 (cherry picked from commit `a00ab8688f`)	2020-07-15 14:49:29 +03:00
Hagit Segev	f9dd8608eb	release: prepare for 4.0.4	2020-07-14 14:10:39 +03:00
Avi Kivity	24a80cbf47	Update seastar submodule * seastar a73b92ff2e...4ee384e15f (2): > futures: Add a test for a broken promise in a parallel_for_each > future: Call set_to_broken_promise earlier Fixes #6749 (probably)	2020-07-13 20:32:27 +03:00
Dmitry Kropachev	6e4edc97ad	dist/common/scripts/scylla-housekeeping: wrap urllib.request with try ... except We could hit "cannot serialize '_io.BufferedReader' object" when request get 404 error from the server Now you will get legit error message in the case. Fixes #6690 (cherry picked from commit `de82b3efae`)	2020-07-09 18:25:35 +03:00
Dejan Mircevski	81df28b6f3	cql/restrictions: Handle `WHERE a>0 AND a<0` WHERE clauses with start point above the end point were handled incorrectly. When the slice bounds are transformed to interval bounds, the resulting interval is interpreted as wrap-around (because start > end), so it contains all values above 0 and all values below 0. This is clearly incorrect, as the user's intent was to filter out all possible values of a. Fix it by explicitly short-circuiting to false when start > end. Add a test case. Fixes #5799. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `921dbd0978`)	2020-07-08 13:25:06 +03:00
Juliusz Stasiewicz	ea6620e9eb	counters: Read the state under timeout Counter update is a RMW operation. Until now the "Read" part was not guarded by a timeout, which is changed in this patch. Fixes #5069 (cherry picked from commit `e04fd9f774`)	2020-07-07 20:45:26 +03:00
Takuya ASADA	19be84dafd	scylla_setup: don't add same disk device twice We shouldn't accept adding same disk twice for RAID prompt. Fixes #6711 (cherry picked from commit `835e76fdfc`)	2020-07-07 13:08:36 +03:00
Pavel Emelyanov	2ff897d351	main: Keep feature_service for storage_proxy Fixes #6250 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423165608.32419-1-xemul@scylladb.com> (cherry picked from commit `98635b74a6`)	2020-07-07 12:42:52 +03:00
Botond Dénes	8fc3300739	sstables: sstable_reader: fix read range upper bound calculation for reverse slices The single-key sstable reader uses the clustering ranges from the slice to determine the upper bound of the disk read-range using the index. For this is simply uses the end bound of the last clustering ranges. For reverse reads however the clustering ranges in the slice are in reverse order, so this will in fact be the upper bound of the smallest range. Depending on whether the distance between the clustering range is big enough for the sstable reader to use the index to skip between them, this will lead to either reading too little data or an assert failure. This patch fixes the problematic function `get_slice_upper_bound()` to consider reverse reads as well. Initially I thought there will be more mishandling of reverse slices, but actually `mutation_fragment_filter`, the component doing the actual slicing of rows, is already reverse-slice aware. A unit test which reproduces the assert failure is also added. Fixes: #6171 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200507114956.271799-1-bdenes@scylladb.com> (cherry picked from commit `791acc7f38`)	2020-07-05 16:02:15 +03:00
Raphael S. Carvalho	d2ac7d4b18	compaction: Fix partition estimation with TWCS interposer Max and min windows are microsecond timestamps, which should be divided by window size in microseconds to properly estimate window count based on provided mutation_source_metadata. Found this problem after properly setting mutation_source_metadata with min and max metadata on behalf of regular compaction. Fixes #6214. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200409194235.6004-2-raphaelsc@scylladb.com> (cherry picked from commit `3edff36cd2`)	2020-07-05 15:27:40 +03:00
Avi Kivity	61706a6789	Update seastar submodule * seastar 0dc0fec831...a73b92ff2e (1): > rpc::compressor: Fix static init fiasco with names Fixes #5963	2020-07-02 18:08:52 +03:00
Piotr Sarna	65aa531010	db: set gc grace period to 0 for local system tables Local system tables from `system` namespace use LocalStrategy replication, so they do not need to be concerned about gc grace period. Some system tables already set gc grace period to 0, but other ones, including system.large_partitions, did not. That may result in millions of tombstones being needlessly kept for these tables, which can cause read timeouts. Fixes #6325 Tests: unit(dev), local(running cqlsh and playing with system tables) (cherry picked from commit `bf5f247bc5`)	2020-07-01 13:13:57 +03:00
Benny Halevy	4bffd0f522	api: storage_service: serialize true_snapshot_size Following up on `91b71a0b1a` We also need to serialize storage_service::true_snapshots_size with snapshot-modifying operations. It seems like it was assumed that get_snapshot_details is done under run_snapshot_list_operation, but the one called here is the table method, not the api::storage_service::get_snapshot_details. Fixes #5603 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200506115732.483966-1-bhalevy@scylladb.com> (cherry picked from commit `682fb3acfd`)	2020-07-01 13:09:43 +03:00
Rafael Ávila de Espíndola	9409fc7290	gms: Don't keep references to reallocated vector entries These callbacks can block a seastar thread and the underlying vector can be reallocated concurrently. This is no different than if it was a plain std::vector and the solution is similar: use values instead of references. Fixes #6230 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200422182304.120906-1-espindola@scylladb.com> (cherry picked from commit `d8555513a9`)	2020-07-01 12:58:56 +03:00
Pavel Solodovnikov	86faf1b3ca	cql3: avoid using shared_ptr's in unrecognized_entity_exception Using shared_ptr's in `unrecognized_entity_exception` can lead to cross-cpu deletion of a pointer which will trigger an assert `_cpu == std::this_thread::get_id()' when shared_ptr is disposed. Copy `column_identifier` to the exception object and avoid using an instance of `cql3::relation`: just get a string representation from it since nothing more is used in associated exception handling code. Fixes: #6287 Tests: unit(dev, debug), dtest(lwt_destructive_ddl_test.py:LwtDestructiveDDLTest.test_rename_column) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200506155714.150497-1-pa.solodovnikov@scylladb.com> (cherry picked from commit `1d3f9174c5`)	2020-07-01 12:54:09 +03:00
Raphael S. Carvalho	426295bda9	compaction: Fix the 2x disk space requirement in SSTable upgrade SSTable upgrade is requiring 2x the space of input SSTables because we aren't releasing references of the SSTables that were already upgraded. So if we're upgrading 1TB, it means that up to 2TB may be required for the upgrade operation to succeed. That can be fixed by moving all input SSTables when rewrite_sstables() asks for the set of SSTables to be compacted, so allowing their space to be released as soon as there is no longer any ref to them. Spotted while auditting code. Fixes #6682. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200619205701.92891-1-raphaelsc@scylladb.com> (cherry picked from commit `52180f91d4`)	2020-07-01 12:37:38 +03:00
Raphael S. Carvalho	c6fde0e562	cql3: don't reset default TTL when not explicitly specified in alter table statement Any alter table statement that doesn't explicitly set the default time to live will reset it to 0. That can be very dangerous for time series use cases, which rely on all data being eventually expired, and a default TTL of 0 means data never being expired. Fixes #5048. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200402211653.25603-1-raphaelsc@scylladb.com> (cherry picked from commit `044f80b1b5`)	2020-06-30 19:28:50 +03:00
Avi Kivity	d9f9e7455b	Merge "Fix handling of decimals with negative scales" from Rafael " Before this series scylla would effectively infinite loop when, for example, casting a decimal with a negative scale to float. Fixes #6720 " * 'espindola/fix-decimal-issue' of https://github.com/espindola/scylla: big_decimal: Add a test for a corner case big_decimal: Correctly handle negative scales big_decimal: Add a as_rational member function big_decimal: Move constructors out of line (cherry picked from commit `3e2eeec83a`)	2020-06-29 12:26:06 +03:00
Piotr Sarna	e95bcd0f8f	alternator: fix propagating tags Updating tags was erroneously done locally, which means that the schema change was not propagated to other nodes. The new code announces new schema globally. Fixes #6513 Branches: 4.0,4.1 Tests: unit(dev) dtest(alternator_tests.AlternatorTest.test_update_condition_expression_and_write_isolation) Message-Id: <3a816c4ecc33c03af4f36e51b11f195c231e7ce1.1592935039.git.sarna@scylladb.com> (cherry picked from commit `f4e8cfe03b`)	2020-06-24 14:10:36 +03:00
Asias He	2ff6e2e122	streaming: Do not send end of stream in case of error Current sender sends stream_mutation_fragments_cmd::end_of_stream to receiver when an error is received from a peer node. To be safe, send stream_mutation_fragments_cmd::error instead of stream_mutation_fragments_cmd::end_of_stream to prevent end_of_stream to be written into the sstable when a partition is not closed yet. In addition, use mutation_fragment_stream_validator to valid the mutation fragments emitted from the reader, e.g., check if partition_start and partition_end are paired when the reader is done. If not, fail the stream session and send stream_mutation_fragments_cmd::error instead of stream_mutation_fragments_cmd::end_of_stream to isolate the problematic sstables on the sender node. Refs: #6478 (cherry picked from commit `a521c429e1`)	2020-06-23 12:48:01 +03:00
Hagit Segev	1fcf38abd9	release: prepare for 4.0.3	2020-06-21 21:46:49 +03:00
Alejo Sanchez	3375b8b86c	lwt: validate before constructing metadata LWT batches conditions can't span multiple tables. This was detected in batch_statement::validate() called in ::prepare(). But ::cas_result_set_metadata() was built in the constructor, causing a bitset assert/crash in a reported scenario. This patch moves validate() to the constructor before building metadata. Closes #6332 Tested with https://github.com/scylladb/scylla-dtest/pull/1465 [avi: adjust spelling of exception message to 4.0 spelling] Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> (cherry picked from commit `d1521e6721`)	2020-06-21 18:22:08 +03:00
Gleb Natapov	586546ab32	cql transport: do not log broken pipe error when a client closes its side of a connection abruptly Fixes #5661 Message-Id: <20200615075958.GL335449@scylladb.com> (cherry picked from commit `7ca937778d`)	2020-06-21 13:09:10 +03:00
Amnon Heiman	e1d558cb01	api/storage_service.cc: stream result of token_range The get token range API can become big which can cause large allocation and stalls. This patch replace the implementation so it would stream the results using the http stream capabilities instead of serialization and sending one big buffer. Fixes #6297 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `7c4562d532`)	2020-06-21 12:57:34 +03:00
Avi Kivity	b0a8f396b4	Update seastar submodule * seastar 447aad8d78...0dc0fec831 (1): > membarrier: fix madvise(MADV_DONTNEED) failure and crash with --lock-memory Fixes #6346.	2020-06-21 12:35:39 +03:00
Rafael Ávila de Espíndola	48e7ee374a	configure: Reduce the dynamic linker path size gdb has a SO_NAME_MAX_PATH_SIZE of 512, so we use that as the path size. Fixes: #6494 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200528202741.398695-2-espindola@scylladb.com> (cherry picked from commit `aa778ec152`)	2020-06-21 12:27:19 +03:00
Piotr Sarna	3e85ecd1bd	alternator: fix the return type of PutItem Even if there are no attributes to return from PutItem requests, we should return a valid JSON object, not an empty string. Fixes #6568 Tests: unit(dev) (cherry picked from commit `8fc3ca855e`)	2020-06-21 12:21:30 +03:00
Piotr Sarna	930a4af8b3	alternator: fix returning UnprocessedKeys unconditionally Client libraries (e.g. PynamoDB) expect the UnprocessedKeys and UnprocessedItems attributes to appear in the response unconditionally - it's hereby added, along with a simple test case. Fixes #6569 Tests: unit(dev) (cherry picked from commit `3aff52f56e`)	2020-06-21 12:19:34 +03:00
Tomasz Grabiec	6a6d36058a	row_cache: Fix undefined behavior on key linearization This is relevant only when using partition or clustering keys which have a representation in memory which is larger than 12.8 KB (10% of LSA segment size). There are several places in code (cache, background garbage collection) which may need to linearize keys because of performing key comparison, but it's not done safely: 1) the code does not run with the LSA region locked, so pointers may get invalidated on linearization if it needs to reclaim memory. This is fixed by running the code inside an allocating section. 2) LSA region is locked, but the scope of with_linearized_managed_bytes() encloses the allocating section. If allocating section needs to reclaim, linearization context will contain invalidated pointers. The fix is to reorder the scopes so that linearization context lives within an allocating section. Example of 1 can be found in range_populating_reader::handle_end_of_stream() where it performs a lookup: auto prev = std::prev(it); if (prev->key().equal(_cache._schema, _last_key->_key)) { it->set_continuous(true); but handle_end_of_stream() is not invoked under allocating section. Example of 2 can be found in mutation_cleaner_impl::merge_some() where it does: return with_linearized_managed_bytes([&] { ... return _worker_state->alloc_section(region, [&] { Fixes #6637. Refs #6108. Tests: - unit (all) Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `e81fc1f095`)	2020-06-21 11:58:39 +03:00
Yaron Kaikov	ce57d0174d	release: prepare for 4.0.2	2020-06-15 20:52:58 +03:00
Avi Kivity	cd11f210ad	tools: toolchain: regenerate for gnutls 3.6.14 CVE-2020-13777. Fixes #6627. Toolchain source image registry disambiguated due to tighter podman defaults.	2020-06-15 07:58:31 +03:00
Calle Wilund	1e2e203cf0	gms::inet_address: Fix sign extension error in custom address formatting Fixes #5808 Seems some gcc:s will generate the code as sign extending. Mine does not, but this should be more correct anyhow. Added small stringify test to serialization_test for inet_address (cherry picked from commit `a14a28cdf4`)	2020-06-09 20:16:37 +03:00
Takuya ASADA	1a98c93a25	aws: update enhanced networking supported instance list Sync enhanced networking supported instance list to latest one. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Fixes #6540 (cherry picked from commit `969c4258cf`)	2020-06-09 16:02:27 +03:00
Calle Wilund	4f4845c94c	commitlog_test: Ensure "when_over_disk_limit" reads segment list only once Fixes #6195 test_commitlog_delete_when_over_disk_limit reads current segment list in flush handler, to compare with result after allowing deletetion of segement. However, it might be called more than once in rare cases, because timing and us using rather small sizes. Reading the list the second time however is not a good idea, because it might just very well be exactly the same as what we read in the test check code, and we actually overwrite the list we want to check against. Because callback is on timer. And test is not. Message-Id: <20200414114322.13268-1-calle@scylladb.com> [ penberg: backported fix random failures in commitlog_test ] (cherry picked from commit `a62d75fed5`)	2020-06-01 18:41:18 +03:00
Nadav Har'El	ef745e1ce7	alternator: fix support for bytes type in Query's KeyConditions Our parsing of values in a KeyConditions paramter of Query was done naively. As a result, we got bizarre error messages "condition not met: false" when these values had incorrect type (this is issue #6490). Worse - the naive conversion did not decode base64-encoded bytes value as needed, so KeyConditions on bytes-typed keys did not work at all. This patch fixes these bugs by using our existing utility function get_key_from_typed_value(), which takes care of throwing sensible errors when types don't match, and decoding base64 as needed. Unfortunately, we didn't have test coverage for many of the KeyConditions features including bytes keys, which is why this issue escaped detection. A patch will follow with much more comprehensive tests for KeyConditions, which also reproduce this issue and verify that it is fixed. Refs #6490 Fixes #6495 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524141800.104950-1-nyh@scylladb.com> (cherry picked from commit `6b38126a8f`)	2020-05-31 14:02:18 +03:00
Calle Wilund	ae32aa970a	commitlog::read_log_file: Preserve subscription across reading Fixes #6265 Return type for read_log_file was previously changed from subscription to future<>, returning the previously returned subscriptions result of done(). But it did not preserve the subscription itself, which in turn will cause us to (in work::stream), call back into a deleted object. Message-Id: <20200422090856.5218-1-calle@scylladb.com> (cherry picked from commit `525b283326`)	2020-05-25 13:07:33 +03:00
Eliran Sinvani	a3eb12c5f1	Auth: return correct error code when role is not found Scylla returns the wrong error code (0000 - server internal error) in response to trying to do authentication/authorization operations that involves a non-existing role. This commit changes those cases to return error code 2200 (invalid query) which is the correct one and also the one that Cassandra returns. Tests: Unit tests (Dev) All auth and auth_role dtests (cherry picked from commit ce8cebe34801f0ef0e327a32f37442b513ffc214) Fixes #6363.	2020-05-25 12:58:09 +03:00
Amnon Heiman	b5cedfc177	storage_service: get_range_to_address_map prevent use after free The implementation of get_range_to_address_map has a default behaviour, when getting an empty keypsace, it uses the first non-system keyspace (first here is basically, just a keyspace). The current implementation has two issues, first, it uses a reference to a string that is held on a stack of another function. In other word, there's a use after free that is not clear why we never hit. The second, it calls get_non_system_keyspaces twice. Though this is not a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling that function does have a cost). This patch solves both issues, by chaning the implementation to hold a string instead of a reference to a string. Second, it stores the results from get_non_system_keyspaces and reuse them it's more efficient and holds the returned values on the local stack. Fixes #6465 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `69a46d4179`)	2020-05-25 12:48:26 +03:00
Hagit Segev	8d9bc57aca	release: prepare for 4.0.1	2020-05-24 21:39:44 +03:00
Tomasz Grabiec	1cbda629a2	sstables: index_reader: Fix overflow when calculating promoted index end When index file is larger than 4GB, offset calculation will overflow uint32_t and _promoted_index_end will be too small. As a result, promoted_index_size calculation will underflow and the rest of the page will be interpretd as a promoted index. The partitions which are in the remainder of the index page will not be found by single-partition queries. Data is not lost. Introduced in `6c5f8e0eda`. Fixes #6040 Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com> (cherry picked from commit `a6c87a7b9e`)	2020-05-24 09:45:55 +03:00
Rafael Ávila de Espíndola	baf0201a6e	repair: Make sure sinks are always closed In a recent next failure I got the following backtrace function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101 at ./seastar/include/seastar/core/shared_ptr.hh:463 at repair/row_level.cc:2059 This patch changes a few functions to use finally to make sure the sink is always closed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200515202803.60020-1-espindola@scylladb.com> (cherry picked from commit `311fbe2f0a`) Ref #6414	2020-05-20 09:00:44 +03:00
Asias He	7dcffb963c	repair: Fix race between write_end_of_stream and apply_rows Consider: n1, n2, n1 is the repair master, n2 is the repair follower. === Case 1 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after row r1 is written. data: partition_start, r1 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream() data: partition_start, r1, partition_end 5) Step 2 resumes to apply the rows. data: partition_start, r1, partition_end, partition_end, partition_start, r2 === Case 2 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after partition_start for r2 is written but before _partition_opened is set to true. data: partition_start, r1, partition_end, partition_start 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream(). Since _partition_opened[node_idx] is false, partition_end is skipped, end_of_stream is written. data: partition_start, r1, partition_end, partition_start, end_of_stream This causes unbalanced partition_start and partition_end in the stream written to sstables. To fix, serialize the write_end_of_stream and apply_rows with a semaphore. Fixes: #6394 Fixes: #6296 Fixes: #6414 (cherry picked from commit `b2c4d9fdbc`)	2020-05-20 08:08:11 +03:00
Piotr Dulikowski	dcfaf4d035	hinted handoff: don't keep positions of old hints in rps_set When sending hints from one file, rps_set field in send_one_file_ctx keeps track of commitlog positions of hints that are being currently sent, or have failed to be sent. At the end of the operation, if sending of some hints failed, we will choose position of the earliest hint that failed to be sent, and will retry sending that file later, starting from that position. This position is stored in _last_not_complete_rp. Usually, this set has a bounded size, because we impose a limit of at most 128 hints being sent concurrently. Because we do not attempt to send any more hints after a failure is detected, rps_set should not have more than 128 elements at a time. Due to a bug, commitlog positions of old hints (older than gc_grace_seconds of the destination table) were inserted into rps_set but not removed after checking their age. This could cause rps_set to grow very large when replaying a file with old hints. Moreover, if the file mixed expired and non-expired hints (which could happen if it had hints to two tables with different gc_grace_seconds), and sending of some non-expired hints failed, then positions of expired hints could influence calculation _last_not_complete_rp, and more hints than necessary would be resent on the next retry. This simple patch removes commitlog position of a hint from rps_set when it is detected to be too old. Fixes #6422 (cherry picked from commit `85d5c3d5ee`)	2020-05-20 08:06:04 +03:00
Piotr Dulikowski	f974a54cbd	hinted handoff: remove discarded hint positions from rps_set Related commit: `85d5c3d` When attempting to send a hint, an exception might occur that results in that hint being discarded (e.g. keyspace or table of the hint was removed). When such an exception is thrown, position of the hint will already be stored in rps_set. We are only allowed to retain positions of hints that failed to be sent and needed to be retried later. Dropping a hint is not an error, therefore its position should be removed from rps_set - but current logic does not do that. Because of that bug, hint files with many discardable hints might cause rps_set to grow large when the file is replayed. Furthermore, leaving positions of such hints in rps_set might cause more hints than necessary to be re-sent if some non-discarded hints fail to be sent. This commit fixes the problem by removing positions of discarded hints from rps_set. Fixes #6433 (cherry picked from commit `0c5ac0da98`)	2020-05-20 08:03:44 +03:00
Piotr Sarna	30a96cc592	db, view: remove duplicate entries from pending endpoints When generating view updates, an endpoint can appear both as a primary paired endpoint for the view update, and as a pending endpoint (due to range movements). In order not to generate the same update twice for the same endpoint, the paired endpoint is removed from the list of pending endpoints if present. Fixes #5459 Tests: unit(dev), dtest(TestMaterializedViews.add_dc_during_mv_insert_test) (cherry picked from commit `86b0dd81e3`)	2020-05-17 19:09:58 +02:00
Avi Kivity	faf300382a	Update seastar submodule * seastar 8bc24f486a...447aad8d78 (1): > timer: add scheduling_group awareness Fixes #6170.	2020-05-10 18:12:32 +03:00
Gleb Natapov	55400598ff	storage_proxy: limit read repair only to replicas that answered during speculative reads Speculative reader has more targets that needed for CL. In case there is a digest mismatch the repair runs between all of them, but that violates provided CL. The patch makes it so that repair runs only between replicas that answered (there will be CL of them). Fixes #6123 Reviewed-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200402132245.GA21956@scylladb.com> (cherry picked from commit `36a24bbb70`)	2020-05-07 19:48:24 +03:00
Mike Goltsov	c177295bce	fix error in fstrim service (scylla_util.py) On Centos 7 machine: fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation When trying run scylla-fstrim service manually you get error: Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module> main() File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main cfg = parse_scylla_dirs_with_default(conf=args.config) File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default if key not in y or not y[k]: NameError: name 'k' is not defined It caused by error in scylla_util.py Fixes #6294. (cherry picked from commit `068bb3a5bf`)	2020-05-07 19:45:35 +03:00
Hagit Segev	d95aa77b62	release: prepare for 4.0.0	2020-05-05 18:58:39 +03:00
Pekka Enberg	fe54009855	scripts/jobs: Keep memory reserve when calculating parallelism The "jobs" script is used to determine the amount of compilation parallelism on a machine. It attempts to ensure each GCC process has at least 4 GB of memory per core. However, in the worst case scenario, we could end up having the GCC processes take up all the system memory, forcin swapping or OOM killer to kick in. For example, on a 4 core machine with 16 GB of memory, this worst case scenario seems easy to trigger in practice. Fix up the problem by keeping a 1 GB of memory reserve for other processes and calculating parallelism based on that. Message-Id: <20200423082753.31162-1-penberg@scylladb.com> (cherry picked from commit `7304a795e5`)	2020-05-04 19:01:14 +03:00
Piotr Sarna	bbe82236be	clocks-impl: switch to thread-safe time conversion std::gmtime() has a sad property of using a global static buffer for returning its value. This is not thread-safe, so its usage is replaced with gmtime_r, which can accept a local buffer. While no regressions where observed in this particular area of code, a similar bug caused failures in alternator, so it's better to simply replace all std::gmtime calls with their thread-safe counterpart. Message-Id: <39e91c74de95f8313e6bb0b12114bf12c0e79519.1588589151.git.sarna@scylladb.com> (cherry picked from commit `05ec95134a`)	2020-05-04 17:14:28 +03:00
Piotr Sarna	abd73cab78	alternator: fix signature timestamps Generating timestamps for auth signatures used a non-thread-safe ::gmtime function instead of thread-safe ::gmtime_r. Tests: unit(dev) Fixes #6345 (cherry picked from commit `fb7fa7f442`)	2020-05-04 17:05:39 +03:00
Nadav Har'El	8fd7cf5cd1	alternator test: drastically reduce time to boot Scylla The alternator test, test/alternator/run, runs Scylla and runs the various tests against it. Before this patch, just booting Scylla took about 26 seconds (for a dev build, on my laptop). This patch reduces this delay to less than one second! It turns out that almost the entire delay was artificial, two periods of 12 seconds "waiting for the gossip to settle", which are completely unnecessary in the one-node cluster used in the Alternator test. So a simple "--skip-wait-for-gossip-to-settle 0" parameter eliminates these long delays completely. Amusingly, the Scylla boot is now so fast, that I had to change a "sleep 2" in the test script to "sleep 1", because 2 seconds is now much more than it takes to boot Scylla :-) Fixes #6310. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200428145035.22894-1-nyh@scylladb.com> (cherry picked from commit `ff5615d59d`)	2020-05-04 16:10:27 +03:00
Alejo Sanchez	dd88b2dd18	utils: error injection allocate string for remote invoke Allocate string before sending to other shards. Reported by Pavel Solodovnikov. Refs #3295 (closed) Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200328204454.1326514-2-alejo.sanchez@scylladb.com> (cherry picked from commit `e5a2ba32b9`) Ref #6342.	2020-05-03 19:33:34 +03:00
Hagit Segev	eee4c00e29	release: prepare for 4.0.rc3	2020-05-01 00:46:40 +03:00
Avi Kivity	85071ceeb1	Merge 'Fix hang in multishard_writer' from Asias " This series fix hang in multishard_writer when error happens. It contains - multishard_writer: Abort the queue attached to consumers when producer fails - repair: Fix hang when the writer is dead Fixes #6241 Refs: #6248 " * asias-stream_fix_multishard_writer_hang: repair: Fix hang when the writer is dead mutation_writer_test: Add test_multishard_writer_producer_aborts multishard_writer: Abort the queue attached to consumers when producer fails (cherry picked from commit `8925e00e96`)	2020-04-30 19:32:12 +03:00
Asias He	4cf201fc24	config: Do not enable repair based node operations by default Give it some more time to mature. Use the old stream plan based node operations by default. Fixes: #6305 Backports: 4.0 (cherry picked from commit `b8ac10c451`)	2020-04-30 17:57:55 +03:00
Raphael S. Carvalho	c6ad5cf556	api/service: fix segfault when taking a snapshot without keyspace specified If no keyspace is specified when taking snapshot, there will be a segfault because keynames is unconditionally dereferenced. Let's return an error because a keyspace must be specified when column families are specified. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com> (cherry picked from commit `02e046608f`) Fixes #6336.	2020-04-30 12:49:13 +03:00
Piotr Sarna	51e3e6c655	Update seastar submodule * seastar 251bc8f2...8bc24f48 (1): > http: make headers case-insensitive Fixes #6319	2020-04-30 08:18:01 +02:00
Nadav Har'El	8ac6579b30	test.py: run Alternator test with the correct Scylla binary The Alternator test's run script, test/alternator/run, runs Scylla. By default, it chooses the last built Scylla executable build/*/scylla. However, test.py has a "mode" option, that should be able to choose which build mode to run. Before this patch, this mode option wasn't honored by the Alternator test, so a "test.py alternator/run" would run the same Scylla binary (the one last built) three times, instead of running each of the three build modes. We fix this in this patch: test.py now passes the "SCYLLA" environment variable to the test/alternator/run script, indicating the location of the Scylla binary with the appropriate build mode. The script already supported this environment variable to override its default choice of Scylla binary. In test.py, we add to the run_test() function an optional "env" parameter which can be used to pass additional environment variables to the test. Fixes #6286 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200427131958.28248-1-nyh@scylladb.com> (cherry picked from commit `858a12755b`)	2020-04-28 16:19:07 +03:00
Piotr Sarna	3744e66244	alternator: fix integer overflow warning in token generation When generating tokens for parallel scan, debug mode undefined behavior sanitizer complained that integer overflow sometimes happens when multiplying two big values - delta and segment number. In order to mitigate this warning, the multiplication is now split into two smaller ones, and the generated machine code remains identical (verified on gcc and clang via compiler explorer). Fixes #6280 Tests: unit(dev) (cherry picked from commit `e17c237feb`)	2020-04-28 16:15:31 +03:00
Piotr Sarna	d3bf349484	alternator: allow parallel scan Parallel scans can be performed by providing Segment and TotalSegments attributes to Scan request, which can be used to split the work among many workers. This test makes the parallel scan test succeed, so the xfail is removed. Fixes #5059 (cherry picked from commit `dbb9574aa2`)	2020-04-28 16:07:43 +03:00
Nadav Har'El	3e6a8ba5bd	test/alternator: increase timeout on Scylla boot The Alternator test boots Scylla to test against it. We set an arbitrary timeout for this boot to succeed: 100 seconds. This 100 seconds is significantly more than 25 seconds it takes on my laptop, and I though we'll never reach it. But it turns out that in some setups - running the very slow debug build on slow and overcommitted nodes - 100 seconds is not enough. So this patch doubles the timeout to 200 seconds. Note that this "200 seconds" is just a timeout, and doesn't affect normal runs: Both a successful boot and a failed boot are recognized as soon as they happen, and we never unnecessarily wait the entire 200 seconds. Fixes #6271. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200422193920.17079-1-nyh@scylladb.com> (cherry picked from commit `92e36c5df5`)	2020-04-28 16:04:12 +03:00
Nadav Har'El	5f1785b9cf	alternator: use RF=3 even if some nodes are temporarily down Alternator is supposed to use RF=3 for new tables. Only when the cluster is smaller than 3 nodes do we use RF=1 (and warn about it) - this is useful for testing. However, our implementation incorrectly tested the number of live nodes in the cluster instead of the total number of nodes. As a result, if a 3-node cluster had one node down, and a new table was created, it was created with RF=1, and immediately could not be written because when RF=1, any node down means part of the data is unavailable. This patch fixes this: The total number of nodes in the cluster - not the number of live nodes - is consulted. The three-node-cluster-with-a-dead-node setup above creates the table with RF=3, and it can be written because two living nodes out of three are enough when RF=3 and we do quorum writes and reads. We have a dtest to reproduce this bug (and its fix), and it's also easy to reproduce manually by starting a 3-node cluster, killing one of the nodes, and then running "pytests". Before this patch, the tests can create tables but then fail to write to them. After this patch, the test succeed on the same cluster with the dead node. Fixes #6267 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200422182035.15106-2-nyh@scylladb.com> (cherry picked from commit `1f75efb556`)	2020-04-28 15:52:06 +03:00
Nadav Har'El	e1fd6cf989	gossiper: add convenience function for getting number of nodes The gossiper has a convenience functions get_up_endpoint_count() and get_down_endpoint_count(), but strangely no function to get the total number. Even though it's easy to calculate the total by summing up their result it is inefficient and also incovenient because of of these functions returns a future. So let's add another function, get_all_endpoint_count(), to get the total number of nodes. We will use this function in the next patch. Signed-off-by: Nadav Har'El <n...@scylladb.com> Message-Id: <20200422182035.15106-1-nyh@scylladb.com> (cherry picked from commit `08c39bde1a`)	2020-04-28 15:51:37 +03:00
Piotr Sarna	b7328ff1e4	alternator: implement ScanIndexForward The ScanIndexForward parameter is now fully implemented and can accept ScanIndexForward=false in order to query the partitions in reverse clustering order. Note that reading partition slices in reverse order is less efficient than forward scans and may put a strain on memory usage, especially for large partitions, since the whole partition is currently fetched in order to be reversed. Fixes #5153 (cherry picked from commit `09e4f3b917`)	2020-04-28 15:30:01 +03:00
Avi Kivity	602ed43ac7	Update seastar submodule * seastar 76260705ef...251bc8f25d (1): > http server: fix "Date" header format Fixes #6253.	2020-04-26 19:30:08 +03:00
Tomasz Grabiec	c42c91c5bb	Merge "Drop only learnt value on PRUNE" from Gleb It is unsafe to remove entire row, so only drop learn value from system.paxos table. Fixes: #6154 (cherry picked from commit `e648e314e5`)	2020-04-21 18:30:12 +03:00
Avi Kivity	cf017b320a	test: alternator: configure scylla for test environment in terms of cpu and disk Currently, the alternator tests configure scylla to use all the logical cores in the host system, but only 1GB of RAM. This can lead to a small amount of memory per core. It also uses the default disk configuration, which is safe, but can be very slow on mechanical or non-enterprise disks. Change to use a fixed --smp 2 configuration, and add --overprovisioned for maximum flexibility (no spinning). Use --unsafe-bypass-fsync for faster performance on non-enterprise or mechanical disks, assuming that the test data is not important. Fixes #6251. Message-Id: <20200420154112.123386-1-avi@scylladb.com> (cherry picked from commit `2482e53de9`)	2020-04-21 18:25:28 +03:00
Hagit Segev	89e79023ae	release: prepare for 4.0.rc2	2020-04-21 16:26:09 +03:00
Nadav Har'El	bc67da1a21	alternator-test: comment out an error-path test that doesn't work on newer boto3 Unfortunately, the boto3 library doen't allow us to check some of the input error cases because it unnecessarily tests its input instead of just passing it to Alternator and allowing Alternator to report the error. In this patch we comment out a test case which used to work fine - i.e., the error was reported by Alternator - until recent changes to boto3 made it catch the problem without passing it to Alternator :-( Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200330190521.19526-2-nyh@scylladb.com> (cherry picked from commit `fe6cecb26d`)	2020-04-21 07:19:54 +02:00
Botond Dénes	0c7643f1fe	schema: schema(): use std::stable_sort() to sort key columns When multiple key columns (clustering or partition) are passed to the schema constructor, all having the same column id, the expectation is that these columns will retain the order in which they were passed to `schema_builder::with_column()`. Currently however this is not guaranteed as the schema constructor sort key columns by column id with `std::sort()`, which doesn't guarantee that equally comparing elements retain their order. This can be an issue for indexes, the schemas of which are built independently on each node. If there is any room for variance between for the key column order, this can result in different nodes having incompatible schemas for the same index. The fix is to use `std::stable_sort()` which guarantees that the order of equally comparing elements won't change. This is a suspected cause of #5856, although we don't have hard proof. Fixes: #5856 Signed-off-by: Botond Dénes <bdenes@scylladb.com> [avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes unstable at 17 elements, and the failing schema had a clustering key with 23 elements] Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com> (cherry picked from commit `a4aa753f0f`)	2020-04-19 18:18:45 +03:00
Rafael Ávila de Espíndola	c563234f40	dht: Use get_random_number<uint64_t> instead of int64_t in token::get_random_token I bisect the opposite change in `9c202b52da` as the cause of issue 6193. I don't know why. Maybe get_random_number<signed_type> is buggy? In any case, reverting to uint64_t solves the issue. Fixes #6193 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200418001611.440733-1-espindola@scylladb.com> (cherry picked from commit `f3fd466156`)	2020-04-19 16:20:40 +03:00
Nadav Har'El	77b7a48a02	alternator: remove mentions of experimental status of LWT Since commit `9948f548a5`, the LWT no longer requires an "experimental" flag, so Alternator documents and scripts which referred to the need for enabling experimental LWT, are fixed here to no longer do that. Fixes #6118. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200405143237.12693-1-nyh@scylladb.com> (cherry picked from commit `d9d50362af`)	2020-04-19 15:10:32 +03:00
Piotr Sarna	b2b1bfb159	alternator: fix failure on incorrect table name with no indexes If a table name is not found, it may still exist as a local index, but the check tried to fetch a local index name regardless if it was present in the request, which was a nullptr dereference bug. Fixes #6161 Tests: alternator-test(local, remote) Message-Id: <428c21e94f6c9e450b1766943677613bd46cbc68.1586347130.git.sarna@scylladb.com> (cherry picked from commit `123edfc10c`)	2020-04-19 15:07:25 +03:00
Nadav Har'El	d72cbe37aa	docs/alternator/alternator.md: fix typos Fix a couple of typos in the Alternator documentation. Fixes scylladb/scylla-doc-issues#280 Fixes scylladb/scylla-doc-issues#281 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200419091900.23030-1-nyh@scylladb.com> (cherry picked from commit `7e7c688946`)	2020-04-19 15:03:22 +03:00
Nadav Har'El	9f7b560771	docs, alternator: alternator.md cleanup Clean up the alternator.md document, by: * Updating out-of-date information that outstayed its welcome. * When Scylla does have a feature but it's just not supported via the DynamoDB API (e.g., CDC and on-demand backups) mention that. * Remove mention of Alternator being experimental and users should not store important data on it :-) * Miscellaneous cleanups. Fixes #6179. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200412094641.27186-1-nyh@scylladb.com> (cherry picked from commit `606ae0744c`)	2020-04-19 15:00:53 +03:00
Nadav Har'El	06af9c028c	alternator-test: make Alternator tests runnable from test.py To make the tests in alternator-test runnable by test.py, we need to move the directory alternator-test/ to test/alternator, because test.py only looks for tests in subdirectories of test/. Then, we need to create a test/alternator/suite.yaml saying that this test directory is of type "Run", i.e., it has a single run script "run" which runs all its tests. The "run" script had to be slightly modified to be aware of its new location relative to the source directory. To run the Alternator tests from test.py, do: ./test.py --mode dev alternator Note that in this version, the "--mode" has no effect - test/alternator/run always runs the latest compiled Scylla, regardless of the chosen mode. The Alternator tests can still be run manually and individually against a running Scylla or DynamoDB as before - just go to the test/alternator directory (instead of alternator-test previously) and run "pytest" with the desired parameters. Fixes #6046 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `4e2bf28b84`)	2020-04-19 11:19:15 +03:00
Nadav Har'El	c74ab3ae80	test.py: add xunit XML output file for "Run" tests Assumes that "Run" tests can take the --junit-xml=<path> option, and pass it to ask the test to generate an XML summary of the run to a file like testlog/dev/xml/run.1.xunit.xml. This option is honored by the Alternator tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `0cccb5a630`)	2020-04-19 11:19:06 +03:00
Nadav Har'El	32cd3a070a	test.py: add new test type "Run" This patch adds a new test type, "Run". A test subdirectory of type "Run" has a script called "run" which is expected to run all the tests in that directory. This will be used, in the next patch, by the Alternator functional tests. These tests indeed have a "run" script, which runs Scylla and then runs all of Alternator's tests, finishing fairly quickly (in less than a minute). All of that will become one test.py test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `0ae3136900`)	2020-04-19 11:18:01 +03:00
Nadav Har'El	bb1554f09e	test.py: flag for aborting tests with SIGTERM, not SIGKILL Today, if test.py is interrupted with SIGINT or SIGTERM, the ongoing test is killed with SIGKILL. Some types of tests - such as Alternator's test - may depend on being killed politely (e.g., with SIGTERM) to clean up files. We cannot yet change the signal to SIGTERM for all tests, because Seastar tests often don't deal well with signals, but we can at least add a flag that certain test types - that know they can be killed gently - will use. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `36e44972f1`)	2020-04-19 11:17:51 +03:00
Nadav Har'El	2037d7550e	alternator-test: change "run" script to pick random IP address Before this patch, the Alternator tests "run" script ran Scylla on a fixed listening address, 127.0.0.1. There is a problem that there might be other concurrent runs of Scylla using the same IP address - e.g., CCM (used by dtest) uses exactly this IP address for its first node. Luckily, Linux's loopback device actually allows us to pick any of over a million addresses in 127.0.0.0/8 to listen on - we don't need to use 127.0.0.1 specifically. So the code in this patch picks an address in 127.1.., so it cannot collide with CCM (which uses 127.0.0.* for up to 255 nodes). Moreover, the last two bytes of the listen address are picked based on the process ID of the run script; This allows multiple copies of this script to run concurrently - in case anybody wishes to do that. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `24fcc0c0ff`)	2020-04-19 11:17:39 +03:00
Nadav Har'El	c320c3f6da	install-dependencies.sh: add dependencies for Alternator tests To run Alternator tests, only two additional dependencies need to be added to install-dependencies.sh: pytest, and python3-boto3. We also need python3-cassandra-driver, but this dependency is already listed. This patch only updates the dependencies for Fedora, which is what we need for dbuild and our Jenkins setups. Tested by building a new dbuild docker image and verifying that the Alternator tests pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com> [avi: update toolchain image; note this upgrades gcc to 9.3.1] Message-Id: <20200330181128.18582-1-nyh@scylladb.com> (cherry picked from commit `8627ae42a6`)	2020-04-19 11:17:07 +03:00
Nadav Har'El	0ed70944aa	alternator-test: run: use the Python driver, not cqlsh The "run" script for the Alternator tests needs to set a system table for authentication credentials, so we can test this feature. So far we did this with cqlsh, but cqlsh isn't always installed on build machines. But install-dependencies.sh already installs the Cassandra driver for Python, so it makes more sense to use that, so this patch switches to use it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200331131522.28056-1-nyh@scylladb.com> (cherry picked from commit `55f02c00f2`)	2020-04-19 11:16:54 +03:00
Nadav Har'El	89f860d409	alternator-test: add "--url" option to choose Alternator's URL The "--aws" and "--local" test options chooses between two useful default URLs - Amazon's, or http://localhost:8000 for a local installation. However, sometimes one wants to run Scylla on a different IP address or port, so in this patch we add a "--url" option to choose a specific URL to connect to. For example, "--url http://127.1.2.3:1234". We will later use this option in the alternator-test/run script, to pick a random IP address on which to run Scylla, and then run the test against this address. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `1aec4baa51`)	2020-04-19 11:13:13 +03:00
Piotr Sarna	0819d221f4	test: add cases for empty paging state for index queries In order to check regressions related to #6136 and similar issues, test cases for handling paging state with empty partition/clustering key pair are added. (cherry picked from commit `88913e9d44`)	2020-04-19 10:35:26 +03:00
Piotr Sarna	53f47d4e67	cql3: fix generating base keys from empty index paging state An empty partition/clustering key pair is a valid state of the query paging state. Unfortunately, recent attempts at debugging a flaky test resulted in introducing an assertion which breaks when trying to generate a key from such a pair. In order to keep the assertion (since it still makes sense in its scope), but at the same time translate empty keys properly, empty keys are now explicitly processed at the beginning of the function. This behaviour was 100% reproducible in a secondary index dtest below. Fixes #6134 Refs #5856 Tests: unit(dev), dtest(TestSecondaryIndexes.test_truncate_base) (cherry picked from commit `45751ee24f`)	2020-04-19 10:35:09 +03:00
Kamil Braun	21ad12669a	sstables: freeze types nested in collection types in legacy sstables Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect serialization headers, which don't wrap frozen UDTs nested inside collections with the FrozenType<...> tag. When reading such SSTable, Scylla would detect a mismatch between the schema saved in schema tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema from the serialization header (which doesn't have these tags). SSTables created in Scylla versions 3.1 and above, in particular in Scylla versions that contain this commit, create correct serialization headers (which wrap UDTs in the FrozenType<...> tag). This commit does two things: 1. for all SSTables created after this commit, include a new feature flag, CorrectUDTsInCollections, presence of which implies that frozen UDTs inside collections have the FrozenType<...> tag. 2. when reading a Scylla SSTable without the feature flag, we assume that UDTs nested inside collections are always frozen, even if they don't have the tag. This assumption is safe to be made, because at the time of this commit, Scylla does not allow non-frozen (multi-cell) types inside collections or UDTs, and because of point 1 above. There is one edge case not covered: if we don't know whether the SSTable comes from Scylla or from C*. In that case we won't make the assumption described in 2. Therefore, if we get a mismatch between schema and serialization headers of a table which we couldn't confirm to come from Scylla, we will still reject the table. If any user encounters such an issue (unlikely), we will have to use another solution, e.g. using a separate tool to rewrite the SSTable. Fixes #6130. (cherry picked from commit `3d811e2f95`)	2020-04-17 09:11:53 +03:00
Kamil Braun	c812359383	sstables: move definition of column_translation::state::build to a .cc file Ref #6130	2020-04-17 09:11:38 +03:00
Piotr Sarna	1bd79705fb	alternator: use partition tombstone if there's no clustering key As @tgrabiec helpfully pointed out, creating a row tombstone for a table which does not have a clustering key in its schema creates something that looks like an open-ended range tombstone. That's problematic for KA/LA sstable formats, which are incapable of writing such tombstones, so a workaround is provided in order to allow using KA/LA in alternator. Fixes #6035 (cherry picked from commit `0a2d7addc0`)	2020-04-16 12:01:51 +03:00
Avi Kivity	7e2ef386cc	Update seastar submodule * seastar 92c488706...76260705e (1): > rpc: always shutdown socket when stopping a client Fixes #6060.	2020-04-16 10:56:31 +03:00
Avi Kivity	51bad7e72c	Point seastar submodule at scylla-seastar.git branch-4.0 This allows us to backport seastar patches to Scylla 4.0.	2020-04-16 10:10:40 +03:00
Asias He	0379d0c031	repair: Send reason for node operations Since `956b092012` (Merge "Repair based node operation" from Asias), repair is used by other node operations like bootstrap, decommission and so on. Send the reason for the repair, so that we can handle the materialized view update correctly according to the reason of the operation. We want to trigger the view update only if the repair is used by repair operation. Otherwise, the view table will be handled twice, 1) when the view table is synced using repair 2) when the base table is synced using repair and view table update is triggered. Fixes #5930 Fixes #5998 (cherry picked from commit `066934f7c4`)	2020-04-16 10:06:17 +03:00
Gleb Natapov	a8ef820f27	lwt: fix cas_now_pruning counter Due to c&p error cas_now_pruning counter is increased instead of decreased after an operation completes. Fix it. Fixes #6116 Message-Id: <20200401142859.GA16953@scylladb.com> (cherry picked from commit `4d9d226596`)	2020-04-06 13:06:11 +02:00
Yaron Kaikov	9908f009a4	release: prepare for 4.0.rc1	2020-04-06 10:22:45 +03:00
Pavel Emelyanov	48d8a075b4	main: Do not destroy token_metadata The storage_proxy instances hold references to token_metadata ones and leave unwaited futures continuing to its query_partition_key_range_concurrent method. The latter is called from do_query so it's not that easy to find out who is leaking. Keep the tokens not freed for a while. Fixes: #6093 Test: manual start-stop Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200402183538.9674-1-xemul@scylladb.com> (cherry picked from commit `86296ba557`)	2020-04-05 13:47:57 +03:00
Konstantin Osipov	e3ddd607bc	lwt: remove Paxos from experimental list Always enable lightweight transactions. Remove the check for the command line switch from the feature service, assuming LWT is always enabled. Remove the check for LWT from Alternator. Note that in order for the cluster to work with LWT, all nodes need to support it. Rename LWT to UNUSED in db/config.hh, to keep accepting lwt keyword in --experimental-features command line option, but do nothing with it. Changes in v2: * remove enable_lwt feature flag, it's always there Closes #6102 test: unit (dev, debug) Message-Id: <20200401071149.41921-1-kostja@scylladb.com> (cherry picked from commit `9948f548a5`)	2020-04-05 08:56:42 +03:00
Piotr Jastrzebski	511773d466	token: relax the condition of the sanity check When we switched token representation to int64_t we added some sanity checks that byte representation is always 8 bytes long. It turns out that for token_kind::before_all_keys and token_kind::after_all_keys bytes can sometimes be empty because for those tokens they are just ignored. The check introduced with the change is too strict and sometimes throws the exception for tokens before/after all keys created with empty bytes. This patch relaxes the condition of the check and always uses 0 as value of _data for special before/after all keys tokens. Fixes #6131 Tests: unit(dev, sct) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit `a15b32c9d9`)	2020-04-04 20:19:10 +03:00
Gleb Natapov	121cd383fa	lwt: remove entries from system.paxos table after successful learn stage The learning stage of PAXOS protocol leaves behind an entry in system.paxos table with the last learned value (which can be large). In case not all participants learned it successfully next round on the same key may complete the learning using this info. But if all nodes learned the value the entry does not serve useful purpose any longer. The patch adds another round, "prune", which is executed in background (limited to 1000 simultaneous instances) and removes the entry in case all nodes replied successfully to the "learn" round. It uses the ballot's timestamp to do the deletion, so not to interfere with the next round. Since deletion happens very close to previous writes it will likely happen in memtable and will never reach sstable, so that reduces memtable flush and compaction overhead. Fixes #5779 Message-Id: <20200330154853.GA31074@scylladb.com> (cherry picked from commit `8a408ac5a8`)	2020-04-02 15:36:52 +02:00
Gleb Natapov	90639f48e5	lwt: rename "in_progress_ballot" cell to "promise" in system.paxos table The value that is stored in "in_progress_ballot" cell is the value of promised ballot, so call the cell accordingly to avoid confusion especially as we have a notion of "in progress" proposal in the code which is not the same as in_progress_ballot here. We can still do it without care about backwards compatibility since LWT is still marked as experimental. Fixes #6087. Message-Id: <20200326095758.GA10219@scylladb.com> (cherry picked from commit `b3db6f5b04`)	2020-04-02 15:36:49 +02:00
Calle Wilund	8d029a04aa	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) v2: - Fix test to take into account that files might be deleted behind our backs. v3: - Fix test better, by doing verification _before_ segments are queued for delete. Message-Id: <20200226121601.15347-2-calle@scylladb.com> Message-Id: <20200324100235.23982-1-calle@scylladb.com> (cherry picked from commit `9fee712d62`)	2020-03-31 14:22:20 +03:00
Asias He	67995db899	gossip: Add an option to force gossip generation Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation number g1, g2, g3. n1, n2, n3 running scylla version with commit `0a52ecb6df` (gossip: Fix max generation drift measure) One year later, user wants the upgrade n1,n2,n3 to a new version when n3 does a rolling restart with a new version, n3 will use a generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's gossip update and mark g3 as down. Such unnecessary marking of node down can cause availability issues. For example: DC1: n1, n2 DC2: n3, n4 When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which causes the whole DC2 to be unavailable. To fix, we can start the node with a gossip generation within MAX_GENERATION_DIFFERENCE difference for the new node. Once all the nodes run the version with commit `0a52ecb6df`, the option is no logger needed. Fixes #5164 (cherry picked from commit `743b529c2b`)	2020-03-30 12:36:20 +02:00
Yaron Kaikov	282cd0df7c	dist/docker: Update SCYLLA_REPO_URL and VERSION defaults Update the SCYLLA_REPO_URL and VERSION defaults to point to the latest unstable 4.0 version. This will be used if someone runs "docker build" locally. For the releases, the release pipelines will pass the stable version repository URL and a specific release version.	2020-03-26 09:54:44 +02:00
Nadav Har'El	ce58994d30	sstable: default to LA format instead of KA format Over the years, Scylla updated the sstable format from the KA format to the LA format, and most recently to the MC format. On a mixed cluster - as occurs during a rolling upgrade - we want all the nodes, even new ones, to write sstables in the format preferred by the old version. The thinking is that if the upgrade fails, and we want to downgrade all nodes back to the older version, we don't want to lose data because we already have too-new sstables. So the current code starts by selecting the oldest format we ever had - KA, and only switching this choice to LA and MC after we verify that all the nodes in the cluster support these newer formats. But before an agreement is reached on the new format, sstables may already be created in the antique KA format. This is usually harmless - we can read this format just fine. However, the KA format has a problem that it is unable to represent table names or keyspaces with the "-" character in them, because this character is used to separate the keyspace and table names in the file name. For CQL, a "-" is not allowed anyway in keyspace or table names; But for Alternator, this character is allowed - and if a KA table happens to be created by accident (before the LA or MC formats are chosen), it cannot be read again during boot, and Scylla cannot reboot. The solution that this patch takes is to change Scylla's default sstable format to LA (and, as before, if the entire cluster agrees, the newer MC format will be used). From now on, new KA tables will never be written. But we still fully support reading the KA format - this is important in case some very old sstables never underwent compaction. The old code had, confusingly, two places where the default KA format was chosen. This patch fixes is so the new default (LA) is specified in only one place. Fixes #6071. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200324232607.4215-2-nyh@scylladb.com> (cherry picked from commit `91aba40114`)	2020-03-25 13:27:51 +01:00
Yaron Kaikov	78f5afec30	release: prepare for 4.0.rc0	2020-03-24 23:33:23 +02:00
Nadav Har'El	f1aaa91e21	merge: add metrics Merged pull request https://github.com/scylladb/scylla/pull/6030 from Piotr Dulikowski: Adds CDC-related metrics. Following counters are added, both for total and failed operations: Total number of CDC operations that did/did not perform splitting, Total number of CDC operations that touched a particular mutation part. Total number of preimage selects. Fixes #6002. Tests: unit(dev, debug) * 'cdc-metrics' of github.com:piodul/scylla: storage_proxy: track CDC operations in LWT flow storage_proxy: track CDC operations in logged batches storage_proxy: track CDC operations in standard flow storage_proxy: add cdc tracker hooks to write response handlers storage_proxy: move "else if" remainder into "else" block cdc: create an operation_result_tracker object cdc: add an object for tracking progress of cdc mutations cdc: count touched mutation parts in transformer::transform cdc: track preimage selects in metrics cdc: register metric counters cdc: fix non-atomic updates in splitting	2020-03-23 21:55:58 +02:00
Botond Dénes	ec36c7cb2f	test: random_schema: remove redundant gc grace period from tombstone expiry Compaction automatically adds gc grace period to expiry times already, no need to add it when creating the tombstones. Remove the redundant additions form the code. The direct impact is really minor as this is only used in tests, but it might confuse readers who are looking at how tombstones are created across the codebase. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200323120948.92104-1-bdenes@scylladb.com>	2020-03-23 15:12:25 +02:00
Piotr Dulikowski	736c1c6056	storage_proxy: track CDC operations in LWT flow Register cdc operation result tracker during LWT flow.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	f7fd6f4607	storage_proxy: track CDC operations in logged batches Register cdc operation result tracker in logged batch flow.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	ef1c62aa04	storage_proxy: track CDC operations in standard flow Register cdc operation result tracker for write response handlers coming from the usual write requests.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	cccc33f0fd	storage_proxy: add cdc tracker hooks to write response handlers Adds a field to abstract_write_response_handler that points to the cdc operation result tracker, and a function for registering the tracker in the handlers that currently write to a CDC log table.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	dc05d30fd3	storage_proxy: move "else if" remainder into "else" block In the following commit, more code will be added to the newly created "else" block.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	5a5cc57878	cdc: create an operation_result_tracker object An `operation_result_tracker` object is now returned as a second return value from the `augment_mutation_call` function.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	1b92cbeabe	cdc: add an object for tracking progress of cdc mutations CDC metrics, apart from tracking "total" metrics for all performed CDC operations, also track metrics for "failed" operations. Because the result of the CDC operation depends on whether all CDC mutations were written successfully by storage_proxy, checking for failure and incrementing appropriate counters is deferred after all write response handlers finish. The `cdc::operation_result_tracker` object was created for that purpose. It contains all the details needed to accurately update the metrics based on what actually happened in the `augment_mutation_call` function, and holds a flag which tells if any of write response handlers failed. This object is supposed to be referenced by write response handlers for CDC mutations created after the same `augment_mutation_call`. After all write response handlers are destroyed, the destructor of `operation_result_tracker` will update appropriate metrics. Actual creating and attaching this object to write response handlers will be done in subsequent commits.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	98e5fdc7ac	cdc: count touched mutation parts in transformer::transform Modifies the transformer::transform so that it also returns a set of flags indicating what parts of the mutation (e.g. rows, tombstones, collections, etc.) were processed during transforming.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	53570d8657	cdc: track preimage selects in metrics This commit causes preimage select counter to be increased after performing this operation.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	e7062de02b	cdc: register metric counters This patch defines a CDC metrics object and registers all of its counters. storage_proxy is chosen as the owner of the metrics object. Because in subsequent commits it will become possible for CDC metrics to be updated after a write operation ends, and because the cdc_service has shorter lifetime than storage_proxy, we could risk a use-after-free if we placed this object inside cdc_service.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	338e473946	cdc: fix non-atomic updates in splitting This patch fixes a bug in mutation splitting logic of CDC. In the part that handles updates of non-atomic clustering columns, the column definition was fetched from a static column of the same id instead of the actual definition of the clustering column. It could cause the value to be written to a wrong column. Tests: unit(dev)	2020-03-23 13:47:23 +01:00
Ivan Prisyazhnyy	5ec7e77b2e	api: /column_family/major_compaction/{keyspace:table} implementation This implements support for triggering major compations through the REST API. Please note that "split_output" is not supported and Glauber Costa confirmed this this is fine: "We don't support splits, nor do I think we should." Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>	2020-03-23 13:48:29 +02:00
Avi Kivity	0d885dbb00	Merge "Make all headers standalone" from Botond " Make sure all headers compile on their own, without requiring any additional includes externally. Even though this requirement is not documented in our coding guides it is still quasi enforced and we semi-regularly get and merge patches adding missing includes to headers. This patch-set fixes all headers and adds a `{mode}-headers` target that can be used to verify each header. This target should be built by promotion to ensure no new non-conforming code sneaks in. Individual headers can be verified using the `build/dev/path/to/header.hh.o` target, that is generated for every header. The majority of the headers was just missing `seastarx.hh`. I think we should just include this via a compiler flag to remove the noise from our code (in a followup). " * 'compiling-headers/v2' of https://github.com/denesb/scylla: configure.py: add {mode}-headers phony target treewide: add missing headers and/or forward declarations test/boost/sstable_test.hh: move generic stuff to test/lib/sstable_utils.hh sstables: size_tiered_backlog_tracker: move methods out-of-line sstables: date_tiered_compaction_strategy.hh: move methods out-of-line	2020-03-23 13:09:09 +02:00
Avi Kivity	c6a441f9c2	Update seastar submodule * seastar 3c498abcab...92c488706c (14): > dpdk: restore including reactor.hh > tests: distributed_test: add missing #include <mutex> > reactor: un-static-ify make_pollfn() > merge: Reduce inclusions of reactor.hh A few #includes added to compensate for this > sharded: delete move constructor > future: Avoid a move constructor call > future: Erase types a bit more in then_wrapped > memory: Drop a never nullopt optional > semaphore: specify get_units and with_semaphore as noexcept > spinlock.hh: Add include for <cassert> header > dpdk: Avoid a variable sized array > future: Add an explicit promise member to continuation > net: remove smart pointer wrappers around pollable_fd > Merge "cleanup reactor file functions" from Benny	2020-03-23 11:59:30 +02:00
Piotr Dulikowski	a693e6ff6c	cdc: fix non-atomic updates in splitting This patch fixes a bug in mutation splitting logic of CDC. In the part that handles updates of non-atomic clustering columns, the schema for serializing that column was looked up incorrectly in the table schema - instead of a `regular_column`, a `static_column` was looked up. Due to how the `column_at` function works, a correct schema was always returned if the table had no static columns. Therefore, in order for this bug to manifest, a table with a static column and a regular column with non-atomic collection was needed.	2020-03-23 10:20:24 +01:00
Piotr Sarna	602a771105	Merge 'utils: error injector API' from Alejo Closes #3295 The error_injection class allows injecting custom handlers into normal control flow at the pre-determined injection points. This is especially useful in various testing scenarios: * Throwing an exception at some rare and extreme corner-cases * Injecting a delay to test for timeouts to be handled correctly * More advanced uses with custom lambda as an injection handler Injection points are defined by `inject` calls. Enabling and disabling injections are done by the corresponding `enable` and `disable` calls. REST frontend APIs is provided for convenience. Branch URL: https://github.com/alecco/scylla/tree/as_error_injection Tests: unit {{dev}}, unit {{debug}} * 'as_error_injection' of github.com:alecco/scylla: api: add error injection to REST API utils: add error injection	2020-03-23 08:39:22 +01:00
Botond Dénes	5174acb359	configure.py: add {mode}-headers phony target	2020-03-23 09:29:45 +02:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Botond Dénes	575466b2cf	test/boost/sstable_test.hh: move generic stuff to test/lib/sstable_utils.hh sstable_test.hh started as collection of utilities shared between the various `_sstable_test.cc` files. Predictably other tests started using it as well, among them some that are non boost unit tests. This poses a problem as if we add the missing boost/test/unit_test.hpp include to sstable_test.hh these tests will suddenly have missing symbols from boost::test. To avoid linking boost::test into all these users, extract utilities more widely used into sstable_utils.hh	2020-03-23 09:29:45 +02:00
Botond Dénes	84329a16ee	sstables: size_tiered_backlog_tracker: move methods out-of-line	2020-03-23 09:29:45 +02:00
Botond Dénes	d58ec632e3	sstables: date_tiered_compaction_strategy.hh: move methods out-of-line	2020-03-23 09:26:19 +02:00
Glauber Costa	dd65f7dcbb	tests: move token_generation_for_shard to common code We now have a utils file for SSTables. This is potentially useful for other tests. As a matter of fact, this function is repeated right now for the resharding test. And to add insult to injury, the version in the resharding test has the parameters shard and number of tokens flipped, which although extremely confusing is the predictable outcome of such repetition Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-22 19:00:26 +02:00
Asias He	be1a196988	repair: Handle keyspace with zero table The following error was seen in materialized_views_test.py:TestMaterializedViews.decommission_node_during_mv_insert_4_nodes_test INFO [shard 0] repair - repair id 3 to sync data for keyspace=ks, status=started repair/repair.cc:662:36: runtime error: member call on null pointer of type 'const struct schema' Aborting on shard 0. The problem is in the test a keyspace was created without creating any table. Since db19a76b1f(selective_token_range_sharder: stop calling global_partitioner()), in get_partitioner_for_tables, we access nullptr when no table is present. schema_ptr last_s; for (auto t: tables) { // set last_s } last_s->get_partitione() To fix: 1) Skip the repair in sync_data_using_repair if there is no table in the keyspace 2) Throw if no schema_ptr is found in get_partitioner_for_tables. Be defensive. After: INFO [shard 0] repair - decommission_with_repair: started with keyspace=ks, leaving_node=127.0.0.2, nr_ranges=744 INFO [shard 0] repair - repair id 3 to sync data for keyspace=ks, status=started WARN [shard 0] repair - repair id 3 to sync data for keyspace=ks, no table in this keyspace INFO [shard 0] repair - repair id 3 completed successfully INFO [shard 0] repair - repair id 3 to sync data for keyspace=ks, status=succeeded Tests: materialized_views_test.py:TestMaterializedViews.decommission_node_during_mv_insert_4_nodes_test Fixes: #6022	2020-03-22 13:46:36 +02:00
Avi Kivity	d310e7c7ea	Merge 'repair: Ignore keyspace that is removed in sync_data_using_repair' from Asias repair: Ignore keyspace that is removed in sync_data_using_repair When a keyspace is removed during node operations, we should not fail the whole operation. Ignore the keyspace that is removed. Fixes #5942 * asias-repair_fix_5942: repair: Stop the nodes that have run repair_row_level_start repair: Ignore keyspace that is removed in sync_data_using_repair	2020-03-22 13:19:51 +02:00
Takuya ASADA	005211bad6	redis: add lolwut command Add lolwut command that shows redis version and ascii art. see: https://redis.io/commands/lolwut	2020-03-22 13:16:20 +02:00
Takuya ASADA	2ab366e653	install.sh: create user/group correctly on redhat variants Seems like adduser in redhat variants and deiban variants are incompatible, and there is no addgroup in redhat variants. Since adduser in install.sh is implemented on debian variants, does not work on redhat compatible. To fix this we need to use 'useradd' / 'groupadd' instead. Fixes #6018	2020-03-22 13:13:00 +02:00
Avi Kivity	7ed083a6a7	Merge "test.py: Allow to change the tests starting order" from Pavel E " In debug mode some tests take veeery looong time to finish, those tests are better to be started first. This set adds this by marking such long tests in suite.yaml files. Tests: unit(dev) " * 'br-split-unit-tests-sorting-2' of https://github.com/xemul/scylla: test.py: Mark some tests as "run_first" test.py: Generate list with short names test.py: Rename "long" to "skip_in_debug_mode"	2020-03-21 19:53:23 +02:00
Rafael Ávila de Espíndola	482fbfcfdb	build: Use more strict stack frame limits A recent seastar update has resolved the worse offenders, so we can lower the limit a bit to warn on the next set of functions. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200317183209.1664860-1-espindola@scylladb.com>	2020-03-21 19:51:57 +02:00
Rafael Ávila de Espíndola	01ac4aef3a	everywhere: Use futurize_apply instead of futurize<void>::apply No functionality change, just simpler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200318234149.283090-1-espindola@scylladb.com>	2020-03-21 19:51:38 +02:00
Rafael Ávila de Espíndola	0d7281ca06	sstable: Move sstables_manager constructor out of line There is no reason to have it in a header. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200320005225.178381-1-espindola@scylladb.com>	2020-03-21 19:47:29 +02:00
Piotr Dulikowski	6c5c745e25	cdc: add cdc log schema test	2020-03-21 07:33:35 +01:00
Piotr Dulikowski	3bfb044bf1	cdc: do not create cdc$deleted columns for pks and cks Primary key and clustering key column should not have a corresponding "cdc$deleted_<name>" column in cdc log table, because it does not make sense to delete such a column from a row. Fixes: #6049 Tests: unit(dev)	2020-03-21 07:33:23 +01:00
Pekka Enberg	6b2cd1bd7d	Revert "db::commitlog: Don't write trailing zero block unless needed" This reverts commit `0b34d88957`. According to Rafael Avila de Espindola: "I have bisected the recent failures [in commitlog_test] on next to this patch."	2020-03-20 22:30:58 +02:00
Pekka Enberg	12b6092ac2	Revert "sstables: Fix incorrect calculation of Compaction Backlog" This reverts commit `458ef4bb06`. According to Glauber Costa: "It may give us the illusion that fixes something for a particular case but this fix is wrong. I am trying to help Raphael figure out why the backlog is wrong but this patch is not the answer."	2020-03-20 22:28:57 +02:00
Piotr Sarna	331ddf41e5	api: add error injection to REST API Simple REST API for error injection is implemented. The API allow the following operations: * injecting an error at given injection name * listing injections * disabling an injection * disabling all injections Currently the API enables/disables on all shards. Closes #3295 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-03-20 20:49:03 +01:00
Pavel Solodovnikov	057adc8b4d	utils: add error injection Error injection class is implemented in order to allow injecting various errors (exceptions, stalls, etc.) in code for testing purposes. Error injection is enabled via compile flag SCYLLA_ENABLE_ERROR_INJECTION TODO: manage shard instances Enable error injection in debug/dev/sanitize modes. Unit tests for error injection class. Closes #3295 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-03-20 19:37:48 +01:00
Rafael Ávila de Espíndola	9445608df6	gms: Add a default constructor to feature_config Also move it out of line while at it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200316180321.45914-1-espindola@scylladb.com>	2020-03-20 13:34:26 +01:00
Nadav Har'El	df8b3cd5dc	alternator-test: a "run" script Running the Alternator tests is easy after you manually run Scylla, but sometimes it's convenient to have a script which just does everything automatically: start Scylla in a temporary directory, set it up properly for the tests (especially the authentication), run all the tests, and remove the temporary directory. This is what this alternator-tests/run script does. This script can be run by Jenkins, for example, to check all the Alternator tests. The script assumes some things (including cqlsh, pytest and the boto3 library) are already installed, and that Scylla has been compiled - by default it takes the latest built build/*/scylla, but this can be overridden by a command like SCYLLA=build/release/scylla alternator-test/run Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200311091918.16170-1-nyh@scylladb.com>	2020-03-19 15:49:46 +01:00
Nadav Har'El	2deba4035a	merge: Hook alternator to admission control Merged patch series from Piotr Sarna: This series hooks alernator to admission control, similarly to how CQL server uses it. The estimated memory consumption is set to 2x raw JSON request, since that seems to be the upper limit of how much more memory rapidjson allocates during parsing. Note, that since Seastar HTTP currently reads the whole contents upfront, there's no easy way to apply admission control before reading the request - that would involve some changes to our HTTP API. Note 2: currently, admission control in CQL does not properly pass memory consumption information for requests that are bounced to another shard - that would require either transferring semaphore units between shards or keeping a foreign pointer to the original units. As a result, alternator also does not pass correct admission control info between shards, and all places in code which do that are marked with clear FIXMEs. Fixes #5029 Piotr Sarna (5): storage_service: add memory limiter semaphore getter alternator: add service permit to callbacks alternator: add memory limiter to alternator server alternator: add addmission control stats entry alternator: hook admission control to alternator server alternator/executor.cc \| 113 ++++++++++++++++++++++-------------- alternator/executor.hh \| 32 +++++----- alternator/rmw_operation.hh \| 1 + alternator/server.cc \| 83 +++++++++++++++----------- alternator/server.hh \| 8 ++- alternator/stats.cc \| 2 + alternator/stats.hh \| 1 + main.cc \| 3 +- service/storage_service.hh \| 4 ++ 9 files changed, 149 insertions(+), 98 deletions(-)	2020-03-19 15:51:17 +02:00
Nadav Har'El	7922b9eb8f	materialized views: reduce recompilation when db/view/view.hh changes. Before this patch, when db/view/view.hh was modified, 89 source files had to be recompiled. After this patch, this number is down to 5. Most of the irrelevant source files got view.hh by including database.hh, which included view.hh just for the definition of statistics. So in this patch we split the view statistics to a separate header file, view_stats.hh, and database.hh only includes that. A few source files which included only database.hh and also needed view.hh (for materialized-view related functions) now need to include view.hh explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200319121031.540-1-nyh@scylladb.com>	2020-03-19 15:46:14 +02:00
Piotr Dulikowski	59727fb34b	cdc: remove result_callback The `result_callback` was a callback returned by `augment_mutation_call` that was supposed to be used in the CDC postimage implementation. Because CDC postimage was implemented without using this callback, and currently a no-op function is always returned, this callback can safely be removed.	2020-03-19 14:55:07 +02:00
Pavel Emelyanov	7af3bbd57b	test.py: Mark some tests as "run_first" Those tests take long time to finish, so it makes sense to start them earlier than others. The provided list of long tests consists of those running more than 10 minutes in debug mode. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-19 12:52:18 +03:00
Rafael Ávila de Espíndola	e28b17de88	auth: Make create_metadata_table_if_missing noexcept It returns a future, so converting an exception to an exceptional future simplifies error handling in the caller. Without this code like the one in standard_role_manager::create_metadata_tables_if_missing has a surprising behavior: return when_all_succeed( create_metadata_table_if_missing(...), create_metadata_table_if_missing(...)); Since it might not wait for both futures. We could use the lambda version of when_all_succeed, but changing create_metadata_table_if_missing seems a nice API improvement. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200317002051.117832-4-espindola@scylladb.com>	2020-03-19 10:22:50 +01:00
Piotr Sarna	0c11e07faf	view,table: fix waiting for view updates during building View updates sent as part of the view building process should never be ignored, but `fd49fd7` introduced a bug which may cause exactly that: the updates are mistakenly sent to background, so the view builder will not receive negative feedback if an update failed, which will in turn not cause a retry. Consequently, view building may report that it "finished" building a view, while some of the updates were lost. A simple fix is to restore previous behaviour - all updates triggered by view building are now waited for. Fixes #6038 Tests: unit(dev), dtest: interrupt_build_process_with_resharding_low_to_half_test	2020-03-19 10:50:54 +02:00
Pavel Emelyanov	59bc116695	test.py: Generate list with short names The list will be sorted a bit differently, for this I will need the shortname at once Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-19 11:46:02 +03:00
Pavel Emelyanov	30c540aae1	test.py: Rename "long" to "skip_in_debug_mode" The "long" test will mean that it is to be started first, not skipped, so rename "long" to avoid additional confusion Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-19 11:45:55 +03:00
Piotr Sarna	62c34a9085	cql: fix qualifying indexed columns for filtering When qualifying columns to be fetched for filtering, we also check if the target column is not used as an index - in which case there's no need of fetching it. However, the check was incorrectly assuming that any restriction is eligible for indexing, while it's currently only true for EQ. The fix makes a more specific check and contains many dynamic casts, but these will hopefully we gone once our long planned "restrictions rewrite" is done. This commit comes with a test. Fixes #5708 Tests: unit(dev)	2020-03-19 10:34:16 +02:00
Tomasz Grabiec	5fe626a887	sstables: Release reserved space for sharding metadata The intention of the code was to clear sharding metadata chunked_vector so that it doesn't bloat memory. The type of c is `chunked_vector*`. Assigning `{}` clears the pointer while the intended behavior was to reset the `chunked_vector` instance. The original instance is left unmodified with all its reserved space. Because of this, the previous fix had no effect because token ranges are stored entirely inline and popping them doesn't realease memory. Fixes #4951 Tests: - sstable_mutation_test (dev) - manual using scylla binary on customer data on top of 2019.1.5 Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1584559892-27653-1-git-send-email-tgrabiec@scylladb.com>	2020-03-19 09:46:27 +02:00
Pekka Enberg	0d2b70798f	reloc/build_reloc.sh: Remove unused functions The is_redhat_variant() and is_debian_variant() funtions are not used so let's remove them. Message-Id: <20200317155740.12916-1-penberg@scylladb.com>	2020-03-19 08:39:57 +01:00
Rafael Ávila de Espíndola	7401a63e92	auth: Handle permission cache not being initialized auth::service::start can fail before _permissions_cache is initialized, so we should not assume that it is always set. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200317002051.117832-3-espindola@scylladb.com>	2020-03-18 20:21:24 +01:00
Rafael Ávila de Espíndola	3c2851aafc	test: Make sure auth_service is always stopped An exception thrown after the start of auth_service and before init_server_without_the_messaging_service_part returns would cause the sharded<auth_service> destructor to assert. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200317002051.117832-2-espindola@scylladb.com>	2020-03-18 20:17:55 +01:00
Botond Dénes	e6e894d871	scylla-gdb.py: introduce scylla small-objects When investigating OOM related cores, a common thing to do is trying to identify the objects in a particularly heavily populated size-class. This command is meant to help with that, providing a way to list the objects in any size-class, in a paginated way. Traversing the objects of a pool is done through a `small_object_iterator` object which is also exposed to python code, to be used in custom scripts wanting to scan all objects belonging to a pool. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200318085437.452906-1-bdenes@scylladb.com>	2020-03-18 13:33:59 +02:00
Raphael S. Carvalho	0df8faeaa2	sstables: make delete_atomically() work with empty set If delete_atomically() was called with a empty set for any reason, it will fail to work because it relies on any of the sstables in the set for getting the sstable directory. This will be needed, in the future, when using sstable replacement function only with new sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200305144657.9440-1-raphaelsc@scylladb.com>	2020-03-18 13:29:42 +02:00
Pavel Emelyanov	da3bf20e71	main: Respect config start_native_transport option There's such an option, and it's not taken into account on scylla start. There's a symmetrical start_rpc one, which is, so make both act similarly. The default value for the option is true, so default set-ups will not get broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200310140518.29410-1-xemul@scylladb.com>	2020-03-18 11:17:56 +02:00
Avi Kivity	164881696b	Merge "scylla-gdb.py: scylla_memory: handle per-sg coordinator stats" from Botond " Since `b783d40aa` storage-proxy maintains separate coordinator stats per scheduling group. This broke scylla_memory, which was still trying to access the old global stats. This mini-series updates it to be able to handle per-sg coordinator stats, while preserving backward compatibility with older versions still using global stats. " * 'scylla-memory-per-sg-coordinator-stats/v1' of https://github.com/denesb/scylla: scylla-gdb.py: scylla_memory: update w.r.t. per-sg coordinator stats scylla-gdb.py: scylla_memory: move coordinator code to print_coordinator_stats()	2020-03-18 12:38:44 +02:00
Avi Kivity	c766f50491	Merge "Split some unit tests into smaller pieces" from Pavel E " The debug mode unit tests take ~half-an-hour to complete. Here's the tests run-times top list Test: Time (seconds): ... steady tail goes here ... test/boost/user_function_test 496 test/boost/row_cache_test 502 test/boost/view_schema_test 932 test/boost/cql_query_test 997 test/boost/mutation_reader_test 1048 test/boost/sstable_mutation_test 1417 test/boost/secondary_index_test 1468 Splitting the spike (top-5) is the primary goal. However, the distribution of test-cases in 3 of those tests is also _very_ non-uniform, so just cutting it into equal parts doesn't work. For example, the test_index_with_paging from the slowest one takes ~14 minutes on its own and is the slowest test-case out there. So the set does this: - moves the champion test_index_with_paging into separate file - detaches the most heavy parts from sstable_mutation_test and mutation_reader_test into own tests too. The resulting split is still non-uniform, but it's 4 tests that run notably less than the 14 minutes record each - splits the cql_query_test and view_schema_test into several parts in a wildcard manner to run out of the 14 min threshold - moves some shared code into lib/ As the result, the debug mode test run takes 14.5 minutes =) which is almost 2 times faster than it was. The dev mode run time is not affected noticeably. Test: well, unit(debug) and unit(dev) " * 'br-split-unit-tests-3-next' of https://github.com/xemul/scylla: test: Split view_schema_test test: Split cql_query_test test: Split mutation_reader_test test: Split sstable_mutation_test test: Split secondary_index test	2020-03-18 12:19:32 +02:00
Pavel Emelyanov	96e3d0fa36	mutation_partition: Debloat header form others Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200317191051.12623-1-xemul@scylladb.com>	2020-03-18 11:53:36 +02:00
Asias He	cdcedf5eb9	gossip: Make is_safe_for_bootstrap more strict Consider 1. Start n1, n2 in the cluster 2. Stop n2 and delete all data for n2 3. Start n2 to replace itself with replace_address_first_boot: n2 4. Kill n2 before n2 finishes the replace operation 5. Remove replace_address_first_boot: n2 from scylla.yaml of n2 6. Delete all data for n2 7. Start n2 At step 7, n2 will be allowed to bootstrap as a new node, because the application state of n2 in the cluster is HIBERNATE which is not rejected in the check of is_safe_for_bootstrap. As a result, n2 will replace n2 with a different tokens and a different host_id, as if the old n2 node was removed from the cluster silently. Fixes #5172	2020-03-17 17:37:16 +01:00
Tomasz Grabiec	488482c55a	Merge "lwt: ensure unqualified SELECT works with SERIAL cl" from Kostja Ensure unqualified SELECT throws an appropriate exception with SERIAL consistency level. Since such query touches multiple partitions, we don't support it in SERIAL mode. Branch URL: https://github.com/kostja/scylla/tree/gh-6016-crash-lwt-select	2020-03-17 17:24:06 +01:00
Konstantin Osipov	4978bb513d	test: add a test case for SERIAL read consistency Pass custom query options to execute_prepared and add a test case for custom SERIAL consistency.	2020-03-17 18:58:12 +03:00
Konstantin Osipov	f5180725df	lwt: check SELECT restricts partition key before accessing it Check that SELECT statement checks there is a partition key before accessing it when determining the shard to execute the query on. Essentially move the check for properly restricted partition key from storage_proxy.cc to select_statement.cc, now that we access it earlier in the call stack. Keep the check in storage_proxy.cc since storage_proxy::query() has other call sites (views), which today should never use serial consistency for its queries, but this can change in the future. Please note that Cassandra only partially enforce SERIAL consistency and can silently downgrade SERIAL consistency to the default non-serial one when doing unbounded SELECTS ( https://issues.apache.org/jira/browse/CASSANDRA-15641) Fixes #6016	2020-03-17 16:55:11 +03:00
Pavel Emelyanov	86c712a340	test: Split view_schema_test Detach partition_key and clustering_key ones into own files. The resultint 2 tests run ~4 minutes each, the leftover ones complete within 11 minutes. The same -- the goal to run out of 14 minutes is reached, further splitting needs more thinking than just wildcarding. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:27:45 +03:00
Pavel Emelyanov	e848d63510	test: Split cql_query_test This detaches like_operator, group_by, functions and large cases into own files. The split is not uniform -- the resulting 4 tests run less that 3 minutes each, what's left in the origin runs ~11 minutes. But since the goal was to get out of 14 minutes threshold and this file contains 126 cases (the champion) so I just did "wildcard" selection that worked. It also required moving require_rows() helpers into a local header. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:27:45 +03:00
Pavel Emelyanov	3fbd88b226	test: Split mutation_reader_test Detach test_multishard_combining_reader_as_mutation_source into individual file. This particular test runs ~13 minutes. What's left in the origin completes a bit faster. The split also requires moving the reader_lifecycle_policy and the dummy_partitioner into lib/ Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:27:44 +03:00
Pavel Emelyanov	3577fa2bb8	test: Split sstable_mutation_test Detach test_schema_changes and test_sstable_conforms_to_mutation_source into individual files. These two take ~10 minutes each, what's left in origin finishes within 4 minutes alltogether. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:26:34 +03:00
Pavel Emelyanov	5b86f4be9a	test: Split secondary_index test Detach test_index_with_paging into individual file. This particular test-case is the longest one in the sute, it takes ~14 minutes to run, further splitting of this test is pointless (for now) and all subsequent splits in this set just make the resulting times less than this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:26:34 +03:00
Pavel Emelyanov	14de126ff8	migration_manager: Run background schema merge in gate The call for merge_schema_from in some cases is run in the background and thus is not aborted/waited on shutdown. This may result in use-after-free one of which is merge_schema_from -> read_schema_for_keyspace -> db::system_keyspace::query -> storage_proxy::query -> query_partition_key_range_concurrent in the latter function the proxy._token_metadata is accessed, while the respective object can be already free (unlike the storage_proxy itself that's still leaked on shutdown). Related bug: #5903, #5999 (cannot reproduce though) Tests: unit(dev), manual start-stop dtest(consistency.TestConsistency, dev) dtest(schema_management, dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reviewed-by: Pekka Enberg <penberg@scylladb.com> Message-Id: <20200316150348.31118-1-xemul@scylladb.com>	2020-03-16 17:41:23 +01:00
Avi Kivity	342c967b6a	Merge "Introduce compacting reader" from Botond " Allow adding compacting to any reader pipeline. The intended users are streaming and repair, with the goal to prevent wasting transfer bandwidth with data that is purgeable. No current user in the tree. Tests: unit(dev), mutation_reader_test.compacting_reader_(debug) " 'compacting-reader/v3' of https://github.com/denesb/scylla: test: boost/mutation_reader_test: add unit test for compacting_reader test: lib/flat_mutation_reader_assertions: be more lenient about empty mutations test: lib/mutation_source_test: make data compaction friendly test: random_mutation_generator: add generate_uncompactable mode mutation_reader: introduce compacting_reader	2020-03-16 16:41:50 +02:00
Botond Dénes	837b79c265	test: boost/mutation_reader_test: add unit test for compacting_reader	2020-03-16 13:58:13 +02:00
Botond Dénes	3b482af33d	test: lib/flat_mutation_reader_assertions: be more lenient about empty mutations When expecting a mutation that compacts to an empty one, allow it to be not produced at all. After all, compaction normally doesn't even emits empty partitions.	2020-03-16 13:58:13 +02:00
Botond Dénes	1ab45e15a0	test: lib/mutation_source_test: make data compaction friendly Currently the mutation source test suite may generate data that is compactable. This poses a problem for the next patch, where we want to use it to test `compacting_reader` a reader which compacts data as it reads it. When the input is compactable, this will introduce artificial differences, failing the tests. To allow also testing such readers, make sure data is not compactable, i.e. compacting it will not change it. The goal of the mutation source test suite is not to exercise compaction logic, so this will not take anything away from its value.	2020-03-16 13:58:13 +02:00
Botond Dénes	c4fab16723	test: random_mutation_generator: add generate_uncompactable mode The random mutation generator currently generates data and tombstones with random timestamps selected from a pre-determined range. This results in mutations where tombstones often cover each other and data. There is nothing wrong with this, as this is how real data is too. However for certain tests this is problematic, as compacting the mutations will result in a different mutations. To cater for these users too, introduce a `generate_uncompactable` option. When set to `yes`, the generated mutations will be uncompactable, i.e. no tombstone will cover lower-level tombstones and no tombstone will cover data. The mutations will not change after compacted.	2020-03-16 13:58:13 +02:00
Botond Dénes	8286a0b1bd	mutation_reader: introduce compacting_reader Compacting reader compacts the output of another reader on-the-fly. Performs compaction-type compaction (`compact_for_sstables::yes`). It will be used in streaming and repair to eliminate purgeable data from the stream, thus prevent wasting transfer bandwidth.	2020-03-16 13:58:13 +02:00
Nadav Har'El	35d95d6887	merge: Add postimage implementation Merged pull request https://github.com/scylladb/scylla/pull/5996 from Calle Wilund: Fixes #4992 Implements post-image support by synthesizing it from pre-image + delta. Post-image data differs from the delta data in two ways: 1.) It merges non-atomics into an actual result value 2.) It contains all columns of the row, not just those affected by the update. For a non-atomic field, the post-image value of a column is either the pre-image or the delta (maybe null) Tested by adding post-image checks to pre-image test and collection/udt tests	2020-03-16 13:42:07 +02:00
Calle Wilund	0a3383c090	cdc: Add postimage implementation Fixes #4992 Implements post-image support by synthesizing it from pre-image + delta. Post-image data differs from the delta data in two ways: 1.) It merges non-atomics into an actual result value 2.) It contains _all_ columns of the row, not just those affected by the update. For a non-atomic field, the post-image value of a column is either the pre-image or the delta (maybe null) Tested by adding post-image checks to pre-image test and collection/udt tests	2020-03-16 09:21:06 +00:00
Calle Wilund	40114f8233	cql3::untyped_result_set: Add bytes_view_opt access to fields For quick access and convenient live-checks	2020-03-16 09:21:06 +00:00
Calle Wilund	ca7046256f	schema: Add "columns" accessor for columns by kind To prevent switch-code everywhere.	2020-03-16 09:21:06 +00:00
Avi Kivity	ee9df91a76	Merge "Allow setting partitioner per table" from Piotr " This PR makes it possible to enable the usage of different partitioner for each table. If no table-specific partitioner is set for a given table then a default partitioner is used. The PR is composed of the following parts: - Introduction of schema::get_partitioner that still returns dht::global_partitioner - Replacement of all the usage of dht::global_partitioner with schema::get_partitioner - Making it possible to set table-specific partitioner in a schema_builder - Remove all the places that were setting default partitioner except for main.cc (mostly tests) - Move default partitioner from i_partitioner to schema.cc and hide it from the rest of the codebase - Remove dht::global_partitioner After this PR there's no such thing as global partitioner at all. There is only a default partitioner but it still has to be accessed through schema::get_partitioner. There are some intermediate states in which i_partitioner is stored as shared_ptr in the schema but the final version keeps it by const&. The PR does not enable per table partitioner end-to-end. Just the internals of the single node are covered. I still have to deal with: - Making sure a table has the same partitioner on each node - Allowing user to set up a table-specific partitioner on table - Signal driver about what partitioner is used by a given table - Persist partitioner info for each table that does not use default partitioner. Fixes #5493 Tests: unit(dev, release, debug), dtest(byo) " * 'per_table_partitioner' of https://github.com/haaawk/scylla: schema: drop optional from _partitioner field make_multishard_combining_reader: stop taking partitioner split_range_to_single_shard: stop taking partitioner as argument tests: remove unused murmur3 includes partitioner: move default_partitioner to schema.cc partitioner: hide dht::default_partitioner schema: include partitioner name in scylla tables mutation schema: make it possible to set custom partitioner scylla_tables: add partitioner column schema_features: add PER_TABLE_PARTITIONERS feature features: add PER_TABLE_PARTITIONERS feature	2020-03-16 11:13:47 +02:00
Avi Kivity	cb523c48cd	Update seastar submodule * seastar 47d929dd1...3c498abca (5): > reactor: Use do_with to save stack space > reactor: Extract code into a schedule_retry helper > reactor: Move an io_event buffer out of the stack > temporary_buffer: fix typo in argument type in comparison operators > tests: tls_test: add missing include <iostream>	2020-03-16 11:02:50 +02:00
Rafael Ávila de Espíndola	69874f4330	feature_service: Remove default constructor This makes user that feature_config_from_db_config is used for both tests and main.cc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200312153453.37282-2-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Rafael Ávila de Espíndola	7c26eb61a3	feature_service: Initialize local variable The use of an uninitialized variable was not being noticed because this is only used by main.cc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200312153453.37282-1-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Rafael Ávila de Espíndola	517a01a3f6	utils: Use sstring as keys in nonstatic_class_registry Now that seastar::string::compare has been updated, it is possible to use sstring for this. This reverts commit `01fe766f1f`. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200311005219.280737-1-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Rafael Ávila de Espíndola	624573a219	configure: Warn on large stacks This adds a warning with a different limit in each mode. The limit is picked as 1KiB lower than the value where no warning would be print. This makes it easy to spot the worse offender. With that we can either fix it or silence the warning once we are sure we can handle large frames in that context. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200311205300.324383-1-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Piotr Sarna	f43e68b383	alternator: hook admission control to alternator server From now on, alternator requests use the memory limiter semaphore to control the amount of memory used by alternator requests.	2020-03-16 08:43:49 +01:00
Piotr Sarna	7eb6d5545d	alternator: add addmission control stats entry The entry will be bumped if admission control was forced to block the request from being served.	2020-03-16 07:44:26 +01:00
Piotr Sarna	a1ea650d83	alternator: add memory limiter to alternator server With the memory limiter semaphore, the server will be able to apply admission control to alternator requets.	2020-03-16 07:44:26 +01:00
Piotr Sarna	781fbe8070	alternator: add service permit to callbacks As a first step towards introducing admission control, the API of alternator callbacks is extended with an additional 'permit' parameter.	2020-03-16 07:44:25 +01:00
Piotr Sarna	cb5fded9c2	storage_service: add memory limiter semaphore getter The memory limiter semaphore is going to be useful for limiting alternator memory as well, so it's hereby exposed via a getter.	2020-03-16 07:34:23 +01:00
Raphael S. Carvalho	458ef4bb06	sstables: Fix incorrect calculation of Compaction Backlog The bug is that we failed to implement this part of the formula: (T - C) * log4(T) We were incorrectly implementing it as: (T - C) * log4(T - C) So it could result in a backlog being calculated as negative when it should actually be positive, or backlog being lower than expected. BTW, we do protect against negative backlog after commit `3e08bd17f0`. Given that STCS backlog tracker is inherited by TWCS and LCS trackers, all compaction strategies are affected. The formula to calculate the aggregate backlog is: A = (T - C) * log4(T) - Sum(i = 0...N) { (Si - Ci)* log4(Si) }. For example, negative backlog is calculated on a tested scenario where T was 3129, C was 2337 and Sum(i = 0...N) { (Si - Ci)* log4(Si) } resulted in 4222.53. (T - C) * log4(T - C) = (3129 - 2337) * log4(3129 - 2337) = 3813.23 So backlog is negative because A = 3813.23 - 4222.53 = -409.302 But it should actually be calculated as follow: (T - C) * log4(T) = (3129 - 2337) * log4(3129) = 4598.15 And the correct backlog is positive, as A = 4598.15 - 4223.53 = 375.621 Fixes #6021. tests: unit(dev) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200315153711.23302-1-raphaelsc@scylladb.com>	2020-03-15 18:16:01 +02:00
Kamil Braun	aa72a1c556	cql3: when altering table, keep old values of unchanged extensions When the user performed alter ks.t with compaction = {...} the values of most other options, which were not specified in the statement, e.g. compression, were left unchanged. That wasn't true for extension options however: for example, the "cdc" option was removed. This commit fixes the behavior to keep the old values of extension options not specified in the alter statement.	2020-03-15 17:45:30 +02:00
Piotr Dulikowski	b1e8170bf9	cdc: add tracing Adds information about the stages of CDC mutation augmentation to tracing sessions.	2020-03-15 11:54:10 +01:00
Asias He	7ac9e0f2a1	gossip: Print CDC_STREAMS_TIMESTAMP correctly I saw UNKNOWN application state in the logs: INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=CACHE_HITRATES, versioned_value=Value(,14) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=SCHEMA_TABLES_VERSION, versioned_value=Value(3,15) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=RPC_READY, versioned_value=Value(0,16) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(,17) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=SHARD_COUNT, versioned_value=Value(1,30) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=IGNOR_MSB_BITS, versioned_value=Value(12,31) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=UNKNOWN, versioned_value=Value(1583371936128,20) It turned out it was CDC_STREAMS_TIMESTAMP. $ nodetool gossipinfo\|grep 1583371936128 X8:1583371936128 X8:1583371936128 Fixes #5992	2020-03-15 11:51:35 +01:00
Piotr Jastrzebski	5bbb826c49	schema: drop optional from _partitioner field Always set the field to the default value if no table specific partitioner has been set. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:21 +01:00
Piotr Jastrzebski	924ed7bb1c	make_multishard_combining_reader: stop taking partitioner The function already takes schema so there's no need for it to take partitioner. It can be obtained using schema::get_partitioner Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	4b7fb323c3	split_range_to_single_shard: stop taking partitioner as argument The function already takes schema so we don't need partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	f99fd35f53	tests: remove unused murmur3 includes Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	22daa262ee	partitioner: move default_partitioner to schema.cc Make it inaccessible to other compilation units. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	7064f6b831	partitioner: hide dht::default_partitioner Remove last usage of this global outside i_partitioner.cc and hide it inside the compilation unit. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	57b69fb804	schema: include partitioner name in scylla tables mutation There are two results of this patch: 1. New partitioner name column is persited on node's disk in scylla_tables 2. New partitioner name column is included into schema digest This is achieved by including this new column in scylla tables mutation. For that we: 1. Add partitioner name to the result of make_scylla_tables_mutation. If table does not have a specific partitioner set and uses default partitioner then we don't include the name of such default partitioner. Only the name of custom partitioner is added if a table has one. 2. In create_table_from_mutations we check whether scylla tables mutation has a partitioner name set. If so then we use it as a parameter for schema_builder. Note that previous patches have ensured that this new column will be included into schema digest only after the whole cluster supports per table partitioners. Before that, during rolling upgrade, new partitioner name column is hidden and not shared with other nodes. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	1d6cec1b0a	schema: make it possible to set custom partitioner schema_builder::with_partitioner can be used now to set custom partitioner on a table. If no such partitioner is set, global partitioner is still used. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	f83ff8fda1	scylla_tables: add partitioner column Following commits make it possible to set a specific partitioner for a table. We want to persist that information and include it into schema digest. For that a new column in scylla_tables is needed. This commit adds such column. We add the new column to scylla_tables because it's a Scylla specific extension. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	782f2caf41	schema_features: add PER_TABLE_PARTITIONERS feature With per table partitioners, partitioner name will be a part of table schema. To allow rolling upgrade we need to perform special logic that hides new partitioner name schema column during the upgrade. This commit adds new schema feature that controls this logic. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	90df9a44ce	features: add PER_TABLE_PARTITIONERS feature This new feature is required because we now allow setting partitioner per table. This will influence the digest of table schema so we must not include partitioner name into the digest unless we know that the whole cluster already supports per table partitioners. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Botond Dénes	5207f530ba	scylla-gdb.py: scylla smp-queues: ignore unresolvable/unmatching symbols Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200313160444.320253-1-bdenes@scylladb.com>	2020-03-15 10:41:16 +02:00
Botond Dénes	a85c3aa839	scylla-gdb.py: introduce sharded_local convenience function To conveniently retrieve the local instance of a sharded object. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200313160106.319743-1-bdenes@scylladb.com>	2020-03-15 10:41:16 +02:00
Botond Dénes	0e9df01ba3	scylla-gdb.py: downcast_vptr(): make compatible with python < 3.6 Subscript operation `__getitem__()` was only added to re.match objects in 3.6. To support previous versions, use `groups()` method to obtain the desired group. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200313160025.319464-1-bdenes@scylladb.com>	2020-03-15 10:41:15 +02:00
Nadav Har'El	635e6d887c	materialized views: fix corner case of view updates used by Alternator While CQL does not allow creation of a materialized view with more than one base regular column in the view's key, in Alternator we do allow this - both partition and clustering key may be a base regular column. We had a bug in the logic handling this case: If the new base row is missing a value for one of the view key columns, we shouldn't create a view row. Similarly, if the existing base row was missing a value for one of the view key columns, a view row does not exist and doesn't need to be deleted. This was done incorrectly, and made decisions based on just one of the key columns, and the logic is now fixed (and I think, simplified) in this patch. With this patch, the Alternator test which previously failed because of this problem now passes. The patch also includes new tests in the existing C++ unit test test_view_with_two_regular_base_columns_in_key. This tests was already supposed to be testing various cases of two-new-key-columns updates, but missed the cases explained above. These new tests failed badly before this patch - some of them had clean write errors, others caused crashes. With this patch, they pass. Fixes #6008. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200312162503.8944-1-nyh@scylladb.com>	2020-03-15 07:57:33 +01:00
Avi Kivity	07ddbf6e54	Merge "Reduce our dependence on sstring" from Rafael " It doesn't look like we will be able to switch to std::string just yet, but when it is not too inconvenient we can try to reduce our dependence so that attempting the switch again in the future is easier. " * 'espindola/sstring-api' of https://github.com/espindola/scylla: redis: Use scattered_message::append(std::string_view) everywhere: Use uninitialized_string instead of sstring::initialized_later compressor: Add an explicit cast to const sstring& everywhere: Be more explicit that we don't want std::make_shared cql3: Don't use sstring::reset everywhere: Don't assume sstring::begin() and sstring::end() are pointers	2020-03-14 16:29:42 +02:00
Avi Kivity	6b747f4673	database: avoid creating thread in make_directory_for_column_family() make_directory_for_column_family() is used in a parallel_for_each() in parse_system_tables(). Because parallel_for_each does not preempt in the initial execution of its input function, and because each thread allocates 128k for the stack, we end up allocating many hundreds of megabytes if there are many tables. This happens early during boot and will only cause problems if there are 5,000 tables per gigabyte of shard memory, and unlikely combination that will probably fail later, but still it is better to avoid unnecessary large allocations. This was developed in order to fix #6003, until it was discovered that `c020b4e5e2` ("logalloc: increase capacity of _regions vector outside reclaim lock") is the real fix. Message-Id: <20200313093603.1366502-1-avi@scylladb.com>	2020-03-13 13:46:45 +02:00
Rafael Ávila de Espíndola	a1ca83b067	gms: Fix static initialization order problem In test_services.cc there is gms::feature_service test_feature_service; And the feature_service constructor has , _lwt_feature(*this, features::LWT) But features::LWT is a global sstring constructed in another file. Solve the problem by making the feature strings constexpr std::string_view. I found the issue while trying to benchmark the std::string switch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Acked-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200309225749.36661-1-espindola@scylladb.com>	2020-03-13 12:37:22 +02:00
Botond Dénes	13e20fe6be	scylla-gdb.py: scylla_memory: update w.r.t. per-sg coordinator stats Since `b783d40aa` storage-proxy maintains separate coordinator stats per scheduling group. This broke scylla_memory, which was still trying to access the old global stats. Update it to print the new per-scheduling group stats when they are available and the old global ones when not. Scheduling groups for which all relevant metrics are 0 are omitted from the printout to reduce noise.	2020-03-13 10:57:51 +02:00
Botond Dénes	ca84c2f566	scylla-gdb.py: scylla_memory: move coordinator code to print_coordinator_stats() This code will have to be revamped. While at it move it to its own method to reduce the clutter in `invoke()`.	2020-03-13 10:54:01 +02:00
Avi Kivity	7311d1b177	Update seastar submodule * seastar 664c911b4c...47d929dd1b (6): > sstring: Simplify operator= > sstring: Deprecate reset > sstring: Pass string_view to compare > sstring: Move exception code out of line > reactor: remove unused variable > reactor: always initialize smp_poller	2020-03-12 21:37:05 +01:00
Piotr Sarna	e8871181eb	scripts: add a script for pulling GitHub pull requests In order to avoid the UI merge button which tends to mess up commit authors, a simple script for pulling a PR from GitHub is added. Example usage: $ git fetch; git checkout origin/next $ ./scripts/pull_github_pr.sh 6007 Message-Id: <1fa79c8be47b5660fc24a81fc0ab381aa26d98af.1584014944.git.sarna@scylladb.com>	2020-03-12 21:37:05 +01:00
Raphael S. Carvalho	34426d1497	sstables: Fix off-by-one when checking for max_data_segregation_window_count Make sure max size of known windows will respect max_d_s_w_c. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200305165014.16022-1-raphaelsc@scylladb.com>	2020-03-12 14:11:18 +02:00
Nadav Har'El	8e4520b2b3	alternator-test: add xfailing test for issue 6008 This patch adds a test, test_gsi.py::test_gsi_missing_attribute_3, reproducing issue #6008. The issue is about a GSI with two regular base columns becoming key columns in a view, and we have a write failure when writing an item with one of these attributes missing. The test passes on DynamoDB, currently xfails on Alternator. Refs #6008. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200312064131.16046-1-nyh@scylladb.com>	2020-03-12 10:07:58 +01:00
Nadav Har'El	77444a38a1	alternator: allow consistent reads on LSI - but not on GSI Recently, Materialized Views were modified (see issue #4365) so that local view updates (when both base and view replicas are the same node) are synchronous. In particular, when the view's partition key is the same as the base table's, view writes are synchronous: A write now only returns after CL copies of the view data have been written. Alternator's LSI have exactly this case (same partition key as the base). This makes strongly-consistent (CL=LOCAL_QUORUM) reads in Alternator work correctly, so we update the documentation accordingly to no longer say that we don't support this DynamoDB feature. However unlike LSIs, for GSIs strongly-consistent reads are still not supported, and should not be supported (they are also not supported by DynamoDB). Such reads should generate an error. So this patch fixes this too. A GSI test which tested that strongly consistent reads are forbidden, which used to xfail, now passes so the patch removes the "xfail". Finally, we can simplify the LSI tests by using consistent reads instead of eventually-consistent reads with retries. Beyond simplifying the test, it's also an opportunity to use strongly-consistent reads and make sure that they work (while, as mentioned above, similar reads for GSIs are refused). Fixes #5007 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200311170446.28611-1-nyh@scylladb.com>	2020-03-12 09:18:00 +01:00
Takuya ASADA	086f0ffd5a	scylla_raid_setup: create missing directories We need to create hints, view_hints, saved_caches directories on RAID volume. Fixes #5811	2020-03-12 09:29:29 +02:00
Takuya ASADA	399ff24efd	docker: apply scylla-jmx sysconfig file on scylla-jmx service Apply scylla-jmx sysconfig file on scyla-jmx service, to allow customize jmx parameter. Fixes #5939	2020-03-12 09:27:23 +02:00
Avi Kivity	86415cf98a	Update seastar submodule * seastar 95f4277c16...664c911b4c (4): > tls_test: Use uninitialized_string instead of initialized_later > tls: Fix race and stale memory use in delayed shutdown Fixes #5759 (maybe) > tls: Re-enable TLS test and fix build+run > tls: Set server name for client connection if available	2020-03-11 19:25:36 +02:00
Avi Kivity	c020b4e5e2	logalloc: increase capacity of _regions vector outside reclaim lock Reclaim consults the _regions vector, so we don't want it moving around while allocating more capacity. For that we take the reclaim lock. However, that can cause a false-positive OOM during startup: 1. all memory is allocated to LSA as part of priming (`2baa16b371`) 2. the _regions vector is resized from 64k to 128k, requiring a segment to be freed (plenty are free) 3. but reclaiming_lock is taken, so we cannot reclaim anything. To fix, resize the _regions vector outside the lock. Fixes #6003. Message-Id: <20200311091217.1112081-1-avi@scylladb.com>	2020-03-11 12:29:31 +02:00
Botond Dénes	931d2fca45	scylla-gdb.py: std_list: __len__(): support C++11 ABI In theory the C++11 ABI should already have a size field but it does not in the version of the C++ standard library shipped with scylla 2019.1. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200225162337.112582-1-bdenes@scylladb.com>	2020-03-11 10:51:05 +02:00
Botond Dénes	0909dd3d11	scylla-gdb.py: scylla_sstables: fix copypasta in name passed to argparse The description is probably from the command this snippet was copied from originally. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200310141025.90051-1-bdenes@scylladb.com>	2020-03-11 10:49:34 +02:00
Botond Dénes	10944689bc	scylla-gdb.py: resolve(): don't attempt to match failed symbols Currently if `startswith` is passed to `resolve()` it will unconditionally try to match the resolved symbol name against it. This will of course fail when the symbols fails to resolve and `name` is `None`. Return early when this happens to prevent the unnecessary prefix matching. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200310140918.88928-1-bdenes@scylladb.com>	2020-03-11 10:48:44 +02:00
Botond Dénes	0da517ca93	scylla-gdb.py: get_text_range(): make compatible with >=3.0 The current method of obtaining the text range based on a known vptr (`reactor::_backend`) was based on branch-2019.1, where `reactor::_backend` is a value member. However in >=3.0 `reactor::_backend` is a `std::unique_ptr<>`. Adapt the code to work for both. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200310135957.86261-1-bdenes@scylladb.com>	2020-03-11 10:46:40 +02:00
Nadav Har'El	8d161cac87	merge: Allow synchronous view updates for local views Merged patch series by Piotr Sarna: This series makes view updates synchronous, as long as the update is going to be applied locally. With this feature, local secondary indexes and, more generally, materialized views with partition keys same as in the base table could enjoy more robust consistency. This series comes with a cql test, not common for materialized views, which usually require eventual consistency checks. With synchronous updates however, the test can simply check view values right after updating the base table. Fixes #4365 Refs #5007 Tests: unit(dev), manually via inserting sleeps and debug messages, to make sure that local view updates are actually waited for Piotr Sarna (4): db,view: drop default parameter for mutate_MV::allow_hints db,view: move putting view updates to background to mutate_MV db,view: perform local view updates synchronously test: add a simple test for synchronous local view updates	2020-03-11 10:29:16 +02:00
Piotr Sarna	8d2555673f	test: add a simple test for synchronous local view updates With synchronous local view updates enabled, local materialized views can be queried right after base table insertions, without the risk of reading stale values.	2020-03-11 09:15:57 +01:00
Piotr Sarna	2061e6a9cc	db,view: perform local view updates synchronously Local view updates (updates applied to a local node, without remote communication) are from now on performed synchronously - which adds consistency guarantees, as a local write failure will be returned to the client instead of being silently ignored.	2020-03-11 09:05:56 +01:00
Piotr Sarna	fd49fd773c	db,view: move putting view updates to background to mutate_MV Currently, launching view updates as an asynchronous background job is done via not waiting for mutate_MV() future in table::generate_and_propagate_view_updates. That has a big downside, since mutate_MV() handles all view updates for all views of a table, so it's not possible to wait for each view independently. Per-view granularity is required in order to implement synchronous view updates of local views - because then we'll synchronously wait for all views that write to a local node (due to having a matching partition key with the base), while remote view updates will still be sent asynchronously. In order to do that, instead of not waiting for mutate_MV, we do wait for it properly, but instead launch the asynchronous, unwaited-for futures inside mutate_MV. Effectively that means no changes for view updates so far - all updates will be fired in the background. Later, another patch will introduce a way to wait for selected updates to finish.	2020-03-11 09:05:56 +01:00
Piotr Sarna	3b3659e8cd	db,view: drop default parameter for mutate_MV::allow_hints Default parameters are considered harmful, and as part of a cleanup before editing view.cc code, a default value for allow_hints parameter is removed.	2020-03-11 09:05:56 +01:00
Rafael Ávila de Espíndola	d5bcb5a974	redis: Use scattered_message::append(std::string_view) This just moves the copy to append instead of doing it in the caller. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:18:54 -07:00
Rafael Ávila de Espíndola	80d969ce31	everywhere: Use uninitialized_string instead of sstring::initialized_later This is just a trivial wrapper over initialized_later when using sstring, but also works when std::string is used. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:17:49 -07:00
Rafael Ávila de Espíndola	76f4fee65b	compressor: Add an explicit cast to const sstring& Some difference on how exactly the operator== is declared for sstring versus std::string requires this change if we convert from sstring to std::string. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Rafael Ávila de Espíndola	c0072eab30	everywhere: Be more explicit that we don't want std::make_shared If sstring is made an alias to std::string ADL causes std::make_shared to be found. Explicitly ask for ::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Rafael Ávila de Espíndola	ad9f17bd92	cql3: Don't use sstring::reset There is no reset in std::string, so don't depend on a sstring only feature. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Rafael Ávila de Espíndola	caef2ef903	everywhere: Don't assume sstring::begin() and sstring::end() are pointers If we switch to using std::string we have to handle begin and end returning iterators. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Avi Kivity	0cb7182768	Update seastar submodule * seastar 5eaec672a2...95f4277c16 (1): > Merge "Add an option for making sstring an alias to std::string" from Rafael	2020-03-10 18:38:37 +02:00
Gleb Natapov	cd73f552b9	storage_service, database: do not move sharded services It may be not safe to move sharded services, so it will be prohibited in the future seastar update. Remove all current cases where we do it. Fixes #5814. Message-Id: <20200301095423.GY434@scylladb.com>	2020-03-10 12:51:02 +02:00
Tomasz Grabiec	3548e85ff7	Merge "features: Properly resolve when_enabled futures on stop" from Pavel E. If the feature service is stopped without enabling some features, the latrer may end up with "broken promise" exception on futures attached to the _pr promise. Fix this by switching the only user of it onto 'listener' API and remove future-based one. Tests: unit(debug), manual start-stop and aborted-start	2020-03-10 10:09:24 +02:00
Juliusz Stasiewicz	3cc3233281	test/cdc: test that LWT generates CDC logs Tests #5952 Refs #5869	2020-03-10 08:33:49 +01:00
Raphael S. Carvalho	899bb230e2	sstable_resharding_test: fix sstable_resharding_strategy_tests with odd smp count leveled_compaction_strategy_strategy::get_resharding_jobs() returns compaction jobs, each containing at most smp::count ssts, so calculation is wrong if smp count is an odd number. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Acked-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200305161003.14424-1-raphaelsc@scylladb.com>	2020-03-09 17:52:53 +02:00
Raphael S. Carvalho	d895f5e131	sstables/stcs: kill FIXME For the purpose of determining size tiers, it doesn't matter whether bytes_on_disk() or data_size() is used. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200302142513.10136-1-raphaelsc@scylladb.com>	2020-03-09 15:47:48 +02:00
Avi Kivity	8af6dabbf0	Merge "Decouple cql_config from storage_service" from Pavel E " The cql_configu is needed by storage_service to feed it to thrift/transport servers. These servers, in turn, put the config onto query_options. The final goal of this config reference is the guts of query_processor (but currently it's only used by restrictions) This way is rather long and confusing. It seems more natural to keep the cql_config on it's main "user" -- query processor. This patch set does so. However, in order to push the config into its current usage places a huge refactoring is needed -- most of the classes in cql3/statements and cql3/restrictions. It's much more handy to contunue keeping it via query_options, so the query_processor is equipped with the method to return the reference on the config to those initializing query_options. Tests: unit(debug) " * 'br-clean-client-services-from-cql-config-2' of https://github.com/xemul/scylla: storage_service: Forget cql_config transport: Forget cql_config thrift: Forget cql_config query_processor: Carry reference on cql_config	2020-03-09 15:06:59 +02:00
Calle Wilund	5c743bfd53	cdc: rename inner "process_cells" to avoid confusion Two lambdas should not share name in same function.	2020-03-09 13:06:32 +00:00
Konstantin Osipov	9c009441e0	test.py: do not override environment options Do not reset user-defined environment options for ASAN with test.py flags. Message-Id: <20200306135714.3380-1-kostja@scylladb.com>	2020-03-09 14:56:09 +02:00
Piotr Dulikowski	5f652e58c1	cdc: allow dropping manually created tables with cdc log suffix The is_log_for_some_table function incorrectly assumed that database::find_schema would return a null pointer in case the queried schema does not exist. This patch fixes that, and now this function checks for existence of the schema using database::has_schema. Tests: unit(dev)	2020-03-09 12:17:13 +01:00
Asias He	6a7c3f0af0	repair: Stop the nodes that have run repair_row_level_start It is ok to run repair_row_level_stop unconditionally. The node that hasn't received the repair_row_level_start will simply return an error that the repair_meta_id is not found. To avoid the unnecessary repair_row_level_stop verb, we can stop the nodes have run repair_row_level_start. This also makes the error message less confusing. For example: Before: INFO 2020-03-09 15:55:43,369 [shard 0] repair - repair id 1 on shard 0 failed: std::runtime_error (get_repair_meta: repair_meta_id 8 for node 127.0.0.4 does not exist) INFO 2020-03-09 15:55:43,369 [shard 0] repair - repair id 1 failed: std::runtime_error ({shard 0: std::runtime_error (get_repair_meta: repair_meta_id 8 for node 127.0.0.4 does not exist)}) WARN 2020-03-09 15:55:43,369 [shard 0] repair - repair id 1 to sync data for keyspace=ks, status=failed, keyspace does not exist any more, ignoring it: std::runtime_error ({shard 0: std::runtime_error (get_repair_meta: repair_meta_id 8 for node 127.0.0.4 does not exist)}) After: INFO 2020-03-09 16:09:09,217 [shard 0] repair - repair id 1 on shard 0 failed: std::runtime_error (Failed to repair for keyspace=ks, cf=cf, range=(9041860168177642466, 9044815446631222376]) INFO 2020-03-09 16:09:09,217 [shard 0] repair - repair id 1 failed: std::runtime_error ({shard 0: std::runtime_error (Failed to repair for keyspace=ks, cf=cf, range=(9041860168177642466, 9044815446631222376])}) WARN 2020-03-09 16:09:09,217 [shard 0] repair - repair id 1 to sync data for keyspace=ks, status=failed, keyspace does not exist any more, ignoring it: std::runtime_error ({shard 0: std::runtime_error (Failed to repair for keyspace=ks, cf=cf, range=(9041860168177642466, 9044815446631222376])}) Refs #5942	2020-03-09 18:24:02 +08:00
Asias He	75cf255c67	repair: Ignore keyspace that is removed in sync_data_using_repair When a keyspace is removed during node operations, we should not fail the whole operation. Ignore the keyspace that is removed. Fixes #5942	2020-03-09 18:24:02 +08:00
Pavel Emelyanov	0298a6270e	storage_service: Forget cql_config It needs the config purely to feed one into thrift/transport server, since the latter two no longer needs one, neither does the former. As a nice side effect -- some tests no longer have to carry the cql_config on board. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-09 11:58:06 +03:00
Pavel Emelyanov	1af8ab80eb	transport: Forget cql_config The cql_server already works with query_processor from which it can get the cql_configu. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-09 11:57:30 +03:00
Pavel Emelyanov	d551f0323a	thrift: Forget cql_config The thrift handlers already mess with query_processor which has the config in question. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-09 11:57:30 +03:00
Pavel Emelyanov	0a9a5a2dd7	query_processor: Carry reference on cql_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-09 11:57:28 +03:00
Pavel Emelyanov	7f2fc837cb	config: Place timeout_config() into own .cc file It's a generic helper that's used by transport, thrift and redis (this guy has own copy of the code). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200306114022.8070-1-xemul@scylladb.com>	2020-03-08 17:57:58 +02:00
Avi Kivity	de1b20ff7c	Update seastar submodule * seastar affc3a5107...5eaec672a2 (12): > test_thread_custom_stack_size_failure: Use a larger custom stack > test_thread_custom_stack_size: Use a larger custom stack > log: correct help message > perftune.py: verify NIC existence > Merge "Fix various memory issues in http" from Rafael > build: Fix IN_LIST usage > future: Disable -Wuninitialized on a particular memcpy > build: use IN_LIST for shorter cmake > build: check support of "-fstack-clash-protection" before using it > configure.py: Add "--verbose" flag > configure.py: Make "cmake" command line human-readable > net: dynamically adjust buffer sizes for posix connected_socket read operations	2020-03-08 17:34:16 +02:00
Benny Halevy	a89fb0abd9	main: log "Startup failed" message as error To make it stand out and be detectable by dtests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Acked-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303160725.235959-1-bhalevy@scylladb.com>	2020-03-08 17:33:50 +02:00
Konstantin Osipov	ac6f64a885	locator: correctly select endpoints if RF=0 SimpleStrategy creates a list of endpoints by iterating over the set of all configured endpoints for the given token, until we reach keyspace replication factor. There is a trivial coding bug when we first add at least one endpoint to the list, and then compare list size and replication factor. If RF=0 this never yields true. Fix by moving the RF check before at least one endpoint is added to the list. Cassandra never had this bug since it uses a less fancy while() loop. Fixes #5962 Message-Id: <20200306193729.130266-1-kostja@scylladb.com>	2020-03-08 16:53:01 +02:00
Calle Wilund	0b34d88957	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) Message-Id: <20200226121601.15347-2-calle@scylladb.com>	2020-03-08 16:51:53 +02:00
Konstantin Osipov	b4b08be0e1	test: add a test case for rare replication configurations Introduce a test which checks how different CQL features (DML, LWT, MV) work when no replicas are available (e.g. because they are all in an unavailable data center). Specifically the test checks that when we SELECT with IN clause and there are no available replicas, there is no crash (#5935). Message-Id: <20200306192521.73486-3-kostja@scylladb.com>	2020-03-08 15:11:08 +02:00
Konstantin Osipov	9827efe554	storage_proxy: do not touch all_replicas.front() if it's empty. The list of all endpoints for a query can be empty if we have replication_factor 0 or there are no live endpoints for this token. Do not access all_replicas.front() in this case. Fixes #5935. Message-Id: <20200306192521.73486-2-kostja@scylladb.com>	2020-03-08 15:11:02 +02:00
Nadav Har'El	6febd4199e	merge: cdc: on row delete, show the whole row as preimage Merged pull request https://github.com/scylladb/scylla/pull/5980 by Piotr Jastrzębski, based on https://github.com/scylladb/scylla/pull/5976 by Juliusz Stasiewicz: "If base mutation has at least one row tombstone, its preimage log entry displays all the base columns." Fixes #5709 Tests: unit(dev)	2020-03-08 14:54:59 +02:00
Juliusz Stasiewicz	49f1a24472	tests/cdc: test preimage on row delete Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-08 13:27:49 +01:00
Juliusz Stasiewicz	68071d35ce	cdc: on row delete display the entire row as preimage If base mutation has at least one row tombstone, its preimage log entry is constructed from all the base columns. Fixes #5709 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-08 12:11:07 +01:00
Piotr Dulikowski	0e413efb48	cdc: correct static row preimage for case with no clustering row In case a static and a clustering row is written at the same time, but a clustering row with given key was not present, the preimage query was incorrectly configured and no rows were returned. This resulted in an empty preimage, while a preimage for static row should be present. This patch fixes this and now the static row is correctly written to cdc log in the case above. Tests: unit(dev)	2020-03-08 09:25:45 +01:00
Piotr Sarna	395c7eeb98	Merge ' cdc: disallow creating nested cdc logs' from Piotr This change disallows creating CDC log tables for already existing CDC log tables. CDC logs nested in that way are not really useful and do not work at the moment, therefore disallowing their creation prevents confusion. Fixes #5967 Tests: unit(dev) * piodul/5967-disallow-nested-cdc-logs: cdc: disallow creating nested CDC logs cql_repl: register schema extensions	2020-03-08 09:22:59 +01:00
Juliusz Stasiewicz	e2b76fd559	cdc: move the extractor of `pirow` columns into separate method Because it will be used more than once.	2020-03-06 17:54:42 +01:00
Piotr Sarna	be293523bd	Merge 'Replace dht::global_partitioner() calls with... ... schema::get_partitioner and make schema::get_partitioner return const&' from Piotr Partitioners returned from get_partitioner are shared and not supposed to be changed so let's use the type system to enforce that. dht::global_partitioner() is deprecated and will be removed as soon as custom partitioners are implemented so it's best to replace it with schema::get_partitioner. Tests: unit(dev) * hawk/global_partitioner_cleanup: schema: get_partitioner return const& compaction_manager: stop calling dht::global_partitioner() sstable_datafile_test: stop calling dht::global_partitioner()	2020-03-06 14:36:03 +01:00
Piotr Jastrzebski	54d24553bb	schema: get_partitioner return const& Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Piotr Jastrzebski	22fac03184	compaction_manager: stop calling dht::global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Piotr Jastrzebski	08ebf1f69d	sstable_datafile_test: stop calling dht::global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Piotr Jastrzebski	968177da04	cdc: store tokens in cdc description as longs Previously the tokens were stored as strings because token could have been represented in multiple ways. Now token representation is always int64_t so we can store them as ints in cdc description as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 11:59:59 +01:00
Piotr Dulikowski	f317283578	cdc: disallow creating nested CDC logs This change disallows creating CDC log tables for already existing CDC log tables. CDC logs nested in that way are not really useful and do not work at the moment, therefore disallowing their creation prevents confusion.	2020-03-06 10:47:13 +01:00
Piotr Dulikowski	75284eb2a5	cql_repl: register schema extensions Alternator and CDC, apart from enabling their experimental features, need to have their schema extensions registered. This patch adds missing registration of schema extensions to cql_repl, so that cql tests written with Alternator or CDC in mind will properly work.	2020-03-06 10:31:07 +01:00
Piotr Sarna	d1db198211	Merge ' Allow repeated LIKE on same column' from Dejan Fixes #5902 by making the LIKE restriction keep a vector of matchers and apply them all to the column value. Tests: unit (dev) * dekimir/multiple-likes: cql3: Allow repeated LIKE on same column cql3: Forbid calling LIKE::values() cql3: Move LIKE::_last_pattern to matcher	2020-03-06 09:55:54 +01:00
Piotr Sarna	22798f7b7b	locator: fix validating replication factor In order to properly validate not only network topology strategy, but also other strategies, the checks are moved straight to validate_replication_factor(). Also, the test case is extended with a too long integer and a check for SimpleStrategy replication factor. Fixes #3801 Tests: unit(dev) Message-Id: <e0c3c3c36c589e1d440c9708a6dce820c111b8da.1583483602.git.sarna@scylladb.com>	2020-03-06 10:39:34 +02:00
Konstantin Osipov	848195125c	test.py: check test xml output Check that XML output of a test is valid and warn otherwise. The following tests currently produce a warning: boost/multishard_mutation_query_test Message-Id: <20200305213501.52279-2-kostja@scylladb.com>	2020-03-06 10:05:28 +02:00
Piotr Sarna	6df132436f	cql3: disallow range deletions for specific columns Range deletions of specific columns are not well-defined (range tombstones cover entire rows) and are forbidden in Cassandra, so we follow suit. This commit comes with a simple test. Fixes #5728 Tests: unit(dev) Message-Id: <896264f5f5790b9f96fcc18655ac3248a6abf37a.1583424131.git.sarna@scylladb.com>	2020-03-06 10:04:05 +02:00
Piotr Sarna	5b7a35e02b	network_topology_strategy: validate integers In order to prevent users from creating a network topology strategy instance with invalid inputs, it's not enough to use std::stol() on the input: a string "3abc" still returns the number '3', but will later confuse cqlsh and other drivers, when they ask for topology strategy details. The error message is now more human readable, since for incorrect numeric inputs it used to return a rather cryptic message: ServerError: stol() This commit fixes the issue and comes with a simple test. Fixes #3801 Tests: unit(dev) Message-Id: <7aaae83d003738f047d28727430ca0a5cec6b9c6.1583478000.git.sarna@scylladb.com>	2020-03-06 09:50:33 +02:00
Piotr Sarna	30d2826358	Merge 'cdc: use `cdc` schema extension for storing... ... and reading cdc metadata' from Piotr Currently, information on what cdc options are enabled in a table - cdc metadata in short - is stored in two places: In cdc column of the system_schema.scylla_tables, In a cdc schema extension. The former is used as a source of truth, i.e. a node reads cdc metadata from that column, while the latter is used for cosmetic purposes (e.g. cqlsh displays info on cdc based on this extension) and is only written, but never read by the node. Introducing the cdc column to scylla_tables made the logic of schema agreement more complicated. As a first step of removing this column, this PR makes the cdc schema extension as the "source of truth" - a node will from now on read cdc metadata from that extension. The cdc column will be deprecated and removed in subsequent releases, but it is left for now and will still be written to in order not to break the logic of schema agreement. Acked-by: Nadav Har-El <nyh@scylladb.com> Refs: #5737 Tests: unit(dev), 2-node cluster upgrade under write load to a cdc-enabled table * piodul/5737-cdc-schema-extension: schema: get cdc options from schema extensions alter_table_statement: fix indentation cf_prop_defs: initialize schema extensions externally cf_prop_defs: move checking of cdc support to ::validate cf_prop_defs: pass database& to ::validate, not db::extensions& unit tests: register cdc extension before tests cdc: construct cdc_options directly inside cdc_extension db::extensions: add shorthands for add_schema_extension	2020-03-05 16:31:40 +01:00
Piotr Dulikowski	861c7b5626	schema: get cdc options from schema extensions Removes logic responsible for setting cdc_options from dedicated column in scylla_tables, and uses the "cdc" schema extension instead.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	e98766dd81	alter_table_statement: fix indentation	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	828077be5e	cf_prop_defs: initialize schema extensions externally Moves initialization of schema extensions outside of cf_prop_defs. This allows to construct these extensions once, and use them several times in cd_prop_defs' methods without caching or recalculating them several times.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	0bdc22e33b	cf_prop_defs: move checking of cdc support to ::validate Validation of CDC options fits better into the `validate` method rather than `apply_to_builder`.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	260c47d758	cf_prop_defs: pass database& to ::validate, not db::extensions& Changes cf_prop_defs::validate function to take database& as an argument instead of db::extensions&. This change will allow us to move the check which asserts that the cluster supports CDC from `apply_to_builder` to `validate` method.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	38b7f1ad45	unit tests: register cdc extension before tests In the following commits, using cdc in tests will require registering cdc extension explicitly in db config.	2020-03-05 16:11:20 +01:00
Piotr Dulikowski	0f4f48ef76	cdc: construct cdc_options directly inside cdc_extension Instead of storing a raw map of options inside `cdc_extension`, the extension now converts them into `cdc_options` directly on construction. This removes the need to construct `cdc_options` object multiple times.	2020-03-05 16:09:44 +01:00
Piotr Dulikowski	6895b0e395	db::extensions: add shorthands for add_schema_extension This abstract away a pattern used everywhere when adding a schema extension.	2020-03-05 16:09:44 +01:00
Piotr Sarna	c35160457b	Merge 'Clean up stream_id representation' from Piotr With #5950 we changed the representation of stream_id in CDC Log from two int columns to a single blob column. This PR cleans up stream_id representation internally. Now stream_id is stored as blob both in-memory and in internal CDC tables. Tests: unit(dev) * hawk/stream_id_representation: cdc: store stream_ids as blobs in internal tables cdc: improve do_update_streams_description cdc: Fix generate_topology_description cdc: add stream_id::operator< cdc: change stream_id representation	2020-03-05 14:14:29 +01:00
Tomasz Grabiec	d5557023f6	Merge "Stop using BOOST_TEST_MESSAGE() in unit tests" from Kostja Stop using BOOST_TEST_MESSAGE() in unit tests, it bloats test XML output. Use Scylla logger instead. Test: unit (debug, dev, release)	2020-03-05 13:27:30 +01:00
Calle Wilund	b48255a4cd	db::commitlog: Only zero disk blocks not already allocated in segment Fixes #5891 Refs #5899 When creating segments with the o_dsync option active, we write max_size zeros to disk, to ensure actual disk blocks are allocated. However, if we recycle a segment, we should, when not actually creating a new file, check the existing size on disk, and only zero any blocks not already allocated (i.e. if recycled file was smaller than max_size, due to segement truncation on close). test: unit Message-Id: <20200226121601.15347-1-calle@scylladb.com>	2020-03-05 13:27:08 +01:00
Piotr Sarna	875d230298	Merge "CDC: use a single `cdc$time` value for a batch of changes" from Kamil. If a batch update is performed with a sequence of changes with a single timestamp, they will now show up in CDC with a single timeuuid in the cdc$time column, distinguished by different cdc$batch_seq_no values. Fixes #5953. Tests: unit(dev) * haaawk/splitbatch: cdc: use a single timeuuid value for a batch of changes cdc: replace `split` with `for_each_change`	2020-03-05 13:17:34 +01:00
Pavel Emelyanov	7bc34c17eb	range-streamer: Tune the progress message Now it will show the full info about range being streamed, like range_streamer - Rebuild with 127.0.0.2 for keyspace=ks2, streaming [72, 96) out of 248 ranges The [x, y) range is semi-open one, the full streaming progress then can be logged like ... streaming [0, 16) out of 36 ranges <- first send ... streaming [16, 24) out of 36 ranges ... streaming [24, 36) out of 36 ranges <- last send Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200304101505.5506-1-xemul@scylladb.com>	2020-03-05 12:56:29 +01:00
Kamil Braun	3200d415da	cdc: use a single timeuuid value for a batch of changes If a batch update is performed with a sequence of changes with a single timestamp, they will now show up in CDC with a single timeuuid in the `time` column, distinguished by different `batch_seq_no` values. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 12:32:57 +01:00
Konstantin Osipov	94ee511f6a	lwt: implement cas_failed_read_round_optimization metric Presently lightweight transactions piggy back the old row value on prepare round response. If one of the participants did not provide the old value or the values from peers don't match, we perform a full read round which will repair the Paxos table and the base table, if necessary, at all participants. Capture the fact that read optimization has failed in a metric. Message-Id: <20200304192955.84208-2-kostja@scylladb.com>	2020-03-05 12:20:45 +01:00
Kamil Braun	292eba9da0	cdc: replace `split` with `for_each_change` `for_each_change` is like `split` but it doesn't return a vector of mutations representing each change; instead, it takes as a parameter a function which gets called on each mutation. This reduced the memory usage and allows to preserve common context when handling each change (will be useful in next commits). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 12:05:08 +01:00
Pekka Enberg	0beb45faf3	build: Use reloc dynamic linker unconditionally The relocatable package requires a magic dynamic linker path for "patchelf" to work correctly. Therefore, use the "get-dynamic-linker.sh" script to unconditionally define a magic dynamic linker path to ensure that building the relocatable package with ninja build ("ninja-build build/<mode>/scylla-package.tar.gz") is always correct. Although the path looks odd with a lot of leading slashes, it works outside relocatable package too. Message-Id: <20200305091919.6315-2-penberg@scylladb.com>	2020-03-05 12:53:28 +02:00
Pekka Enberg	8a810cc41a	reloc: Move dynamic linker magic to get-dynamic-linker.sh In preparation for moving dynamic linker flags to ninja build, move the magic dynamic linker path generation to "reloc/get-dynamic-linker.sh" script that configure.py can call. Message-Id: <20200305084331.5339-1-penberg@scylladb.com>	2020-03-05 12:53:22 +02:00
Konstantin Osipov	ac0717fb64	test: consistently use a global testlog object in all tests Use test/lib/log.hh in all tests now that we have it.	2020-03-05 13:34:24 +03:00
Piotr Jastrzebski	57cfe6d0e1	cdc: store stream_ids as blobs in internal tables In new CDC Log format stream_id is represented by a single blob column so it makes sense to store it in the same form everywhere - including internal CDC tables. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	b2acdc9307	cdc: improve do_update_streams_description Use std::set::insert that takes range instead of looping through elements and adding them one by one. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	446722d6ed	cdc: Fix generate_topology_description In new CDC Log format we store only a single stream_id column. This means generate_topology_description has to use appropriate schema for generating tokens for stream_ids. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	9a212dcaef	cdc: add stream_id::operator< Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:21 +01:00
Piotr Jastrzebski	f317a659d9	cdc: change stream_id representation New CDC Log format stores stream ids as blobs. It makes sense to keep them internally in the same form. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:30:10 +01:00
Piotr Sarna	f21bd57058	Merge "cdc: log static rows correctly" from Piotr Currently, writes to a static row in a base table are not reflected at all in the corresponding cdc log. This patch causes such writes to be properly logged. Fixes: #5744 Tests: unit(dev) * piodul/5744-handle-static-row-correctly-in-cdc: cdc_test: add tests for handling static row cdc: fix indentation in transformer::transform cdc: handle static rows separately in transformer::transform cdc: move process_cells higher (and fix captured variables) cdc: reduce dependencies on captured variables in process_cells cdc: fix preimage query for static rows	2020-03-05 10:42:15 +01:00
Nadav Har'El	96ca5ac2c8	alternator: use separate smp_service_group for bouncing requests Until this patch, we used the default_smp_service_group() when bouncing Alternator requests between shards (which is needed for LWT). This patch creates a new smp_service_group for this purpose, which is limited to 5000 concurrent requests (the same limit used for CQL's bounce_request_smp_service_group). The purpose of this limit is to avoid many shards admitting a huge number of requests and bouncing all of them to the same shard who now can't "unadmit" these requests. Fixes #5664. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200304170825.27226-1-nyh@scylladb.com>	2020-03-05 10:17:51 +01:00
Konstantin Osipov	ff3f9cb7cf	test: stop using BOOST_TEST_MESSAGE() for logging We use boost test logging primarily to generate nice XML xunit files used in Jenkins. These XML files can be bloated with messages from BOOST_TEST_MESSAGE(), hundreds of megabytes of build archives, on every build. Let's use seastar logger for test logging instead, reserving the use of boost log facilities for boost test markup information.	2020-03-05 11:38:11 +03:00
Juliusz Stasiewicz	c8527f20b0	CDC+LWT: fix missing CDC entries for successful LWTs Now, if CDC is enabled, `paxos_response_handler::learn_decision()` augments the base table mutation. The differences in logic between: (1) `mutate_internal<std::vector<mutation>>()` and (2) `mutate_internal<std::vector<std::tuple<paxos::proposal, schema_ptr, ...>>>()` make it necessary to separate "CDC mutations" from "base mutation" and send them, respectively, to (1) and (2). Gleb explained in #5869 why it became necessary to add CDC code to LWT writes specifically, instead of doing it somewhere central that affects all writes: "All paths that do write goes through mutate_internally() eventually so it would have been best to do augmentations there, but cdc chose to log only certain writes and not others (unlike MV that does not care how write happened) and mutate_internal have no idea which is which so I do not have other choice but code duplication. ... paxos_response_handler::learn_decision is probably the place to add cdc augmentation." Fixes #5869	2020-03-05 09:49:19 +02:00
Piotr Dulikowski	204e204586	cdc: do not attempt to log empty mutations It is possible to produce an empty mutation using CQL. For example, the following query: DELETE FROM ks.tbl WHERE pk = 0 AND ck < 1 AND ck > 2; will attempt to delete from an empty range of rows. This is translated to the following mutation: {ks.tbl {key: pk{000400000000}, token:-3485513579396041028} {mutation_partition: static: cont=1 {row: }, clustered: {}}} Such mutation does not contain any timestamp, therefore it is difficult to determine what timestamp was used while making the query. This is problematic for CDC, because an entry in CDC log should be written with the same timestamp as a part of the mutation. Because an empty mutation does not modify the table in any way, we can safely skip logging such mutations in CDC and still preserve the ability to reconstruct the current state of the base table from full CDC log. Tests: unit(dev)	2020-03-05 08:32:54 +01:00
Piotr Dulikowski	e6751fad62	cdc_test: add tests for handling static row	2020-03-05 00:16:17 +01:00
Piotr Dulikowski	39519ce923	cdc: fix indentation in transformer::transform	2020-03-05 00:16:17 +01:00
Piotr Dulikowski	0d05b17881	cdc: handle static rows separately in transformer::transform Before this patch, `transform` did not generate any log rows about static row change. This commit fixes that - now, a log row is created if a static row is changed, and this row is separate from the rows that describe changes to the clustering rows.	2020-03-05 00:16:17 +01:00
Piotr Dulikowski	6a0b0b5786	cdc: move process_cells higher (and fix captured variables) The `process_cells` lambda is moved outside the loop, because it will be used by other code in subsequent commits.	2020-03-05 00:15:57 +01:00
Piotr Dulikowski	f136f6e02c	cdc: reduce dependencies on captured variables in process_cells This is a preparation for moving the lambda outside the for loop. - `log_ck`, `pikey`, `pirow` are now passed as arguments, - `value` is now a variable local to the lambda, - `ttl` is now a variable local to the lambda that is returned.	2020-03-05 00:14:05 +01:00
Piotr Dulikowski	a7f51449c3	cdc: fix preimage query for static rows For static rows, we need to fetch at least one row from its partition in order to compute its preimage.	2020-03-04 18:43:55 +01:00
Botond Dénes	8b908a9aba	test: lib/mutation_source_test: log the name of the test-method Most test-methods log a message with their names upon entering them. This helps in identifying the test-method a failure happened in in the logs. Two methods were missing this log line, so add it. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200304155235.46170-1-bdenes@scylladb.com>	2020-03-04 18:16:21 +02:00
Pekka Enberg	7fde2e28da	dist/redhat: Specify files once in scylla.spec file Silences the following warnings when building an RPM: warning: File listed twice: /opt/scylladb/scripts/libexec/hex2list.py warning: File listed twice: /opt/scylladb/scripts/libexec/node_exporter_install warning: File listed twice: /opt/scylladb/scripts/libexec/perftune.py warning: File listed twice: /opt/scylladb/scripts/libexec/scylla-blocktune warning: File listed twice: /opt/scylladb/scripts/libexec/scylla-housekeeping warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_bootparam_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_config_get.py warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_coredump_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_cpuscaling_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_cpuset_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_dev_mode_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_ec2_check warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_fstrim warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_fstrim_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_io_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_kernel_check warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_ntp_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_prepare warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_raid_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_selinux_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_stop warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_sysconfig_setup warning: File listed twice: /opt/scylladb/scripts/libexec/seastar-addr2line warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/LICENSE-crc32-vpmsum.TXT warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/README.md warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/apache-license-2.0.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/boost-license-1.0.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/date-license.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/git-archive-all-license.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/libdeflate-license.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/xxhash-license.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/zstd-license.txt I verified that the files are in the generated RPMs after the change: [penberg@nero scylla]$ rpm -ql build/dist/dev/redhat/RPMS/x86_64/scylla-server-666.development-0.20200304.2bc700b008.x86_64.rpm \| grep scripts.*libexec /opt/scylladb/scripts/libexec /opt/scylladb/scripts/libexec/hex2list.py /opt/scylladb/scripts/libexec/node_exporter_install /opt/scylladb/scripts/libexec/perftune.py /opt/scylladb/scripts/libexec/scylla-blocktune /opt/scylladb/scripts/libexec/scylla-housekeeping /opt/scylladb/scripts/libexec/scylla_bootparam_setup /opt/scylladb/scripts/libexec/scylla_config_get.py /opt/scylladb/scripts/libexec/scylla_coredump_setup /opt/scylladb/scripts/libexec/scylla_cpuscaling_setup /opt/scylladb/scripts/libexec/scylla_cpuset_setup /opt/scylladb/scripts/libexec/scylla_dev_mode_setup /opt/scylladb/scripts/libexec/scylla_ec2_check /opt/scylladb/scripts/libexec/scylla_fstrim /opt/scylladb/scripts/libexec/scylla_fstrim_setup /opt/scylladb/scripts/libexec/scylla_io_setup /opt/scylladb/scripts/libexec/scylla_kernel_check /opt/scylladb/scripts/libexec/scylla_ntp_setup /opt/scylladb/scripts/libexec/scylla_prepare /opt/scylladb/scripts/libexec/scylla_raid_setup /opt/scylladb/scripts/libexec/scylla_selinux_setup /opt/scylladb/scripts/libexec/scylla_setup /opt/scylladb/scripts/libexec/scylla_stop /opt/scylladb/scripts/libexec/scylla_sysconfig_setup /opt/scylladb/scripts/libexec/seastar-addr2line [penberg@nero scylla]$ rpm -ql build/dist/dev/redhat/RPMS/x86_64/scylla-server-666.development-0.20200304.2bc700b008.x86_64.rpm \| grep license /opt/scylladb/share/doc/scylla/licenses /opt/scylladb/share/doc/scylla/licenses/LICENSE-crc32-vpmsum.TXT /opt/scylladb/share/doc/scylla/licenses/README.md /opt/scylladb/share/doc/scylla/licenses/apache-license-2.0.txt /opt/scylladb/share/doc/scylla/licenses/boost-license-1.0.txt /opt/scylladb/share/doc/scylla/licenses/date-license.txt /opt/scylladb/share/doc/scylla/licenses/git-archive-all-license.txt /opt/scylladb/share/doc/scylla/licenses/libdeflate-license.txt /opt/scylladb/share/doc/scylla/licenses/xxhash-license.txt /opt/scylladb/share/doc/scylla/licenses/zstd-license.txt Message-Id: <20200304150057.2621-1-penberg@scylladb.com>	2020-03-04 17:25:53 +02:00
Tomasz Grabiec	da4bd3d2e6	Merge "Clean cql3 usage of storage_proxy and _service" from Pavel E. This set removes _all_ mentionings of storage_service and _all_ calls for global storage_proxy instances from cql3/ code. Tests: unit(dev)	2020-03-04 15:20:24 +01:00
Raphael S. Carvalho	3ba3ee2a7b	distributed_loader: trigger regular compaction on resharding completion Regular compaction relies on compaction manager to run compaction jobs until compaction strategy is satisfied. Resharding, on the other hand, is an one-off operation which runs only once in compaction manager, and leave the sstable set in such a way that the strategy is very likely unsatisfied. We need to trigger regular compaction whenever a resharding job replaces a shared sstable by an unshared sstable, so that compaction will not fall way behind due to lots of new sstables created by resharding process. Fixes #5262. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200217144946.20338-1-raphaelsc@scylladb.com>	2020-03-04 16:08:13 +02:00
Nadav Har'El	f67a402c48	merge: Remove treewide dependency on boost/multiprecision Merged patch series from Avi Kivity: boost/multiprecision is a heavyweight library, pulling in 20,000 lines of code into each header that depends on it. It is used by converting_mutation_partition_applier and types.hh. While the former is easy to put out-of-line, the latter is not. All we really need is to forward-declare boost::multiprecision::cpp_int, but that is not easy - it is a template taking several parameters, among which are non-type template parameters also defined in that header. So it's quite difficult to disentangle, and fragile wrt boost changes. This patchset introduces a wrapper type utils::multiprecision_int which _can_ be forward declared, and together with a few other small fixes, manages to uninclude boost/multiprecision from most of the source files. The total reduction in number of lines compiled over a full build is 324 * 23,227 or around 7.5 million. Tests: unit (dev) Ref #1 https://github.com/avikivity/scylla uninclude-boost-multiprecision/v1 Avi Kivity (5): converting_mutation_partition_applier: move to .cc file utils: introduce multiprecision_int tests: cdc_test: explicitly convert from cdc::operation to uint8_t treewide: use utils::multiprecision_int for varint implementation types: forward-declare multiprecision_int configure.py \| 2 + concrete_types.hh \| 2 +- converting_mutation_partition_applier.hh \| 163 ++------------- types.hh \| 12 +- utils/big_decimal.hh \| 3 +- utils/multiprecision_int.hh \| 256 +++++++++++++++++++++++ converting_mutation_partition_applier.cc \| 188 +++++++++++++++++ cql3/functions/aggregate_fcts.cc \| 10 +- cql3/functions/castas_fcts.cc \| 28 +-- cql3/type_json.cc \| 2 +- lua.cc \| 38 ++-- mutation_partition_view.cc \| 2 + test/boost/cdc_test.cc \| 6 +- test/boost/cql_query_test.cc \| 16 +- test/boost/json_cql_query_test.cc \| 12 +- test/boost/types_test.cc \| 58 ++--- test/boost/user_function_test.cc \| 2 +- test/lib/random_schema.cc \| 14 +- types.cc \| 20 +- utils/big_decimal.cc \| 4 +- utils/multiprecision_int.cc \| 37 ++++ 21 files changed, 627 insertions(+), 248 deletions(-) create mode 100644 utils/multiprecision_int.hh create mode 100644 converting_mutation_partition_applier.cc create mode 100644 utils/multiprecision_int.cc	2020-03-04 15:13:42 +02:00
Avi Kivity	5dee627f73	types: forward-declare multiprecision_int This reduces the number of translation units that depend on boost/multiprecision from 354 to 30, and reduces the size of database.i (as an example) from 406160 to 382933 (smaller files will benefit more, relatively). Ref #1	2020-03-04 13:28:16 +02:00
Avi Kivity	3c772757c0	treewide: use utils::multiprecision_int for varint implementation The goal is to forward-declare utils::multiprecision_int, something beyond my capabilities for boost::multiprecision::cpp_int, to reduce compile time bloat. The patch is mostly search-and-replace, with a few casts added to disambiguate conversions the compiler had trouble with.	2020-03-04 13:28:16 +02:00
Avi Kivity	874f65c58c	tests: cdc_test: explicitly convert from cdc::operation to uint8_t After the varint data type starts using the new multiprecision_int type, this code fails to compile. I expect that somehow the conversion from enum class to cpp_int was allowed to succeed, and we ended up with a data_value of type varint. The tests succeeded because the serialized representation happened to be the same.	2020-03-04 13:28:16 +02:00
Piotr Jastrzebski	354e3c34c8	cdc log: merge stream_id columns into a single column Previously we had stream_id_1 and stream_id_2 columns of type long each. They were forming a partition key. In a new format we want a single stream_id column that forms a partition key. To be able to still store two longs, the new column will have type blob and its value will be concatenated bytes of two longs that partition key is composed of. We still want partition key to logically be two longs because those two values will be used by a custom partitioner later once we implement it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-04 13:27:48 +02:00
Avi Kivity	7434c81a29	utils: introduce multiprecision_int multiprecision_int is a wrapper around boost::multiprecision::cpp_int that adds no functionality. The intent is to allow forward declration; cpp_int is so complicated that just finding out what its true type is a difficult exercise, as it depends on many internal declarations. Because cpp_int uses expression templates, the implementation has to explicitly cast to the desired type in many places, otherwise the C++ compile is presented with too many choices, especially in conjunction with data_value (which can convert from many different types too).	2020-03-04 12:42:57 +02:00
Avi Kivity	414ec8c68e	converting_mutation_partition_applier: move to .cc file converting_mutation_partition_applier is a heavyweight class that is not used in the hot path, so it can be safely out-of-lined. This moves some includes to boost/multiprecision out of header files, where they can infect a lot of code. mutation_partition_view.cc's includes were adjusted to recover missing dependencies.	2020-03-04 12:42:57 +02:00
Pavel Emelyanov	35b0e6dd7f	repair_writer: Use db from repair_meta (2nd try) The previous version errorneously used local db reference which was propagated into another shard. This time carry the sharded instance and use .local() as before. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303221729.31261-1-xemul@scylladb.com>	2020-03-04 11:31:52 +01:00
Tomasz Grabiec	477dadc062	Merge "cql_test_env: Drop a few shared_ptr<sharded<...>>" from Rafael I found that a few variables in cql_test_env were wrapping sharded in shared_ptr for no apparent reason. These patches convert them to plain sharded<...>.	2020-03-04 11:31:52 +01:00
Yaron Kaikov	de19496ff7	dist/docker: Add VERSION argument to Dockerfile (#5845 ) Currently, the Dockerfile installs the latest version of Scylla. Let's add a VERSION argument to Dockerfile, which explicitly specifies the version to ensure scripts, for example, always build the expected version. If no VERSION is specified for "docker build", use the default value of "666.development", which is the version number for latest nightly.	2020-03-04 12:20:24 +02:00
Pekka Enberg	e76b5bdf7b	Merge 'Cleanup test.py output' from Kostja "These two patches were made suspect of failing next promotion and excluded from the original series." * 'test.py.log' of https://github.com/kostja/scylla: test.py: remove log output on success unless -s is specified test.py: do not store entire log output in junit report.	2020-03-04 11:58:46 +02:00
Eliran Sinvani	99cedf737c	docker: rsyslog configuration fixes The introduction of rsyslog had two errors in it. Both errors are non fatal and the docker still works, however, the system is left in a wrong state in which supervisord marks rsyslogd service as failed (after several failed retry attempts). Another bug in the configuration causes rsyslog to output an error. 1) An inclusion command from a newer version was used in rsyslogs main configuration file. This caused to rsyslog to complain during startup but it didn't do much damage since rsyslog converts every unrecognised command to a message command. 2) in the supervisord definition of the service, rsyslogd is ran without the -n option which means it defaults to automatically switch to the background. Supervisord interpret this as an unexpected process termination and retries to start the process (unsuccessfully because rsyslog protects itself from having multiple processes of itself) and eventually marks it as down although it is fully up and running. This commit fixes both configuration problems. Tests: Build and run docker and validate the errors are gone. Fixes #5937	2020-03-04 11:56:30 +02:00
Pekka Enberg	325c3e13eb	build: Switch to SHA1 build IDs Currently, you have to build the relocatable package tarball with ./reloc/build_reloc.sh to be able to build an RPM out of it. You need to do this because RPMS require SHA1 build-ids, but the build system does not enforce that. To prepare for adding RPM target to the ninja build, let's switch to SHA1 build ID conditionally, because the performance difference between xxhash and SHA1 is neglible. Rafael Avila de Espindola writes: [...] the sha1 implementation in current lld is pretty fast. Linking release scylla the times I get are lld in fedora fast 2.83739 sha1 3.51990 current lld fast 2.6936 sha1 2.90250 And the sha1 implementation might get even faster: https://bugs.llvm.org/show_bug.cgi?id=44138. Message-Id: <20200303131806.22422-1-penberg@scylladb.com>	2020-03-04 11:00:43 +02:00
Tomasz Grabiec	82b76163e3	utils/small_vector: Add missing include Needed for std::uninitialized_move() et al Message-Id: <20200303191148.11716-1-tgrabiec@scylladb.com>	2020-03-03 21:23:40 +02:00
Tomasz Grabiec	5dfefc0a85	Revert "repair_writer: Use db from repair_meta" This reverts commit `c6ddd21c50`. Uses database& instance across shards, which causes repair writer to use the table object from the wrong shard. Fixes #5907	2020-03-03 19:50:53 +01:00
Avi Kivity	906784639d	Merge "Clean sstables from using global objects" from Pavel E " This set cleans sstable_writer_config and surrounding sstables code from using global storage_ and feature_ service-s and database by moving the configuration logic onto sstables_manager (that was supposed to do it since `eebc3701a5`). Most of the complexity is hidden around sstable_writer_config creation, this set makes the sstables_manager create this object with an explicit call. All the rest are consequences of this change. Tests: unit(debug), manual start-stop " * 'br-clean-sstables-manager-2' of https://github.com/xemul/scylla: sstables: Move get_highest_supported_format sstables: Remove global get_config() helper sstables: Use manager's config() in .new_sstable_component_file() sstable_writer_config: Extend with more db::config stuff sstables_manager: Don't use global helper to generate writer config sstable_writer_config: Sanitize out some features fields initialization sstable_writer_config: Factor out some field initialization sstables: Generate writer config via manager only sstables: Keep reference on manager test: Re-use existing global sstables_manager table: Pass sstable_writer_config into write_memtable_to_sstable	2020-03-03 18:33:01 +02:00
Nadav Har'El	750fe9585a	alternator: change rjson::get() to take std::string_view Change rjson::get() to take std::string_view, instead of RapidJson's version of that type, "StringRef". We already did the same change for rjson::find() in a previous patch. Not only is std::string_view more convenient for potential callers in Scylla, this change also avoids a bug in FindMember() on StringRef where the length is ignored (and instead, null-termination of the string is assumed). This patch doesn't require any changes to callers, because we actually had just a handful of remaining callers (most call sites switched to rjson::find()), and all of them used string constants which could be implicitly converted to StringRef or std::string_view just the same. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200303161019.1456-1-nyh@scylladb.com>	2020-03-03 17:13:40 +01:00
Nadav Har'El	91d9632909	alternator: add rjson::remove_member() convenience function This patch adds a rjson::remove_member() wrapper to the RemoveMember method, which takes a std::string_view. But beyond the convenience, this actually works around a subtle bug in RemoveMember where, if given a StringRef parameter, ignores its length (see upstream issue https://github.com/Tencent/rapidjson/issues/1649). In the one place we used RemoveMember, it forced us to copy the string because it wasn't null-terminated. The solution proposed here involves wrapping the string view in a GenericValue - which no longer needs to copy the string, but still works around the bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200303143524.28300-1-nyh@scylladb.com>	2020-03-03 16:35:41 +01:00
Nadav Har'El	0fcb226412	alternator: switch rjson::find() to use std::string_view Our rjson::find() convenience function used RapidJson's "StringRef" type, which is almost exactly like std::string_view. If we switch to use string_view as we do in this patch, a lot of call sites become much simpler. Moreover, there was an even more important motivation for this patch: the RapidJson FindMember() function we used in rjson::find() has a bug when given a StringRef - although a StringRef contains a length, the FindMember() code ignores it and expects the string to be null-terminated (see: https://github.com/Tencent/rapidjson/issues/1649). In this patch, we wrap the pointer and length of a std::string_view in an rjson::value, a code path which bypasses the FindMember bug, and yet does not require copying the string. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200303141814.26929-1-nyh@scylladb.com>	2020-03-03 16:35:41 +01:00
Nadav Har'El	2ea0b9d226	Merge branch 'split-mutations' of github.com:haaawk/scylla into next Merged pull request https://github.com/scylladb/scylla/pull/5940 from Kamil Braun: Add a bunch of new structs describing a change made to a table, and an extract_changes function which takes a mutation and returns the set of changes contained in this mutation, separated by timestamp and ttl. Add a split function which uses extract_changes to split a mutation into separate mutations, each describing a single change. Static rows are put into separate changes now. The pre_image_select function was fixed to select pre_image data always when there is a static row/clustered row change, even if there were e.g. additional range tombstones. Fixes: #5719. Tests: unit(dev)	2020-03-03 17:27:21 +02:00
Botond Dénes	103bf50e18	storage_proxy: add timeouts to smp calls on the write path When a node is overloaded requests usually start to queue up. Timeouts are supposed to prevent queues from exploding and causing an OOM. One prominent queue that tends to explode is the smp queue as it didn't support timeouts and so requests would sit in the queue until the target shard would process them. If the target shard is heavily overloaded requests might accumulate faster then they are processed, surely leading to an OOM. To prevent this use the recently introduces timeout to `seastar::smp::submit_to()` and derived APIs to time out write requests sitting in the smp queue. We simply use the request's own timeout for this purpose. Fixes: #5055 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200303131658.741720-1-bdenes@scylladb.com>	2020-03-03 15:39:58 +02:00
Kamil Braun	5de9b5b566	cdc: add change splitting test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-03 13:31:19 +01:00
Kamil Braun	5c4a237c12	cdc: split the mutation before passing it into `transform` If the mutation contains separate logical changes (e.g. with different timestamps and/or ttls), it will be split into multiple mutations, each passed into transform.	2020-03-03 13:17:51 +01:00
Kamil Braun	9924e3aa34	cdc: reduce code duplication in augment_mutation_call Now there's only one call to `transform`.	2020-03-03 13:17:51 +01:00
Kamil Braun	24a32a13b5	cdc: retrieve preimage anytime there are static/clustered row updates Previously we wouldn't retrieve the preimage if the mutation contained something different than static/clustered row updates, e.g. if it contained a partition deletion. However, there are mutations created from batch statements which can contain both a partition deletion and a set of row updates with a later timestamp. We want to retrieve the preimage too in this case.	2020-03-03 13:17:51 +01:00
Kamil Braun	529d30ef66	cdc: add `split` function This function takes a mutation and returns a set of mutations, each representing a separate change with a single timestamp and ttl.	2020-03-03 13:17:51 +01:00
Kamil Braun	132ea89c32	cdc: add `extract_changes` function This commit introduces a bunch of new structs describing a change made to a table, and an `extract_changes` function which takes a mutation and returns the set of changes contained in this mutation, separated by timestamp and ttl.	2020-03-03 13:17:51 +01:00
Kamil Braun	b5c944370e	cdc: add `should_split` function The function checks if there are multiple timestamps and/or ttls inside a mutation, which means separate changes should be created for this mutation in CDC.	2020-03-03 13:17:50 +01:00
Konstantin Osipov	48f09b95d0	test.py: remove log output on success unless -s is specified Log output is saved by the build system and can take a lot of space. Remove it unless -s is specified.	2020-03-03 13:59:14 +03:00
Konstantin Osipov	ae2820a1c7	test.py: do not store entire log output in junit report. This makes report very heavy and is suspected to corrupt XML output.	2020-03-03 13:59:14 +03:00
Nadav Har'El	359b32fb63	merge: CDC: implement new column format and naming Merged pull request https://github.com/scylladb/scylla/pull/5910 by Calle Wilund: Rename metadata and data columns according to new spec Also use transformation methods for names in all code + tests to make switching again easier Break up data column tuple Data column is now pure frozen original type. If column is deleted (set to null), a metadata column cdc$deleted_ is set to true, to distinguish null column == not involved in row operation For non-atomic collections, a cdc$deleted_elements_ column is added, and when removing elements from collection this is where they are shown. For non-atomic assign, the "cdc$deleted_" is true, and is set to new value. column_op removed.	2020-03-03 12:36:16 +02:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Piotr Jastrzebski	f105f43008	commitlog: remove FIXME In segment_manager::on_timer() there's a FIXME to stop discarding future returned from sync() but sync() does not return any future so it's safe to remove the FIXME and stop casting to (void). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6d6d819cb2972e47e5f3fbe7b896499c64b09e53.1583230579.git.piotr@scylladb.com>	2020-03-03 12:21:56 +02:00
Calle Wilund	ed0d1c5fe2	cdc: Break up data column tuple According to "new" spec: Data column is now pure frozen original type. If column is deleted (set to null), a metadata column cdc$deleted_<name> is set to true, to distinguish null column == not involved in row operation For non-atomic collections, a cdc$deleted_elements_<name> column is added, and when removing elements from collection this is where they are shown. For non-atomic assign, the "cdc$deleted_<name>" is true, and <name> is set to new value. column_op removed.	2020-03-03 08:52:20 +00:00
Rafael Ávila de Espíndola	28e59566a8	cql_test_env: Don't use a shared_ptr for token_metadata Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:52:23 -08:00
Rafael Ávila de Espíndola	47f8a63279	cql_test_env: Don't use a shared_ptr for migration_notifier Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:51:45 -08:00
Rafael Ávila de Espíndola	ed0c4d2801	cql_test_env: Don't use a shared_ptr for view_update_generator Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:51:25 -08:00
Rafael Ávila de Espíndola	ff2edd15d4	cql_test_env: Don't use a shared_ptr for view_builder Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:50:48 -08:00
Rafael Ávila de Espíndola	9375478803	cql_test_env: Don't use a shared_ptr for feature_service Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:50:25 -08:00
Rafael Ávila de Espíndola	5e87562f33	cql_test_env: Don't use a shared_ptr for database Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:50:08 -08:00
Rafael Ávila de Espíndola	a4b7de4d5d	cql_test_env: Don't use a shared_ptr for auth::service Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:49:46 -08:00
Botond Dénes	8a1c8ce8a6	mutation_partition: make query_result_builder safely movable `query_result_builder` is movable but if you actually try to move it after it having consumed some fragments it will blow up in your face when you try to use it again. This is because its `mutation_querier` member received a reference to its `query::result::partition_writer`. Of course the reference to the latter was invalidated on move so the former accessed invalid memory. Since `query::result::partition_writer` wasn't actually used for anything other, just move it into the `mutation_querier`, making `query_result_builder` actually safe to move. Fixes: #3158 Message-Id: <20190830142601.51488-1-bdenes@scylladb.com>	2020-03-02 18:46:59 +01:00
Botond Dénes	4da0a1d397	docs/debugging.md: mention another method of helping gdb find sources Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200225124701.80706-1-bdenes@scylladb.com>	2020-03-02 18:26:29 +01:00
Pavel Emelyanov	86ca4b83d0	Revert "Revert "features: Stop on shutdown"" This reverts commit `165913598b`.	2020-03-02 19:56:18 +03:00
Pavel Emelyanov	0a10e9787e	features: Remove future-based when_enabled() This API is considered to be error-prone, all users of it are reworked, so let's drop it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-02 19:55:52 +03:00
Pavel Emelyanov	e63f5187b2	system_keyspace: Rework migrate_truncation_records feature subscription The function in question uses future-based .when_enabled() subscription on cluster_supports_truncation_table feature. This method is considered to be unsafe, so here's the patch that changes it onto feature::listener. The completion of the migration is only awaited by a single test, so this waiting mechanism is also slightly simplified. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-02 19:55:28 +03:00
Tomasz Grabiec	e17db536fd	Merge "lwt: support LIKE operator in conditional expressions" from Alejo Support LIKE operator condition on column expressions. NOTE: following the existing code, the LIKE pattern value is converted to raw bytes and passed straight as bytes_view to like_matcher without type checking; it should be checked/sanitized by caller. Refs: #5777 Branch URL: https://github.com/alecco/scylla/tree/as_like_condition_2 Tests: unit ({dev}), unit ({debug}) NOTE: fail for unrelated test test_null_value_tuple_floating_types_and_uuids	2020-03-02 17:36:57 +01:00
Botond Dénes	6218153543	scylla-gdb.py: introduce collection_element() Extracting a certain element from a collection is a common task I have to do while debugging cores. For certain collections (c-array, std::array) this is trivial, for others it is easy enough (std::vector), but for some (std::list) this is a tiresome work-intensive process. This convenience function allows getting a reference to any element of the supported container types, returning them for further use in the interactive session. Currently only `std::list` and `std::vector` are supported.	2020-03-02 16:28:49 +01:00
Botond Dénes	94352b3426	scylla-gdb.py: generalize dereference_lw_shared_ptr() To be a generic convenience function for dereferencing all sorts of smart pointers. For now `std::unique_ptr`, `seastar::lw_shared_ptr` and `seastar::foreign_ptr` are supported.	2020-03-02 16:28:04 +01:00
Botond Dénes	b6f8a6fbd3	test/boost: sstable_datafile_test: sstable_scrub_test: stop table `table` is not registered with the database, and hence will not be waited on during shutdown. Stop it explicitly to prevent any asynchronous operation on it racing with shutdown. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200302142845.569638-1-bdenes@scylladb.com>	2020-03-02 16:20:00 +01:00
Calle Wilund	b6443e44b9	set: Make set_type_impl::serialize_partially_deserialized_form static Conform with map + does not require any instance info.	2020-03-02 14:43:34 +00:00
Pavel Solodovnikov	64451e5f51	cql3: minor cleanups regarding cql3::attributes::raw class * Mark cql3::attributes::raw class as final * Change every occurrence of ::shared_ptr<attributes::raw> to std::unique_ptr<...> * Mark all methods in cql3::attributes::raw as const * Remove redundant "_attrs" ptr copy in insert_json_statement, use one from raw::modification_statement * Fix odd indentation in cql3/statements/update_statement.cc Tests: unit-tests (dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200301223708.99883-1-pa.solodovnikov@scylladb.com>	2020-03-02 13:26:01 +01:00
Tomasz Grabiec	51cfd13f8c	gdb: Fix get_local_tasks() chunked_vector holds task* directly after seastar commit bcb5cf3a8dca19be0e577ee4e3bcd246f949dce6. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200227171722.7189-1-tgrabiec@scylladb.com>	2020-03-02 12:02:19 +02:00
Tomasz Grabiec	57a3f3e36b	gdb: Fix std_variant::get() when index is > 0 _get_next() was recursively calling itself with index - 1 if index was > 0. When we reached the desired element we always tried to use member_types[0] as the type, which is incorrect since member_types contains all types and doesn't change in get(). Fix by replacing recursion with iteration so that we keep the original index. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1582900804-18681-1-git-send-email-tgrabiec@scylladb.com>	2020-03-02 11:59:19 +02:00
Tomasz Grabiec	4942c4c22b	gdb: Drop class keyword when constructing type name in seastar_lw_shared_ptr I encountered a case when template type name is not resolved when "class " is present. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1582900998-19267-1-git-send-email-tgrabiec@scylladb.com>	2020-03-02 11:58:44 +02:00
Calle Wilund	1085860c62	cdc: Rename metadata and data columns according to new spec Also use transformation methods for names in all code + tests to make switching again easier	2020-03-02 09:34:51 +00:00
Piotr Sarna	c62863cf69	alternator: restore verbose parsing error messages When wrapping rapidjson routines with safer, yieldable code, parsing information was lost, because the JSON reader was not checked for parsing errors before further processing. That resulted in nearly all parsing errors being reduced to "Assertion failed: StackSize() != 1". After this patch, all various errors (missing quotations, colons, object names, etc.) are properly returned for the user. Message-Id: <968ce2f7539bf33d3eb829f0ab431b788d291602.1583134221.git.sarna@scylladb.com>	2020-03-02 11:29:09 +02:00
Tomasz Grabiec	4c0ddf3a2d	gdb: Introduce 'scylla features' command Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1582901194-19903-1-git-send-email-tgrabiec@scylladb.com>	2020-03-02 11:28:13 +02:00
Nadav Har'El	ba536dbc95	alternator-test: don't warn about not verifying SSL certificates When running the Alternator tests, we don't care about verifying the pedigree of the SSL certificates - we actually know the ones we use in our test setups are fake, and not signed by any respectable certificate authority. We already use "verify=False" in many requests to avoid the certificate checking, but then we start getting scary-looking warning messages about an "Unverified HTTPS request is being made.". There's a way to disable these warnings, but we only did in some cases, and there were still some tests that show these warnings. Let's do it once, in a way that affects all tests. Message-Id: <20200301175607.8841-1-nyh@scylladb.com>	2020-03-01 22:59:20 +01:00
Juliusz Stasiewicz	cf24ae86f3	cdc: distinguishing update from insert When incoming mutation contains live row marker the `operation` is described as "insert", not as an "update". Also, I extended the test case "test_row_delete" with one insert, which is expected to log different value of `operation` than update or delete. Renamed the test case accordingly. Test cases that relied on "update" being the same as "insert" are updated accordingly (`test_pre_image_logging`, `test_cdc_across_shards`, `test_add_columns`). Fixes #5723	2020-03-01 17:50:08 +02:00
Avi Kivity	157fe4bd19	Merge "Remove default timeouts" from Botond " Timeouts defaulted to `db::no_timeout` are dangerous. They allow any modifications to the code to drop timeouts and introduce a source of unbounded request queue to the system. This series removes the last such default timeouts from the code. No problems were found, only test code had to be updated. tests: unit(dev) " * 'no-default-timeouts/v1' of https://github.com/denesb/scylla: database: database::query(), database::apply(): remove default timeouts database: table::query(): remove default timeout mutation_query: data_query(): remove default timeout mutation_query: mutation_query(): remove default timeout multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout reader_concurrency_semaphore: wait_admission(): remove default timeout utils/logallog: run_when_memory_available(): remove default timeout	2020-03-01 17:29:17 +02:00
Alejo Sanchez	c3b157a80b	lwt: support LIKE operator in conditional expressions Adds support of LIKE operator in conditional column expressions. Refs: #5777 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-03-01 14:22:10 +01:00
Piotr Sarna	2137017bc3	alternator: revert to ValidationException for JSON errors Both rapidjson library and DynamoDB induce enough corner cases for incorrect JSON, that the simplest way out is to simply conform back to ValidationException in all cases. This commit comes with an updated test, which is now aware of 3 possible outcomes for an incorrect JSON: a ValidationException, a SerializationException and HTTP 404. Message-Id: <5e39d2dc077f4ea5ce360035a4adcddaf3a342a0.1582876734.git.sarna@scylladb.com>	2020-03-01 14:35:20 +02:00
Avi Kivity	1ed06cdb7c	Revert "dist/common/scripts/scylla_coredump_setup: bind-mount coredump directory, add coredump test" This reverts commit `65aadad9a6`. It causes crashes (due to the coredump test) during package install, since scylla_coredump_setup is called from rpm postinstall. The test should be done only from scylla_setup (and the user should be warned). Fixes #5916.	2020-03-01 14:32:31 +02:00
Avi Kivity	db544db5e2	Merge "Convert a few APIs to std::string_view" from Rafael " As part of avoiding static initialization order problems I want to switch a few global sstring to constexpr std::string_view. The advantage being that a constexpr variable doesn't need runtime initialization and therefore cannot be part of a static initialization order problem. In order to do the conversion I needed to convert a few APIs to use std::string_view instead of sstring and const sstring&. These patches are the simple cases that are also an improvement in their own right. " * 'espindola/string_view' of https://github.com/espindola/scylla: (22 commits) test: Pass a string_view to create_table's callback Pass string_view to the schema constructor cql3: Pass string_view to the column_specification constructor Pass string_view to keyspace_metadata::new_keyspace Pass string_view to the keyspace_metadata constructor utils: Use std::string as keys in nonstatic_class_registry utils: Pass a string_view to class_registry::to_qualified_class_name auth: Return a string_view from authorizer::qualified_java_name auth: Return a string_view from authenticator::qualified_java_name utils: Pass string_view to is_class_name_qualified test: Pass a string_view to create_keyspace Pass string_view to no_such_column_family's constructor perf_simple_query: Pass a string_view to make_counter_schema Pass string_view to the schema_builder constructor types: Add more data_value constructors transport: Pass a string_view to cql_server::connection::make_autheticate transport: Pass a string_view to cql_server::response::write_string cql3: Pass std::string_view to query_processor::compute_id cql3: Remove unused variable cql3: Pass a string_view to cf_statement::prepare_keyspace ...	2020-03-01 14:22:28 +02:00
Rafael Ávila de Espíndola	b3d396ea1f	utils: Use on_internal_error from seastar With this change abort_on_internal_error is enable on every SEASTAR_TEST_CASE. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200227164823.21021-1-espindola@scylladb.com>	2020-02-29 19:28:57 +02:00
Pavel Emelyanov	3ab43eba01	validation: Cleanup validate_keyspace helpers One of them uses global storage_proxy instance, but since it is not used -- remove it not to encourage anybody to start calling one. Another call uses the db.find_keyspace to check if a keyspace exists, while there's a nicer db.has_keyspace helper (which doesn't throw exceptions) so use it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200228123644.13931-1-xemul@scylladb.com>	2020-02-29 19:28:57 +02:00
Rafael Ávila de Espíndola	80bfe91a20	test: Pass a string_view to create_table's callback This gives more flexibility to the create_table implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	151f5e723f	Pass string_view to the schema constructor This moves string copies from the callers of the constructor to the implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	fba071163e	cql3: Pass string_view to the column_specification constructor This moves sstring copies from the callers to the constructor implementation. While at it, move the implementation out-of-line. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	ba453d832b	Pass string_view to keyspace_metadata::new_keyspace This avoids a few sstring copies. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	94d07fba07	Pass string_view to the keyspace_metadata constructor This avoids a few sstring copies when constructing keyspace_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	01fe766f1f	utils: Use std::string as keys in nonstatic_class_registry The sstring::compare functions was never updated to work with std::string_view. We could fix that, but it seems better to just switch to std::string. With a working compare function we can avoid copying the argument passed to to_qualified_class_name when an entry is found in the map. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:08 -08:00
Rafael Ávila de Espíndola	31985d3c28	utils: Pass a string_view to class_registry::to_qualified_class_name This just moves a string copy from the caller to the implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 13:30:00 -08:00
Rafael Ávila de Espíndola	df4f1a3bc3	auth: Return a string_view from authorizer::qualified_java_name This gives more flexibility to the implementations as they now don't need to construct a sstring. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 11:45:22 -08:00
Rafael Ávila de Espíndola	c29f8caafc	auth: Return a string_view from authenticator::qualified_java_name This gives more flexibility to the implementations as they now don't need to construct a sstring. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 11:32:36 -08:00
Rafael Ávila de Espíndola	fae05e9268	utils: Pass string_view to is_class_name_qualified With this we don't need to construct a sstring just to call is_class_name_qualified. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	0b57bddb3e	test: Pass a string_view to create_keyspace With this we don't need to construct a sstring just to call create_keyspace. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	2b96abcece	Pass string_view to no_such_column_family's constructor With this we don't have to construct a sstring to construct a no_such_column_family. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	2679c0cc87	perf_simple_query: Pass a string_view to make_counter_schema With this we don't need to construct a sstring just to call make_counter_schema. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	9ab2346e7f	Pass string_view to the schema_builder constructor With this we don't need to construct a sstring just to construct a schema_builder. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	93de9597bf	types: Add more data_value constructors With this we can construct a data_value from any string type. This also avoids a few sstring copies. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	c51d81341b	transport: Pass a string_view to cql_server::connection::make_autheticate With this we don't need to construct a sstring just to call make_autheticate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	c2c44f4778	transport: Pass a string_view to cql_server::response::write_string With this we don't need to construct a sstring just to call write_string. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	4adefd9a76	cql3: Pass std::string_view to query_processor::compute_id With this we don't need to construct a sstring just to call compute_id. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	f44a5255da	cql3: Remove unused variable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	9e00f1e23b	cql3: Pass a string_view to cf_statement::prepare_keyspace This avoids a copy in the callers. While at it, also make this function non-virtual since it is never overwritten. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	2fd3ec8d6f	cql3: Pass a string_view to keyspace_element_name::set_keyspace With this we don't need to construct a sstring just to call set_keyspace. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	35089447cd	cql3: Pass a string_view to keyspace_element_name::to_internal_name This moves the string copy from the callers to the implementation of to_internal_name. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Botond Dénes	5b0cfbb51f	test/boost/mutation_reader_test: test_multishard_streaming_reader: use caller's priority class Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200228073239.475778-1-bdenes@scylladb.com>	2020-02-28 16:39:30 +01:00
Avi Kivity	134d5a5f75	Merge "flat_mutation_reader: abort reverse reads when size of mutation exceeds limit" from Botond " Reverse queries work by reading an entire partition into memory, then start emitting its rows in reverse order. It is easy to see how this can lead to disasters combined with large partitions. In fact a handful of such reverse queries on large partitions is enough to bring a node down. To prevent this, abort reverse queries, when we find out that the size of the partition is larger than a limit. This might be annoying to users, but I'm sure it is not as annoying as their nodes going down. The limit is configurable via `max_memory_for_unlimited_query` configuration option, which is 1MB by default. This limit is propagated to each table, system tables having no limit. This limit is planned to be used by other queries capable of consuming unlimited amount of memory, like unpaged queries. Not in this series. The proper solution would be to read the data in reverse (#1413), but that is a major effort. In the meanwhile make sure the unsuspecting user won't bring their nodes down with an innocent looking ordering directive. Note that for calculating the memory footprint of the partition-in-question, only the clustering rows are used. This should be fine, the 1MB limit is conservative enough that an eventual overshoot caused by the omitted range tombstones and the static row would not make a big difference. Fixes: #5804 " * 'limit-reverse-query-memory-consumption/v3' of https://github.com/denesb/scylla: flat_mutation_reader: make_reversing_reader(): add memory limit db/config: add config memory limit of otherwise unlimited queries utils::updateable_value: add operator=(T) flat_mutation_reader: expose reverse reader as a standalone reader	2020-02-28 07:57:13 +02:00
Rafael Ávila de Espíndola	e670dfc0cd	auth: Fix static initialization order problem A static constructor was used to initialize update_row_query. That constructor would call meta::roles_table::qualified_name() which would access AUTH_KS which is also initialized by a static constructor in another file, so the construction order is not guaranteed. This change turns update_row_query into a function with a static local variable in it. The static local is initialized at first use, fixing the problem. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200227163916.19761-1-espindola@scylladb.com>	2020-02-28 07:57:13 +02:00
Nadav Har'El	7953f7c65f	merge "alternator: Make parsing yieldable" Merged patch series by Piotr Sarna: This series makes json parsing yieldable in order to prevent reactor stalls. It's done by: 1. Extracting the parsing stage out of alternator executor 2. Moving the parsing stage to a separate service, which uses a static seastar thread (parallelism: 1) 3. Wrapping rjson parsing routines with a yieldable parser, which takes advantage of running in a seastar thread and occasionally performs maybe_yield() Step 2 above is only used for JSON's big enough to potentially create stalls - small requests will be parsed immediately, without being redirected to a static thread. Handling a PutItem operation with large JSONs on my machine takes approximately: 1MB doc: ~30ms 3MB doc: ~90ms 12MB doc: ~350ms out of which parsing itself is around: 1MB doc: ~7ms 3MB doc: ~20ms 12MB doc: ~80ms (bonus: 400KiB doc: ~2ms) ; the document was a single object full of small items, which triggers many allocations during parsing. The above numbers were roughly the same before and after the series, but the 12MB document did not cause reactor stalls after the patch. Note: writing the JSON can still be a source of stalls, especially for large documents. Note2: DynamoDB limits single value size to 400KiB, but for batches it will be 16MiB total request size Note3: If parallelism ever proves to be an issue, it's easily increasable by spawning more static threads. Refs: #5742 Tests: alternator(local) manual Piotr Sarna (12): alternator: break lines in server callbacks alternator: allow moving the request from rmw operation alternator: move parsing in front of executor alternator: convert parse to std::string_view alternator: implement json parser inside the server alternator: remove rjson::parse_raw alternator: make rjson yieldable in thread context alternator: fix returning raw JSON errors alternator: change json errors class to SerializationException alternator-test: rename large requests test to 'manual requests' alternator-test: extract getting signed request helper alternator-test: add tests for incorrect JSON documents ...ge_requests.py => test_manual_requests.py} \| 53 +++-- alternator/executor.cc \| 203 ++++++++---------- alternator/executor.hh \| 33 +-- alternator/rjson.cc \| 47 +++- alternator/rjson.hh \| 7 +- alternator/rmw_operation.hh \| 1 + alternator/serialization.cc \| 9 +- alternator/server.cc \| 111 ++++++++-- alternator/server.hh \| 20 +- 9 files changed, 310 insertions(+), 174 deletions(-) rename alternator-test/{test_large_requests.py => test_manual_requests.py} (70%)	2020-02-28 07:57:13 +02:00
Benny Halevy	b31867eafa	types: tri_compare: turn marshal_exception to on_internal_error We see this exception on gemini testing with large number of pk, ck, columns, for example: 2020-02-19T17:52:54+00:00 gemini-8h-large-num-columns-GeminiL-db-node-f2d6a8e0-3 !ERR \| scylla: [shard 0] storage_proxy - Exception when communicating with 10.0.207.169: std::runtime_error (marshaling error: read_simple_exactly - size mismatch (expected 4, got 1) Backtrace: 0x2c4f08d#012 0x9fcd3e#012 0x444b28#012 0x4d8fe5#012 0xa78e8b#012 0xeab269#012 0xc27a67#012 0xc28239#012 0xc600e3#012 0xadebf3#012 0xae14c1#012 0x29ff291#012 0x29ff49f#012 0x2a3fc65#012 0x29a5d6f#012 0x29a6e9e#012 0x72a4e3#012 /opt/scylladb/libreloc/libc.so.6+0x271a2#012 0x77548d#012) Decoded backtrace: seastar::current_backtrace() at crtstuff.c:? seastar::internal::backtraced<marshal_exception>::backtraced<seastar::basic_sstring<char, unsigned int, 15u, true> >(seastar::basic_sstring<char, unsigned int, 15u, true>&&) at crtstuff.c:? void seastar::throw_with_backtrace<marshal_exception, seastar::basic_sstring<char, unsigned int, 15u, true> >(seastar::basic_sstring<char, unsigned int, 15u, true>&&) at crtstuff.c:? abstract_type::compare(std::basic_string_view<signed char, std::char_traits<signed char> >, std::basic_string_view<signed char, std::char_traits<signed char> >) const [clone .cold] at types.cc:? bound_view::tri_compare::operator()(clustering_key_prefix const&, int, clustering_key_prefix const&, int) const at crtstuff.c:? sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? mutation_reader_merger::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? combined_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? cache::cache_flat_mutation_reader::do_fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? This patch should help us get a core dump if this happens again. Ref #5856 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200227131939.388770-1-bhalevy@scylladb.com>	2020-02-28 07:57:13 +02:00
Piotr Sarna	b461750ae3	alternator-test: add tests for incorrect JSON documents The test case sends incorrectly formed JSON documents to alternator, expecting a serialization exception as a response.	2020-02-28 07:57:12 +02:00
Raphael S. Carvalho	40e75fb109	streaming/stream_transfer_task: avoid pointless iterations in has_relevant_range_on_this_shard() When has_relevant_range_on_this_shard() found a relevant range, it will unnecessarily iterate through the end. Verified manually that this could be thousands of pointless iterations when streaming data to a node just added. The relevant code could be simplified by de-futurizing it but I think it remains so to allow task scheduler to preempt it if necessary. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200220224048.28804-2-raphaelsc@scylladb.com>	2020-02-28 07:57:12 +02:00
Piotr Sarna	79b04aeba9	alternator-test: extract getting signed request helper A helper function for getting custom requests is extracted to top-level, in order to be used later by other test cases.	2020-02-28 07:57:12 +02:00
Raphael S. Carvalho	8a986bc23b	streaming/stream_transfer_task: avoid unecessary copies of ranges Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200220224048.28804-1-raphaelsc@scylladb.com>	2020-02-28 07:57:12 +02:00
Piotr Sarna	ad48328407	alternator-test: rename large requests test to 'manual requests' This test suite can then be the parent of tests which use custom, potentially not validated input in order to test alternator against data not easy to push via boto3 or Python, due to their implementation details.	2020-02-28 07:57:12 +02:00
Piotr Sarna	ccdf519829	alternator: make alternator server sharded Previously, alternator server was not directly sharded - and instead kept a helper http server control class, which stored sharded http server inside. That design is confusing and makes it hard to expand alternator server with new sharded attributes, so from now on the alternator server is itself sharded<>. Tests: alternator-test(local, smp==1&smp==4) Fixes #5913 Message-Id: <b50e0e29610c0dfea61f3a1571f8ca3640356782.1582788575.git.sarna@scylladb.com>	2020-02-28 07:57:12 +02:00
Piotr Sarna	c370586189	alternator: change json errors class to SerializationException In order to be consistent with DynamoDB - a parsing error on incorrect JSON input is reported as SerializationException instead of ValidationException.	2020-02-28 07:57:12 +02:00
Piotr Sarna	6f8c70d54b	alternator: fix returning raw JSON errors A couple of places in executor code leaked raw JSON errors to the user instead of formulating a proper ValidationException message. These places are now fixed, and the next patch in this series will act as a regression checker, since all JSON errors will be returned as SerializationException, not ValidationException instances.	2020-02-28 07:57:12 +02:00
Piotr Sarna	1be1cfc5d8	alternator: make rjson yieldable in thread context In order to fight reactor stalls, rjson parsing and writing routines can now yield if they run in seastar thread context. In order to run a yieldable version of the parser which needs to be run in seastar thread context, use parse_yieldable() instead of parse().	2020-02-28 07:57:12 +02:00
Piotr Sarna	0af8516675	alternator: remove rjson::parse_raw With parse() being based on std::string_view, there's not much sense in keeping a separate parse_raw function, so it's deleted.	2020-02-28 07:57:12 +02:00
Piotr Sarna	aad6c01b98	alternator: implement json parser inside the server The json parser runs in a static thread which accepts and parses documents. Documents smaller than a parsing threshold (currently: 16KiB) will be parsed in place without yielding. The assumption is that most alternator requests are small and there's no need to parse them in a yieldable way, which also induces overhead. For reference, parsing a 128KiB document made of many small objects with rapidjson takes around 0.5 millisecond, and a 16KiB document is parsed in around 0.06ms - a value small enough not to disturb Seastar's current value of 0.5ms task quota too much.	2020-02-28 07:57:12 +02:00
Piotr Sarna	ffdbbc0ad0	alternator: convert parse to std::string_view The original implementation used const std::string&, which is less versatile.	2020-02-28 07:57:12 +02:00
Piotr Sarna	2402955d45	alternator: move parsing in front of executor Parsing a request string into JSON happens as a first thing in every request, so it can be performed before calling any executor callbacks. The most important thing however, is that making parsing a separate stage allows certain optimizations, e.g. running all parsing in a single seastar thread, which allows adding yields to rjson parsing later.	2020-02-28 07:57:12 +02:00
Piotr Sarna	c20432bcac	alternator: allow moving the request from rmw operation In order to elide copying the JSON value when rerouting the operation to another shard - a way to move the parsed request from the operation is added.	2020-02-28 07:57:12 +02:00
Piotr Sarna	c7a8549270	alternator: break lines in server callbacks The lines are about to get longer, so they are broken as a first step, to make the next commits more clear.	2020-02-28 07:57:12 +02:00
Botond Dénes	1073094f04	database: database::query(), database::apply(): remove default timeouts	2020-02-27 19:14:12 +02:00
Botond Dénes	2c1ee7b9cd	database: table::query(): remove default timeout	2020-02-27 19:14:09 +02:00
Botond Dénes	8da88e6cb9	mutation_query: data_query(): remove default timeout	2020-02-27 19:02:40 +02:00
Botond Dénes	fdb45d16de	mutation_query: mutation_query(): remove default timeout	2020-02-27 18:56:30 +02:00
Botond Dénes	72509911d9	multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout	2020-02-27 18:45:15 +02:00
Botond Dénes	f6013a39ec	reader_concurrency_semaphore: wait_admission(): remove default timeout	2020-02-27 18:43:12 +02:00
Botond Dénes	93039a085d	utils/logallog: run_when_memory_available(): remove default timeout	2020-02-27 18:36:32 +02:00
Botond Dénes	7bdeec4b00	flat_mutation_reader: make_reversing_reader(): add memory limit If the reversing requires more memory than the limit, the read is aborted. All users are updated to get a meaningful limit, from the respective table object, with the exception of tests of course.	2020-02-27 18:11:54 +02:00
Botond Dénes	75efa707ce	db/config: add config memory limit of otherwise unlimited queries We have a few kind of queries whose memory consumption is not limited at all. One of these is reverse queries, which reads entire partitions into memory, before reversing them. These partitions can be larger than memory and thus such a query can single-handedly cause OOM. This patch introduces a configuration for a memory limit for such queries. This will serve as a hard limit and queries which attempt to use more memory than this, will be aborted. The limit is propagated to table objects, with the intention of keeping system tables unlimited. These tables are usually small and initiators of system queries are not prepared for failures.	2020-02-27 18:11:54 +02:00
Botond Dénes	d1194da98d	utils::updateable_value: add operator=(T) Allow assigning a const value.	2020-02-27 18:11:54 +02:00
Botond Dénes	091d80e8c3	flat_mutation_reader: expose reverse reader as a standalone reader Currently reverse reads just pass a flag to `flat_mutation_reader::consume()` to make the read happen in reverse. This is deceptively simple and streamlined -- while in fact behind the scenes a reversing reader is created to wrap the reader in question to reverse partitions, one-by-one. This patch makes this apparent by exposing the reversing reader via `make_reversing_reader()`. This now makes how reversing works more apparent. It also allows for more configuration to be passed to the reversing reader (in the next patches). This change is forward compatible, as in time we plan to add reversing support to the sstable layer, in which case the reversing reader will go.	2020-02-27 18:11:54 +02:00
Dejan Mircevski	0d7457946f	cql3: Allow repeated LIKE on same column No reason to disallow this. We still forbid mixing LIKE and non-LIKE relations on the same column. Fixes #5902. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-02-27 09:34:51 -05:00
Pekka Enberg	109bb1baa6	cql3: Switch from distributed<> to seastar::sharded<> Convert the last instance of "distributed<>" in cql3 to seastar::sharded<>. Message-Id: <20200227092804.27374-1-penberg@scylladb.com>	2020-02-27 12:09:59 +02:00
Pekka Enberg	123b50cdb9	configure.py: Disable package registry when building Seastar The CMake build system in seastar.git exports the package to CMake package registry. However, we don't use it when building from scylla.git (we link to seastar directly) and get the following warning when building with "dbuild" (that does not bind mount $HOME/.cmake): CMake Warning at CMakeLists.txt:1180 (export): Cannot create package registry file: /home/penberg/.cmake/packages/Seastar/3b6ede62290636bbf1ab4f0e4e6a9e0b No such file or directory Let's just disable the package registry for our builds by setting the CMAKE_EXPORT_NO_PACKAGE_REGISTRY CMake option as discussed here to make the warning go away: https://cmake.org/cmake/help/v3.4/variable/CMAKE_EXPORT_NO_PACKAGE_REGISTRY.html Message-Id: <20200227092743.27320-1-penberg@scylladb.com>	2020-02-27 12:09:59 +02:00
Takuya ASADA	01a03c4d69	install.sh: run post-install script just like .rpm/.deb package To install scylla using install.sh easily, we need to run following things: - add scylla user/group - configure scylla.yaml - run scylla_post_install.sh But we don't want to run them when we build .rpm/.deb package, we also need to add --packaging option to skip them. Fixes #5830	2020-02-27 11:17:24 +02:00
Dejan Mircevski	acccab31f7	cql3: Forbid calling LIKE::values() We were incorrectly returning the LIKE pattern as if it were a column value. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-02-26 14:07:46 -05:00
Dejan Mircevski	fd583196ce	cql3: Move LIKE::_last_pattern to matcher Instead of keeping the LIKE pattern in a restriction object (as we currently do), keep it in like_matcher. Also move the pattern-idempotence check from the restriction to the matcher. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-02-26 14:00:04 -05:00
Avi Kivity	956b092012	Merge "Repair based node operation" from Asias " Here is a simple introduction to the node operations scylla supports and some of the issues. - Replace operation It is used to replace a dead node. The token ring does not change. It pulls data from only one of the replicas which might not be the latest copy. - Rebuild operation It is used to get all the data this node owns form other nodes. It pulls data from only one of the replicas which might not be the latest copy. - Bootstrap operation It is used to add a new node into the cluster. The token ring changes. Do no suffer from the "not the latest replica” issue. New node pulls data from existing nodes that are losing the token range. Suffer from failed streaming. We split the ranges in 10 groups and we stream one group at a time. Restream the group if failed, causing unnecessary data transmission on wire. Bootstrap is not resumable. Failure after 99.99% of data is streamed. If we restart the node again, we need to stream all the data again even if the node already has 99.99% of the data. - Decommission operation It is used to remove a live node form the cluster. Token ring changes. Do not suffer “not the latest replica” issue. The leaving node pushes data to existing nodes. It suffers from resumable issue like bootstrap operation. - Removenode operation It is used to remove a dead node out of the cluster. Existing nodes pulls data from other existing nodes for the new ranges it own. It pulls from one of the replicas which might not be the latest copy. To solve all the issues above. We could use repair based node operation. The idea behind repair based node operations is simple: use repair to sync data between replicas instead of streaming. The benefits: - Latest copy is guaranteed - Resumable in nature - No extra data is streamed on wire E.g., rebuild twice, will not stream the same data twice - Unified code path for all the node operations - Free repair operation during bootstrap, replace operation and so on. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test " * 'repair_for_node_ops' of https://github.com/asias/scylla: docs: Add doc for repair_based_node_ops storage_service: Enable node repair based ops for bootstrap storage_service: Enable node repair based ops for decommission storage_service: Enable node repair based ops for replace storage_service: Enable node repair based ops for removenode storage_service: Enable node repair based ops for rebuild storage_service: Use the same tokens as previous bootstrap storage_service: Add is_repair_based_node_ops_enabled helper config: Add enable_repair_based_node_ops repair: Add replace_with_repair repair: Add rebuild_with_repair repair: Add do_rebuild_replace_with_repair repair: Add removenode_with_repair repair: Add decommission_with_repair repair: Add do_decommission_removenode_with_repair repair: Add bootstrap_with_repair repair: Introduce sync_data_using_repair repair: Propagate exception in tracker::run	2020-02-26 20:37:25 +02:00
Avi Kivity	35e5772b94	Update seastar submodule * seastar 7a3b4b4e4e...affc3a5107 (6): > Merge "Add the possibility to remove rules from routes" from Pavel > stall_detector: expose correct clock type to use > queue: add has_blocked_consumer() function > Merge "core: reduce memory use for idle connections" from Avi > testing: Enable abort_on_internal_error on tests > core: Add a on_internal_error helper	2020-02-26 19:21:24 +02:00
Rafael Ávila de Espíndola	17f12a8197	perf_simple_query: Call set_abort_on_internal_error(true) We should never ignore an internal error in a perf test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200225055745.321086-2-espindola@scylladb.com>	2020-02-26 18:22:05 +02:00
Rafael Ávila de Espíndola	c6897dcbea	perf_simple_query: Simplify with seastar::thread There is no reason not to use a seastar::thread in setup code. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200225055745.321086-1-espindola@scylladb.com>	2020-02-26 18:22:04 +02:00
Nadav Har'El	3e44356c9f	alternator-test: fix tests failing with HTTPS When we test Alternator on its HTTPS port (i.e., pytest --https), we don't want requests to verify the pedigree of the SSL certificate. Our "dynamodb" fixture (conftest.py) takes care of this for most of the tests, but a few tests create their own requests and need to pass the "verify=False" option on their own. In some tests, we forgot to do this, and this patch fixes three tests which failed with "pytest --https". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200226142330.27846-1-nyh@scylladb.com>	2020-02-26 15:29:24 +01:00
Nadav Har'El	cf8354f703	merge "cdc: Fix `operation` value for row deletes" Merged pull request https://github.com/scylladb/scylla/pull/5897 from Juliusz Stasiewicz: Column operation now contains operation::row_delete (== 2) after queries like delete from tbl where pk=x and ck=y;. Before this patch row deletes were treated as updates, which was incorrect because updates do not contain row tombstones (and row deletes do). Refs #5709	2020-02-26 16:26:34 +02:00
Juliusz Stasiewicz	f425f7d217	tests/cdc: added test for row delete <-> update differentiation	2020-02-26 12:32:16 +01:00
Juliusz Stasiewicz	836183b847	cdc: fix `operation` value for row deletes Column `operation` now contains `operation::row_delete` (== 2) after queries like `delete from tbl where pk=x AND ck=y;`. Before this patch row deletes were treated as updates, which was incorrect because updates do not contain row tombstones (and row deletes do). Refs #5709	2020-02-26 11:58:50 +01:00
Nadav Har'El	6da4d65f12	merge: Fix alternator decommision/shutdown Merged patch series from Piotr Sarna: Alternator shutdown routines were only registered in main.cc, but it's not enough - other operations, like decommision, also rely on shutting down client servers. In order to remedy the situation, a notion of client shutdown listeners is introduced to storage service. A shutdown listener implements a callback used by the storage service when client servers need to shut down, and at the same time it does not force storage service to keep a reference for the client service itself. NOTE: the interface can also be used later to provide proper shutdown routines for redis and any other future APIs. Fixes #5886 Tests: alternator-test(local, including a shutdown during the run) Piotr Sarna (4): storage_service: make shutdown_client_servers() thread-only storage_service: add client shutdown hook main: make alternator shutdown hook-based main: reduce scope of alternator services main.cc \| 18 +++++++++--------- service/storage_service.cc \| 22 +++++++++++++++++----- service/storage_service.hh \| 15 ++++++++++++++- 3 files changed, 40 insertions(+), 15 deletions(-)	2020-02-26 12:45:30 +02:00
Botond Dénes	a83cca93ff	scylla-gdb.py: introduce std_deque A python read-only container wrapper for std::deque. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200225184951.125129-1-bdenes@scylladb.com>	2020-02-26 11:20:50 +01:00
Takuya ASADA	65aadad9a6	dist/common/scripts/scylla_coredump_setup: bind-mount coredump directory, add coredump test On some environment systemd-coredump does not work with symlink directory, we can use bind-mount instead. Also, it's better to check systemd-coredump is working by generating coredump. Fixes #5753	2020-02-26 11:21:48 +02:00
Takuya ASADA	8e901636fc	scylla_setup: fix --nic option on non-interactive mode scylla_setup should not shows up NIC selection prompt on non-interactive mode. Fixes #5725 Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2020-02-26 11:13:53 +02:00
Piotr Sarna	148456a741	main: reduce scope of alternator services With the new shutdown routines in place, alternator executor and server do not need to be declared outside of the `if` clause which conditionally sets up alternator.	2020-02-26 08:45:07 +01:00
Piotr Sarna	33ce8379ba	main: make alternator shutdown hook-based In order to properly handle not only shutdown, but also decommission, drain and similar operations, alternator shutdown is now registered as a client shutdown hook, which allows storage service to trigger its shutdown routines. Fixes #5886	2020-02-26 08:44:56 +01:00
Piotr Sarna	8d499603aa	storage_service: add client shutdown hook The shutdown hook interface can be used later by additional client interfaces (e.g. alternator, redis) to register shutdown routines for various operations: Scylla shutdown, node decommission, drain, etc. It also decouples the services themselves from being part of the storage service, since it's huge enough as it is.	2020-02-26 08:44:35 +01:00
Piotr Sarna	171bc9a3df	storage_service: make shutdown_client_servers() thread-only The function is only ever called in thread context, so it's moved from being future<>-based in order to ease future changes.	2020-02-26 08:18:42 +01:00
Nadav Har'El	0ab6c7fcef	alternator: stricter checks for user-supplied attribute values Until now, PutItem or UpdateItem could be used to insert almost any JSON as an attribute's value - even those that do not match DynamoDB's typed value specification. Among other things, the new validation allows us to reject empty sets, strings or byte arrays - which are (somewhat artificially) forbidden in DynamoDB. Also added tests for the empty sets, strings and byte arrays that should be rejected. Fixes #5896 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200225150525.4926-1-nyh@scylladb.com>	2020-02-26 08:12:26 +01:00
Nadav Har'El	6339f419ac	alternator: removing all elements from a set should delete it DynamoDB does not support empty sets. Operations which remove elements from a set attribute should remove the attribute when the last item is removed - not leave an empty set as it incorrectly does now. Incidentally, the same patch fixes another bug - deleting elements from a non-existent set attribute should be allowed (and do nothing), not fail as it does now. This patch also includes tests for both bugs. Fixes #5895 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200225125343.31629-1-nyh@scylladb.com>	2020-02-26 08:12:19 +01:00
Nadav Har'El	acb7f45ca7	alternator-test: add tests for UpdateItem's AttributeUpdates DELETE and ADD We have not yet implemented the DELETE-with-value and ADD operations in UpdateItem's old-style "AttributeUpdates" parameter - see issue #5864 and issue #5893, respectively This patch include comprehensive tests for both features. The new tests pass on DynamoDB, but currently xfails on Alternator - until these features will be implemented. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200225105546.25651-1-nyh@scylladb.com>	2020-02-26 08:12:10 +01:00
Botond Dénes	ea08d7a0df	scylla-gdb.py: make get_text_range() more reliable Currenly `get_text_range()` uses heuristics about which ELF section actually contains the text for the main executable. It appears that this fails from time-to-time and we have to adjust the heuristics. We don't really have to guess however, a much better method of determining the section hosting text is to find a vtable pointer and locate the section it resides in. For this, we use the `reactor::_backend` as a canary. When this is not available, we fall back to the pre-existing heuristics. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200225164719.114500-1-bdenes@scylladb.com>	2020-02-25 19:02:26 +01:00
Calle Wilund	a3a764fd10	cdc: Handle non-atomic columns Fixes #5669 This implements non-atomic collection and UDT handling for both cdc preimage + delta. To be able to express deltas in a meaningful way (and reconstruct using it), non-atomic values are represented somewhat differently from regular values: * maps - stored as is (frozen) * sets - stored as is (frozen) * lists - stored as map<timeuuid, value> (frozen) this allows reconstructing the list, as otherwise things like list[0] = value cannot be represented in a meaningful way * udt - stored as tuple<tuple<field0>, tuple<field1>...> (frozen) UDTs are normally just tuples + metadata, but we need to distinguish the case of outer tuple element == null, meaning "no info/does not partake in mutation" from tuple element being a tuple(null) (i.e. empty tuple), meaning "set field to null"	2020-02-25 19:34:54 +02:00
Avi Kivity	d17ebde46b	Update seastar submodule * seastar 8b6bc659c7...7a3b4b4e4e (3): > Merge "Add custom stack size to seastar threads" from Piotr Ref #5742. > expiring_fifo: Optimize memory usage for single-element lists Ref #4235. > Close connection, when reach to max retransmits	2020-02-25 18:02:25 +02:00
Pavel Emelyanov	7363d56946	sstables: Move get_highest_supported_format The global get_highest_supported_format helper and its declaration are scattered all over the code, so clean this up and prepare the ground for moving _sstables_format from the storage_service onto the sstables_manager (not this set). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:45 +03:00
Pavel Emelyanov	792cec39df	sstables: Remove global get_config() helper Finally, the thing is not used by anyone and can be removed. This greatly relaxes the sstables -> storage_service dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Applauded-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-25 14:31:45 +03:00
Pavel Emelyanov	1af065296e	sstables: Use manager's config() in .new_sstable_component_file() This is the last place left that calls for global get_config(), switch it onto _sst_manager.config(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:43 +03:00
Pavel Emelyanov	5dea657991	sstable_writer_config: Extend with more db::config stuff The enable_sstable_key_validation and summary_bytes_cost are used in sstables writing code, keeping them on sstable_writer_config removes more calls to global get_config(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:34 +03:00
Pavel Emelyanov	85d9326d70	sstables_manager: Don't use global helper to generate writer config The main goal of this patch is to stop using get_config() glbal when creating the sstable_writer_config instance. Other than being global the existing get_config() is also confusing as it effectively generates 3 (three) sorts of configs -- one for scylla, when db config and features are ready, the other one for tests, when no storage service is at hands, and the third one for tests as well, when the storage service is created by test env (likely intentionally, but maybe by coincidence the resulting config is the same as for no-storage-service case). With this patch it's now 100% clear which one is used when. Also this makes half the work of removing get_config() helper. The db::config and feature_service used to initialize the managers are referenced by database that creates and keeps managers on, so the references are safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Pavel Emelyanov	3a603729d4	sstable_writer_config: Sanitize out some features fields initialization Similar to previous patch -- initialize config fields from features in configurator, not in default initializers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Pavel Emelyanov	34302a3e1c	sstable_writer_config: Factor out some field initialization The promoted_index_block_size is taken from db config in two places. Factor this out and, at the same time, stop keeping it as std::optional. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Pavel Emelyanov	5adce3390c	sstables: Generate writer config via manager only The sstable_writer_config creation looks simple (just declare the struct instance) but behind the scenes references storage and feature services, messes with database config, etc. This patch teaches the sstables_manager generate the writer config and makes the rest of the code use it. For future safety by-hands creation of the sstable_writer_config is prohibited. The manager is referenced through table-s and sstable-s, but two existing sstables_managers live on database object, and table-s and sstable-s both live shorter than the database, this reference is save. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Pavel Emelyanov	f289da1e3b	sstables: Keep reference on manager This is needed for further patching. The sstables_manager outlives all sstables objects, so it's safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:03 +03:00
Pavel Emelyanov	e73e923e95	test: Re-use existing global sstables_manager The sstables_manager in scylla binary outlives the sstables objects created by it, this makes it possible to add sstable->manager reference and use it. In unit tests there are cases when sstables::test_env that keeps manager in _mgr field is destroyed right after sstable creation (e.g. -- in the boost/sstable_mutation_test.cc ka_sst() helper). Fix this by chaning the _mgr being reference on the manager and initialize it with already existing global manager. Few exceptions from this rule that need to set own large data handler will create the sstable_manager their own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 13:54:41 +03:00
Pavel Emelyanov	961f1642c7	table: Pass sstable_writer_config into write_memtable_to_sstable The latter creates the config by hands, but the plan is to create it via sstables_manager. Callers of this helper are the final frontiers where the manager will be safely accessible. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 13:54:40 +03:00
Asias He	aaa1f3ce7b	docs: Add doc for repair_based_node_ops This patch adds a doc for the repair based node operations.	2020-02-25 08:54:35 +08:00
Asias He	ac90c1c184	storage_service: Enable node repair based ops for bootstrap - Bootstrap operation It is used to add a new node into the cluster. The token ring changes. Do not suffer from the "not the latest replica” issue. New node pulls data from existing nodes that are losing the token range. Suffer from failed streaming. We split the ranges in 10 groups and we stream one group at a time. Restream the group if failed, causing unnecessary data transmission on wire. Bootstrap is not resumable. Failure after 99.99% of data is streamed. If we restart the node again, we need to stream all the data again even if the node already has 99.99% of the data. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-25 08:54:33 +08:00
Asias He	62f056c022	storage_service: Enable node repair based ops for decommission - Decommission operation It is used to remove a live node form the cluster. Token ring changes. Do not suffer “not the latest replica” issue. The leaving node pushes data to existing nodes. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-25 08:53:37 +08:00
Asias He	a38916121c	storage_service: Enable node repair based ops for replace - Replace operation It is used to replace a dead node. The token ring does not change. It pulls data from only one of the replicas which might not be the latest copy. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-25 08:53:36 +08:00
Glauber Costa	628dd16519	compaction: deprecate DTCS. Step 1. This patch adds a warning of deprecation to DTCS. In a follow up step, we will start requiring a flag for it to be enabled to make sure users notice. For now we'll just be nice and add a warning for the log watchers. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200224164405.9656-1-glauber@scylladb.com>	2020-02-24 20:26:24 +02:00
Takuya ASADA	5a7beef6a0	dist/common/scripts/scylla_coredump_setup: don't create /etc/sysctl.d/99-scylla-coredump.conf on CentOS8 We don't need to create 99-scylla-coredump.conf on CentOS8, the file is only needed for CentOS7. Fixes #5818	2020-02-24 17:38:47 +02:00
Takuya ASADA	fa423e25d4	scylla_setup: shows up usage when --nic is not specified & eth0 is not available Since we set 'eth0' as default NIC name, we get following error when running scylla_setup in non-interactive mode without --nic parameter: $ sudo scylla_setup --setup-nic-and-disks --no-raid-setup --no-verify-package --no-io-setup NIC eth0 doesn't exist. It looks strange since user actually does not specified 'eth0', they might forget to specify --nic. I think we should shows up usage, when eth0 is not available on the system. Fixes #5828	2020-02-24 17:35:40 +02:00
Piotr Dulikowski	41d82e39ea	storage proxy: rename mutate_hint_from_scratch Changes the name of storage_proxy::mutate_hint_from_scratch function to another name, whose meaning is more clear: send_hint_to_all_replicas. Tests: unit(dev)	2020-02-24 17:30:22 +02:00
Takuya ASADA	29285b28e2	dist/debian: fix "unable to open node-exporter.service.dpkg-new" error It seems like .service is conflicting on install time because the file installed twice, both debian/.service and debian/scylla-server.install. We don't need to use *.install, so we can just drop the line. Fixes #5640	2020-02-24 17:28:14 +02:00
Juliusz Stasiewicz	127e258ade	cql3: Fix missing aggregate functions for counters Aggregate functions on counters do not exist. Until now counters could, at best, fall back to blob->blob overloads, e.g.: ``` cqlsh> select max(cnt) from ks.tbl; system.max(cnt) ---------------------- 0x000000000000000a (1 rows) cqlsh> select sum(entities) from ks.tbl; InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid call to function sum, none of its type signatures match [...] ``` Meanwhile, counters are compatible with bigints (aka. `long_type'), so bigint overloads can be used on them (e.g. sum(bigint)->bigint). This is achieved here by a special rule in overload resolution, which makes `selector' perceive counters as an `EXACT_MATCH' to counter's underlying type (`long_type', aka. bigint).	2020-02-24 17:14:44 +02:00
Juliusz Stasiewicz	0ea17216fe	atomic_cell: special rule for printing counter cells Until now, attempts to print counter update cell would end up calling abort() because `atomic_cell_view::value()` has no specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`, i.e. counter update IMR type. Such visitor is not easy to write if we want to intercept counters only (and not all int64_t values). Anyway, linearized byte representation of counter cell would not be helpful without knowing if it consists of counter shards or counter update (delta) - and this must be known upon `deserialize`. This commit introduces simple approach: it determines cell type on high level (from `atomic_cell_view`) and prints counter contents by `counter_cell_view` or `atomic_cell_view::counter_update_value()`. Fixes #5616	2020-02-24 17:11:34 +02:00
Benny Halevy	25a763a187	dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with the binary's build-id when stripping its debug info as it is passed the `--build-id-seed <version>.<release>` option. To prevent that we need to set the following macros as follows: unset `_unique_build_ids` set `_no_recompute_build_ids` to 1 Fixes #5881 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-24 11:50:20 +02:00
Nadav Har'El	4b7577e429	alternator-test: correct typo "existant" The official documentation language of Scylla is English, not French. So correct the word "existant", which appeared several times throughout Alternator's tests, to "existent". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-6-nyh@scylladb.com>	2020-02-24 10:40:53 +01:00
Nadav Har'El	e075eff915	alternator: complete implementation of ReturnValues parameter This patch completes the support for the ReturnValues parameter for the UpdateItem operation. This parameter has five settings - NONE, ALL_OLD, ALL_NEW, UPDATED_OLD and UPDATED_NEW. Before this patch we already supported NONE and ALL_OLD - and this patch completes the support for the three remaining modes: ALL_NEW, UPDATED_OLD and UPDATED_NEW. The patch also continues to improve test_returnvalues.py with additional corner cases discovered during the development. After this patch, only one xfailing test remains - testing updates to nested document paths, which we do not yet support (even without the ReturnValues parameter). After this patch, the support of ReturnValues is complete - for all operations (UpdateItem, PutItem and DeleteItem) and all of its possible settings. Fixes #5053 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-5-nyh@scylladb.com>	2020-02-24 10:40:53 +01:00
Nadav Har'El	1e500a2a34	alternator: rjson: another variant of set_with_string_name() utility The rjson::set_with_string_name() utility function copies the given string into the JSON key. The existing implementation required that this input string be an std::string&, but a std::string_view would be fine too, and I want to use it in new code to avoid yet another unnecessary copy. Adding the overloads also exposes a few places where things were implicitly converted to std::string and now cause an ambiguity - and clearing up this ambiguity also allowed me to find places where this conversion was unnecessary. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-4-nyh@scylladb.com>	2020-02-24 10:38:54 +01:00
Nadav Har'El	fa5c2a4f58	alternator: UpdateItem only deleting attribute shouldn't create item UpdateItem operations usually need to add a row marker: * An empty UpdateItem is supposed to create a new empty item (row). Such an empty item needs to have a row marker. * An UpdateItem to add an attribute x and then later an UpdateItem to remove this attribute x should leave an empty item behind. This means the first UpdateItem needed to add a row marker, so it will be left behind after the second UpdateItem. So the existing code always added a row marker in UpdateItem. However, there is one case where we should NOT create the row marker: When the UpdateItem operation only has attribute deletions, and nothing else, and it is applied to a key with no pre-existing item, DynamoDB does not create this item. So neither should we. This patch includes a new test for this test_update_item_non_existent, which passes on DynamoDB, failed on Alternator before this patch, and passes after the patch. Fixes #5862. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-3-nyh@scylladb.com>	2020-02-24 10:38:10 +01:00
Nadav Har'El	3cde949980	alternator-test: test for BatchWriteItem same key in two tables In issue #5698 I raised a theory that we might have a bug when BatchWriteItem is given two writes to the same key but in two different tables. The test added here verifies that this theory was wrong, and this case already works correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-2-nyh@scylladb.com>	2020-02-24 10:37:23 +01:00
Piotr Sarna	5e07c00eeb	Merge 'Delete table snapshot' from Amnon This series adds an option to the API that supports deleting a specific table from a snapshot. The implementation works in a similar way to the option to specify specific keyspaces when deleting a snapshot. The motivation is to allow reducing disk-space when using the snapshot for backup. A dtest PR is sent to the dtest repository. Fixes #5658 Original PR #5805 Tests: (database_test) (dtest snapshot_test.py:TestSnapshot.test_cleaning_snapshot_by_cf) * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot	2020-02-24 09:38:57 +01:00
Pekka Enberg	263261fa15	README: Remove out-of-date package build instructions The package build instructions in README.md are out-of-date so let's remove them. Message-Id: <20200224064632.3285-1-penberg@scylladb.com>	2020-02-24 10:25:07 +02:00
Pekka Enberg	684e4602dc	redis: Fix DB index error message The error message (silently) changed to "DB index is out of range" the following commit: `c7a4e694ad` The new error message is part of Redis 4.0, released in 2017, so let's switch Scylla to use the new one. Message-Id: <20200211133946.746-1-penberg@scylladb.com>	2020-02-24 10:22:27 +02:00
Pavel Emelyanov	60bdf0685c	cql3: Clean cql3/ from remaining storage_service mentionings These are several #include-s and the no longer valid comment. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	d639d4ed5f	cql3: Parse cf name in drop_index_satement::validate The patch `759752947b` explains why the .column_family method of this statament implementation must be tuned to calculate the column_family in some cases. However, to do this the global storage_proxy is needed. The proposal is to calculate the column_family in .validate method, like it's done e.g. for function_statement-s, which has storage_proxy reference at hands. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	a0a0d40267	cql3: Use proxy arg in batch_statement::verify_batch_size Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	bf7004326e	cql3: Use proxy arg in drop_index_statement::lookup_indexed_table Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	9bb67b5771	cql3: Don't get global storage_proxy Get rid of numerous calls to get_local_stroage_proxy().get_db() and use the storage proxy argument that's already avaliable in most of them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	6892dbdde7	cql3: Add storage_proxy argument to .check_access method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:19 +03:00
Asias He	f4b4192c91	storage_service: Enable node repair based ops for removenode - Removenode operation It is used to remove a dead node out of the cluster. Existing nodes pulls data from other existing nodes for the new ranges it own. It pulls from one of the replicas which might not be the latest copy. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-24 11:11:41 +08:00
Asias He	cf0601735e	storage_service: Enable node repair based ops for rebuild - Rebuild operation It is used to get all the data this node owns form other nodes. It pulls data from only one of the replicas which might not be the latest copy. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-24 11:11:41 +08:00
Asias He	3b64b4bb17	storage_service: Use the same tokens as previous bootstrap With repair based node operations, we can resume previous failed bootstrap. In order to do that, we need the bootstrap node uses the same tokens as previous bootstrap. Currently, we always use new tokens when we bootstrap, because we need to stream all the ranges anyway. It does not matter if we use the same tokens or not.	2020-02-24 11:11:41 +08:00
Asias He	a4c614914a	storage_service: Add is_repair_based_node_ops_enabled helper It is used to check if repair based node operations are enabled or not.	2020-02-24 11:11:40 +08:00
Asias He	cb4045e11d	config: Add enable_repair_based_node_ops An option to enable the repair based node operations.	2020-02-24 11:11:40 +08:00
Asias He	1672f64add	repair: Add replace_with_repair It is used to replace a dead node using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	960ce7ab54	repair: Add rebuild_with_repair It is used to rebuild a node using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	b488ab7d11	repair: Add do_rebuild_replace_with_repair The rebuild and replace operations are similar because the token ring does not change for both of them. Add a common helper to do rebuild and replace with repair. It will be used by rebuild and replace operation shortly.	2020-02-24 11:11:40 +08:00
Asias He	b18e078ca2	repair: Add removenode_with_repair It is used to remove a dead node from a cluster using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	e9a9fde1f7	repair: Add decommission_with_repair It is used to decommission a node using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	569c126a84	repair: Add do_decommission_removenode_with_repair It will be used by decommission and removenode operation shortly.	2020-02-24 11:11:40 +08:00
Asias He	9c67389cc8	repair: Add bootstrap_with_repair It is used to bootstrap a node using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	198cad6179	repair: Introduce sync_data_using_repair It is used to sync data for node operations like bootstrap, decommission and so on. Unlike plain repair operation, the user of sync_data_with_repair() can pass repair_neighbors object to specify the pre-calculated neighbors for a range. If a mandatory neighbor is not available, the repair will fail so that the upper layer can fail the node operation.	2020-02-24 11:11:40 +08:00
Asias He	1038e375af	repair: Propagate exception in tracker::run In sync_data_with_repair, we depends on return future of tracker::run to tell if the repair is successful or not.	2020-02-24 11:11:40 +08:00
Piotr Sarna	14dfa3c0c3	alternator: change keyspace prefix to alternator_ The original idea of prefixing alternator keyspace names with 'a#' leveraged the fact that '#' is not a legal CQL character for keyspace names. The idea is flawed though, since '#' proved to confuse existing Scylla tools (e.g. nodetool). Thus, the prefix is changed to more orthodox 'alternator_'. It is possible to create such keyspaces with CQL as well, but then the alternator CreateTable request would simply fail, because the keyspace already exists, which is graceful enough. Hiding alternator keyspaces and tables from CQL is another issue, but there are other ways to distinguish them than a non-standard prefix, e.g. tags. Fixes #5883	2020-02-23 23:32:29 +02:00
Pavel Emelyanov	049b549fdc	api: Register /v2/config stuff after database is started The set_config registers lambdas that need db.local(), so these routes must be registered after database is started. Fixes: #5849 Tests: unit(dev), manual wget on API Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200219130654.24259-1-xemul@scylladb.com>	2020-02-23 17:09:03 +02:00
Takuya ASADA	3d1154272f	dist/debian: remove unused dependencies Since we moved relocatable package, almost all dependencies are not needed now.	2020-02-23 15:36:13 +02:00
Takuya ASADA	98c182ec67	dist/redhat: align dependencies with debian On Debian, we don't add xfsprogs/mdadm on package dependency, install on scylla_raid_setup script instead. Since xfsprogs/mdadm only needed for constructing RAID, we can move dependencies to scylla_raid_setup too.	2020-02-23 15:34:35 +02:00
Piotr Sarna	4ad577b40c	alternator: add content length limit to alternator servers This patch adds a 16MB content length limit to alternator HTTP(S) servers. It also comes with a test, which verifies that larger requests are refused. Fixes #5832 Tests: alternator-test(local,remote) Message-Id: <29d5708f4bf9f41883d33d21b9cca72b05170e6c.1582285070.git.sarna@scylladb.com>	2020-02-23 14:34:20 +02:00
Piotr Sarna	085cd857ab	alternator-test: limit the number of retries to 3 In order to decrease the developer's time spent on waiting for boto3 to retry the request many times, the retry count is configured to be 3. Two major benefits: - vastly decrease wait time when debugging a failing test - for requests which are expected to fail, but return results not compatible with boto3, execution time is decreased Tests: alternator-test(local,remote) Message-Id: <46a3a9344d9427df7ea55c855f32b8f0e39c9b79.1582285070.git.sarna@scylladb.com>	2020-02-23 14:19:38 +02:00
Pavel Emelyanov	f4e789a9c2	range_streamer: Fix off-by-size in stream progress log The nr_ranges_streamed denotes the number of ranges streamed so far, but by the time the sending lambda is called this counter is already incremented by the number of ranges to be streamed in this call. And the variable is not used for anything else but logging. Fix this by swapping logging with incrementing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200221101601.18779-1-xemul@scylladb.com>	2020-02-23 11:20:17 +02:00
Tomasz Grabiec	3e83d30daf	gdb: scylla sstables: Fix for older versions of GDB Some GDB versions complain about subscript being a gdb.Value Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1582308177-24893-1-git-send-email-tgrabiec@scylladb.com>	2020-02-23 11:17:20 +02:00
Tomasz Grabiec	e7dece7f1e	gdb: scylla sstables: Allow locating sstables attached to tables This patch adds an alternative way to locate sstables by looking at sstable sets in table objects: scylla sstables -t This may be useful for several things. One is to identify sstables which are not attached to tables. Another use case is to be able to use the command on older versions of scylla which don't have sstable tracking. Message-Id: <1582308099-24563-1-git-send-email-tgrabiec@scylladb.com>	2020-02-23 11:16:20 +02:00
Piotr Sarna	e1ecd0d637	doc: refer to dev build mode instead of release The paragraph about adding `Tests:` footer imply that it's preferred to run tests in release mode, while dev is equally good and compiles faster. Message-Id: <9e1ad1a4e1529d30abb3adb1923b007c52ccf955.1582282066.git.sarna@scylladb.com>	2020-02-23 11:11:44 +02:00
Rafael Ávila de Espíndola	fc018a73bb	build: Add the --enable-stack-guards and --disable-stack-guards options I neither is used, we get the default behavior: only release is built without stack guards. With --disable-stack-guards all modes are built without stack guards. With --enable-stack-guards all modes are built with stack guards. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200222012732.992380-1-espindola@scylladb.com>	2020-02-23 11:05:13 +02:00
Avi Kivity	197adf4c0d	Update seastar submodule * seastar cdda3051e3...8b6bc659c7 (2): > core/file-types.hh: Fix missing header > cmake: Add a Seastar_STACK_GUARDS cmake option	2020-02-23 11:03:59 +02:00
Tomasz Grabiec	3a4597f8f3	Merge remote-tracking branch 'xemul/br-repair-remove-storage-service' into next	2020-02-23 10:29:34 +02:00
Pavel Emelyanov	897bbeabea	storage_service: Relax _is_bootstrap_mode The variable in question was used to check that the bootstrap mode finishes correctly, but it was removed, becase this check was for self-evident code and thus useless (`dbca327b`) Later, the patch was reverted to keep track the bootstrap mode for API is_cleanup_allowed call (`a39c8d0e`) This patch is a reworked combination of both -- the variable is kept for API sake, but in a much simpler manner. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200221101813.18945-1-xemul@scylladb.com>	2020-02-23 10:26:50 +02:00
Pavel Emelyanov	a364190700	storage_service: Remove if-0-ed-out Java code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200221101704.18868-1-xemul@scylladb.com>	2020-02-23 10:26:50 +02:00
Pavel Emelyanov	38143a76c7	main: Register stop_gossiping earlier The _scheduled_gossip_task timer needs token_metadata and thus should be stopped before. However, this is not always the case. The timer is armed in start_gossiping, which is called by storage_service init_server_without_the_messaging_service_part, and is canceled inside stop_gossiping, which in turn is called by drain_on_shutdown, which in turn is registered too late. If something fails between the internals of the init_server_... and defered registration of drain_on_shutdown (lots of reasons) the timer is not stopped and may run, thus accessing the freed token_metadata. Bandaid this by scheduling stop_gossiping right after the gossiper instances are created. This can be too early (before storage_service starts gossiping) or too late (after drain_on_shutdown stops it), but this function is re-entrable. Fixes #5844 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200221085226.16494-1-xemul@scylladb.com>	2020-02-23 10:26:50 +02:00
Pavel Emelyanov	72a6d38e6c	storage_service: Merge identical branches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200210185011.25244-1-xemul@scylladb.com>	2020-02-23 10:26:49 +02:00
Piotr Sarna	dae86849a2	Update seastar submodule * seastar 2b510220...cdda3051 (10): > core: discard unused variable / function > pollable_fd: use boost::intrusive_ptr rather than std::unique_ptr for lifecycle management > build: check for pthread_setname_np() > build: link against Threads::Threads > future: Avoid recursion in do_for_each > future: Expand description of parallel_for_each > merge: Add content length limit to httpd > tests/scheduling_group_test: verify current scheduling group is inherited as expected > net: return future<> instead of subscription<> > cmake: be more verbose when looking for libraries	2020-02-23 10:26:49 +02:00
guy9	a7586c6f7d	added training section to readme file	2020-02-21 11:36:18 +01:00
Nadav Har'El	e8cbbba653	alternator: partial implementation of ReturnValues parameter Before this patch, we only supported the ReturnValues=NONE setting of the PutItem, UpdateItem and DeleteItem operations. This patch also adds full support for the ReturnValues=ALL_OLD option in all three operation. This option directs Alternator to return the full old (i.e., pre-modification) contents of the item. We implement this as a RMW (read-modify-write) operation just as we do other RMW operations - i.e., by default we use LWT, to ensure that we really return the value of the item directly before the modification, the same value that would have been used in a conditional expression if there was one. NOTE: This implementation means one cannot use ReturnValues=ALL_OLD in forbid_rmw write isolation mode. One may theorize that if we only need the read-before-write for ReturnValues and not for a conditional expression, it should have been enough to use a separate read (as we do in unsafe_rmw isolation mode) before the write. But we don't have this "optimization" yet and I'm not sure it's a valid optimization at all - see discussion in a new issue #5851. This patch completes the ReturnValues support for the PutItem and DeleteItem operations. However, the third operation, UpdateItem, supports three more ReturnValues modes: UPDATED_OLD, ALL_NEW and UPDATED_NEW. We do not yet support those in this patch. If a user tries to use one of these three modes, an informative error message will be returned. The three tests for these three unimplemented settings continue to xfail, but the rest of the tests in test_returnvalues.py (except one test of nested attribute paths) now pass so their xfail flag is dropped. Refs #5053 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219135658.7158-1-nyh@scylladb.com>	2020-02-21 08:32:47 +01:00
Tomasz Grabiec	d0b6be0820	Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael Row cache needs to be invalidated whenever data in sstables changes. Cleanup removes data from sstables which doesn't belong to the node anymore, which means cache must be invalidated on cleanup. Currently, stale data can be returned when a node re-owns ranges which data are still stored in the node's row cache, because cleanup didn't invalidate the cache." Fixes #4446. tests: - unit tests (dev mode) - dtests: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test cleanup_test.py	2020-02-20 18:20:56 +01:00
Pavel Solodovnikov	8efb02146f	cql3: const cleanups and API de-pointerization * Pass raw::select_statement::parameters as lw_shared_ptr * Some more const cleanups here and there * lists,maps,sets::equals now accept const-ref to _type_impl instead of shared_ptr Remove unused `get_column_for_condition` from modification_statement.hh * More methods now accept const-refs instead of shared_ptr Every call site where a shared_ptr was required as an argument has been inspected to be sure that no dangling references are possible. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200220153204.279940-1-pa.solodovnikov@scylladb.com>	2020-02-20 18:14:49 +02:00
Gleb Natapov	df2f67626b	commitlog: fix size of a write used to zero a segment Due to a bug the entire segment is written in one huge write of 32Mb. The idea was to split it to writes of 128K, so fix it. Fixes #5857 Message-Id: <20200220102939.30769-1-gleb@scylladb.com>	2020-02-20 17:22:21 +02:00
Gleb Natapov	6a78cc9e31	commitlog: use commitlog IO scheduling class for segment zeroing There may be other commitlog writes waiting for zeroing to complete, so not using proper scheduling class causes priority inversion. Fixes #5858. Message-Id: <20200220102939.30769-2-gleb@scylladb.com>	2020-02-20 17:15:13 +02:00
Raphael S. Carvalho	f93912f344	Revert "Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"" With #4446 fixed, this commit can be reverted. This reverts commit `454e7e0109`.	2020-02-20 10:55:50 -03:00
Raphael S. Carvalho	fb81f2aa7c	table: Fix stale data being returned due to lack of cache invalidation Row cache needs to be invalidated whenever data in sstables changes. Cleanup removes data from sstables which doesn't belong to the node anymore, which means cache must be invalidated on cleanup. Currently, stale data can be returned when a node re-owns ranges which data are still stored in the node's row cache, because cleanup didn't invalidate the cache. To prevent data that belongs to the node from being purged from the row cache, cleanup will only invalidate the cache with a set of token ranges that will not overlap with any of ranges owned by the node. update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test now passes. Fixes #4446. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-20 10:55:50 -03:00
Raphael S. Carvalho	e81076b01c	compaction: Implement ranges for cache invalidation on behalf of cleanup This procedure will calculate ranges for cache invalidation by subtracting all owned ranges from the sstables' partition ranges. That's done so as to reduce the size of invalidated ranges. Refs #4446. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-20 10:55:49 -03:00
Raphael S. Carvalho	56f66cff9f	dht: Extract to_partition_ranges() from streaming to allow reuse Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-20 10:53:01 -03:00
Piotr Sarna	cbe6f260ef	alternator: add guarding stack height for JSON parsing In order to avoid stack overflow issues represented by the attached test case, rapidjson's parser now has a limit of nested level. Previous iterations of this patch used iterative parsing provided by rapidjson, but that solution has two main flaws: 1. While parsing can be done iteratively, printing the document is based on a recursive algorithm, which makes the iteratively parsed JSON still prone to stack overflow on reads. Documents with depth 35k were already prone to that. 2. Even if reading the document would have been performed iteratively, its destruction is stack-based as well - the chain of C++ destructors is called. This error is sneaky, because it only shows with depths around 100k with my local configuration, but it's just as dangerous. Long story short, capping the depth of the object to an arguably large value (39) was introduced to prevent stack overflows. Real life objects are expected to rarely have depth of 10, so 39 sounds like a safe value both for the clients and for the stack. DynamoDB has a nesting limit of 32. Fixes #5842 Tests: alternator-test(local,remote) Message-Id: <b083bacf9df091cc97e4a9569aad415cf6560daa.1582194420.git.sarna@scylladb.com>	2020-02-20 13:05:58 +02:00
Piotr Dulikowski	82a2bdf39f	cdc: distinguish open and closed ranges for range delete This patch causes inclusive and exclusive range deletes to be distinguished in cdc log. Previously, operations `range_delete_start` and `range_delete_end` were used for both inclusive and exclusive bounds in range deletes. Now, old operations were renamed to `range_delete__inclusive`, and for exclusive deletes, new operations `range_delete__exclusive` are used. Tests: unit(dev)	2020-02-20 11:39:06 +01:00
Asias He	62774ff882	gossiper: Always use the new generation number User reported an issue that after a node restart, the restarted node is marked as DOWN by other nodes in the cluster while the node is up and running normally. Consier the following: - n1, n2, n3 in the cluster - n3 shutdown itself - n3 send shutdown verb to n1 and n2 - n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to INT_MAX - n3 restarts - n3 sends gossip shadow rounds to n1 and n2, in storage_service::prepare_to_join, - n3 receives response from n1, in gossiper::handle_ack_msg, since _enabled = false and _in_shadow_round == false, n3 will apply the application state in fiber1, filber 1 finishes faster filber 2, it sets _in_shadow_round = false - n3 receives response from n2, in gossiper::handle_ack_msg, since _enabled = false and _in_shadow_round == false, n3 will apply the application state in fiber2, filber 2 yields - n3 finishes the shadow round and continues - n3 resets gossip endpoint_state_map with gossiper.reset_endpoint_state_map() - n3 resumes fiber 2, apply application state about n3 into endpoint_state_map, at this point endpoint_state_map contains information including n3 itself from n2. - n3 calls gossiper.start_gossiping(generation_number, app_states, ...) with new generation number generated correctly in storage_service::prepare_to_join, but in maybe_initialize_local_state(generation_nbr), it will not set new generation and heartbeat if the endpoint_state_map contains itself - n3 continues with the old generation and heartbeat learned in fiber 2 - n3 continues the gossip loop, in gossiper::run, hbs.update_heart_beat() the heartbeat is set to the number starting from 0. - n1 and n2 will not get update from n3 because they use the same generation number but n1 and n2 has larger heartbeat version - n1 and n2 will mark n3 as down even if n3 is alive. To fix, always use the the new generation number. Fixes: #5800 Backports: 3.0 3.1 3.2	2020-02-20 11:20:20 +01:00
Dejan Mircevski	8393ee2e54	cql3: Permit views sync when a table is modified Previously we required MODIFY permissions on all materialized views in order to modify a table. This is wrong, because the views should be synced to the table unconditionally. For the same reason, users shouldn't be granted MODIFY on views, to prevent them manually changing (and breaking) a view. This patch removes an explicit permissions check in modification_statement introduced by `65535b3`. It also tests that a user can indeed modify a table they are allowed to modify, regardless of lacking permissions on the table's views and indices. Fixes #5205. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-02-20 10:43:41 +01:00
Avi Kivity	4cc7f7e2af	Merge "Log CQL queries under "trace" level" from Kostja " This series ensures the server more often than not initializes raw_cql_statement, a variable responsible for holding the original CQL query, and adds logging events to all places executing CQL, and logs CQL text in them. A prepared statement object is the third incarnation of parser output in Scylla: - first, we create a parsed_statement descendent. This has ~20 call sites inside Cql.g - then, we create a cql_statement descendent, at ~another 20 call sites - finally, in ~5 call sites we create a prepared statement object, wrapping cql_statement. Sometimes we use cql_statement object without a prepared statement object (e.g. BATCHes). Ideally we'd want to capture the CQL text right in the parser, but due to complicated transformations above that would require patching dozens of call sites. This series moves raw_cql_statement from class prepared_statement to its nested object, cql_statement, batches, and initializes this variable in all major call sites. View prepared statements and some internal DDL statements still skip setting it. " * 'query_processor_trace_cql_v2' of https://github.com/kostja/scylla: query_processor: add CQL logging to all major execute call sites. query_procesor: move raw_cql_statement to cql_statement query_processor: set raw_cql_statement consistently	2020-02-20 11:07:52 +02:00
Nadav Har'El	7d545078ca	docs/alternator: remove incorrect comment on BatchWriteItem In the state of Alternator in docs/alternator/alternator.md, we said that BatchWriteItem doesn't check for duplicate entries. That is not true - we do - and we even have tests (test_batch_write_duplicate*) to verify that. So drop that comment. Refs #5698. (there is still a small bug in the duplicate checking, so still leaving that issue open). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219164107.14716-1-nyh@scylladb.com>	2020-02-20 08:11:31 +01:00
Nadav Har'El	b8aed18a24	alternator: unzero "scylla_alternator_total_operations" metric In commit `388b492040`, which was only supposed to move around code, we accidentally lost the line which does _executor.local()._stats.total_operations++; So after this commit this counter was always zero... This patch returns the line incrementing this counter. Arguably, this counter is not very important - a user can also calculate this number by summing up all the counters in the scylla_alternator_operation array (these are counters for individual types of operations). Nevertheless, as long as we do export a "scylla_alternator_total_operations" metric, we need to correctly calculate it and can't leave it zero :-) Fixes #5836 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219162820.14205-1-nyh@scylladb.com>	2020-02-20 08:11:15 +01:00
Raphael S. Carvalho	db4c3230f7	compaction: Add ranges for cache invalidation to compaction_completion_desc It will store the ranges to be invalidated in row cache on compaction completion. Intended to be used by cleanup compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:30:35 -03:00
Raphael S. Carvalho	51532b84f8	compaction: Make it possible for a compaction type to customize compaction_completion_desc compaction_completion_desc will eventually store more information that can be customized by the compaction type. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:30:35 -03:00
Raphael S. Carvalho	fa16845353	database: Fix on_compaction_completion doc Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:30:34 -03:00
Raphael S. Carvalho	65b4fc8bcd	sstables/compaction: Introduce compaction_completion_desc This descriptor contain all information needed for table to be properly updated on compaction completion. A new member will be added to it soon, which will store ranges to be invalidated in row cache on behalf of cleanup compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:29:32 -03:00
Piotr Sarna	4e95b67501	Merge 'cql3: do_execute_base_query: fix null deref ... ... when clustering key is unavailable' from Benny This series fixes null pointer dereference seen in #5794 `efd7efe` cql3: generate_base_key_from_index_pk; support optional index_ck `7af1f9e` cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable `7fe1a9e` cql3: do_execute_base_query: fixup indentation Fixes #5794 Branches: 3.3 Test: unit(dev) secondary_indexes_test:TestSecondaryIndexes.test_truncate_base(debug) * bhalevy/fix-5794-generate_base_key_from_index_pk: cql3: do_execute_base_query: fixup indentation cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable cql3: generate_base_key_from_index_pk; support optional index_ck	2020-02-19 13:30:30 +01:00
Tomasz Grabiec	884d5e2bcb	Merge "Fix use-after-frees in migration_manager and feature_service" from Pavel There has been recently discussed several problems when stopping migration manager and features. The first issue is with migration manager's schema pull sleeping and potentially using freed migration manager instances. Two others are with freeing database and migration manager before features they wait for are enabled.	2020-02-19 13:02:35 +01:00
Piotr Sarna	3315220aea	alternator: fix server when no authorization header is found A typo caused the code to check for wrong header and assume that Authorization header exists, even if it was not the case. The fix comes with a regression test. Message-Id: <58070abddae6359212aa399688e3e2704d52f419.1582108625.git.sarna@scylladb.com>	2020-02-19 13:39:50 +02:00
Benny Halevy	7fe1a9ec4a	cql3: do_execute_base_query: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-19 13:31:18 +02:00
Benny Halevy	7af1f9e26a	cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable 1. Only call base_ck = generate_base_key_from_index_pk<... if the base schema has a clustering key. 2. Only call command->slice.set_range(*_schema, base_pk, ... if the base schema has a clustering key, otherwise just create an open ended range. Proposed-by: Piotr Sarna <sarna@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-19 13:30:37 +02:00
Piotr Sarna	5f0d77b9a4	Merge 'mv: drop materialized views before its table' from Eliran When dropping a table, the table and its views are dropped in parallel, this is not a problem as for itself but we have mechanism to snapshot a deleted table before the actual delete. When a secondary index is removed, in the snapshot process it looks for it's schema for creating the schema part of the snapshot but if the main table is already gone it will not find it. This commit serializes views and main table removals and removes the views prior to the tables. See discussion on #5713 Tests: Unit tests (dev) dtest - A test that failed on "can't find schema" error Fixes #5614 * eliran/serialize_table_views_deletion: Materialized Views: serialize tables and views creation Materialized Views: drop materialized views before tables	2020-02-19 12:20:20 +01:00
Pavel Emelyanov	8435e93549	db: Move unbounded_range_tombstones listening from storage_service Now the database keeps reference on feature service, so we can listen on the feature in it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 14:08:24 +03:00
Pavel Emelyanov	7aa7e4f550	migration_manager: Abort and wait cluster upgrade waiters The maybe_schedule_schema_pull waits for schema_tables_v3 to become available. This is unsafe in case migration manager goes away before the feature is enabled. Fix this by subscribing on feature with feature::listener and waiting for condition variable in maybe_schedule_schema_pull. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 14:08:24 +03:00
Nadav Har'El	405115fa5f	alternator: cleanup of get_string_attribute() function The get_string_attribute() function used attribute_value->GetString() to return an std::string. But this function does not actually return a std::string - it returns a char*, which gets implicitly converted to an std::string by looking for the first null character. This lookup is unnecessary, because rjson already knows the length of the string, and we can use it. This patch is just a cleanup and a very small performance improvement - I do not expect it fixes any bugs or changes anything functional, because JSON strings anyway cannot contain verbatim nulls. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219101159.26717-1-nyh@scylladb.com>	2020-02-19 11:59:54 +01:00
Benny Halevy	efd7efe41e	cql3: generate_base_key_from_index_pk; support optional index_ck When called from indexed_table_select_statement::do_execute_base_query, old_paging_state->get_clustering_key() may return un-engaged optional<clustering_key>. Dereferencing it unconditionally crashes scylla as seen in https://github.com/scylladb/scylla/issues/5794 Fixes #5794 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-19 12:13:08 +02:00
Pavel Emelyanov	08363e5034	migration_manager: Abort and wait delayed schema pulls The sleep is interrupted with the abort source, the "wait" part is done with the existing _background_tasks gate. Also we need to make sure the gate stays alive till the end of the function, so make use of the async_sharded_service (migration manager is already such). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 11:55:27 +03:00
Eliran Sinvani	95724e1a66	Materialized Views: serialize tables and views creation This change serializes tables and views creation. The changes purpose is to avoid future possible races due to a view searching for its base table information while the later haven't been created yet.	2020-02-19 10:51:49 +02:00
Eliran Sinvani	923a46030b	Materialized Views: drop materialized views before tables When dropping a table, the table and its views are dropped in parallel, this is not a problem as for itself but we have mechanism to snapshot a deleted table before the actual delete. When a secondary index is removed, in the snapshot process it looks for its schema for creating the schema part of the snapshot but if the main table is already gone it will not find it. This commit serializes views and main table removals and removes the views prior to the tables. See discussion on https://github.com/scylladb/scylla/pull/5713 Tests: Unit tests (dev) dtest - A test that failed on "can't find schema" error Fixes #5614	2020-02-19 10:48:11 +02:00
Pavel Solodovnikov	a46f235092	cql3: prefer passing schema as const ref instead of shared_ptr De-pointerize cql3 code APIs further: change some call sites to pass `schema` as const-ref instead of `shared_ptr`. Affected functions known to be expecting always non-null pointer to schema and don't store or pass the pointer somewhere else, assuming it's safe to give them just a reference. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200218142338.69824-1-pa.solodovnikov@scylladb.com>	2020-02-18 20:13:10 +02:00
Piotr Dulikowski	4343471954	hh: handle counter update hints correctly This patch fixes a bug that appears because of an incorrect interaction between counters and hinted handoff. When a counter is updated on the leader, it sends mutations to other replicas that contain all counter shards from the leader. If consistency level is achieved but some replicas are unavailable, a hint with mutation containing counter shards is stored. When a hint's destination node is no longer its replica, it is attempted to be sent to all its current replicas. Previously, if the cluster did not have the feature HINTED_HANDOFF_SEPARATE_CONNECTION enabled, storage_proxy::mutate function would be used for the purpose of sending the hint. It was incorrect because that function treats mutations for counter tables as mutations containing only a delta (by how much to increase/decrease the counter). These two types of mutations have different serialization format, so in this case a "shards" mutation is reinterpreted as "delta" mutation, which can cause data corruption to occur. This patch fixes the case when HINTED_HANDOFF_SEPARATE_CONNECTION is disabled, and uses storage_proxy::mutate_internal, which treats "shards" mutation as regular mutations - which is the correct behavior. Refs #5833. Tests: unit(dev)	2020-02-18 20:13:10 +02:00
Avi Kivity	454e7e0109	Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations" This reverts commit `5e9925b9f0`. It causes data resurrection in simple_decommission_node_2_test. Fixes #5838.	2020-02-18 20:13:10 +02:00
Calle Wilund	d7a9fc3611	db::config: Adjust truncation timeout to match value in yaml example Refs #817 Truncation is potentially long. It has its own timeout in storage proxy/rpc. This value should probably also be higher than default timeout. Message-Id: <20200218135926.26522-1-calle@scylladb.com>	2020-02-18 20:13:10 +02:00
Amnon Heiman	30a7587963	test/boost/database_test: adopt new clear_snapshot signature The clear_snapshot method signature was modified and accept a table name parameter. This patch adds an empty table name to the clear_snapshot test so it would compile and pass. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:50:58 +02:00
Amnon Heiman	6b020e67ce	api/storage_service: Support specifying a table when deleting a snapshot This patch adds an optional parameter to DELETE /storage_service/snapshots After this patch the following will be supported: If a keyspace called keyspace1 and a table called standard1 exists. curl -X POST 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1' curl -X DELETE --header 'Accept: application/json' 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1&cf=standard1' Fixes #5658 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Amnon Heiman	c3260bad25	storage_service: Add optional table name to clear snapshot There are cases when it is useful to delete specific table from a snapshot. An example is when a snapshot is used for backup. Backup can take a long period of time, during that time, each of the tables can be deleted once it was backup without waiting for the entire backup process to completed. This patch adds such an option to the database and to the storage_service wrapping method that calls it. If a table is specified a filter function is created that filter only the column family with that given name. This is similar to the filtering at the keyspace level. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Nadav Har'El	e50e8a8432	alternator-test: improve ReturnValues tests This patch adds additional tests for the ReturnValues feature to make the test even more comprehensive. As this feature is not yet implemented in Alternator (see issue #5053), all tests XFAIL on Alternator - except two tests for the trivial "NONE" mode which is already supported. As usual all tests pass on DynamoDB. This patch also splits the tests for the ReturnValues parameter in the UpdateItem operation into multiple tests, each testing one of the different modes which DynamoDB supports - NONE, ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW. The separate tests will be useful if we implement this feature incrementally - so the separate modes can be tested separately. Refs #5053. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200218085618.5584-1-nyh@scylladb.com>	2020-02-18 16:16:20 +02:00
Alejo Sanchez	45a6cc5d53	cql3: single metric for range scan and full scan Combining both range and full table scans in a single metric as "partition range scans are used to implement full scans in scylla deployments." Requested by @bdenes and @avi Refs: #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200211101221.690031-2-alejo.sanchez@scylladb.com>	2020-02-18 16:16:20 +02:00
Nadav Har'El	c8348bccc9	docs: new document about protocols and ports in Scylla This patch adds a new document, docs/protocols.md, about all the different protocols which Scylla supports - and the different ports which they use. This includes Scylla's internal protocol, user-facing protocols (CQL, Thrift, DynamoDB, Redis, JMX) and things inbetween (REST API, Prometheus). I wrote this document after being frustrated that when I see a port number (e.g., "7000") or a port option name (e.g., "storage_port") it's hard to figure out what they actually are - or why they are given such strange names. The intention is that this file can easily be searched for option names, for more familiar names (e.g., "CQL"), and a reader can get the whole story - including some pointers to relevant part of the code (this part of the document can be improved further - in this version this only exists for the internal protocol). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200217172049.25510-1-nyh@scylladb.com>	2020-02-18 16:16:20 +02:00
Avi Kivity	fe71ed5f82	Update seastar submodule * seastar c7c249f67d...2b51022073 (8): > dns_test: Test with seastar.io instead of www.google.com > sharded: fix move constructor for peering_sharded_service Fixes #5814. > tests: Delete Seastar.dist > reactor: distinguish structs from classes when befriending > util/tuple_utils.hh: avoid redundant move > io_request: do not include fmt/format.h > reactor: cleanup write_some leftover > posix: change the signature of accept/try_accept	2020-02-18 16:16:19 +02:00
Avi Kivity	6c7aa18238	Merge "Introduce schema::get_partitioner" from Piotr " Introduce schema::get_partitioner and use it instead of dht::global_partitioner. Fixes #5493 Tests: unit(dev, release, debug) " * 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits) cdc: stop using partitioners partitioner_test: stop calling set_global_partitioner storage_service: stop calling global_partitioner() mutation_writer_test: stop calling global_partitioner() schema: reduce number of global_partitioner() calls test_services: stop calling global_partitioner() sstable_utils: stop calling global_partitioner() sstable_resharding_test: stop depending on global partitioner sstable_mutation_test: stop calling global_partitioner() sstable_data_file_test: stop calling global_partitioner() random_schema: stop taking partitioner in constructor mutation_reader_test: stop calling global_partitioner() multishard_mutation_query_test: stop calling global_partitioner() row_level repair: stop calling global_partitioner() distribute_reader_and_consume_on_shards: don't take partitioner thrift: reduce global_partitioner() calls binary_search: stop calling global_partitioner() index_entry: stop calling global_partitioner() mc writer: stop calling global_partitioner() sstable: stop calling global_partitioner() ...	2020-02-17 18:12:53 +02:00
Avi Kivity	06c16108df	Merge "cql3: minor cleanups (de-pointerize APIs)" from Pavel " This change set is comprised of several unrelated patches regarding some cleanups in cql3 layer code. Most of the changes are aimed at eliminating superfluous `shared_ptr` usages. In places where it can be safely assumed that objects passed to the function are considered non-null and constant, these places were adjusted to use passing as const ref instead. Other changes incude eliminating unused arguments at some functions and replacing usages of `shared_ptr<service::pager::paging_state>` to use `lw_shared_ptr` instead, since `pager::paging_state` is final. Tests: unit(dev, debug) " * 'feature/cql_cleanups_4' of https://github.com/ManManson/scylla: cql3: minor sweeps through the cql layer code to reduce shared_ptrs count cql3: change some function signatures to accept const references cql3: change signatures of several functions to return crefs instead of pointers cql3: remove unused argument at functions::castas_functions::get paging_state: switch from shared_ptr to lw_shared_ptr	2020-02-17 17:50:30 +02:00
Piotr Dulikowski	01084a79b8	hh: send orphaned hints on HINT_MUTATION verb When replaying a hint with a destination node that is no longer in the cluster, it will be sent with cl=ALL to all its new replicas. Before this patch, the MUTATION verb was used, which causes such hints to be handled on the same connection and with the same priority as regular writes. This can cause problems when a large number of hints is orphaned and they are scheduled to be sent at once. Such situation may happen when replacing a dead node - all nodes that accumulated hints for the dead node will now send them with cl=ALL to their new replicas. This patch changes the verb used to send such hints to HINT_MUTATION. This verb is handled on a separate connection and with streaming scheduling group, which gives them similar priority to non-orphaned hints. Refs: #4712 Tests: unit(dev)	2020-02-17 14:45:22 +01:00
Tomasz Grabiec	76d1dd7ec6	Merge "nodetool scrub: implement validation and the skip-corrupted flag " from Botond Nodetool scrub rewrites all sstables, validating their data. If corrupt data is found the scrub is aborted. If the skip-corrupted flag is set, corrupt data is instead logged (just the keys) and skipped. The scrubbing algorithm itself is fairly simple, especially that we already have a mutation stream validator that we can use to validate the data. However currently scrub is piggy-backed on top of cleanup compaction. To implement this flag, we have to make scrub a separate compaction type and propagate down the flag. This required some massaging of the code: * Add support for more than two (cleanup or not) compaction types. * Allow passing custom options for each compaction type. * Allow stopping a compaction without the manager retrying it later. Additionally the validator itself needed some changes to allow different ways to handle errors, as needed by the scrub. Fixes: #5487 * https://github.com/denesb/nodetool-scrub-skip-corrupted/v7: table: cleanup_sstables(): only short-circuit on actual cleanup compaction: compaction_type: add Upgrade compaction: introduce compaction_options compaction: compaction_descriptor: use compaction options instead of cleanup flag compaction_manager: collect all cleanup related logic in perform_cleanup() sstables: compaction_stop_exception: add retry flag mutation_fragment_stream_validator: split into low-level and high-level API compaction: introduce scrub_compaction compaction_manager: scrub: don't piggy-back on upgrade_sstables() test: sstable_datafile_test: add scrub unit test	2020-02-17 15:28:07 +02:00
Piotr Jastrzebski	f0f6e220ea	cdc: stop using partitioners CDC can get all it needs from a config and does not need partitioner. For base table specific operations CDC is using partitioner from that table (obtained with schema::get_partitioner). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	c0873f9b10	partitioner_test: stop calling set_global_partitioner All the places that use partitioner have been switched to not use global partitioner any more and we can stop setting it in this test. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	499e330ff9	storage_service: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	81cfc63ba6	mutation_writer_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	406f42e012	schema: reduce number of global_partitioner() calls Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	8a9dc8b394	test_services: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	510245f3c3	sstable_utils: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	65f8fc5a06	sstable_resharding_test: stop depending on global partitioner Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	a65f3d1f7b	sstable_mutation_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	aae6240273	sstable_data_file_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	a18c791f6f	random_schema: stop taking partitioner in constructor random_schema already has a _schema field which in turn has a get_partitioner() function. Store partitioner in random_schema is redundant. At the moment all uses of random_schema are based on default partitioner so it is not necessary to set it explicitly. If in the future we need random_schema to work with other partitioners we will add the constructor back and fix the creation of _schema to contain it. It's not needed now though. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	aeb9ea87df	mutation_reader_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	4df60c7998	multishard_mutation_query_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	ef9acd9ee5	row_level repair: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	9494da2102	distribute_reader_and_consume_on_shards: don't take partitioner This function already takes schema so it can get partitioner using schema::get_partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	7c6f415647	thrift: reduce global_partitioner() calls Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	56e3cb8c3a	binary_search: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	1db437ee91	index_entry: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	1f866d7001	mc writer: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	6fe0dcbac4	sstable: stop calling global_partitioner() parse functions now take const schema& which allows them to reach a partitioner. It's safe to take schema by const& because the only caller takes the schema from an sstable object. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	0677bafd16	multishard_mutation_query: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	76d154dbac	view: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	6e424a3645	select_statement: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	2d7532f87f	dht: add dht::get_token and replace all calls to dht::global_partitioner().get_token dht::get_token is better because it takes schema and uses it to obtain partitioner instead of using a global partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	ca4a89d239	dht: add dht::decorate_key and replace all dht::global_partitioner().decorate_key with dht::decorate_key It is an improvement because dht::decorate_key takes schema and uses it to obtain partitioner instead of using global partitioner as it was before. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:06 +01:00
Piotr Jastrzebski	abd76e566f	dht::shard_of: stop calling global_partitioner() Take const schema& as a parameter of shard_of and use it to obtain partitioner instead of calling global_partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:23:16 +01:00
Piotr Jastrzebski	5234350df2	split_range_to_single_shard: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	24b721c21b	ring_position_exponential_sharder: stop calling global_partitioner() ring_position_exponential_sharder calls global_partitioner in one constructor. Luckily the constructor is never used so we can remove that constructor. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	db19a76b1f	selective_token_range_sharder: stop calling global_partitioner() This requires a change in a repair that uses selective_token_range_sharder. Repair performs operation on a set of tables. We will have to make sure that all of that tables use the same partitioner. This is achieved by adding a check to a repair_info constructor. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	75785ef13e	i_partitioner: add operator<< Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	065885300d	i_partitioner: add == and != operators Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	57e4b7f215	ring_position_range_sharder: stop calling global_partitioner Remove ring_position_range_sharder(nonwrapping_range<ring_position>) which calls another constructor with partitioner obtained with dht::global_partitioner(). Fix all the places the removed constructor was used and obtain partitioner from schema instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	dd1120454b	dht: move sharders to a separate header i_partitioner.hh is widely included while sharders are used only in 6 places so there's no need to include them in the whole codebase. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:02 +01:00
Piotr Jastrzebski	a5b6374398	dht: remove unused ring_position_exponential_vector_sharder The next patch is moving sharders to a separate header. ring_position_exponential_vector_sharder is not used anywhere so instead of just silently removing it with the move, this commit is separated to make it clear the class is removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:04:41 +01:00
Piotr Jastrzebski	9b95153136	schema: add get_partitioner() The plan is to remove dht::global_partitioner() and use schema::get_partitioner() instead. This will allow a usage of per schema/table partitioner instead of a single global partitioner everywhere. Initially schema::get_partitioner will call dht::global_partitioner. After all the calls to dht::global_partitioner are switched to schema::get_partitioner, the ability to set per schema partitioner will be implemented. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:04:41 +01:00
Takuya ASADA	9a84164c95	dist: drop old distribution code Since we dropped support of Ubuntu 14.04 and Debian 8, we can remove the code for these distributions.	2020-02-17 10:18:35 +02:00
Avi Kivity	6728b96df7	clustering_interval_set: split to own header file clustering_interval_set is a rarely used class, but one that requires boost/icl, which is quite heavyweight. To speed up compilation, move it to its own header and sprinkle #includes where needed. Tests: unit (dev) Message-Id: <20200214190507.1137532-1-avi@scylladb.com>	2020-02-16 17:40:47 +02:00
Nadav Har'El	51f3e7eaff	merge: token_metadata: pimplify Merged patch series from Avi Kivity: token_metadata is a heavyweight class with heavyweight includes (boost/icl) it is a good candidate for the pimpl pattern, which this series implements. Tests: unit (dev) https://github.com/avikivity/scylla token_metadata-pimplification/v1 Avi Kivity (6): locator: token_metadata: use non-deduced return type for ring_range() locator: token_metadata: pimplify locator: token_metadata: make token_metadata_impl::tokens_iterator a non-nested class locator: token_metadata: pimplify tokens_iterator locator: token_metadata: move implementation classes to .cc locator: token_metadata: remove unused include "query-request.hh" locator/token_metadata.hh \| 783 +--------------- locator/token_metadata.cc \| 1338 ++++++++++++++++++++++++++- test/boost/sstable_datafile_test.cc \| 1 + 3 files changed, 1332 insertions(+), 790 deletions(-) Message-Id: <20200214184954.1130194-1-avi@scylladb.com>	2020-02-16 17:15:26 +02:00
Piotr Sarna	70c9889ef7	storage_proxy: remove dead metrics code This patch removes an implementation of register_split_metrics_for, which is not used anywhere in the codebase. Message-Id: <e83f3e9d109113fe0553919032f005d4ab3a3023.1581851904.git.sarna@scylladb.com>	2020-02-16 17:00:45 +02:00
Nadav Har'El	e18a302c54	merge: Implement stopping alternator server Merged patch series from Piotr Sarna: This miniseries implements graceful shutdown for alternator by introducing two mechanisms: - refusing to accept new requests during shutdown by stopping the HTTP/HTTPS server(s) - guarding pending requests with a gate, so that when alternator server is stopped, no in-flight alternator requests are being processed Fixes #5781 Tests: manual(stopping Scylla in the middle of alternator-test multiple times, used to crash every time with local_is_initialized() assertion) Piotr Sarna (3): alternator: implement stopping alternator server alternator: guard pending alternator requests with a gate alternator: guard alternator-specific handlers with a gate alternator/server.cc \| 64 +++++++++++++++++++++++++++++++++++--------- alternator/server.hh \| 4 +++ main.cc \| 11 ++++++-- 3 files changed, 64 insertions(+), 15 deletions(-)	2020-02-16 16:35:14 +02:00
Pavel Solodovnikov	abb3a7e218	cql3: minor sweeps through the cql layer code to reduce shared_ptrs count Convert some more helper functions to accept const reference to column_specification and column_identifier instead of shared_ptr. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:24:26 +03:00
Pavel Solodovnikov	5b6e2d7178	cql3: change some function signatures to accept const references This patch continues the effort of reducing shared_ptr's count in the different APIs throughout the cql3 code tree. These functions now pass cref to column_specification instead of shared_ptr: * multiple variants of `validate_assignable_to` * sets::value_spec_of * lists::value_spec_of * lists::index_spec_of * lists::uuid_index_spec_of * tuples::component_spec_of * user_types::field_spec_of These functions don't pass the shared_ptr around down the call hierarchy, also obviously assuming that the column_specification passed is always non-null. So it's safe to assume that they don't borrow the ownership of the pointer or knowingly prolongate lifetime of the object pointed by. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:24:14 +03:00
Pavel Solodovnikov	49bf936403	cql3: change signatures of several functions to return crefs instead of pointers The following functions now accept const reference to column_specification instead of shared_ptr: * lists::index_spec_of * lists::value_spec_of * lists::uuid_index_spec_of * sets::value_spec_of Changed maps::value_spec_of and maps::key_spec_of signatures to accept const ref instead of non-const ref to column_specification. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:23:56 +03:00
Pavel Solodovnikov	7c05100c87	cql3: remove unused argument at functions::castas_functions::get Remove unused `schema_ptr` argument at `functions::castas_functions::get` function. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:23:46 +03:00
Pavel Solodovnikov	d64fd52ae5	paging_state: switch from shared_ptr to lw_shared_ptr Change the way `service::pager::paging_state` is passed around from `shared_ptr` to `lw_shared_ptr`. It's safe since `paging_state` is final. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:23:36 +03:00
Piotr Sarna	626ec730c4	storage_proxy: make register_metrics_for function reentrant Helper function for registering metrics for an endpoint, register_metrics_for(ep) depends on an external state to be updated. It checks if given metrics are added to a map, and if not, the metrics are registered, but the mentioned map is expected to be updated by the caller (e.g. get_ep_stat). This behaviour is error-prone, because calling this function twice will result in an exception, since registering metrics twice is not allowed. Refs #5697 Message-Id: <5a9ddccf52861749dbda4204b5d098cc77bc51eb.1581855769.git.sarna@scylladb.com>	2020-02-16 15:43:07 +02:00
Piotr Sarna	bd888a2695	alternator: guard alternator-specific handlers with a gate Alternator is able to serve more requests than its database operations, e.g. a health check and returning the list of its nodes. These operation, for safety, are no also guarded by the pending requests gate.	2020-02-16 14:15:29 +01:00
Piotr Sarna	acfed880cc	alternator: guard pending alternator requests with a gate In order to make sure that pending alternator requests are processed during shutdown, a gate for each shard is introduced. On shutdown, each gate will be closed and all in-progress operations will be waited upon. Fixes #5781	2020-02-16 13:48:45 +01:00
Piotr Sarna	c8ab9b3ae4	alternator: implement stopping alternator server Stopping Scylla with alternator enabled is not clean, because the server does not stop accepting requests on shutdown, which leads to use-after-free events. The first step towards a cleaner solution is to implement alternator_server::stop(), which stops the HTTP/HTTPS servers. Refs #5781	2020-02-16 13:34:21 +01:00
Nadav Har'El	70d914ad5b	alternator: update docker instructions in docs/alternator/getting-started.md The instructions in docs/alternator/getting-started.md on how to run Alternator with docker are outdated and confusing, so this patch updates them. First, the instructions recommended the scylladb/scylla-nightly:alternator tag, but we only ever created this tag once, and never updated it. Since then, Alternator has been constantly improving, and we've caught up on a lot of features, and people who want to test or evaluate Alternator will most likely want to run the latest nightly build, with all the latest Alternator features. So we update the instructions to request the latest nightly build - and mention the need to explictly do "docker pull" (without this step, you can find yourself running an antique nightly build, which you downloaded months ago!). This instruction can be revisited once Alternator is GAed and not improving quickly and we can then recommend to run the latest stable Scylla - but I think we're not there yet. Second, in recent builds, Alternator requires that the LWT feature is enabled, and since LWT is still experimental, this means that one needs to add "--experimental 1" to the "docker run" command. Without it, the command line in getting-started.md will refuse to boot, complaining that Alternator was enabled but LWT wasn't. So this patch adds the "--experimental 1" in the relevant places in the text. Again, this instruction can and should be revisited once LWT goes out of experimental mode. Fixes #5813 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200216113601.9535-1-nyh@scylladb.com>	2020-02-16 12:42:37 +01:00
Nadav Har'El	b01b11c1f3	alternator: implement KeyConditionExpression This patch adds to Alternator's Query operation full support for the KeyConditionExpression parameter - a newer syntax for specifying which partition and which sort-key range are to be queried. The older syntax for the same thing, "KeyConditions", was already supported by Alternator. The patch also includes additional test cases for more corner cases discovered during the development. After this patch, all 47 test cases in test_key_condition_expression.py pass on Alternator (and, of course, also on DynamoDB). One interesting thing to note about this patch is that it does not include a new parser for the KeyConditionExpression syntax. It turns out that we need - to be fully compatible with DynamoDB - to use the already existing parser for ConditionExpression syntax, and then forbid certain things not allowed in KeyConditionExpression (you can see a lot of examples in code comments and in the tests included in this patch). Most importantly, allowing the full ConditionExpression syntax also means we allow completely useless parentheses on key conditions, e.g., '((p=:p) AND (c=:c))'. While the KeyConditionExpression documentation doesn't mention allowing these parentheses, DynamoDB does support them - and it turns out that boto3 uses them when you use its condition builders, as we do in one test case (test_query_key_condition_expression). Fixes #5037. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200213192509.32685-4-nyh@scylladb.com>	2020-02-16 11:22:30 +02:00
Nadav Har'El	15515b2cc1	alternator: more useful get_key_from_typed_value() utility function We had a get_key_from_typed_value() utility function to decode a JSON-encoded value with a known type (the JSON encoding is a map whose key is the type, the value always a string because all possible key types - string, bytes and number, are encoded as strings). However, the function was less useful than it could have been - it was missing one check for a malformed object (a check which only appeared in one of its callers), it unnecessarily received the column's expected type (all the callers passed it the given key column's type). The cleaned up function will be more useful for the following patch to support KeyConditionExpression, which wants to reuse it. While at it, this patch also uses rjson::to_string_view(it->value) instead of the less correct it->value.GetString() (the latter relies on null-termination, which is actually true for JSON strings, but there is no reason to rely on it). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200213192509.32685-3-nyh@scylladb.com>	2020-02-16 11:22:30 +02:00
Nadav Har'El	1fd44a0049	alternator: extract useful function to_string_view() conditions.cc contains a useful utility function for extracting (without copying) a string_view from a rjson::value which is known to contain a string. This function will be useful in more Alternator code, so let's extract it to rjson.hh, with the name rjson::to_string_view() Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200213192509.32685-2-nyh@scylladb.com>	2020-02-16 11:22:30 +02:00
Asias He	5e9925b9f0	streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations The table::flush_streaming_mutations is used in the days when streaming data goes to memtable. After switching to the new streaming, data goes to sstables directly in streaming, so the sstables generated in table::flush_streaming_mutations will be empty. It is unnecessary to invalidate the cache if no sstables are added. To avoid unnecessary cache invalidating which pokes hole in the cache, skip calling _cache.invalidate() if the sstables is empty. The steps are: - STREAM_MUTATION_DONE verb is sent when streaming is done with old or new streaming - table::flush_streaming_mutations is called in the verb handler - cache is invalidated for the streaming ranges In summary, this patch will avoid a lot of cache invalidation for streaming. Backports: 3.0 3.1 3.2 Fixes: #5769	2020-02-16 11:22:30 +02:00
Avi Kivity	82df5dfb76	Update seastar submodule * seastar 6d2ed8cdc...c7c249f67 (3): > reactor: fix issue with hrtimer completions being lost > Merge "refactor network and storage I/O handling in backend code" from Glauber > reactor: don't call set_heap_profiling_enable() if not needed	2020-02-16 11:22:30 +02:00
Piotr Sarna	84be1eb6f2	test,cdc: skip across-shard test when run with one shard Running cdc_test binary fails with a segmentation fault when run with --smp 1, because test_cdc_across_shards assumes shard count to be >=2. This patch skips the test case when run with a single shard and produces a log warning. Message-Id: <9b00537db9419d8b7c545ce0c3b05b8285351e7d.1581600854.git.sarna@scylladb.com>	2020-02-16 11:22:30 +02:00
Gleb Natapov	ed3e423922	lwt: add counter for a case where timeout is sent prematurely There is a case in current PAXOS implementation where timeout is returned because the code cannot guaranty whether the value is accepted or not in case of a contention. The counter will help to correlate this condition with failed requests. Message-Id: <20200211160653.30317-2-gleb@scylladb.com>	2020-02-16 11:22:30 +02:00
Gleb Natapov	7694f164c4	lwt: add more tracing to paxos stages Message-Id: <20200211160653.30317-1-gleb@scylladb.com>	2020-02-16 11:22:30 +02:00
Pavel Solodovnikov	bf95bd0916	cql3: more functions marked as const The following functions are now "const": * `term::collect_marker_specification` * `relation::to_term` * `multi_item_terminal::get_elements` * `raw_update::is_compatible_with` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200213142445.35312-1-pa.solodovnikov@scylladb.com>	2020-02-16 11:22:30 +02:00
Nadav Har'El	65d0a776c2	merge: alternator: Add keyspace per table This series implements keyspace-per-table approach for Alternator. The changes are as follows: - when a table is created, its keyspace is created first - after table deletion, its keyspace is deleted as well; works with views too, since these must be deleted before the base table is dropped - instead of SimpleStrategy, network topology is used Keyspaces are created with a prefix not legal from CQL - 'a#'. I validated that even though not reachable via CQL, keyspaces created with # character work well and produce correct directories, restarts work flawlessly too. Fixes #5611 Refs #5596 Tests: alternator(local, remote) Piotr Sarna (3): alternator: switch to keyspace-per-table approach alternator: move to NetworkTopologyStrategy alternator-test: add test for recreating a table	2020-02-16 11:22:30 +02:00
Piotr Sarna	e620181832	Merge 'cdc: TTLs on CDC log cells' from Juliusz Cells in CDC logs used to be created while completely neglecting TTLs (the TTLs from cdc = {...'ttl':600}). This patch adds TTLs to all cells; there are no row markers, so wee need not set TTL there. Fixes #5688 * jul-stas/5688-set-ttl-in-cdc-log-table: tests/cdc: added test for TTL on log table cells cdc: set TTLs on CDC log cells	2020-02-16 11:22:30 +02:00
Nadav Har'El	cb8315ace8	merge: alternator: Make write isolation config less terse Merged patch series from Piotr Sarna: This series addresses and fixes #5758 by providing less terse configuration for write isolation. Before the patch, suggested values for alternator write isolation policies was one of 'f', 'a', 'o', 'u', which are not really descriptive. The code actually checks only the first character from the tag value, but now the input is validated to allow only specific, expressive values: * 'a', 'always', 'always_use_lwt' - always use LWT * 'o', 'only_rmw_uses_lwt' - use LWT only for requests that require read-before-write * 'f', 'forbid', 'forbid_rmw' - forbid statements that need read-before- write. Using such statements (e.g. UpdateItem with ConditionExpression) will result in an error * 'u', 'unsafe', 'unsafe_rmw' - (unsafe) perform read-modify-write without any consistency guarantees Using other values will result in an error. This series comes with tests and docs updates. Fixes #5758 Tests: alternator-test(local,remote) Piotr Sarna (5): alternator: move rmw_operation to a header alternator: add validating write_isolation tag alternator-test: add test for write isolation tag alternator-test: mark write isolation tests scylla_only docs: update write isolation documentation alternator-test/test_condition_expression.py \| 10 +- alternator-test/test_tag.py \| 9 + alternator/executor.cc \| 163 +++++++------------ alternator/rmw_operation.hh \| 99 +++++++++++ docs/alternator/alternator.md \| 8 +- 5 files changed, 173 insertions(+), 116 deletions(-)	2020-02-16 11:22:30 +02:00
Pavel Solodovnikov	76a0652deb	types: fix serialization and validation of empty values Empty values (zero-sized string in serialized form) were not handled properly in serialize routines for floating types and uuids, which led to runtime exceptions and failing tests as described in https://github.com/scylladb/scylla/issues/5782. Also fix validation visitor to handle empty values properly. There already was the code in place that took into consideration zero-sized values. But it was trying to read some bytes regardless of that (e.g. for timeuuid values), even if there is none to read. Tests: unit(dev, debug) Fixes: #5782 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200213130021.31598-1-pa.solodovnikov@scylladb.com>	2020-02-16 11:22:30 +02:00
Pavel Emelyanov	b11cf6e950	cql3/query_processor.hh: Debloat from other headers This gives ~30% less (251 jobs -> 181 jobs) recompile when touching it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200212225828.3374-1-xemul@scylladb.com>	2020-02-16 11:22:30 +02:00
Alejo Sanchez	a5516767d5	tests: enforce SERIAL consistency on all prepared statements Add SERIAL consistency level query option to boost tests. This is required for LWT testing. Refs: #5777 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200212102921.27139-2-alejo.sanchez@scylladb.com>	2020-02-16 11:22:29 +02:00
Konstantin Osipov	7b7462b49f	test.py: fix a bug with an incorrect glob pattern On start, test.py cleans up testlog directory. The cleanup file search pattern was shell style, not python glob style, which led to .log files being left around between runs. Message-Id: <20200212204047.22398-9-kostja@scylladb.com>	2020-02-16 11:22:29 +02:00
Konstantin Osipov	70fcbd8e32	test.py: print test invocation failure to test log Capture test invocation failure in the test log. Remove dead code lingering from introduction of log output. Message-Id: <20200212204047.22398-6-kostja@scylladb.com>	2020-02-15 17:19:28 +02:00
Konstantin Osipov	851b2d652e	test.py: start run_test() by opening test log file Always open the log file first, this will be necessary to append output to it in case the test timed out or didn't start. Message-Id: <20200212204047.22398-5-kostja@scylladb.com>	2020-02-15 17:19:28 +02:00
Konstantin Osipov	22a050250e	test.py: if a test fails, print it on its own line, even in compact mode To be able to easily see what tests have failed as they run, print failed tests on their own line even if --verbose switch is off. Message-Id: <20200212204047.22398-4-kostja@scylladb.com>	2020-02-15 17:19:28 +02:00
Konstantin Osipov	8eb127279e	test.py: convert cookie to TabularConsoleOutput class test.py used a functional programming cookie pattern to carry tabular console output state, convert this cookie to an object. In order to make console output more pretty we'll need to add more state to it, and keeping this state in a tuple would be too messy. Message-Id: <20200212204047.22398-3-kostja@scylladb.com>	2020-02-15 17:19:28 +02:00
Avi Kivity	91c4409376	locator: token_metadata: remove unused include "query-request.hh" sstable_datafile_test.cc lost access to interval_map (via position_in_partition.hh), so it now includes that directly.	2020-02-14 20:46:25 +02:00
Avi Kivity	bee1cc42fe	locator: token_metadata: move implementation classes to .cc With pimplification complete, move the implementation classes to .cc and remove boost/icl includes.	2020-02-14 20:34:44 +02:00
Avi Kivity	ef41b45142	locator: token_metadata: pimplify tokens_iterator Because tokens_iterator refers to token_metadata_impl, the latter cannot be moved out-of-line. So this patch pimplifies tokens_iterator as well.	2020-02-14 20:29:14 +02:00
Avi Kivity	9425e9c13d	locator: token_metadata: make token_metadata_impl::tokens_iterator a non-nested class In order to pimplify token_metadata_impl::tokens_iterator, we must make it a non-nested class, since eventually token_metadata_impl will be an incomplete class for users and nested classes cannot be forward declared. So this patch makes it a non-nested class. Two inline functions that referred to it were moved out of class scope so they can see the definition. No functional changes.	2020-02-14 20:29:13 +02:00
Avi Kivity	6d53f240d1	locator: token_metadata: pimplify token_metadata is a heavyweight class, with heavyweight include dependencies (icl, which has tens of thousands of lines in headers), heavyweight methods, but it rarely used. So it is a classic candidate for pimmplication. This patch splits off the implementation into token_metadata_impl and leaves token_metadata as a forwarding class. Actual movement of the code is left to a later patch to ease review. Notes: - some constructors were made public due to limitations of std::make_unique - a few token_metadata methods pass *this along to external functions, so we now pass the holder object as "unpimplified_this" to support this.	2020-02-14 20:29:12 +02:00
Avi Kivity	90a3670952	locator: token_metadata: use non-deduced return type for ring_range() Deduced return types are user hostile as the user has to look at the implementation in order to understand what the return type is.	2020-02-14 15:44:46 +02:00
Konstantin Osipov	8b2ce03ce4	query_processor: add CQL logging to all major execute call sites. Add missing CQL query logging to statement prepare, internal execute, batch execute. The logging is done under log level "trace".	2020-02-13 21:53:58 +03:00
Botond Dénes	78624b5069	test: sstable_datafile_test: add scrub unit test	2020-02-13 15:02:37 +02:00
Botond Dénes	26d4c8be95	compaction_manager: scrub: don't piggy-back on upgrade_sstables() Now that we have the necessary infrastructure to do actual scrubbing, don't rely on `upgrade_sstables()` anymore behind the scenes, instead do an actual scrub. Also, use the skip-corrupted flag.	2020-02-13 15:02:37 +02:00
Botond Dénes	33c126e8c0	compaction: introduce scrub_compaction A specialized compaction subclass for executing a scrub compaction. `scrub_compaction` supplies a specialized reader which will validate its input and stop on the first error. If it is configured with `skip_corrupted`, it will instead skip bad data, logging it.	2020-02-13 15:02:37 +02:00
Botond Dénes	1b7725af4b	mutation_fragment_stream_validator: split into low-level and high-level API The low-level validator allows fine-grained validation of different aspects of monotonicity of a fragment stream. It doesn't do any error handling. Since different aspects can be validated with different functions, this allows callers to understand what exactly is invalid. The high-level API is the previous fragment filter one. This is now built on the low-level API. This division allows for advanced use cases where the user of the validator wants to do all error handling and wants to decide exactly what monotonicity to validate. The motivating use-case is scrubbing compaction, added in the next patches.	2020-02-13 15:02:32 +02:00
Juliusz Stasiewicz	c13e935eae	tests/cdc: added test for TTL on log table cells	2020-02-13 14:00:53 +01:00
Piotr Sarna	f4d03d6063	docs: update write isolation documentation The documentation now mentions all acceptable variants of write isolation configuration values.	2020-02-13 13:51:31 +01:00
Piotr Sarna	8795323678	alternator-test: mark write isolation tests scylla_only With scylla_only fixture already available, manual checks for dynamodb no longer need to be performed.	2020-02-13 13:51:31 +01:00
Piotr Sarna	fba756858e	alternator-test: add test for write isolation tag Write isolation tags now accept only a small set of valid values. The test case ensures that all valid values are accepted and that invalid values return an error.	2020-02-13 13:51:31 +01:00
Piotr Sarna	fa4ddd2947	alternator: add validating write_isolation tag In order to prevent users from using incorrect write isolation configuration, a set of allowed values is introduced. When tagging a resource (which is considered rare), a tag will only be allowed if it belongs to the allowed set.	2020-02-13 13:51:31 +01:00
Piotr Sarna	7e6c9cad9a	alternator: move rmw_operation to a header rmw_operation is a class with a public interface, including a write_isolation enum and a fixed tag name for its configuration. For convenience, it's moved to a header file, so that code from executor.cc can use the definitions regardless of their position in the source file - it prevents reordering functions just to make sure that rmw_operation is defined before a function that uses its attributes.	2020-02-13 13:51:31 +01:00
Konstantin Osipov	ced778ba0b	query_procesor: move raw_cql_statement to cql_statement We'd like to log CQL statements inside batches, and they don't have prepared_statement object created for them.	2020-02-13 13:35:37 +03:00
Piotr Sarna	f4a05e1d23	alternator-test: add test for recreating a table The first iteration of keyspace-per-table approach for alternator revealed an issue with recreating a table after deleting it. This test case was used as a regression check.	2020-02-13 09:54:12 +01:00
Piotr Sarna	dca6c2c81d	alternator: move to NetworkTopologyStrategy Imstead of SimpleStrategy, NetworkTopologyStrategy is used for setting up the replication configuration for alternator tables. Replication factor 3 is used along with a local datacenter, unless alternator discovers that it's running on a test cluster with less than 3 nodes - then, RF is reduced accordingly and emits a warning, which was also the case for SimpleStrategy.	2020-02-13 09:46:46 +01:00
Piotr Sarna	3eb6da224b	alternator: switch to keyspace-per-table approach Instead of a monolith alternator keyspace, each table creates its own keyspace, named in the following pattern: `a#TABLE_NAME`. The `a#` prefix contains an illegal CQL character in order to ensure that these keyspaces are never created via CQL.	2020-02-13 09:46:19 +01:00
Konstantin Osipov	b531a6fe82	query_processor: set raw_cql_statement consistently raw_cql_statement is a member of prepared_statement which is not set in its constructor because prepared_statement constructor has too many call sites inside cql_statement hierarchy. cql_statement and prepared_statement dependency form a cycle and long term it obviously should be fixed. As a quick fix to query processor tracing, consistently assign raw_cql_statement in all prepared_statement usage sites.	2020-02-13 11:18:32 +03:00
Piotr Sarna	dcf54331ea	alternator: allow custom names for keyspaces The maybe_create_keyspace utility now accepts a parameter - the desired name for a newly created keyspace.	2020-02-13 09:16:37 +01:00
Piotr Sarna	e93c54e837	db,view: fix generating view updates for partition tombstones The update generation path must track and apply all tombstones, both from the existing base row (if read-before-write was needed) and for the new row. One such path contained an error, because it assumed that if the existing row is empty, then the update can be simply generated from the new row. However, lack of the existing row can also be the result of a partition/range tombstone. If that's the case, it needs to be applied, because it's entirely possible that this partition row also hides the new row. Without taking the partition tombstone into account, creating a future tombstone and inserting an out-of-order write before it in the base table can result in ghost rows in the view table. This patch comes with a test which was proven to fail before the changes. Branches 3.1,3.2,3.3 Fixes #5793 Tests: unit(dev) Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>	2020-02-12 23:16:30 +02:00
Tomasz Grabiec	3252068588	Merge "Multiple cleanups in cql3" from Kostja These series were born when working on debugging (missing) query processor trace-level logging, and trying to identify all entry points into parsed_statement::prepare(). Unfortunately I was unable to easily merge prepared_statement and cql_statement objects. Rationale for individual patches is given in commit comments.	2020-02-12 17:33:39 +01:00
Nadav Har'El	b93204d6bf	Alternator: allow CreateTable with streams explicitly turned off While Alternator doesn't yet support creating a table with streams (i.e., CDC) turned on, we should only failed the creation if streams were really turned on. If the StreamSpecification option exists, but does not ask to turn on streams, we should not fail the creation - and this patch fixes this. This patch also adds two tests - one where StreamSpecification is passed but does not ask to turn on streams (so table creation should succeed), and another test which explicitly requests to turn on streams. The second test still xfails on Alternator, and should continue to do so until we implement streams (we do not want to silently ignore a request to turn on streams). Fixes #5796 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200212100546.16337-1-nyh@scylladb.com>	2020-02-12 17:29:02 +01:00
Avi Kivity	48b694df55	cql3: like_matcher: pimplify to reduce inclusions of boost/regex boost/regex has huge header dependencies amounting to tens of thousands of lines. This are now replicated in 167 translation units. This patch converts like_matcher to use the pointer-to-implementation idiom, which reduces the number of translations including boost/regex to 28. Since regular expressions are relatively expensive, and like_matcher is relatively rare, the extra memory usage and run time will be negligible. Message-Id: <20200211170152.809554-1-avi@scylladb.com>	2020-02-12 17:04:12 +02:00
Konstantin Osipov	d4866c1a28	cql3: remove prepared alias for prepared_statement cql3 has cql_statement, parsed_statement and prepared_statement classes, which, largely, stand for the same thing. prepared was an alias for prepared_statement which only required an extra tag jump in IDE and carried no meaning.	2020-02-12 16:44:43 +03:00
Konstantin Osipov	cfdef844d8	cql3: remove unused include from parsed_statement.hh	2020-02-12 16:44:43 +03:00
Konstantin Osipov	bcb094c87a	query_processor: move parsed_statement definition to raw/ This is where parsed_statement declaration resides, put the definition next to declaration as is conventional for the rest of the classes.	2020-02-12 16:44:43 +03:00
Konstantin Osipov	93db4d748c	query_processor: fold one execute_internal() into another. All internal execution always uses query text as a key in the cache of internal prepared statements. There is no need to publish API for executing an internal prepared statement object. The folded execute_internal() calls an internal prepare() and then internal execute(). execute_internal(cache=true) does exactly that.	2020-02-12 16:44:12 +03:00
Konstantin Osipov	2e07c76153	query_processor: rename process_statement_prepared Rename process_statement_prepared to execute_prepared for consistency with the rest of query_processor API.	2020-02-12 16:37:08 +03:00
Konstantin Osipov	1a53458239	query_processor: rename one overload of process() Rename an overloaded function process() to execute_direct(). Execute direct is a common term for executing a statement that was not previously prepared. See, for example SQLExecuteDirect in ODBC/SQL CLI specification, mysql_stmt_execute_direct() in MySQL C API or EXECUTE DIRECT in Postgres XC.	2020-02-12 16:36:56 +03:00
Konstantin Osipov	170d41acf4	query_processor: fold process_statement_unprepared into process() process_statement_unprepared() is used in ::process() only and can be inlined. This will simplify understading CQL log output.	2020-02-12 16:22:15 +03:00
Piotr Sarna	f4e51a96ca	alternator: replace overloaded with overloaded_functor Turns out we already have a utility header for a visitor with overloaded lambdas. This patch purges the explicit reimplementation of the same trick and uses the existing class instead. Message-Id: <60c0b9a978f8208b188ef6ddc0564cb133bed707.1581496049.git.sarna@scylladb.com>	2020-02-12 14:21:42 +02:00
Amnon Heiman	8581617e78	api/storage_service: protect the objects during function call The list_snapshot API, uses http stream to stream the result to the caller. It needs to keep all objects and stream alive until the stream is closed. This patch adds do_with to hold these objects during the lifetime of the function. Fixes #5752	2020-02-12 13:08:34 +02:00
Calle Wilund	5e46079e89	exceptions: Set correct error code in truncate_exception Refs #4924 truncate_exception should, like its origin counterpart, set error code to TRUNCATE_ERROR, not PROTOCOL_ERROR. tests: unit + partial dtest Message-Id: <20200212100920.14478-1-calle@scylladb.com>	2020-02-12 11:17:16 +01:00
Avi Kivity	da00530464	Update seastar submodule * seastar 1c7bccc500...6d2ed8cdc6 (11): > connect_test: keep socket alive until the end. > Merge "Add timeout to smp::submit_to() and friends" from Botond > reactor: use reference to addrlen in accept > tests: stall_detector_test: use same clock as in test as in the detector > reactor: fallback to epoll backend when fs.aio-max-nr is too small > util: move read_sys_file_as() from iotune to seastar header, rename read_first_line_as() > core/resources: fix cpuset error > distributed_tests: increase sleep time further > core: thread: Fix compilation error in comment > reactor: specialize the pollable_fd_state > build: Use with -fstack-clash-protection when using guard pages	2020-02-12 12:07:00 +02:00
Avi Kivity	a8a4e584ec	Merge "Move token_metadata from storage_service" from Pavel " Lots of code needs storage_service just to get token_metadata from. This creates unwanted dependency loops and increases the use of global storage_service instance. This set keeps the sharded<locator::token_metadata> on main's stack and carries the references where needed. This removes the dependency on storage_service from: - storage_proxy - gossiper - redis - batchlog manager and makes the database only need it for sstables_format (will fix in one of the next sets). Also, this set is the prerequisite for controlling the copying of token_metadata instances (spotted two occurrences in bootstrap code). Tests: unit(dev), manual start-stop " * 'br-token-metadata-standalone-2' of https://github.com/xemul/scylla: api: Keep and use reference on token_metadata redis: Use proxy token_metadata gossiper: Keep needed for failure_detection values on board database: Use own token_metadata batchlog: Use token_metadata from proxy proxy: Use own token_metadata gossiper: Use own token_metadata tokens: Switch into standalone sharded instance batchlog: Use in-config ring-delay database: Have it in size_estimate_virtual_reader storage_proxy: Pass token_metadata in some static helpers storage_service: Move get_local_tokens wrapper size_estimates_virtual_reader: Make get_local_ranges static migration_manager: Refactor validation of new/updating ksm storage_service: Tiny cleanup of excessive self-reference	2020-02-11 19:15:22 +02:00
Botond Dénes	7d3bce403d	sstables: compaction_stop_exception: add retry flag Allow the thrower to communicate that it doesn't want the compaction to be retried later. I know, using exceptions for control flow is very bad, but this is the existing mechanism to stop a compaction and I don't want to invent a new one for this. Also massage the error messages a bit to take the value of this flag into consideration.	2020-02-11 18:38:35 +02:00
Avi Kivity	ba30a4074d	Merge "stop passing tracing state pointer in client_state" from Gleb " client_state is used simultaneously by many requests running in parallel while tracing state pointer is per request. Both those facts do not sit well together and as a result sometimes tracing state is being overwritten while still been used by active request which may cause incorrect trace or even a crash. " Fixes #5700. * 'gleb/tracing_fix_v1' of github.com:scylladb/seastar-dev: client_state: drop the pointer to a tracing state from client_state transport: pass tracing state explicitly instead of relying on it been in the client_state alternator: pass tracing state explicitly instead of relying on it been in the client_state	2020-02-11 17:59:20 +02:00
Botond Dénes	8014c7124d	compaction_manager: collect all cleanup related logic in perform_cleanup() Currently the call chain for a cleanup collection looks like this: compaction_manager::perform_cleanup() compaction_manager::rewrite_sstables() table::cleanup_sstables() ... `perform_cleanup()` is essentially empty, immediately deferring to `rewrite_sstables()`. Cleanup related logic is scattered between the latter two methods on the call chain. These methods however recently started serving as generic methods for compactions that want to rewrite each sstable one-by-one, collecting cleanup related ifs in various places. The reason is historic, we first had cleanup, then bolted others on top, trying to share the underlying code as much as possible. It is time this is cleaned up (pun intended). Make `perform_cleanup()` the place where all cleanup related logic is, with the rest of the stack made truly generic.	2020-02-11 17:47:44 +02:00
Botond Dénes	b2dc5d4895	compaction: compaction_descriptor: use compaction options instead of cleanup flag Instead of the restrictive `cleanup` boolean flag, which allows for choosing between only two compaction types, use `compaction_options`, which in addition to allowing any number of compaction types to be selected, also allows seamlessly passing specific options to them.	2020-02-11 17:47:44 +02:00
Botond Dénes	8579bef076	compaction: introduce compaction_options Currently the compaction API is quite restrictive. It offers a generic `compact_sstables()` and `reshard_sstables()` methods. The former is the one used by all but resharding, however it only really supports two modes: regular and cleanup. The latter is supported by a semi-hidden `cleanup` flag in `compaction_description`. Actually there are two more compaction types already which are piggy-backed on cleanup: upgrade and scrub. The upper layers distinguish between actual cleanup and "fake" cleanup by a `is_actual_cleanup` flag. The latter two "fake" cleanup compactions cannot be distinguished even by the upper layers. This is terribly confusing and hard to follow, in addition to being restrictive. This worked so far, because upgrade is served quite well by the cleanup compaction type, turning off certain preparations by the above mentioned `is_actual_cleanup` flag. Scrub is barely implemented and just an upgrade behind the scenes. This situation is however preventing really specializing each compaction. Enter `compaction_options`. This variant in disguise is designed to allow passing specific option to each compaction type, and doubles as an enum allowing more than two low level compaction type. This patch only adds the option class itself, propagating and handling it will be done by the next patches.	2020-02-11 17:47:44 +02:00
Botond Dénes	6bc3b41c20	compaction: compaction_type: add Upgrade Although we currently do support upgrade compaction, it is piggy-backed on top of cleanup compaction. This is soon going to change, so in preparation to that, add an `Upgrade` member to the `compaction_type` enum.	2020-02-11 17:47:44 +02:00
Botond Dénes	0b53ccaecd	table: cleanup_sstables(): only short-circuit on actual cleanup Currently the cleanup call is short circuited if it is determined that cleanup is not needed for the sstable to-be-cleaned-up. This is undesired because actually not just cleanup uses this routine to rewrite sstables, sstable-upgrade and sstable-scrub also uses it, and they want to go on with the cleanup compaction sstables even if all data in it belongs to the current node. Fix: #5699	2020-02-11 17:47:44 +02:00
Nadav Har'El	9fad494572	merge: Reduce #include bloat around cql3 internals from non-cql3 users Merged pull request https://github.com/scylladb/scylla/pull/5755 from Avi Kivity: This series removes some #include dependencies around cql3. It results in 30k line (6.6%) reduction in the preprocessed size of database.i, mainly due to elimination of boost::regex (which was brought in in turn by like_matcher). This should result in fewer and faster recompiles. commits: tracing: remove #include of modification_statement.hh from table_helper cql3: selection: remove now-unneeded include of statement_restrictions.hh cql3: deinline result_set_builder::restrictions_filter constructor view_info: remove include of select_statement.hh cql3: selection: remove unnecessary include of selector_factories cql3: query_processor: reduce #includes	2020-02-11 15:58:29 +02:00
Juliusz Stasiewicz	67b92c584f	cdc: set TTLs on CDC log cells Cells in CDC logs used to be created while completely neglecting TTLs (the TTLs from `cdc = {...'ttl':600}`). This patch adds TTLs to all cells; there are no row markers, so wee need not set TTL there. Fixes #5688	2020-02-11 12:56:41 +01:00
Eliran Sinvani	9eb6ac7162	docker: add rsyslog for syslog support One of the logging options for Scylla is syslog, this method, until today wasn't supported in the docker images that are created with the Dockerfile in the repo. This commit add rsyslog installation, configuration and setup for Docker. Tests: built and ran the docker and validated the existance of the /dev/log socket. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20200210112448.210169-1-eliransin@scylladb.com>	2020-02-11 13:30:59 +02:00
Tomasz Grabiec	165913598b	Revert "features: Stop on shutdown" This reverts commit `ca55c6c15f`. Triggers the broken promise exception on aborted stop. If the feature service is stopped without enabling some features, the later may end up with "broken promise" exception on futures attached to the _pr promise.	2020-02-11 11:57:22 +01:00
Botond Dénes	3164456108	row: append(): downgrade assert to on_internal_error() This assert, added by `060e3f8` is supposed to make sure the invariant of the append() is respected, in order to prevent building an invalid row. The assert however proved to be too harsh, as it converts any bug causing out-of-order clustering rows into cluster unavailability. Downgrade it to on_internal_error(). This will still prevent corrupt data from spreading in the cluster, without the unavailability caused by the assert. Fixes: #5786 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>	2020-02-11 11:07:42 +02:00
Piotr Sarna	b977aa034b	Merge 'cdc: disallow negative TTL values in CDC options' from Juliusz Setting TTL = -1 in cdc_options prevents any writes to CDC log. But enabling CDC and having unwritable log table makes no sense. Notably, normal writes USING TTL -1 are forbidden. This patch does the same to TTLs in CDC options. Fixes #5747 * jul-stas/5747-cdc-disallow-negative-ttl: tests/cdc: added test for exception when TTL < 0 cdc: disallow negative TTL values in CDC	2020-02-11 09:23:56 +01:00
Pavel Emelyanov	ac998e9576	repair: Do not explicitly switch sched group When registering callbacks for row-level repair verbs the sched groups is assigned automatically with the help of messaging_service::scheduling_group_for_verb. Thus the the lambda will be called in the needed sched group, no need for manual switch. This removes the last occurence of global storage_service usage from row-level repair. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 22:15:44 +03:00
Pavel Emelyanov	ccc102affa	repair: Use db from callee The do_repair_start() emulates db.invoke_on_all and can re-use the db.local() inside without the need to call for global storage_service instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 22:13:03 +03:00
Pavel Emelyanov	c6ddd21c50	repair_writer: Use db from repair_meta The caller of repair_writer.create_writer al ready have the needed reference on database, no need to get it from global storage_service instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 22:10:42 +03:00
Juliusz Stasiewicz	c0edc2bf53	tests/cdc: added test for exception when TTL < 0	2020-02-10 19:13:59 +01:00
Pavel Emelyanov	5434e412e4	api: Keep and use reference on token_metadata	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	4b2307c8b6	redis: Use proxy token_metadata This removes dependency between redis and storage_service	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	eb827c9f5d	gossiper: Keep needed for failure_detection values on board And drop the gossiper -> storage_service link Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	1a3f78a57d	database: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	7cdfd94207	batchlog: Use token_metadata from proxy This kills the second global reference on storage_service from batchlog code and breaks the dependency loop between these two. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	fecea1de7e	proxy: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	2f3490dc8d	gossiper: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	c5997b573c	tokens: Switch into standalone sharded instance Way too many places in code needs storage_service just for token_metadata. These references increase the amount of get(_local)?_storage_service() calls and create loops in components dependencies. Keep the token_metadata separately from storage_service and pass instances' references where needed (for now -- only into the storage_service itself). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	b4e66ddf1d	batchlog: Use in-config ring-delay This kills the first (out of two) global reference on storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	9257346c18	database: Have it in size_estimate_virtual_reader This is to remove the last global reference on storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	bf5be0e971	storage_proxy: Pass token_metadata in some static helpers Soon there will be token_metadata on storage_proxy, so prepare for that in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	6050c559a3	storage_service: Move get_local_tokens wrapper This wrapper just makes sure the system_keyspace::get_saved_tokens reports non empty result. Move them close together. As a side effect -- get rid of penultimate global storage_service reference from size_estimates_virtual_reader (the last one will be removed soon). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:31 +03:00
Piotr Sarna	bfd7d74b0f	Merge 'Protect CDC-related tables from being modified by the user' from Piotr This patch introduces following modifications: Disallows enabling cdc for table X when X_scylla_cdc_log already exists, Restricts DROP permissions for X_scylla_cdc_log tables, Restricts ALTER and DROP permissions for cdc_description and cdc_topology_description, Disallows cdc option when creating materialized views. Refs #4991. Tests: unit(dev). * piodul/4991-permissions-for-cdc-tables: cdc: disallow CDC options for materialized views cdc: restrict permissions on cdc_(topology_)description cdc: restrict permissions on _scylla_cdc_log tables cdc: refuse to enable cdc when table _scylla_cdc_log exists	2020-02-10 18:02:43 +01:00
Raphael S. Carvalho	140520ff87	sstables/compaction_manager: add metric for pending compaction tasks we have compaction_manager.compactions metric for the number of active tasks, but they don't account for tasks blocked waiting for an opportunity to run, and they're the problematic ones. Fixes #5254. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200210131929.30981-1-raphaelsc@scylladb.com>	2020-02-10 17:55:02 +01:00
Pavel Emelyanov	17db6df15c	size_estimates_virtual_reader: Make get_local_ranges static There's the call of the same name in storage_service, so make this one explicitly static for better readability. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:39 +03:00
Pavel Emelyanov	de1dc59548	migration_manager: Refactor validation of new/updating ksm The goal is to have token_metadata reference intide the keyspace_metadata.validate method. This can be acheived by doing the validation through the database reference which is "at hands" in migration_manager. While at it, merge the validation with exists/not-exists checks done in the same places. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:38 +03:00
Pavel Emelyanov	01a28867d6	storage_service: Tiny cleanup of excessive self-reference Do not use get_local_storage_service inside storage_service method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:38 +03:00
Piotr Dulikowski	949642b866	cdc: disallow CDC options for materialized views While it didn't have any effect, it was possible to supply cdc options for a materialized view. This change disallows it.	2020-02-10 15:51:11 +01:00
Piotr Dulikowski	81fa59e178	cdc: restrict permissions on cdc_(topology_)description Following permissions are disallowed on cdc_description and cdc_topoplogy_description: ALTER, DROP.	2020-02-10 15:40:48 +01:00
Piotr Dulikowski	6fe4f9ded8	cdc: restrict permissions on _scylla_cdc_log tables Disallows DROP permission on CDC log tables.	2020-02-10 15:40:48 +01:00
Piotr Dulikowski	0c18742997	cdc: refuse to enable cdc when table _scylla_cdc_log exists	2020-02-10 15:40:48 +01:00
Gleb Natapov	31cf2434d6	client_state: drop the pointer to a tracing state from client_state client_state is shared between requests and tracing state is per request. It is not safe to use the former as a container for the later since a state can be overwritten prematurely by subsequent requests.	2020-02-10 14:59:22 +02:00
Takuya ASADA	43097854a5	dist/debian: keep /etc/systemd .conf files on 'remove' Since dpkg does not re-install conffiles when it removed by user, currently we are missing dependencies.conf and sysconfdir.conf on rollback. To prevent this, we need to stop running 'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'. Fixes #5734	2020-02-10 14:54:25 +02:00
Gleb Natapov	9f1f60fc38	transport: pass tracing state explicitly instead of relying on it been in the client_state Multiple requests can use the same client_state simultaneously, so it is not safe to use it as a container for a tracing state which is per request. Currently next request may overwrite tracing state for previous one causing, in a best case, wrong trace to be taken or crash if overwritten pointer is freed prematurely. Fixes #5700	2020-02-10 14:54:15 +02:00
Gleb Natapov	38fcab3db4	alternator: pass tracing state explicitly instead of relying on it been in the client_state Multiple requests can use the same client_state simultaneously, so it is not safe to use it as a container for a tracing state which is per request. This is not yet an issue for the alternator since it creates new client_state object for each request, but first of all it should not and second trace state will be dropped from the client_state, by later patch.	2020-02-10 14:50:55 +02:00
Juliusz Stasiewicz	133156ddcf	cdc: disallow negative TTL values in CDC	2020-02-10 13:50:00 +01:00
Kamil Braun	6c4f2b9717	storage_service: check for CDC flag in start_gossiping This is a bug: we tried to retrieve the CDC streams timestamp even if CDC flag was not enabled in storage_service::start_gossiping.	2020-02-10 14:30:35 +02:00
Takuya ASADA	b6988112b4	scylla_post_install.sh: fix operator precedence issue with multiple statements In bash, 'A \|\| B && C' will be problem because when A is true, then it will be evaluates C, since && and \|\| have the same precedence. To avoid the issue we need make B && C in one statement. Fixes #5764	2020-02-10 14:29:40 +02:00
Avi Kivity	bed61b96a2	Merge "Move features from storage- into feature-service" from Pavel " There's a lot of code around that needs storage service purely to get the specific feature value (cluster_supports_<something> calls). This creates several circular dependencies, e.g. storage_service <-> migration_manager one and database <-> storage_servuce. Also features sit on storage_service, but register themselfs on the feature_service and the former subscribes on them back which also looks strange. I propose to keep all the features on feature_service, this keeps the latter intependent from other components, makes it possible to break one of the mentioned circle dependencyand heavily relax the other. Also the set helps us fighting the globals and, after it, the feature_service can be safely stopped at the very last moment. Tests: unit(dev), manual debug build start-stop " * 'br-features-to-service-5' of https://github.com/xemul/scylla: gossiper: Avoid string merge-split for nothing features: Stop on shutdown storage_service: Remove helpers storage_service: Prepare to switch from on-board feature helpers cql3: Check feature in .validate database: Use feature service storage_proxy: Use feature service migration_manager: Use feature service start: Pass needed feature as argument into migrate_truncation_records features: Unfriend storage_service features: Simplify feature registration features: Introduce known_feature_set features: Move disabled features set from storage_service features: Move schema_features helper features: Move all features from storage_service to feature_service storage_service: Use feature_config from _feature_service features: Add feature_config storage_service: Kill set_disabled_features gms: Move features stuff into own .cc file migration_manager: Move some fns into class	2020-02-09 19:22:07 +02:00
Calle Wilund	af963e76c7	keyspace/distributed_loader: Add wait for (user) keyspace population to finish Allows caller to check/wait for a given user keyspace to finish populating on boot. Can be called at any time, though if called before population starts, it will wait until it either starts and we can determine that the keyspace does not need populating, or population finishes. tests: unit Message-Id: <20200203151712.10003-1-calle@scylladb.com>	2020-02-09 18:56:22 +02:00
Pavel Emelyanov	d1775dd701	utils: Move disk-error-handler into it The disk-error-handler is purely auxiliary thing that helps propagating IO errors to the rest of the code. It well deserves not sitting in the root namespace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112443.18475-1-xemul@scylladb.com>	2020-02-09 17:26:52 +02:00
Pavel Solodovnikov	bcc4647552	lwt: fix handling of nulls in parameter markers for LWT queries This patch affects the LWT queries with IF conditions of the following form: `IF col in :value`, i.e. if the parameter marker is used. When executing a prepared query with a bound value of `(None,)` (tuple with null, example for Python driver), it is serialized not as NULL but as "empty" value (serialization format differs in each case). Therefore, Scylla deserializes the parameters in the request as empty `data_value` instances, which are, in turn, translated to non-empty `bytes_opt` with empty byte-string value later. Account for this case too in the CAS condition evaluation code. Example of a problem this patch aims to fix: Suppose we have a table `tbl` with a boolean field `test` and INSERT a row with NULL value for the `test` column. Then the following update query fails to apply due to the error in IF condition evaluation code (assume `v=(null)`): `UPDATE tbl SET test=false WHERE key=0 IF test IN :v` returns false in `[applied]` column, but is expected to succeed. Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286) Fixes: #5710 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com>	2020-02-09 16:50:42 +02:00
Avi Kivity	b26ded8ec5	tracing: remove #include of modification_statement.hh from table_helper Replace with a forward declration to reduce #include bloat and dependencies.	2020-02-09 13:04:13 +02:00
Avi Kivity	f8e85e5c2a	cql3: selection: remove now-unneeded include of statement_restrictions.hh Actual users gain #includes of statement_restrictions and query_options that they previously got through selection.hh.	2020-02-09 13:01:32 +02:00
Avi Kivity	710e4ec99d	cql3: deinline result_set_builder::restrictions_filter constructor It stands in the way of #include removal, so it must go. It should have no performance impact as it is too large to be inlined.	2020-02-09 13:00:17 +02:00
Avi Kivity	c6118d96d2	view_info: remove include of select_statement.hh It is not needed by users of view_info.	2020-02-09 12:43:33 +02:00
Avi Kivity	7474db4075	cql3: selection: remove unnecessary include of selector_factories It is only mentioned in the header file, so the forward declaration can be used and the include moved to the real users.	2020-02-09 12:37:36 +02:00
Avi Kivity	dcab666d52	cql3: query_processor: reduce #includes query_processor is a central class, so reducing its includes can reduce dependencies treewite. This patch removes includes for parsed_statement, cf_statement, and untyped_result_set and fixes up the rest of the tree to include what it lacks as a result of these removals.	2020-02-09 12:24:24 +02:00
Nadav Har'El	576f80be74	alternator-test: add comprehensive tests for KeyConditionExpression This patch adds comprehensive tests for KeyConditionExpression, the newer DynamoDB API syntax for specifying the item range which is requested from a Query (this syntax replaced the older KeyConditions syntax, which Alternator already supports). Before this patch, we had only a small test for KeyConditionExpression in test_query.py. This patch replaces it by a large number of small tests, testing the many sub-features of KeyConditionExpression - its different operators, sort-key types, different failure modes, etc. As usual, because we haven't yet implemented this feature in Alternator (see issue #5037), all these tests pass on AWS, but xfail on Alternator. Despite the new test file containing about 40 small tests, it finishes very quickly because we use pytest's fixture feature to allow small read-only tests to perform a query to a partition that is only written once for many tests. So these small tests become extremely fast, and there is no downside to having many small tests instead of lumping them into fewer large tests checking many things. Refs #5037. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200207134159.3283-1-nyh@scylladb.com>	2020-02-08 11:10:09 +02:00
Piotr Dulikowski	534e9ba27d	cdc: store information on ttl in "ttl" column, not in tuples This patch changes the way TTL is stored in the CDC log table. Instead of including TTL of cell `X` in the third element of the tuple in column `_X`, TTL is written to the previously unused column `ttl`. This is done for cosmetic purposes. This implementation works under assumption that there will be only one TTL included in a mutation coming from a CQL write. This might not be the case when writing a batch that modifies the same row twice, e.g.: ``` BATCH INSERT INTO ks.t (pk, ck, v1) VALUES (1,2,3) USING TTL 10; INSERT INTO ks.t (pk, ck, v2) VALUES (1,2,3) USING TTL 20; END BATCH ``` In this case, this implementation will choose only one TTL value to be written in the CDC log: ``` ... \| batch_seq_no \| _ck \| _pk \| _v1 \| _v2 \| operation \| ttl ...-+--------------+-----+-----+--------+--------+-----------+----- ... \| 0 \| 2 \| 1 \| (0, 3) \| (0, 3) \| 1 \| 20 ``` This behavior might be changed as a part of issue #5719, which considers splitting a batch write mutation when it contains multiple writes to the same row. Refs #5689 Tests: unit(dev)	2020-02-08 11:10:09 +02:00
Pavel Emelyanov	e2ec5eecf6	view_update: Do not need storage_proxy The view_update_generator acceps (and keeps) database and storage_proxy, the latter is only needed to initialize the view_updating_consumer which, in turn, only needs it to get database from (to find column family). This can be relaxed by providing the database from _generator to _consumer directly, without using the storage_proxy in between. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112427.18419-1-xemul@scylladb.com>	2020-02-07 13:30:01 +02:00
Pavel Emelyanov	00746d6a16	dht: Use const reference for token_metadata arg Two places in dht code have token_metadata _value_ arguments, but only read tokens from them. Optimize it a bit by turning values into const references. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112408.18352-1-xemul@scylladb.com>	2020-02-07 13:30:00 +02:00
Avi Kivity	5950a9e37f	.dockerignore: add testlog testlog files are not used when preparing the frozen toolchain, and can be very large, so ignore them in order to speed up the docker build.	2020-02-07 08:59:39 +01:00
Gleb Natapov	ff88ff880b	lwt: use cached truncation record instead of quering the database Message-Id: <20200206163838.5220-3-gleb@scylladb.com>	2020-02-06 18:15:48 +01:00
Gleb Natapov	20bf3800f3	database: cache truncation time in table objects Truncation time is used on each LWT request now, so reading it from the table is too heave operation to be on a fast path. It also requires jumping to a shard that contains corresponding data. This patch caches the data on the table object of each shard for easy access. The cache is initialized during boot from system.truncated table and updated on each truncation operation. Message-Id: <20200206163838.5220-2-gleb@scylladb.com>	2020-02-06 18:15:48 +01:00
Takuya ASADA	5d82fcf944	dist/ami: use prebuilt rpms on --localrpm We made --localrpm option to automatically build rpms from sourcecode, but we actually use the option to produce AMI using prebuilt rpm on our CI. To simplified the script, and to prevent accsidently start rpm build in the script, drop rpm build part.	2020-02-06 18:41:52 +02:00
Amnon Heiman	687e554737	api/storage_service: use stream in get_snapshots get_snapshot should use http stream to reduce memory allocation and stalls. This patch change the implementation so it would stream each of the snapshot object instead of creating a single response and return it. Fixes #5468 Depends on scylladb/seastar#723	2020-02-06 18:40:37 +02:00
Takuya ASADA	c44f347886	SCYLLA-VERSION-GEN: skip updating version files when git hash unchanged On our build system we tries to build relocatable package multiple times on same revision of the repository, it executes ./SCYLLA-VERSION-GEN for each time. When the build job invoked at midnight and it did not finished until 12:00AM, first build and last build has different SCYLLA-RELEASE-FILE, since it contains current date. To prevent it, skip updating SCYLLA-*-FILE when git hash unchanged. Fixes scylladb/scylla-pkg#826	2020-02-06 18:36:46 +02:00
Botond Dénes	05116ba963	reader_concurrency_semaphore: make signal() noexcept Currently reader_concurrency_semaphore::signal() can fail. This is dangerous in two ways: * It is called from constructors, so the exception can bring down the node. This will convert an `std::bad_alloc` to a crash. * Reads in the queue will be blocked until they either time-out, or another `signal()` succeeds. To solve this, wrap the `reader_permit` constructor, the only code that can throw, with try-catch and forward the exception to the reader admission promise. In practice this will result in the flushing of the reader queue, when we fail to admit a read. Fixes #5741 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200206154238.707031-1-bdenes@scylladb.com>	2020-02-06 17:51:03 +02:00
Botond Dénes	434d32befe	reader_permit: tidy up reader_permit::memory_units This patch is a bag of fixes/cleanups that were omitted from the reader memory tracking series due to contributor error. It contains the following changes: * Get rid of unused `increase()` and `decrease()` methods. * Make all constructors and assignment operators `noexcept`. * Make move assignment operator safe w.r.t. self assignment. * `reset()`: consume the new amount before releasing the old amount, to prevent a transient window where new readers might be admitted. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200206143007.633069-1-bdenes@scylladb.com>	2020-02-06 16:35:07 +02:00
Piotr Sarna	757c1cf91e	Merge ' Remove unnecessary schema copies' from Piotr Most of the time schema does not have to be copied and sometimes it's not even used. tests: unit(dev) Closes #5739 * hawk/remove_schema_copies: multishard_mutation_query_test: stop capturing unused schema index_reader: avoid copying schema to lambda	2020-02-06 15:20:24 +01:00
Piotr Jastrzebski	d1fe75edbc	multishard_mutation_query_test: stop capturing unused schema Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 14:18:50 +01:00
Piotr Jastrzebski	8813a6ca2a	index_reader: avoid copying schema to lambda Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 14:10:58 +01:00
Nadav Har'El	abdbb70ad9	Allow configuring alternator write isolation Merged patch series from Piotr Sarna: This series adds a way to confgure alternator write isolation policy per-table with the use of tags. Instead of hardcoded LWT_ALWAYS policy, it can now be set by tagging a table with a tag of the following form: { 'Key': 'system:write_isolation', 'Value': X }, where X is one of the following implemented levels: * 'f' - forbid RMW * 'a' - always enforce RMW * 'o' - only RMW writes will go through LWT * 'u' - unsafe RMW (to be deprecated/eradicated) By default, if no tag is found, alternator falls back to always applying LWT to writes. This series also contains fixes to the tagging interface - some minor issue came up while implementing write isolation config on top of tags. test: alternator-test(local,remote) Piotr Sarna (6): alternator: return tags for a table via const reference alternator: fix overwriting tags alternator: make _write_isolation a protected attribute alternator: add configuring write isolation policy via tags alternator-test: add testing different write isolation policies docs: update alternator on write isolation alternator-test/test_condition_expression.py \| 63 ++++++++++++++ alternator-test/test_tag.py \| 25 ++++++ alternator/executor.cc \| 89 +++++++++++++------- docs/alternator/alternator.md \| 21 +++-- 4 files changed, 162 insertions(+), 36 deletions(-)	2020-02-06 12:37:19 +02:00
Nadav Har'El	8b6925790f	Reduce usage of global_partitioner() Merged pull request https://github.com/scylladb/scylla/pull/5733 from Piotr Jastrzębski: In many places we use global_partitioner() to obtain parameters that are available in config. This PR replaces number of global_partitioner() calls with equivalent non-global ways. tests: unit(dev) * 'reduce_global_usage' of github.com:haaawk/scylla: storage_service: reduce number of global_partitioner calls cdc: remove partitioner from db_context gossiper: stop calling global_partitioner() system_keyspace: stop calling global_partitioner() transport/server: stop calling global_partitioner() thrift: stop calling global_partitioner() partitioner: move cpu_sharding_algorithm_name to token-sharding.hh	2020-02-06 12:10:38 +02:00
Piotr Sarna	9ac35b9367	docs: update alternator on write isolation Docs are appended with information on write isolation - which levels are implemented in alternator and how to configure them properly.	2020-02-06 10:26:26 +01:00
Piotr Sarna	4d3b8e3b5a	alternator-test: add testing different write isolation policies Additional testing is done via: 1. Checking that permissive isolation levels ('a', 'o', 'u') allow conditional writes 2. Checking that 'f' isolation level (forbid rmw) works as expected: - read-modify-write requests are forbidden - non-rmw writes are allowed	2020-02-06 10:26:26 +01:00
Piotr Sarna	4a9536b7c1	alternator: add configuring write isolation policy via tags Until now, write isolation policy was hardcoded to always enforcing LWT. From now on, setting a tag via UpdateTags request or during table creation will associate a policy with given table. The tag key is 'system:write_isolation' and its value can be one of: * 'f' - forbid RMW * 'a' - always enforce RMW * 'o' - only RMW writes will go through LWT * 'u' - unsafe RMW (to be deprecated/eradicated)	2020-02-06 10:26:26 +01:00
Piotr Sarna	0479a1bf67	alternator: make _write_isolation a protected attribute No useful semantic changes yet, but it will help produce better diffs for future patches.	2020-02-06 10:04:34 +01:00
Piotr Sarna	51c14cb1ce	alternator: fix overwriting tags Tagging a resource with a tag key that already exists should result in overwriting the old value. It wasn't the case, so it's now fixed and an appropriate test is added.	2020-02-06 10:04:34 +01:00
Piotr Sarna	ed940f000d	alternator: return tags for a table via const reference The signature of the helper function is changed, so that it's possible to acquire a const reference of the tags, instead of being forced to get a copy of the whole map (potentially large).	2020-02-06 10:04:34 +01:00
Piotr Sarna	f4b6f0956b	Merge "Pending Alternator patches" from Nadav Here is a rebase of some of my already-reviewed Alternator patches - the final piece of the fix to LWT timestamps (in BatchWriteItems), The "/localnodes" request, and a couple of patches reducing the number of times that the global storage_proxy is needed. Also available in a github branch, git@github.com:nyh/scylla.git series1 * nyh/series1: redis: remove redundant code storage_proxy: make it into a peering sharded service alternator: use simpler API for registering Alternator's HTTP URLs alternator-test: test "/localnodes" feature alternator: add public API for list of nodes in current DC alternator: use LWT timestamp - in BatchWriteItems too	2020-02-06 09:48:10 +01:00
Juliusz Stasiewicz	20f7b1b0ad	tests: add test for CDC schema extension Test for functionality added in #5720. Refs #5589	2020-02-06 09:26:13 +01:00
Piotr Jastrzebski	9bfd3dc311	storage_service: reduce number of global_partitioner calls Replace global_partitioner().sharding_ignore_msb() call with config::murmur3_partitioner_ignore_msb_bits() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 08:00:34 +01:00
Piotr Jastrzebski	97262bec82	cdc: remove partitioner from db_context partitioner from cdc::db_context is no longer used so it can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 08:00:01 +01:00
Piotr Jastrzebski	61d8308848	gossiper: stop calling global_partitioner() Obtain name of the default partitioner from config instead of a global. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:59:07 +01:00
Piotr Jastrzebski	8b4ec5b1d2	system_keyspace: stop calling global_partitioner() Obtain name of default partitioner from config instead of a global. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:58:07 +01:00
Piotr Jastrzebski	d3d6547889	transport/server: stop calling global_partitioner() Obtain SCYLLA_SHARDING_IGNORE_MSB and SCYLLA_PARTITIONER from config instead of a global. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:57:06 +01:00
Piotr Jastrzebski	dde8c7df00	thrift: stop calling global_partitioner() Replace global_partitioner().name() call with config::partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:55:54 +01:00
Piotr Jastrzebski	8817a62499	partitioner: move cpu_sharding_algorithm_name to token-sharding.hh Sharding logic has been moved to token-sharding.hh some time ago. This logic does not depend on partitioner any more so cpu_sharding_algorithm_name can be safely moved to the header where rest of sharding logic lives. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:53:45 +01:00
Nadav Har'El	3f27b070e7	redis: remove redundant code In one place, we already had a "proxy" object, but still asked for it again. Remove the redundant line. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	9fd9ec14c2	storage_proxy: make it into a peering sharded service We consider globals like service::get_storage_proxy() a bad idea, and would like to reduce their use as much as possible - and eventually, eliminate it completely. One easy case to fix case is when we already have a shard-local proxy, but now we need the sharded object, to invoke_on() something on it. In this patch, we turn storage_proxy into a peering_sharded_service. This means that if you already have a storage_proxy, you can call its container() function to get the sharded<storage_proxy>, without needing to call the global service::get_storage_proxy(). We found a few such cases in storage_proxy itself, and in Alternator, and fixed them to use container() instead of the global function. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	b262eb5031	alternator: use simpler API for registering Alternator's HTTP URLs We used the Seastar HTTP server's add() method to register URLs to serve (so-called "routes"), but as suggested by Amnon, when we have fixed URLs without parameters being path components, it's simpler to use the put() method to do the same thing - and also results in slightly less work at run-time to look up these routes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	9de26b73a4	alternator-test: test "/localnodes" feature This is a partial test for the "/localnodes" request, which is supposed to return the list of live nodes in this DC. Because of the limitations of our current alternator-test framework (which should work on any pre-existing cluster), we don't know what to expects as a reply, but we just verify the minimum: The request is understood, returns a JSON list, which contains at least one item. As "/localnodes" is a Scylla-only feature, this test is skipped when running with "--aws". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	3fecf6f641	alternator: add public API for list of nodes in current DC If we want to balance the Alternator request load among the different nodes (Refs #5030), the load balancer - whether it uses HTTP load balancing or DNS - needs to be able to get an up-to-date list of live nodes to which it can direct Alternator traffic. This list should include only the live nodes in the same data center (geographical region) - it is expected that a separate load balancer will be installed in each data center, and clients from within this data center will reach this data center's load balancer. There are multiple APIs in current Scylla to do something similar to what we need, but as far as I know, none of them is exactly what we need or convenient for Alternator installations: We don't want the load balancer to use CQL, and the REST API http://localhost:10000/gossiper/endpoint/live/ doesn't do what we need (it doesn't restrict the list to one data center) plus it's not open to connections outside the machine. So in this patch, we implement a new HTTP request on the Alternator port - "/localnodes", returning a JSON-formatted list of all live nodes in the contacted node's data center: $ curl http://localhost:8000/localnodes ["127.0.0.2","127.0.0.1","127.0.0.3"] Like the existing health check HTTP request, this operation is public and unauthenticated. We consider the security risk low - it allows an attacker to enquire the list of Scylla nodes in this DC, but an attacker can achieve the same thing by just scanning the addresses in this subnet using the health check request (or even with ordinary DynamoDB API requests). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	95351016fd	alternator: use LWT timestamp - in BatchWriteItems too A previous patch fixed Alternator's writes to use the timestamp provided by LWT instead of the current timestamp. That patch fixed the PutItem, DeleteItem and UpdateItem operations - and this patch fixes the remaining write operation: BatchWriteItems. So, Fixes #5653. Unfortunatly, the requirements of both BatchWriteItems and LWT make the resulting code - and this patch - somewhat inelegant. BatchWriteItems requires that we prepare all the operations first - failing if any of them has an error. Before this patch, the result of this preparation was an array of mutations, which in a second step we wrote to the database. But we can no longer use mutations for the result of the first step, because creating a mutation requires knowing the timestamp, which we don't know during the preparate phase - we will only know it during the later LWT operation. So now we need to invent a new intermediate format between the request and the mutation. This intermediate format is further complicated by the need to be send it between shards (for LWT's shard forwarding) so it cannot, for example, contain a reference to a schema. The fact that different sub-operations need to be sent to different shards, and that different sub-operations may write to different tables, further complicate the book-keeping and gives us a bunch of funky-typed maps. But eventually it all fits together. After this patch, as before this patch, the same code (now called put_or_delete_item), is used to implement both the PutItem and DeleteItem stand-alone operation, and the BachWriteItems operation which includes a whole list of these PutItem and DeleteItem operation. This patch also includes two more tests in test_batch.py, which test two more corner tests we haven't tested before: One tests the capability of BatchWriteItems to write to more than one table. The other tests that BatchWriteItems can write an empty item (it is not surprising that it does, but we do have special code for this case, so we should test it). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Avi Kivity	27b36beb4a	Update seastar submodule * seastar 30185fd901...1c7bccc500 (8): > reactor: rename kernel_completion::set_value to complete_with > net: Remove unused member variables > net: Fix global buffer overflow around rss_key_type > reactor: remove kernel_completion::set_promise > Merge "generalize the io_desc (now kernel_completion)" from Glauber > everywhere: Disable -Wmisleading-indentation around ragel generated code > core: Make when_all_state_component final > io_tester: Remove unused lambda capture	2020-02-05 20:20:43 +02:00
Gleb Natapov	ff696682ed	add missing include to timestamp.hh The file uses std::string but does include <string> header. My compiler complains. Message-Id: <20200205085739.GN26048@scylladb.com>	2020-02-05 19:42:18 +02:00
Avi Kivity	e719ea1bba	Merge "Fix assert on initialization error" (in large_data_handler) from Rafael " This series fixes an assertion when initialization fails after creating a database. I don't know of a case where that currently happens, but it is easy to cause that when writing a patch and the produced assert is just confusing. " * 'espindola/dont-assert-on-init-error' of https://github.com/espindola/scylla: db: Replace large_data_handler::_stopped with _running db: Move nop_large_data_handler constructor out-of-line db: Move large_data_handler::stop out-of-line	2020-02-05 18:49:11 +02:00
Juliusz Stasiewicz	5127568cc4	cdc: cdc per-table options put into schema extensions With this patch, client tools (in particular cqlsh) get the access to cdc options and will be able to print them with `DESC TABLE`. Fixes #5589	2020-02-05 13:44:39 +01:00
Piotr Sarna	ee244a6d22	Merge 'Make it clear that memory_footprint_test has to be run with -c1' from Piotr This tests fails when run on more than 1 core. Tests: unit(dev) * hawk/fix_memory_footprint: memory_footprint_test: Make it clear it has to run with -c1 tests: move memory_footprint_test to perf/	2020-02-05 12:09:50 +01:00
Avi Kivity	31593e1451	Merge "Change token representation to int64_t" from Piotr " After deprecating partitioners other than Murmur3 we can change the representation of tokens to int64_t. This will allow setting custom partitioner on each table. With this change partitioners become just converters from partition keys to tokens (int64_t). Following operations are no longer dependant on partitioner implementation: - Tokens comparison - Tokens serialization/deserialization to strings - Tokens serialization/deserialization to bytes - Sharding logic - Random token generation This change will be followed by a PR that enables per table partitioner and then another PR that introduces a special partitioner for CDC tables. Tests: unit(dev) Results of memory footprint test: Differences: in cache: 992 vs 984 in memtable: 750 vs 742 sizeof(cache_entry) = 112 vs 104 -- sizeof(decorated_key) = 36 vs 32 MASTER: mutation footprint: in cache: 992 in memtable: 750 in sstable: 351 frozen: 540 canonical: 827 query result: 342 sizeof(cache_entry) = 112 -- sizeof(decorated_key) = 36 -- sizeof(cache_link_type) = 32 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 232 sizeof(lru_link_type) = 16 sizeof(deletable_row) = 168 sizeof(row) = 112 sizeof(atomic_cell_or_collection) = 8 THIS PATCHSET: mutation footprint: in cache: 984 in memtable: 742 in sstable: 351 frozen: 540 canonical: 827 query result: 342 sizeof(cache_entry) = 104 -- sizeof(decorated_key) = 32 -- sizeof(cache_link_type) = 32 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 232 sizeof(lru_link_type) = 16 sizeof(deletable_row) = 168 sizeof(row) = 112 sizeof(atomic_cell_or_collection) = 8 " * 'fixed_token_representation' of https://github.com/haaawk/scylla: (21 commits) token: cast to int64_t not long in long_token murmur3: move sharding logic to token and i_partitioner partitioner: move shard_of_minimum_token to token partitioner: remove token_to_bytes partitioner: move get_token_validator to token partitioner: merge tri_compare into dht::tri_compare partitioner: move describe_ownership to token partitioner: move from_bytes to token partitioner: move from_string to token partitioner: move to_sstring to token partitioner: move get_random_token to token partitioner: move midpoint function to token token: remove token_view sstables: use copy constructor for tokens token: change _data to int64_t partitioner: remove hash_large_token token: change data to array<uint8_t, 8> partitioner: Extract token to separate .hh and .cc files partitioner: remove unused functions Revert "dht/murmur3_partitioner: take private methods out of the class" ...	2020-02-05 12:21:02 +02:00
Piotr Jastrzebski	edd7398a0c	memory_footprint_test: Make it clear it has to run with -c1 The test fails when run on number of cores different than 1. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 10:22:32 +01:00
Piotr Jastrzebski	1a8fe4befd	tests: move memory_footprint_test to perf/ Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 10:18:28 +01:00
Piotr Jastrzebski	6d24f26ff7	token: cast to int64_t not long in long_token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	50cfe81331	murmur3: move sharding logic to token and i_partitioner Since token representation is fixed now, all the partitioners will share the sharding logic. It makes sense now to keep the logic in common super class and separate header that's included only in i_partitioner.cc. shard_of and token_for_next_shard are now implemented in i_partitioner. They would be non-virtual but we have to keep them virtual because one test is overriding them to enforce some specific sharding. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	7eab3024bd	partitioner: move shard_of_minimum_token to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	9c55e5be13	partitioner: remove token_to_bytes i_partitioner::token_to_bytes is just a call to token::data and does not depend on partitioner at all. It is possible to convert token to bytes without having access to partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	d4d55160f0	partitioner: move get_token_validator to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	2c630c5820	partitioner: merge tri_compare into dht::tri_compare Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	d0d8bfaf8c	partitioner: move describe_ownership to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	f845220445	partitioner: move from_bytes to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	8107d99e3d	partitioner: move from_string to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	03bdce2d68	partitioner: move to_sstring to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	9c202b52da	partitioner: move get_random_token to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	f42b1ee819	partitioner: move midpoint function to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	1d1ac476c3	token: remove token_view Now that both token and token_view contain int64_t it makes no sense to keep the view. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	06dfd16aad	sstables: use copy constructor for tokens instead of manually creating new token from another token internals. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	05e0451b27	token: change _data to int64_t Previously _data was stored as array of 8 bytes in network byte order. After this change it stores the same value in int64_t in host byte order. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	fea0187f55	partitioner: remove hash_large_token Now that token representation is always array<uint8_t, 8>, hash<dht::token> will always pick read_le<size_t>(reinterpret_cast<const char*>(b.data())) and never call hash_large_token because the check is always true b.size() == sizeof(size_t). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	b569d127a0	token: change data to array<uint8_t, 8> It is save to do such change because we support only Murmur3Partitioner which uses only tokens that are 8 bytes long. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:30:46 +01:00
Piotr Jastrzebski	0da21c28ab	partitioner: Extract token to separate .hh and .cc files Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:18:24 +01:00
Piotr Jastrzebski	8bd9d3a69e	partitioner: remove unused functions Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:18:24 +01:00
Piotr Jastrzebski	d86548c06e	Revert "dht/murmur3_partitioner: take private methods out of the class" This patch conflicts with the following patches. The final effect is equivalent and it's easier to revert this patch and cleanly apply already reviewed patches. This reverts commit `f4f8593bac`.	2020-02-05 09:18:24 +01:00
Piotr Jastrzebski	08036fc511	murmur3_partitioner: get rid of static shard_of This will enable revert of a commit that creates conflicts with following patches. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:18:24 +01:00
Rafael Ávila de Espíndola	5d4671526c	db: Replace large_data_handler::_stopped with _running This is not just a direct flip to a variable with the negated Boolean value. When created, a large_data_handler is not considered to be running, the user has to call start() before it can be used. The advantaged of doing this is that if initialization fails and a database is destructed before the large_data_handler is started, the assert database::stop() { assert(!_large_data_handler->running()); is not triggered. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-04 21:15:44 -08:00
Rafael Ávila de Espíndola	33dfe34f78	db: Move nop_large_data_handler constructor out-of-line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-04 21:12:01 -08:00
Rafael Ávila de Espíndola	e99a225f25	db: Move large_data_handler::stop out-of-line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-04 21:11:49 -08:00
Rafael Ávila de Espíndola	9eae0b57a3	test: Enable all experimental features in the cql_repl The cql repl will hopefully be used to write most new tests, so it should have all experimental features enabled. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200204173448.95892-1-espindola@scylladb.com>	2020-02-04 19:36:37 +02:00
Avi Kivity	7d70bfe20c	Merge "Lua: Fix handling of list<varint> and list<decimal>" from Rafael " This patch series fixes #5711, enables UDF support in CQL tests and and includes a few extra cleanups. " * 'espindola/lua-fixes' of https://github.com/espindola/scylla: lua: Use a negative index for consistency lua: Fix returning list<decimal> lua: Fix returning list<varint> lua: Use a lua_slice_state instead of a from_lua_visitor test: Enable UDF in the cql repl	2020-02-04 18:51:54 +02:00
Nadav Har'El	acafcbfdf4	alternator: use LWT timestamp, not current timestamp The DynamoDB API doesn't have the notion of client-supplied timestamps, so the server is supposed to use its own current timestamp for write operations. However, for LWT writes, we should not use this node's current time: Different nodes may slightly differ in their clocks, and LWT needs a monotonically-increasing notion of time for the consistent operations. LWT provides to the operation's apply() method the specific timestamp that it should use in its returned mutation - and we should use this timestamp, not the current timestamp. In the optional write modes where LWT is not used, we continue to use the current timestamp (api::new_timestamp()) as before. This patch fixes the PutItem, UpdateItem and DeleteItem operations. The BatchWriteItem operation is not yet fixed by this patch - fixing it will require more elaborate code changes so will be done in a separate patch. Refs #5653. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200130122853.7658-1-nyh@scylladb.com>	2020-02-04 10:18:49 +01:00
Nadav Har'El	0a23471eae	alternator: switch BatchWriteItems to use LWT too Today, we use LWT for all PutItem, UpdateItem and GetItem operations. We do this even for pure writes - writes which do not involve a read before the write). But BatchWriteItem also does pure writes - and it doesn't use LWT yet. So this patch changes it so it does. As before we keep in the code - not yet configurable by a user - also the option to do these unconditional writes without LWT. A BatchWriteItem may change multiple partitions (but a fairly low number - DynamoDB allows each BatchWriteItem to only do 25 updates) and we start the different LWT operations in parallel. This patch collects multiple mutations to the same partition together to be done with a single LWT operation, so we also add a test for this case, were we have a batch of writes involving several items in each of several partitions. Fixes #5637 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200128160538.11775-1-nyh@scylladb.com>	2020-02-04 10:08:18 +01:00
Rafael Ávila de Espíndola	6764316576	cql3: Simplify maybe_quote This produce code that is just as fast as the previous implementation and is quite a bit easier to read IMHO. I benchmarked it by temporally adding: BOOST_AUTO_TEST_CASE(bench_maybe_quote) { std::string val(1 << 20, 'x'); using clk = std::chrono::steady_clock; cql3::util::maybe_quote(val); auto start = clk::now(); for (int i = 0; i < 1000; ++i) { cql3::util::maybe_quote(val); } auto end = clk::now(); std::chrono::duration<double> duration = end - start; std::cout << "delta = " << duration.count() << '\n'; } Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200203225140.180262-1-espindola@scylladb.com>	2020-02-04 10:52:04 +02:00
Avi Kivity	cdecb21b78	Update seastar submodule * seastar 65980a9b30...30185fd901 (12): > sstring: resize: NulTerminate when downsizing > reactor: make open_flags::dsync respect --unsafe-bypass-fsync > json/json_elements: Use double quotes around element name > Revert "reactor: make open_flags::dsync respect --unsafe-bypass-fsync" > Merge "smp: reduce allocations in work_item::process" from Avi > task: optimize destruction by making destructor non-virtual > reactor: make open_flags::dsync respect --unsafe-bypass-fsync > Revert "sstring: resize: NulTerminate when downsizing" > sstring: resize: NulTerminate when downsizing > tests: Rename unix domain socket test for consistency > resource: downgrade cgroupsv2 message. > Merge "Simplify the stream/subscription implementation" from Rafael	2020-02-04 10:20:29 +02:00
Nadav Har'El	3de09042bb	CDC topology change support Merged pull request https://github.com/scylladb/scylla/pull/5485 by Kamil Braun: This series introduces the notion of CDC generations: sets of CDC streams used by the cluster to choose partition keys for CDC log writes. Each CDC generation begins operating at a specific time point, called the generation's timestamp (cdc_streams_timestamp in the code). It continues being used by all nodes in the cluster to generate log writes until superseded by a new generation. Generations are chosen so that CDC log writes are colocated with their corresponding base table writes, i.e. their partition keys (which are CDC stream identifiers picked from the generation operating at time of making the write) fall into the same vnode and shard as the corresponding base table write partition keys. Currently this is probabilistic and not 100% of log writes will be colocated - this will change in future commits, after per-table partitioners are implemented. CDC generations are a global property of the cluster -- they don't depend on any particular table's configuration. Therefore the old "CDC stream description tables", which were specific to each CDC-enabled table, were removed and replaced by a new, global description table inside the system_distributed keyspace. A new generation is introduced and supersedes the previous one whenever we insert new tokens into the token ring, which breaks the colocation property of the previous generation. The new generation is chosen to account for the new tokens and restore colocation. This happens when a new node joins the cluster. The joining node is responsible for creating and informing other nodes about the new CDC generation. It does that by serializing it and inserting into an internal distributed table ("CDC topology description table"). If it fails the insert, it fails the joining process. It then announces the generation to other nodes through gossip using the generation's timestamp, which is the partition key of the inserted distributed table entry. Nodes that learn about the new generation through gossip attempt to retrieve it from the distributed table. This might fail - for example, if the node is partitioned away from all replicas that hold this generation's table entry. In that case the node might stop accepting writes, since it knows that it should send log entries to a new generation of streams, but it doesn't know what the generation is. The node will keep trying to retrieve the data in the background until it succeeds or sees that it is no longer necessary (e.g., because yet another generation superseded this one). So we give up some availability to achieve safety. However, this solution is not completely safe (might break consistency properties): if a node learns about a new generation too late (if gossip doesn't reach this node in time), the node might send writes to the wrong (old) generation. In the future we will introduce a transaction-based approach where we will always make sure that all nodes receive the new generation before any of them starts using it (and if it's impossible e.g. due to a network partition, we will fail the bootstrap attempt). In practice, if the admin makes sure that the cluster works correctly before bootstrapping a new node, and a network partition doesn't start in the few seconds window where a new generation is announced, everything will work as it should. After the learning node retrieves the generation, it inserts it into an in-memory data structure called "CDC metadata". This structure is then used when performing writes to the CDC log -- given the timestamp of the written mutation, the data structure will return the CDC generation operating at this time point. CDC metadata might reject the query for two reasons: if the timestamp belongs to an earlier generation, which most probably doesn't have the colocation property anymore, or if it is picked too far away into the future, where we don't know if the current generation won't be superseded by a different one (so we don't yet know the set of streams that this log write should be sent to). If the client uses server-generated timestamps, the query will never be rejected. Clients can also use client-generated timestamps, but they must make sure that their clocks are not too desynchronized with the database -- otherwise some or all of their writes to CDC-enabled tables will be rejected. In the case of rolling upgrade, where we restart nodes that were previously running without CDC, we act a bit differently - there is no naturally selected joining node which must propose a new generation. We have to select such a node using other means. For this we use a bully approach: every node compares its host id with host ids of other nodes and if it finds that it has the greatest host id, it becomes responsible for creating the first generation. This change also fixes the way of choosing values of the "time" column of CDC log writes: the timeuuid is chosen in a way which preserves ordering of corresponding base table mutations (the timestamp of this timeuuid is equal to the base table mutation timestamp). Warning: if you were running a previous CDC version (without topology change support), make sure to disable CDC on all tables before performing the upgrade. This will drop the log data -- backup it if needed. TODO in future patchset: expire CDC generations. Currently, each inserted CDC generation will stay in the distributed tables forever (until manually removed by the administrator). When a generation is superseded, it should become "expired", and 24 hours after expiration, it should be removed. The distributed tables (cdc_topology_description and cdc_description) both have an "expired" column which can be used for this purpose. Unit tests: dev, debug, release dtests (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/907/	2020-02-04 10:20:29 +02:00
Gleb Natapov	2876482373	lwt: account for cases where LWT request were moved to another shard in statistics Now that we bounce lwt requests to the correct shard before calling into storage_proxy the cross shard op accounting does not account for bounced lwt statement. Fix that by increasing corresponding counter when returning a "bounce" reply. Message-Id: <20200203122011.GH26048@scylladb.com>	2020-02-04 10:20:28 +02:00
Nadav Har'El	37f2f6112e	cql3::util::maybe_quote: avoid stack overflow and fix quote doubling Merged patch series from Benny Halevy: The function was reimplemented to solve the following issues. The cutom implementation also improved its performance in close to 19% Using regex_match("[a-z][a-z0-9_]") may cause stack overflow on long input strings as found with the limits_test.py:TestLimits.max_key_length_test dtest. std::regex_replace does not replace in-place so no doubling of quotes was actually done. Add unit test that reproduces the crash without this fix and tests various string patterns for correctness. Note that defining the regex with std::regex::optimize still ended up with stack overflow. Fixes #5671 cql3::util::maybe_quote: avoid stack overflow and fix quote doubling * cql3::util::maybe_quote: further optimize quote doubling	2020-02-04 10:20:28 +02:00
Nadav Har'El	6e91f159fe	LWT: handle bounce_to_shard result for batch statements Merged patch series from Gleb Natapov: Batch statement can also execute LWT and hence need to handle bounce_to_shard result. * transport: handle bounce_to_shard for batch statement * transport: consolidate bounce_to_shard handling between all three verbs that handle it	2020-02-04 10:20:28 +02:00
Takuya ASADA	1446fe930b	dist/redhat: install specified version of scylla-conf on meta package (#5599 ) To install specified version of scylla-conf package, we need to add it on Requires. Fixes #5639	2020-02-04 10:20:28 +02:00
Benny Halevy	f45fabab73	gossiper: do_stop_gossiping: copy live endpoints vector It can be resized asynchronously by mark_dead. Fixes #5701 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>	2020-02-04 10:20:28 +02:00
Avi Kivity	501b24cad3	test.py: use command line option in preference to environment variable when calling a test Command line options are printed out, so if a user cuts-and-pastes a command line they will get a run that is more similar to the one that the test executed. Message-Id: <20200202133209.209608-1-avi@scylladb.com>	2020-02-04 10:20:28 +02:00
Rafael Ávila de Espíndola	1294770970	lua: Use a negative index for consistency In this case we know the size of the stack and both indexes refer to the same position. Using a negative index is just more consistent with the rest of the file and hopefully a bit less brittle to future changes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:23:09 -08:00
Rafael Ávila de Espíndola	a4d668e8ed	lua: Fix returning list<decimal> We were accessing the wrong stack location if a decimal was not at top of the stack. Fixes: #5711 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:10:04 -08:00
Rafael Ávila de Espíndola	39e637f6bf	lua: Fix returning list<varint> We were accessing the wrong stack location if a varint was not at the top of the stack. Refs: #5711 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:09:59 -08:00
Rafael Ávila de Espíndola	530779efb6	lua: Use a lua_slice_state instead of a from_lua_visitor A few places were using a from_lua_visitor only to access the lua_slice_state member variable. This is just a simplification. No functionality changed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:04:36 -08:00
Rafael Ávila de Espíndola	35023c831c	test: Enable UDF in the cql repl A followup commit will use this to write cql tests for UDF. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 17:58:27 -08:00
Gleb Natapov	9c75a25e9f	transport: consolidate bounce_to_shard handling between all three verbs that handle it All three verbs that need to handle bounce_to_shard have almost identical process_() and process__on_shard() functions. Consolidate them into one to reuse the code.	2020-02-03 14:27:50 +02:00
Gleb Natapov	dd793098fa	transport: handle bounce_to_shard for batch statement Batch statement can also execute LWT and hence need to handle bounce_to_shard result. Fixes: #5644	2020-02-03 14:27:30 +02:00
Pavel Emelyanov	8a7f13420f	gossiper: Avoid string merge-split for nothing The caller of check_knows_remote_features merges a set of features into a string, but the method in question ... splits them back into the set. Avoid this unneeded step and clean the respective storage service helpers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	ca55c6c15f	features: Stop on shutdown The service in question doesn't depend on anything, so it's started first and stopped last. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	f6f76ef8c1	storage_service: Remove helpers The storage_service no longers works as features provider. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	0e62d615ae	storage_service: Prepare to switch from on-board feature helpers There are some places that get global storage_service instance for individual features. In the next patch all these helpers will be removed, so here's the preparation for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	0abddc4557	cql3: Check feature in .validate There's no local variable to get features from in the create_view_statement constructor, but since the .validate is always called after it, it looks safe to check for needed feature in it (we have storage_proxy there). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	abe588888d	database: Use feature service Keep local feature_service reference on database. This relaxes the circular storage_service <-> database reference, but not removes it completely. This needs some args tossing in apply_to_builder, but it's rather straightforward, so comes in the same patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	12c1378be0	storage_proxy: Use feature service Keep reference on local feature service from storage_proxy and use it in places that have (local) storage_proxy at hands. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	4f5b70dcb1	migration_manager: Use feature service This unties migration_manager from storage_service thus breaking the circular dependency between these two. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	74fd3466b5	start: Pass needed feature as argument into migrate_truncation_records As a nice side-effect this stops using global storage service instance by this function. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	aa6b1efc35	features: Unfriend storage_service The storage service no longer needs to mess with feature config. It only needs two features to register onself in, but this can be solved by respective cluster_supports_foo helpers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	9b67226715	features: Simplify feature registration Now features are registered into a map of vectors, but it looks like the vector is always 1-item long and is used to keep pointer on feature, instead of the feature itself. Switch it into map of reference_wrapper-s. Before this patch we could register more than one feature under the same name, now we can't. But this seems to be OK, as we don't actually do this. To catch violations of this restriction there's an assert() in the feature_service::register_feature. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	da6af8bde7	features: Introduce known_feature_set There are two masks -- supported and known. They differ in unbounded_range_tombstones one which is set depending on the sstables format in use. Since the feature_service doesn't know anything about sstables format, the logic is reverted -- the feature service reports back the known mask (all features) and storage_service clears the unbounded_range_tombstones if the sst format is low -- but is (hopefully) left intact. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	4a01f468dd	features: Move disabled features set from storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	a5b1998247	features: Move schema_features helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	b0638606e5	features: Move all features from storage_service to feature_service And leave some temporary storage_service->feature links. The plan is to make every subsystem that needs storage_service for features stop doing so and switch on the feature_service. The feature_service is the service w/o any dependencies, it will be freed last, thus making the service dependency tree be a tree, not a graph with loops. While at it -- make all const-s not have _FEATURE suffix (now there are both options) and const-qualify cluster_supports_lwt(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	49de3b4ad8	storage_service: Use feature_config from _feature_service This makes the testing/prod config logic much simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	052259f8ef	features: Add feature_config Some features take db::config to find out whether to be enabled or disabled. This creates unwanted dependency between database and features, so split the features configuration explicitly. Also this will make the "this is for testing env only" logic cleaner and simpler to understand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	d38f8ca52a	storage_service: Kill set_disabled_features The _disabled_features is configured by tests via storage_service constructor, so the helper in question is effectively useless. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	76a7fd4186	gms: Move features stuff into own .cc file Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:21 +03:00
Kamil Braun	4b3754ff94	docs: add documentation about CDC generations	2020-02-03 10:57:31 +01:00
Kamil Braun	b130b76274	test: disable CDC flag by default When CDC flag is on, the node startup procedure takes a few seconds longer (we have to generate CDC streams). This is not necessary in non-CDC tests.	2020-02-03 10:57:31 +01:00
Kamil Braun	0d41e2c1fe	test: add cdc::generate_timeuuid tests	2020-02-03 10:57:31 +01:00
Kamil Braun	5fb5925fb4	test: add cdc::find_timestamp tests	2020-02-03 10:57:31 +01:00
Kamil Braun	7cb6ac33f5	storage_service: check if we know other nodes' tokens when joining ring If we are a seed node (but not the only one) or we set auto_bootstrap=off, it might happen due to misconfiguration or a network partition that we don't know other nodes' tokens at the end of the join_token_ring function, when we go into the NORMAL status, finishing the joining process. CDC however requires that we know other nodes' tokens at this point: we need them to correctly create a new CDC generation. This commit adds a check which prevents the node from starting if that's not the case. If the check fails, the node first tries waiting a bit until it learns about the tokens or timeouts.	2020-02-03 10:57:28 +01:00
Pavel Emelyanov	7a2123c8dc	migration_manager: Move some fns into class These methods will need to have this-> in one of the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 12:29:54 +03:00
Avi Kivity	2816404f57	test.py: documented exit code value Document our chosen exit failure code value and its relationship to git bisect. Message-Id: <20200202134223.210578-1-avi@scylladb.com>	2020-02-03 00:58:58 +02:00
Avi Kivity	541893e69a	Merge "Fix conversion of lua nil to cql null" from Rafael " The fix itself is fairly simple, but looking at the code I found that our code base was not cleanly distinguishing null and empty values and was treating null and missing values differently, but that distinction was dead since a null is represented as a dead cell. " * 'espindola/lua-fix-null-v6' of https://github.com/espindola/scylla: lua: Handle nil returns correctly types: Return bytes_opt from data_value::serialize query-result-set: Assert that we don't have null values types: Fix comparison of empty and null data_values Revert "tests: Handle null and not present values differently" query-result-set: Avoid a copy during construction types: Move operator== for data_value out-of-line	2020-02-02 15:43:24 +02:00
Avi Kivity	c8890eb124	Merge "Simplify usage of stream subscriptions" from Rafael " In a few places, the only use we had for a subscription was calling done(). With this series we now call done() early and store the future<> instead. " * 'espindola/stream-cleanup' of https://github.com/espindola/scylla: sstable_test: Store a future<> instead of a subscription commitlog: Store a future instead of a subscription in db::commitlog::segment_manager::list_descriptors::helper lister: Store a future<> instead of a subscription	2020-02-02 14:49:00 +02:00
Rafael Ávila de Espíndola	5dfb658e77	build: Add two missing dependencies With this change we always rebuild seastar/libseastar_testing.a for the same reason we always rebuild seastar/libseastar.a: We have no idea what its dependencies are, we have to recurse to seastar to find out. The other missing dependency is that we have to rebuild build.ninja when seastar/CMakeLists.txt changes. A change in seastar/CMakeLists.txt can cause seastar.pc to change which can change the command lines used. That is incomplete as change other seastar files can have the same impact, but it is better than nothing. It is not sufficient to put a dependency in the seastar.pc file as that file will be modified when cmake is run and the scylla ninja process doesn't see the CMakeLists.txt to seastar.pc edge. Fixes: #5687 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200201001126.458992-1-espindola@scylladb.com>	2020-02-01 21:08:26 +02:00
Pavel Emelyanov	4839ca8491	storage_service: Unregister from gossiper notifications ... at all This unregistration doesn't happen currently, but doesn't seem to cause any problems in general, as on stop gossiper is stopped and nothing from it hits the store_service. However (!) if an exception pops up between the storage_service is subscribed on gossiper and the drain_on_shutdown defer action is set up then we _may_ get into the following situation: - main's stuff gets unrolled back - gossiper is not stopped (drain_on_shutdown defer is not set up) - migration manager is stopped (with deferred action in main) - a nitification comes from gossiper -> storage_service::on_change might want to pull schema with the help of local migration manager -> assert(local_is_initialized) strikes Fix this by registering storage_service to gossiper a bit earlier (both are already initialized y that time) and setting up unregister defer right afterwards. Test: unit(dev), manual start-stop Bug: #5628 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200130190343.25656-1-xemul@scylladb.com>	2020-01-31 14:02:18 +01:00
Avi Kivity	ec5b721db7	test: make eventually() more patient We use eventually() in tests to wait for eventually consistent data to become consistent. However, we see spurious failures indicating that we wait too little. Increasing the timeout has a negative side effect in that tests that fail will now take longer to do so. However, this negative side effect is negligible to false-positive failures, since they throw away large test efforts and sometimes require a person to investigate the problem, only to conclude it is a false positive. This patch therefore makes eventually() more patient, by a factor of 32. Fixes #4707. Message-Id: <20200130162745.45569-1-avi@scylladb.com>	2020-01-31 14:02:18 +01:00
Dejan Mircevski	6661ed7de4	cql3: Drop restrictions::values() method No-one seems to invoke this method. Instead, clients invoke restriction::values (note singular "restriction"). Most subclasses of restrictions also inherit from restriction, so values() still exists in their public interface. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-31 13:05:51 +01:00
Avi Kivity	985e00efa6	Merge "Fix the serialization of negative varint values" from Rafael " Benny pointed out that we could avoid a branch inside a loop is the old serialization code. That got me looking at the logic and I found that it would also produce an unnecessary 0xff prefix for some negative numbers. This patch series fixes the serialization and optimizes it. It now does no extra copies for positives numbers and only one extra copy for negative numbers, which I think is optimal since cpp_int uses sign magnitude and we want the 2 complement representation. " * 'espindola/serialize_varint-improvements-v2' of https://github.com/espindola/scylla: types: Use a fancy iterator to avoid a temporary buffer types: Use export_bits to serialize cpp_int types: Avoid a branch in a loop types: Fix encoding of negative varint types: Replace "num.sign() < 0" with "num < 0"	2020-01-30 20:35:54 +02:00
Rafael Ávila de Espíndola	cc81ba3432	types: Use a fancy iterator to avoid a temporary buffer By using a fancy iterator we can avoid calling export_bits with a temporary buffer before copying the result to the output. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	7e67ce0bdb	types: Use export_bits to serialize cpp_int This avoid a copy when serializing positive numbers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	27a67f1a2c	types: Avoid a branch in a loop Thanks to Benny for the suggestion. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	c89c90d07f	types: Fix encoding of negative varint We would sometimes produce an unnecessary extra 0xff prefix byte. The new encoding matches what cassandra does. This was both a efficiency and correctness issue, as using varint in a key could produce different tokens. Fixes #5656 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:25:09 -08:00
Rafael Ávila de Espíndola	ed747122aa	types: Replace "num.sign() < 0" with "num < 0" Surprisingly, this produces better code with cpp_int. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:24:03 -08:00
Rafael Ávila de Espíndola	cc9495d4d3	sstable_test: Store a future<> instead of a subscription The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Rafael Ávila de Espíndola	da984f1f33	commitlog: Store a future instead of a subscription in db::commitlog::segment_manager::list_descriptors::helper The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Rafael Ávila de Espíndola	b88f6edee0	lister: Store a future<> instead of a subscription The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Gleb Natapov	b08679e1d3	db/system_keyspace: use user memory limits for local.paxos table Treat writes to local.paxos as user memory, as the number of writes is dependent on the amount of user data written with LWT. Fixes #5682 Message-Id: <20200130150048.GW26048@scylladb.com>	2020-01-30 17:07:27 +02:00
Piotr Sarna	b783d40aaf	Merge 'Add per scheduling groups statistics' from Eliran This set implements support for per scheduling group statistics in storage proxy and tables view statistics (although tables view per scheduling group stats are not actively applied in this series). Having those statistics per scheduling group can help in finding operations that are performed outside their context, another advantage is that it lays the land for supporting per service level statistics for the workload prioritization enterprise feature. At some point there was a thought to add those stats per role but for now it is not feasible at the moment: 1. The number of roles/user is unbounded so it is dangerous to hold stats (in memory) for all of them. 2. We will need a proper design of how to deal with the hierarchical nature of roles in the stats. Besides these reasons and regardless, it is beneficial to look on resource related stats per scheduling group, looking at resources per user or role will not necessarily give insights since resources are divided per sg and not role, so it can lead to false conclusions if more than one role is attached to the same service level. Tests: unit tests (Dev, Debug) validating the stats with monitor * es/per_sg_stats/v6: storage proxy: migrate to per scheduling group statistics internalize storage proxy statistics metric registration	2020-01-30 15:02:33 +01:00
Eliran Sinvani	971711a546	storage proxy: migrate to per scheduling group statistics This commit builds on top of the introduced per scheduling group statistics template and employs it for achieving a per scheduling group statistics in storage_proxy. Some of the statistics also had meaning as a global - per shard one. Those are the ones for determining if to throttle the write request. This was handled by creating a global stats struct that will hold those stats and by changing the stat update to also include the global one. One point that complicated it is an already existing aggregation over the per shard stats that now became a per scheduling group per shard stats, converting the aggregation to a two-dimensional aggregation. One thing this commit doesn't handle is validating that an individual statistic didn't "cross a scheduling group boundary", such validation is possible but it can easily be added in the future. There is a subtlety to doing so since if the operation did cross to other scheduling group two connected statistics can lose balance for example written bytes and completed write transactions. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:44 +01:00
Eliran Sinvani	8cfc2aad57	internalize storage proxy statistics metric registration The storage proxy statistics structure did not contain a method for registering the statistics for metric groups, instead, each user had to register some of the metrics by itself. There is no real reason for separating the metrics registration from the statistics data. There is even less justification for doing this only for part of the stats as is the case for those statistics. This commit internalize the metrics registration in the storage_proxy stats structures. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:40 +01:00
Gleb Natapov	c138dfd33e	lwt: introduce LWT gossiper feature Do not allow lwt operation if LWT is not enabled by entire cluster. Message-Id: <20200130120912.GV26048@scylladb.com>	2020-01-30 15:12:56 +02:00
Benny Halevy	606db0d412	cql3::util::maybe_quote: further optimize quote doubling Avoid string copies when doubling quotes in the string by counting them when scanning the input string and reserving the required space when making the result std::string. This showed a performance improvement of ~1.8% when running the maybe_quote unit test in tight loop (w/ the shorter strings only) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-30 14:55:51 +02:00
Rafael Ávila de Espíndola	a16cb00719	configure: Don't use -Wno-error when building seastar This depends on the recent patches to avoid warnings in seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200127210833.200410-1-espindola@scylladb.com>	2020-01-30 14:10:18 +02:00
Avi Kivity	09e2556541	Update seastar submodule * seastar 44cf127ee9...65980a9b30 (2): > io_tester: fix the fix for lack of file closing > cmake: Disable broken gcc warning -Warray-bounds	2020-01-30 14:10:18 +02:00
Avi Kivity	b01f0cab60	utils: add missing include for ssize_t gcc 10 tightened its C++ includes to no longer provide ssize_t, so we must get it from a C header instead. Message-Id: <20200129205912.21139-1-avi@scylladb.com>	2020-01-30 14:10:18 +02:00
Avi Kivity	adb64dc72f	treewide: tighten concepts syntax gcc 10 requires a semicolon after every compound requirement, as per the standard. Add missing semicolons where necessary. Message-Id: <20200129205805.20928-1-avi@scylladb.com>	2020-01-30 14:10:18 +02:00
Rafael Ávila de Espíndola	4b4efcf302	types: Remove collection_type_impl::serialize The rest of the serialize api has been devirtualized some time ago, but this auxiliary function stayed virtual. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200129203916.20460-1-espindola@scylladb.com>	2020-01-30 14:10:18 +02:00
Kamil Braun	bd42b10df1	cdc: rename cdc/cdc.{hh,cc} to cdc/log.{hh,cc} To increase modularity, making it easier to find what is where and maintain. The 'log' module (cdc/log.{hh,cc}) is responsible for updating CDC log tables when base table writes are performed. The 'generation' module (cdc/generation.{hh,cc}) handles stream generation changes in response to topology change events. cdc/metadata.{hh,cc} contains a helper class which holds the currently used generation of streams. It is used by both aforementioned modules: 'log' queries it, while 'generation' updates it.	2020-01-30 11:10:39 +01:00
Kamil Braun	1a56310687	locator: remove get_shard_count and get_ignore_msb_bits from snitch Snitch forms a class hierarchy which get_shard_count and get_ignore_msb_bits ignore (their returned values only depend on the gossiper's state). Besides, these functions just don't belong there. Snitch has nothing to do with shard_count or ignore_msb_bits.	2020-01-30 11:10:08 +01:00
Kamil Braun	e91af78cf5	cdc: update streams description table Inform CDC users about newly generated streams.	2020-01-30 11:10:08 +01:00
Kamil Braun	cbe510d1b8	cdc: use stream generations Change the CDC code to use the global CDC stream generations. The per-base-table CDC description table was removed. The code instead uses cdc::metadata which is updated on gossip events. The per-table description tables were replaced by a global description table to be used by clients when searching for streams.	2020-01-30 11:10:08 +01:00
Kamil Braun	8f4a2ba0b9	storage_service: learn about CDC stream generations. When a node learns that another node joins the cluster (or begins the joining process, i.e. bootstrap), it will read the CDC generation timestamp proposed by that node, use it to retrieve the generation from the distributed generations table, and save it in its local generation queue to be used for writing to the CDC log when its local clock crosses the generation's timestamp. The CDC generation is saved in the queue before tokens are saved in token_metadata. This is important so that when the node becomes a coordinator of a write, it will already have all the necessary information required to generate a corresponding CDC log mutation. After joining, nodes should keep gossiping their proposed stream generation timestamps forever, until they learn about a newer timestamp, in which case they'll start gossiping the new timestamp. There is one case where a node won't gossip such any generation timestamp: if it's upgrading from a non-CDC version. In this situation we make one of the nodes begin the first generation.	2020-01-30 11:10:08 +01:00
Kamil Braun	834c2ca997	cdc: add cdc::metadata class The class stores a queue of CDC generations to be used for choosing streams when writing to the CDC log. This data structure will be updated on some gossip events (when a new node joins the cluster and proposes a new generation of CDC streams).	2020-01-30 11:10:08 +01:00
Kamil Braun	86af2a63ec	clocks: add printing functions For debugging and logging.	2020-01-30 11:10:08 +01:00
Kamil Braun	34e4ce275d	storage_service: restore CDC streams timestamp when replacing a node When a node is replacing another node it will keep gossiping its CDC streams generation timestamp.	2020-01-30 11:10:08 +01:00
Kamil Braun	a6e62dba95	cdc: add get_streams_timestamp_for(endpoint) method In future commits this will be used by nodes learning about other nodes entering NORMAL status. The joining node proposes a new generation of streams, whose timestamp is gossiped by the node.	2020-01-30 11:10:08 +01:00
Kamil Braun	37ae37db38	storage_service: move get_application_state_value method to gossiper	2020-01-30 11:10:08 +01:00
Kamil Braun	b44c63a127	storage_service: small refactors in prepare_replacement_info	2020-01-30 11:10:08 +01:00
Kamil Braun	32f4489a18	storage_service: generate CDC streams generation and gossip its timestamp. Generate a new generation of streams during bootstrap, insert it into an internal distributed table for other nodes to read and save its timestamp in the system.local table. When restarting, read the generation timestamp from the system.local table. Gossip the generation timestamp.	2020-01-30 11:10:08 +01:00
Kamil Braun	19f23c6de1	cdc: add cdc-related node startup functions	2020-01-30 11:10:08 +01:00
Kamil Braun	96e5d6c924	token_metadata: add count_normal_token_owners method	2020-01-30 11:10:08 +01:00
Kamil Braun	52d71832f8	gossiper: make some methods const	2020-01-30 11:10:08 +01:00
Kamil Braun	3ae7b6cbc4	versioned_value: add cdc_streams_timestamp This will be used to inform other nodes that a new CDC streams generation has been created.	2020-01-30 11:10:08 +01:00
Kamil Braun	7fa30f6f34	db: add a system.cdc_local table with CDC generation timestamp This will be used to persist CDC streams generation timestamp proposed by a joining node in case the node crashes or restarts, similarly to the way tokens are persisted. The get_saved_cdc_streams_timestamp method retrieves the generation timestamp from the system table. It will be used by a restarting node. The update_cdc_streams_timestamp method saves CDC stream generation timestamp of the calling node in the system table. A joining node will persist the timestamp before it proposes it to other nodes.	2020-01-30 11:10:08 +01:00
Piotr Jastrzebski	04fe18de0f	system_distributed_keyspace: add cdc-related tables The cdc_topology_description table will be used internally by nodes to send new CDC stream generations to other nodes. The cdc_description table is a user-facing table, used to inform users about new sets of CDC streams. Regenerate sstables and digests for schema_change_test. We don't need to protect this change by a schema feature: when a node creates these tables, it announces them to all other nodes. If schema agreement happens before this migration, all nodes will use a digest calculated without these tables. If it happens after, then all nodes will eventually know about these tables and use a digest calculated with these tables.	2020-01-30 11:10:08 +01:00
Piotr Jastrzebski	9fa18c03c1	cdc: add generate_topology_description cdc::topology_description describes a mapping of tokens to CDC streams. The cdc::generate_topology_description function is given: 1. a set of tokens which split the token ring into token ranges (vnodes), 2. information on how each token range is distributed among its owning node's shards and tries to generate a set of CDC stream identifiers such that for each shard and vnode pair there exists a stream whose token falls into this vnode and is owned by this shard. It then builds a cdc::topology_description which maps tokens to these found stream identifiers, such that if token T is owned by shard S in vnode V, it gets mapped to the stream identifier generated for (S, V).	2020-01-30 11:10:07 +01:00
Piotr Jastrzebski	a3748f942e	cdc: add topology_description class This is a class that will be used for storing information required to perform CDC operations, i.e. assignment of token ranges to CDC streams. It is serializable to bytes and will be stored in such a form in a distributed table accessible by all nodes.	2020-01-30 11:10:07 +01:00
Kamil Braun	36ee36618a	dht: add i_partitioner::shard_of(token, shard_count, ignore_msb) method Allows calculating the shard of the given token using custom values of shard_count and sharding_ignore_msb (instead of the ones used by the particular partitioner instance).	2020-01-30 11:10:07 +01:00
Kamil Braun	f4f8593bac	dht/murmur3_partitioner: take private methods out of the class The methods were made static functions of the murmur3_partitioner module.	2020-01-30 11:09:48 +01:00
Benny Halevy	0329fe1fd1	cql3::util::maybe_quote: avoid stack overflow and fix quote doubling The function was reimplemented to solve the following issues. The cutom implementation also improved its performance in close to 19% Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings as found with the limits_test.py:TestLimits.max_key_length_test dtest. std::regex_replace does not replace in-place so no doubling of quotes was actually done. Add unit test that reproduces the crash without this fix and tests various string patterns for correctness. Note that defining the regex with std::regex::optimize still ended up with stack overflow. Fixes #5671 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-30 12:00:30 +02:00
Rafael Ávila de Espíndola	e4b8f52237	commitlog: Simplify the return of read_log_file This function really just wants to signal it is done, so return a future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200128172847.31513-1-espindola@scylladb.com>	2020-01-30 12:00:29 +02:00
Gleb Natapov	67deab0661	test: fix cql_repl to be able to run lwt tests on smp Handle bounce_to_shard result properly in cql_repl. Message-Id: <20200129122547.GO26048@scylladb.com>	2020-01-30 11:37:27 +02:00
Konstantin Osipov	4d3423b983	test.py: add a help file Message-Id: <20200128210426.24509-2-kostja@scylladb.com>	2020-01-30 11:05:02 +02:00
Avi Kivity	5842833d62	test.py: change test failure exit code to be more friendly to git bisect test.py returns -1 on failure; exit() translates that to 255, which git bisect interprets as a special exit code requiring manual intervention. Change to return the more traditional 1 on failure, which git bisect can interpret as a normal failure condition. Message-Id: <20200130084950.4186598-1-avi@scylladb.com>	2020-01-30 11:02:22 +02:00
Rafael Ávila de Espíndola	090164791c	logalloc: Store unused ids in a std::vector There doesn't seem to be any requirement for how unused ids are reused, so we may as well use the simpler type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200129211154.47907-1-espindola@scylladb.com>	2020-01-30 10:31:16 +02:00
Rafael Ávila de Espíndola	bd7593eab3	lua: Handle nil returns correctly With this patch lua nil values are mapped to CQL null values instead of producing an error. Fixes #5667 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 14:05:01 -08:00
Rafael Ávila de Espíndola	bd93a0af52	types: Return bytes_opt from data_value::serialize Since a data_value can contain a null value, returning bytes from serialize() was losing information as it was mapping null to empty. This also introduces a serialize_nonnull that still returns bytes, but results in an internal error if called with a null value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 14:04:59 -08:00
Avi Kivity	5137b596f8	build_id: add missing include for assert() build_id.cc uses assert() but doesn't include the header. Reviewed-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200129205515.20406-1-avi@scylladb.com>	2020-01-29 23:44:50 +02:00
Rafael Ávila de Espíndola	2b45edd97e	query-result-set: Assert that we don't have null values Null values are represented with dead cells and never included in a result_set. To enforce that, this adds a non_null_data_value that wraps a data_value and whose constructor calls on_internal_error if a null data_value is passed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Rafael Ávila de Espíndola	3abac35d9f	types: Fix comparison of empty and null data_values Before this patch a null data_value would compare equal to any data_value that serialized to an empty byte sequence. With this patch null only compares equal to null. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Rafael Ávila de Espíndola	9031294ea9	Revert "tests: Handle null and not present values differently" This reverts commit `2ebd1463b2`. The test introduced by that commit was wrong, and in fact depended on a bug in operator== for data_value. A followup patch fixes operator==, so this reverts the broken commit first. The reason it was broken was that it created a live cell with a null data_value. In reality, null values are represented with dead cells. For example, the sstable produced by CREATE TABLE my_table (key int PRIMARY KEY, v1 int, v2 int) with compression = {'sstable_compression': ''}; INSERT INTO my_table (key, v1, v2) VALUES (1, 42, null); Is 00 04 key_length 00 00 00 01 key 7f ff ff ff local_deletion_time 80 00 00 00 00 00 00 00 marked_for_delete_at 24 HAS_ALL_COLUMNS \| HAS_TIMESTAMP 09 row_body_size 12 prev_unfiltered_size 00 delta_timestamp 08 USE_ROW_TIMESTAMP_MASK 00 00 00 2a value 0d USE_ROW_TIMESTAMP_MASK \| HAS_EMPTY_VALUE_MASK \| IS_DELETED_MASK 00 deletion time 01 END_OF_PARTITION Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Rafael Ávila de Espíndola	66290c3bb9	query-result-set: Avoid a copy during construction No functionality change. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Rafael Ávila de Espíndola	02e8e8d6b3	types: Move operator== for data_value out-of-line Most of the work is done by decompose and compare which are out-of-line anyway. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Piotr Sarna	d13492485f	alternator: restore Python2 compatibility for test_tag ... by explicitly declaring utf-8 encoding. Message-Id: <e99789876176cf722ccfc297621338dc93843588.1580301449.git.sarna@scylladb.com>	2020-01-29 18:11:47 +02:00
Nadav Har'El	ce0c9c1044	merge: add tagging to alternator Merged patch series from Piotr Sarna: This series adds the following to alternator: - TagResource request - UntagResource request - ListTagsOfResource request - Honoring "Tags" parameter in CreateTable It also provides more tests for above features and extended docs. Tagging is backed by a schema extension, which is in turn backed by entries in system_schema.tables.extensions map. Tags are considered part of the schema, and in particular they are updated via an equivalent of: ALTER TABLE table WITH scylla_tags = {'key1':'v1', 'key2':'v2'} Each tag change is therefore a schema change, which also means that editing tags for the same table on different nodes may be subject to races, until the schema agreement issues are resolved in Scylla. Fixes #5066 Tests: alternator-test(local, remote) Piotr Sarna (6): alternator,main: add tags schema extension alternator: add creating values from string views alternator: implement tagging alternator: allow tagging on table creation docs: add entries for alternator tags and arn alternator-test: make test tables case sensitive alternator-test/test_tag.py \| 63 ++++++++++- alternator-test/util.py \| 2 +- alternator/executor.cc \| 191 ++++++++++++++++++++++++++++++++-- alternator/executor.hh \| 3 + alternator/rjson.cc \| 4 + alternator/rjson.hh \| 1 + alternator/server.cc \| 3 + alternator/tags_extension.hh \| 52 +++++++++ docs/alternator/alternator.md \| 14 ++- main.cc \| 5 + 10 files changed, 325 insertions(+), 13 deletions(-) create mode 100644 alternator/tags_extension.hh	2020-01-29 18:11:47 +02:00
Botond Dénes	69f606baa0	database: check timout before applying writes Attempting to apply timed-out writes is a wasted effort. The coordinator have already given up on the write and reported it as failed to the client. Any cycles spent on this write is a waste at this point. We currently only check the timeout if the write is blocked on memory, otherwise, if the system is not under pressure, we will happily apply timed out writes. If the system is under pressure we will make it worse by wasting cycles on processing a timed out write. Prevent this by checking the timeout as early as possible in `database::apply()` and `database::apply_counter_update()`. This patch doesn't solve all our problems related to timed out writes. They can still sit and accumulate in various queues without expiring, a prominent example being the smp queues. It is however a good first step towards reducing wasted effort spent on them. Refs: #5055 Ref #5251 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200129093007.550250-1-bdenes@scylladb.com>	2020-01-29 13:08:43 +02:00
Gleb Natapov	c654ffe34b	commitlog: fix flushing an entry marked as "sync" in periodic mode After `546556b71b` we can have mixed writes into commitlog, some do flush immediately some do not. If non flushing write races with flushing one and becomes responsible for writing back its buffer into a file flush will be skipped which will cause assert in batch_cycle() to trigger since flush position will not be advanced. Fix that by checking that flush was skipped and in this case flush explicitly our file position. Fixes #5670 Message-Id: <20200128145103.GI26048@scylladb.com>	2020-01-29 12:58:25 +02:00
Piotr Sarna	93d8612a49	alternator-test: make test tables case sensitive In order to test case sensitivity, test table names now contain a capital letter.	2020-01-29 10:21:35 +01:00
Piotr Sarna	f8c1c82149	docs: add entries for alternator tags and arn Support for tagging and arn was added already, so the documentation is properly extended.	2020-01-29 10:20:05 +01:00
Piotr Sarna	668e15643d	alternator: allow tagging on table creation During table creation, it's now possible to provide a 'Tags' parameter, which will add tags to a newly created table. Note that creating a table and tagging it is not atomic, so in case of failure it's possible to end up with a created table, but without appropriate tags. This commit comes with a test. Message-Id: <00c2e202e9075d2c61e4ee5ba322ff4d5dbe718c.1579618972.git.sarna@scylladb.com>	2020-01-29 10:20:05 +01:00
Piotr Sarna	4c9f2f3c0a	alternator: implement tagging The following requests are implemented: - TagResource - UntagResource - ListTagsOfResource Also, more tests are added for validating inputs, for both arns, tag values and tag keys. Message-Id: <a7ce9534ca580736fea445813fafef75a6139e29.1579618972.git.sarna@scylladb.com>	2020-01-29 10:20:05 +01:00
Piotr Sarna	ea04b7fb04	alternator: add creating values from string views An additional override for rjson::from_string() is added for a std::string_view type. Message-Id: <3552ac3347b6a79dd22ca1215c831808450b1ef8.1579618972.git.sarna@scylladb.com>	2020-01-29 10:20:05 +01:00
Piotr Sarna	16688efad7	alternator,main: add tags schema extension A schema extension is introduced for alternator - tags. This schema extension can be used to store arbitrary tags for a table, in the form of a map<text, text>. Updating tags for a table is equivalent to the following CQL query: ALTER TABLE table WITH scylla_tags = {'key1':'v1', 'key2':'v2'} The extension, as all other extensions, is backed by the entry in the system_schema.tables table.	2020-01-29 10:20:05 +01:00
Pavel Solodovnikov	f2feeb4b10	cql3: Propagate "const" to some virtual methods in cql hierarchy Add "const" attributes to `assignment_testable::test_assignment` and `term::raw::prepare` methods. These should have been marked as "const" even before the change but for some reason were missing these qualifiers. Mark other supplementary methods with "const" attributes as necessary. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200127213215.494000-1-pa.solodovnikov@scylladb.com>	2020-01-29 00:23:40 +02:00
Avi Kivity	3343baf159	Merge "cql3: time_uuid_fcts: validate time UUID" from Benny " Throw an error in case we hit an invalid time UUID rather than hitting an assert. Fixes #5552 (Ref #5588 that was dequeued and fixed here) Test: UUID_test, cql_query_test(debug) " * 'validate-time-uuid' of https://github.com/bhalevy/scylla: cql3: abstract_function_selector: provide assignment_testable_source_context test: cql_query_test: add time uuid validation tests cql3: time_uuid_fcts: validate timestamp arg cql3: make_max_timeuuid_fct: delete outdated FIXME comment cql3: time_uuid_fcts: validate time UUID test: UUID_test: add tests for time uuid utils: UUID: create_time assert nanos_since validity utils/UUID_gen: make_nanos_since utils: UUID: assert UUID.is_timestamp	2020-01-29 00:11:17 +02:00
Avi Kivity	ec1687e4fe	Merge "Remove deprecated partitioners #5636 " from Piotr " This PR makes named_value respect allowed_values and then use it to transition away from old deprecated RandomPartitioner and ByteOrderedPartitioner. Then it removes the code that's no longer used. We want to remove deprecated partitioners because, on one hand, they lead to performance problems and hot nodes. Moreover, we're planning to unify the token representation which would allow per table partitioner support. That, in turn, is a feature helpful in multiple efforts like CDC, materialized views, secondary indexes and multi-tenancy. tests: unit(dev) " * 'remove_deprecated_partitioners' of https://github.com/haaawk/scylla: partitioners: remove random_partitioner partitioners: Make it impossible to use RandomPartitioner partitioners: remove byte_ordered_partitioner partitioners: Make it impossible to use ByteOrderedPartitioner partitioners: Remove leftovers of OrderPreservingPartitioner i_partitioner.cc: stop including byte_ordered_partitioner.hh i_partitioner.cc: stop including random_partitioner.hh config: use allowed_values to verify named_value input config: add operator<< for seed_provider_type	2020-01-29 00:11:17 +02:00
Avi Kivity	652d8a9b84	install-dependencies.sh: add lld Since we now default to lld if present, and since lld is a faster linker than either ld or gold, it makes sense to install it as a dependency and to make it available as part of the frozen toolchain.	2020-01-29 00:11:17 +02:00
Avi Kivity	17eaf552f0	Merge "Improve the accuracy of reader memory tracking" from Botond " Grab the lowest hanging fruits. This patch-set makes three important changes: * Consume the memory for I/O operations on tracked files, before they are forwarded to the underlying file. * Track memory consumed by buffers created for parsing in `continuous_data_consumer`. As this is the basis for the data, index and promoted index parsers, all three are covered now in this regard. * Track the index file. The remaining, not-so-low handing fruits in order of gain/cost(performance) ratio: * Track in-memory index lists. * Track in-memory promoted index blocks. * Track reader buffer memory. Note that this ordering might change based on the workload and other environmental factors. Also included in this series is an infrastructure refactoring to make tracking memory easier and involve including lighter headers, as well as a manual test designed to allow testing and experimenting with the effects of changes to the accuracy of the tracking of reader memory consumption. Refs: #4176 Refs: #2778 Tests: unit(dev), manual(sstable_scan_footprint_test) The latter was run as: build/dev/test/manual/sstable_scan_footprint_test -c1 -m2G --reads=4000 --read-concurrency=1 --logger-log-level test=trace --collect-stats --stats-period-ms=20 This will trickle reads until the semaphore blocks, then wait until the wait queue drains before sending new reads. This way we are not testing the effectiveness of the pre-admission estimation (which is terribly optimistic) and instead check that with slowly ramping up read load the semaphore will block on memory preventing OOM. This now runs to completion without a single `std::bad_alloc`. The read concurrency semaphore allows between 15-30 reads, and is always blocked on memory. " * 'more-accurate-reader-resource-tracking/v1' of ssh://github.com/denesb/scylla: test/manual/sstable_scan_footprint_test: improve memory consumption diagnostics tests/manual/sstable_scan_footprint_test: use the semaphore to determine read rate tests/manual: Add test measuring memory demand of concurrent sstable reads index_reader: make the index file tracked sstables/continuous_data_consumer: track buffers used for parsing reader_concurrency_semaphore: tracking_file_impl: consume memory speculatively reader_concurrency_semaphore: bye reader_resource_tracker treewide: replace reader_resource_tracer with reader_permit reader_permit: expose make_tracked_temporary_buffer() reader_permit: introduce make_tracked_file() reader_permit: introduce memory_units reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh reader_concurrency_semaphore: reader_permit: make it a value type reader_concurrency_semaphore: s/resources/reader_resources/ reader_concurrency_semaphore::reader_permit: move methods out-of-line	2020-01-29 00:11:17 +02:00
Gleb Natapov	8dc37277df	commitlog: remove unused variable Message-Id: <20200128132118.GH26048@scylladb.com>	2020-01-29 00:11:17 +02:00
Eliran Sinvani	57f90e34ea	alternator: run alternator processing loop in the statement scheduling group In Scylla all query processing activity should run under the "statement" scheduling group. The scheduling group is important for maintaining the balance between background and foreground tasks in Scylla. Testing: In order to test the correctness of the patch. First, the following assert was inserted before any call to one of the executor functions in the http route: assert(current_scheduling_group().name() == "statement" Then all alternator tests ran and passed. The second stage was to change the name so the assert will fail: assert(current_scheduling_group().name() == "no-statement" And ran the tests again - validating that Scylla coredumps. The asserts were then removed. Fixes #5008 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20200127154341.10020-1-eliransin@scylladb.com>	2020-01-29 00:11:17 +02:00
Avi Kivity	e09ed81c23	Merge "Fix two corner cases in snapshots API" from Pavel " There seem to be two problems with handling snapshot API -- one on start and the other one on stop. Here's the set that addresses both. The fix moved snapshot API registration later in time that required Amnon's ACK. Now we have it :) so -- the rebase and resend. Tests: unit(dev), start-stop " * 'br-snapshot-bugs-2' of https://github.com/xemul/scylla: snapshot: Pass requests through gate api: Register snapshot API later api: Unwrap wrap_ks_cf	2020-01-29 00:11:17 +02:00
Avi Kivity	c0f412617e	Merge "Make the scylla build deterministic" from Rafael " With these changes and a binutils compiled with --enable-deterministic-archives, the only difference I get in the build directory if I build scylla twice from scratch are: * The various CMakeError.log because they have temporary file names. * The various CMakeOutput.log for the same reason. * .ninja_log and .ninja_deps. I am not sure what the contents are. " * 'espindola/fix-determinism' of https://github.com/espindola/scylla: build: remove timestamps from then antlr output build: Make the output of idl-compiler deterministic	2020-01-28 18:16:06 +02:00
Rafael Ávila de Espíndola	0e8bee0774	configure: Use lld if available This depends on the patch mk: avoid combining -r and -export-dynamic linker options being added to dpdk. I benchmarked this on top of my patches to get a reproducible build. I first compiled with ccache, deleted the build directory and recompiled so that all the "gcc -c" invocations were served by ccache. The times of the second "ninja release" invocations were: lld: ninja release 155.68s user 71.89s system 2077% cpu 10.953 total gold: ninja release 953.79s user 254.71s system 2533% cpu 47.699 total Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200127171516.26268-1-espindola@scylladb.com>	2020-01-28 18:15:50 +02:00
Avi Kivity	7440125cb1	Update seastar submodule > memory: add scoped_heap_profiling > build: add switch to enable heap profiling support > io_tester: do not abort on end of test > resource: clean up cgroups version determination. > prometheus: Silence a bogus gcc warning in http server > Update dpdk submodule > resource: Support cgroups v2 > net: Don't use variable length arrays > core/memory.hh: document set_heap_profiling_enabled() > Revert "net: Don't use variable length arrays" > cmake: fix pkgconfig boost deps > thread: Avoid confusing comment by switching value > net: posix-stack: fix allocator in ap listening sockets > net: posix-stack: fix passing allocator to new sockets > stall_detector: Add a counter for stall detector report > Merge "Don't use variable length arrays" from Rafael > treewide: fix minor issues reported by clang > thread: Call mprotect in make_stack > thread: Always allocate stack with aligned_alloc > build: Make SEASTAR_THREAD_STACK_GUARDS private > thread: Move code out of a header	2020-01-28 18:15:18 +02:00
Nadav Har'El	b06b34478e	merge: lwt: add lightweight transaction unit tests Merged patch series from Konstantin Osipov: This series sets cql_repl core count to 1 and adds LWT unit tests. test.py: invoke cql_repl with smp=1 lwt: add lightweight transactions unit tests	2020-01-28 12:39:23 +02:00
Nadav Har'El	30283f2544	merge: Alternator: return api_error instead of throwing Merged patch series from Piotr Sarna: In order to minimize the usage of throws and catches in code paths that are potentially hot, these paths instead return appropriate errors directly. The server layer is still able to catch and translate errors, but the preferred way is to return api_error directly in places that may be performance-sensitive. Tests: alternator-test(local) Fixes #5472 Piotr Sarna (3): alternator: change request return type to variant<value, error> alternator: elide throwing in condition checks alternator: replace top-level throws with returns in executor alternator/executor.hh \| 28 ++++---- alternator/server.hh \| 4 +- alternator/executor.cc \| 141 +++++++++++++++++++++-------------------- alternator/server.cc \| 44 ++++++++----- 4 files changed, 117 insertions(+), 100 deletions(-)	2020-01-28 12:39:23 +02:00
Konstantin Osipov	98c34ae750	test.py: always build cql_repl, do not strip Exclude cql_repl from the list of tests, since it's not a test. Build it as a separate app. Do not strip, so that any CQL test failure is easy to debug without a rebuild. All test-related targets are converted from lists to sets to avoid quadratic lookup cost in the check inside the loop which creates the ninja file.	2020-01-28 12:39:23 +02:00
Piotr Sarna	a81640d402	alternator: replace top-level throws with returns in executor In order to elide unnecessary throwing, all errors previously thrown from top-level executor methods (the ones that handle user requests) are now returned directly. Message-Id: <73e05d1057ee842576fae11be9d77265ffb2e96f.1579515640.git.sarna@scylladb.com>	2020-01-28 12:39:23 +02:00
Takuya ASADA	f21123b3ae	scylla_io_setup: Improve error message for unsupported EC2 instance types (#5561 ) Currently --ami does not check instance types, creates invalid io_properties.yaml on unsupported instance types. It actually won't occur on AMI startup, since scylla_ami_setup only invoke scylla_io_setup --ami when the instance is supported, so we don't get the issue on startup, but we still get when we run scylla_io_setup manually. It's better to check instance type on scylla_io_setup, too. Refs #5438	2020-01-28 12:39:23 +02:00
Piotr Sarna	854adf5b70	alternator: elide throwing in condition checks Conditional updates inform the user that the condition is not met by returning an error. An initial implementation was based on rethrowing these errors, but returning them directly is considered better for performance.	2020-01-28 12:39:23 +02:00
Gleb Natapov	0d0c05a569	lwt: allow only one paxos instance to run for each key simultaneously This will prevent contention in case of parallel updates of the same row by the same coordinator. The patch does it by introducing a new per key lock map and taking it before running PAXOS protocol (either for write of for read). Message-Id: <20200117101228.GA14816@scylladb.com>	2020-01-28 12:39:23 +02:00
Piotr Sarna	a6a65abc3c	alternator: change request return type to variant<value, error> In order to minimize the use of exceptions during normal operations, each request handler is now able to return either a proper JSON value, or an instance of api_error, which indicates that something went wrong, but without having to throw, catch and rethrow C++ exceptions. This is especially important for conditional updates, since it's expected to be common to return ConditionalCheckFailedException. Message-Id: <d8996a0a270eb0d9db8fdcfb7046930b96781e69.1579515640.git.sarna@scylladb.com>	2020-01-28 12:39:23 +02:00
Avi Kivity	897320f6ab	tools: toolchain: dbuild: relax process limit in container Docker restricts the number of processes in a container to some limit it calculates. This limit turns out to be too low on large machines, since we run multiple links in parallel, and each link runs many threads. Remove the limit by specifying --pids-limit -1. Since dbuild is meant to provide a build environment, not a security barrier, this is okay (the container is still restricted by host limits). I checked that --pids-limit is supported by old versions of docker and by podman. Fixes #5651. Message-Id: <20200127090807.3528561-1-avi@scylladb.com>	2020-01-28 12:39:23 +02:00
Avi Kivity	c7e0be75a5	Merge "Metrics for full scan" from Alejo " Final set of changes for full scan metrics. - allow filtering - full scan (Note: non-system tables only) - full scan without BYPASS CACHE option - tests for all metrics (bypass cache, allow filtering, full scan) - works with prepared statements (tested, too) " * 'as_full_scan_metrics' of https://github.com/alecco/scylla: Range scan query counter Counter of queries doing full scan. ALLOW FILTERING query counter	2020-01-28 12:39:23 +02:00
Botond Dénes	e4616f92fe	test/manual/sstable_scan_footprint_test: improve memory consumption diagnostics This test is all about tracking measured memory consumption vs. real memory consumption. To make this easier add additional diagnostics: * enable seastar heap profiler for the duration of the reads (seastar has to be compiled with `-DSEASTAR_HEAPPROF`). * Add a stats collector, which periodically collects stats such as non-LSA free/used memory, LSA free/used memory and memory tracked by the reader concurrency semaphore. These stats are written to a `.csv` file, allowing importing them into a spreadsheet and processing them.	2020-01-28 10:15:55 +02:00
Botond Dénes	9e9c59d125	tests/manual/sstable_scan_footprint_test: use the semaphore to determine read rate Currently the test fires the configured amount of reads at once. This is somewhat restricting in the number of testable scenarios. For example, it doesn't allow one to see if the semaphore correctly tracks the memory consumption of existing reads, by firing new reads after a while. Replace this algorithm by one which fires reads with a configured concurrency, then waits for the semaphore's queue (if any) to drain, before firing new reads. The test can now be configured with the total amount of reads to fire, and with the read-concurrency, i.e. the number of reads to fire at once in each iteration. This allows for much greater flexibility in the different test scenarios. The previous behaviour can still be achieved by configuring a concurrency of 100. This patch also adds better error handling. Reads are aborted on the first error and errors are caught and not allowed to bubble up past the test's main function and are logged instead. Extensive logging is also added to be able to monitor the system while the test is running.	2020-01-28 10:15:53 +02:00
Tomasz Grabiec	2eb88024c0	tests/manual: Add test measuring memory demand of concurrent sstable reads Allow manual experimentation with the effectiveness of the accuracy of the tracking of the resource consumption of readers, and hence the system's ability to prevent overload and the dreaded `std::bad_alloc`. This patch was originally developed by Tomasz Grabiec <tgrabiec@scylladb.com>, I only adapted it to compile and link on current master.	2020-01-28 08:13:16 +02:00
Botond Dénes	dfc66194c8	index_reader: make the index file tracked Track I/O going to the index file, similarly to how we already track I/O going to the data file.	2020-01-28 08:13:16 +02:00
Botond Dénes	936619a8d3	sstables/continuous_data_consumer: track buffers used for parsing Based on heap profiling, buffers used for storing half-parsed fields are a major contributor to the overall memory consumption of reads. This memory was completely "under the radar" before. Track it by using tracked `temporary_buffer` instances everywhere in `continuous_data_consumer`. As `continuous_data_consumer` is the basis for parsing all index and data files, adding the tracing here automatically covers all data, index and promoted index parsing. I'm almost convinced that there is a better place to store the `permit` then the three places now, but so far I was unable to completely decipher the our data/index file parsing class hierarchy.	2020-01-28 08:13:16 +02:00
Botond Dénes	92fffe51d5	reader_concurrency_semaphore: tracking_file_impl: consume memory speculatively Consume the memory before even submitting the I/O to the underlying `file` object. This is in line with the underlying `file` object allocating the buffer before it forwards the I/O request to the kernel. This extends the "visibility" over the memory consumed by I/O greatly, as it turns out buffers spend most time alive waiting for the I/O to complete and are parsed shortly afterwards.	2020-01-28 08:13:16 +02:00
Botond Dénes	4bb3c7b1f0	reader_concurrency_semaphore: bye reader_resource_tracker Replaced by `reader_permit`, of which it was a mere wrapper of in the first place.	2020-01-28 08:13:16 +02:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Botond Dénes	dea24ca859	reader_permit: expose make_tracked_temporary_buffer() Previously `tracking_file_impl::make_tracked_buf()`. In the next patches we plan on using this outside `tracking_file_impl`, so make it public and templatize on the char type.	2020-01-28 08:13:16 +02:00
Botond Dénes	16cea36a94	reader_permit: introduce make_tracked_file() Free function equivalent of `reader_resource_tracker::track_file()`, using a `reader_permit` directly.	2020-01-28 08:13:16 +02:00
Botond Dénes	1859a03629	reader_permit: introduce memory_units Similar to `seastar::semaphore_units`, this allows consuming and releasing memory via an RAII object. In addition to that, it also allows tracking changing values. This feature was designed to be used for tracking the ever changing memory consumption of the buffers of `flat_mutation_reader`:s. This is now the only supported way of consuming memory from a permit.	2020-01-28 08:13:16 +02:00
Botond Dénes	c0f96db2d9	reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh In the next patches we will replace `reader_resource_tracker` and have code use the `reader_permit` directly. In subsequent patches, the `reader_permit` will get even more usages as we attempt to make the tracking of reader resource more accurate by tracking more parts of it. So the grand plan is that the current `reader_concurrency_semaphore.hh` is split into two headers: * `reader_concurrency_semaphore.hh` - containing the semaphore proper. * `reader_permit.hh` - a very lightweight header, to be used by components which only want to track various parts of the resource consumption of reads.	2020-01-28 08:13:16 +02:00
Botond Dénes	2005495857	reader_concurrency_semaphore: reader_permit: make it a value type Currently `reader_permit` is passed around as `lw_shared_ptr<reader_permit>`, which is clunky to write and use and is also an unnecessary leak of details on how permit ownership is managed. Make `reader_permit` a simple value type, making it a little bit easier and safer to use. In the next patches we will get rid of `reader_resource_tracker` and instead have code use the permit instance directly, so this small improvement in usability will go a long way towards preventing eye sore.	2020-01-28 08:13:16 +02:00
Botond Dénes	932bc02730	reader_concurrency_semaphore: s/resources/reader_resources/ In preparation of making it a top-level class and moving it to another file.	2020-01-28 08:13:16 +02:00
Botond Dénes	89c5fd0c25	reader_concurrency_semaphore::reader_permit: move methods out-of-line In preparation for making the reader_permit a top-level class, and moving it to another file. It is also good practice to define non-performance critical methods out-of-line to reduce header bloat.	2020-01-28 08:13:16 +02:00
Konstantin Osipov	511ae023f0	lwt: add lightweight transactions unit tests These unit tests cover all CQL aspects of lightweight transactions, such as grammar, null semantics, batch semantics, result set format, and so on. For now, comment out unicode tests: test output depends on libjsoncpp version in use.	2020-01-27 23:09:57 +03:00
Konstantin Osipov	fef50b66a2	test.py: invoke cql_repl with smp=1 Since bounce_to_shard is not handled by cql_repl, invoke it with smp=1 until it is fixed.	2020-01-27 22:57:10 +03:00
Pavel Emelyanov	976463f620	snapshot: Pass requests through gate When the scylla process is stopped no code waits for current snapshot operations to finish. Also, the API server is not stopped either, so new snapshot requests can creep into. In seastar there's a useful abstraction to address both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-27 17:42:04 +03:00
Pavel Emelyanov	fd6b5efe75	api: Register snapshot API later In storage_service's snapshot code there are checks for _operation_mode being _not_ JOINING to proceed. The intention is apparently to allow for snapshots only after the cluster join. However, here's how the start-up code looks like - _operation_mode = STARTING in storage_service::constructor - snapshot API registered in api::set_server_storage_service - _operation_mode = JOINING in storage_service::join_token_ring So in between steps 2 and 3 snapshots can be taken. Although there's a quick and simple fix for that (check for the _operation_mode to be not STARTING either) I think it's better to register the snapshot API later instead. This will help greatly to de-bload the storage_service, in particular -- to incapsulate the _operation_mode properly. Note, though the check for _operation_mode is made only for taking snapshot, I move all snapshot ops registration to the later phase. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-27 17:42:04 +03:00
Pavel Emelyanov	4886c1db74	api: Unwrap wrap_ks_cf This is preparation for the next patch -- the lambda in question (and the used type) will be needed in two functions, so make the lambda a "real" function. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-27 17:42:04 +03:00
Benny Halevy	10c912d3db	cql3: abstract_function_selector: provide assignment_testable_source_context Return function name. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	35e9538d49	test: cql_query_test: add time uuid validation tests Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	1078c86af9	cql3: time_uuid_fcts: validate timestamp arg Make sure that the timestamp argument does not overflow 60 bits when converted to units of 100 nanos since epoch, like with writetime() that returns microseconds since epoch in contrast to other time functions like unixtimestampof that return millis since epoch. Fixes #5552 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	fa0fa53bd3	cql3: make_max_timeuuid_fct: delete outdated FIXME comment Done in `86c09046fd` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	72e2ea47c1	cql3: time_uuid_fcts: validate time UUID Throw an error in case we hit an invalid time UUID rather than hitting an assert. Ref #5552 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	00bd1d32d3	test: UUID_test: add tests for time uuid Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	f8b079b599	utils: UUID: create_time assert nanos_since validity Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	cd3460cc88	utils/UUID_gen: make_nanos_since Safely convert millis to "nanos_since" (number of 100 nanseconds since START_EPOCH) while type casting to uint64_t to avoid possible int overflow. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:08:16 +02:00
Benny Halevy	22bac26023	utils: UUID: assert UUID.is_timestamp Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-26 18:54:36 +02:00
Avi Kivity	cc0222ec2d	Merge "Futurize get_changed_ranges_for_leaving" from Asias " Futurize get_changed_ranges_for_leaving to fix stalls like: 2019-12-17T15:18:33+00:00 ip-10-0-116-62 !INFO \| scylla: Reactor stalled for 4609 ms on shard 0. 0x0000000002accbd2 0x0000000002a4579b 0x0000000002a45cc2 0x0000000002a45ff7 0x00007ff0a609be7f 0x0000000001b0b500 0x0000000001b03185 0x0000000001af0d41 0x0000000001af027a 0x0000000001f7e89a 0x0000000001f9f55a 0x0000000001fc9c09 0x0000000001fcac08 0x00000000007dfee3 /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/reactor.cc:1041 (inlined by) seastar::reactor::block_notifier(int) at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/reactor.cc:1164 ?? ??:0 __gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > > std::__lower_bound<__gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, dht::token, __gnu_cxx::__ops::_Iter_less_val>(__gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, __gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, dht::token const&, __gnu_cxx::__ops::_Iter_less_val) at crtstuff.c:? locator::token_metadata::first_token_index(dht::token const&) const at crtstuff.c:? locator::token_metadata::ring_range(dht::token const&, bool) const at crtstuff.c:? locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at crtstuff.c:? service::storage_service::get_changed_ranges_for_leaving(seastar::basic_sstring<char, unsigned int, 15u, true>, gms::inet_address) at crtstuff.c:? service::storage_service::unbootstrap() at crtstuff.c:? service::storage_service::decommission()::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const::{lambda()#1}::operator()() const [clone .isra.0] at storage_service.cc:? Refs: #5495 " * 'futurize_get_changed_ranges_for_leaving' of https://github.com/asias/scylla: storage_service: Yield in get_changed_ranges_for_leaving storage_service: Make get_changed_ranges_for_leaving run inside thread	2020-01-26 13:25:53 +02:00
Takuya ASADA	dd81fd3454	dist/debian: Use tilde for release candidate builds We need to add '~' to handle rcX version correctly on Debian variants (merged at `ae33e9f`), but when we moved to relocated package we mistakenly dropped the code, so add the code again. Fixes #5641	2020-01-26 13:25:53 +02:00
Ivan Prisyazhnyy	4c001553eb	dep/arch: better messages Tested on Arch 5.4.2-arch1-1 and docker archlinux. Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Message-Id: <20200125122836.460811-1-ivan@scylladb.com>	2020-01-26 12:02:32 +02:00
Ivan Prisyazhnyy	98a8c36c60	cmake: fix seastar and gen include dirs lookup Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Message-Id: <20200125145926.545859-1-ivan@scylladb.com>	2020-01-26 12:02:32 +02:00
Dejan Mircevski	90b54c8c42	view_info: Drop partition_ranges() The method view_info::partition_ranges() is unused. Also drop the now-dead _partition_ranges data member. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-26 12:02:32 +02:00
Piotr Sarna	9fa88e26a9	Merge 'Alternator - LWT and ConditionExpression' from Nadav This is a fourth iteration of the patch series adding LWT usage (instead of the old naive - and wrong - read before write) to Alternator, as well as full support for the ConditionExpression syntax for conditional updates. Changes in v4: * Rebased to most recent master * Replaced 3 booleans which had 2^3 = 8 theoretical combinations, by just 4 options in enum write_isolation: FORBID_RMW, LWT_ALWAYS, LWT_RMW_ONLY, UNSAFE_RMW The four options are described in details comments. * Fix reversed assertion in FORBID_RMW case. * Two new metrics: write_using_lwt and shard_bounce_for_lwt. * Fail boot if alternator is enabled, but LWT isn't. * Add information about enabling LWT in docs/alternator/alternator.md * nyh/v4-lwt: alternator: add support for ConditionExpression alternator: reimplement read-modify-write operations using LWT alternator: make "executor" a peering_sharded_service	2020-01-26 12:02:32 +02:00
Alejo Sanchez	936cae6069	Range scan query counter Fixes #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-01-24 15:02:58 +01:00
Alejo Sanchez	f57513a809	Counter of queries doing full scan. In scope of #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-01-24 14:25:19 +01:00
Alejo Sanchez	dbe8a54768	ALLOW FILTERING query counter Implements a counter of executions of SELECT queries with ALLOW FILTERING option. In scope of #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-01-24 13:38:30 +01:00
Piotr Jastrzebski	682dfdafe1	partitioners: remove random_partitioner Previous patch makes it impossible to configure Scylla with RandomPartitioner so this code is effectively dead now. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	d80ac4c2d0	partitioners: Make it impossible to use RandomPartitioner RandomPartitioner has been deprecated for 2.5 year. Now we drop the support for it. There are two reasons for this. First, this partitioner can lead to uneven distribution of partitions among the nodes in the cluster which leads to hot nodes. Second, we're planning to unify the representation of tokens and fix it as int64_t. RandomPartitioner does not comply with this. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	7a86e2ff46	partitioners: remove byte_ordered_partitioner Previous patch makes it impossible to configure Scylla with ByteOrderedPartitioner so this code is effectively dead now. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	130eb91636	partitioners: Make it impossible to use ByteOrderedPartitioner ByteOrderedPartitioner has been deprecated for 2.5 year. Now we drop the support for it. There are two reasons for this. First, this partitioner can lead to uneven distribution of partitions among the nodes in the cluster which leads to hot nodes. Second, we're planning to unify the representation of tokens and fix it as int64_t. ByteOrderPartitioner does not comply with this. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	4088be2056	partitioners: Remove leftovers of OrderPreservingPartitioner OrderPreservingPartitioner seems to be long gone and not supported so remove all the places it's still mentioned. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	1d345091f6	i_partitioner.cc: stop including byte_ordered_partitioner.hh Nothing from that header is used in i_partitioner.cc. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	44c9a71686	i_partitioner.cc: stop including random_partitioner.hh Nothing from that header is used in i_partitioner.cc. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	6a2cd64b5c	config: use allowed_values to verify named_value input Even though we configure the set of accepted values for some config flags, named_value ignore them. This patch implements the checks that verify flag is not set to the value that's not on the list. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:08:59 +01:00
Nadav Har'El	b50274e8a7	alternator: add support for ConditionExpression This patch adds support for the ConditionExpression parameter of the item-writing operations in Alternator: PutItem, UpdateItem and DeleteItem. We already supported conditional updates/put/delete using the "Expected" parameter. The ConditionExpression parameter implemented here provides a very similar feature, using a different - and also newer and more powerful - syntax. The implementation here reuses much of our existing expression-parsing infrastructure. Unsurprisingly, ConditionExpression's syntax has much in common with UpdateExpression which we already support) and also many of the comparison functions already implemented for "Expected". However, it's still quite a bit of new code, because of the many different comparisons, functions, and syntax variations we need to support. This patch also expands alternator-test/test_condition_expression.py with a few additional corner cases discovered during the development of this patch. Almost all of the tests for this feature (35 out of 39) now pass. Two tests still fail because we don't yet support nested attributes (this is a missing feature across Alternator), and two tests fail because of minor ideosyncracies in DynamoDB's error path that we chose not to duplicate yet (but still remember the difference in the form of an xfailing test). Fixes #5035 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-01-23 13:57:33 +02:00
Nadav Har'El	370b963ce5	alternator: reimplement read-modify-write operations using LWT In this patch, we re-implement the three read-modify-write operations - PutItem, UpdateItem, DeleteItem. All three operations may need to read the item before writing it to support conditional updates (the "Expected" parameter) and UpdateItem may also need the previous item's value for its update expression (e.g., a user may ask to "set a=a+1" or "set a=b"). Before this patch, the implementation of RMW operations simply did a read, and then a write - without any attempt to protect concurrent operations. In this patch, Scylla's LWT mechanism (storage_proxy::cas()) is used instead, to ensure that concurrent update operations are correctly isolated even if they are conditional. This means that Alternator now requires the experimental LWT feature to be enabled (and refuses to boot if it isn't). The version presented here is configured to always use LWT for every write, regardless of whether it has a condition or not. So it will will significantly slow down write-only workloads like YCSB. But the code in this patch actually includes three other modes, which can be chosen by setting an enum constant in the code. In the future we will want to let the user configure this mode, globally, per table or per attribute. Note that read requests are NOT modified, and work exactly as they did before: i.e., strongly-consistent reads are done using a normal CL=LOCAL_QUORUM read - not via LWT. I believe this is good enough given Dynamo's guarantees, and critical for our read performance. Also note that patch doesn't yet fix the BatchWriteItem operation. Although BatchWriteItem does not support any RMW operations - just pure writes - we may still need to do those pure writes using LWT. This should be fixed in a follow-up patch. Unfortunately, this patch involves a large amount of code movement and reorganization, because: 1. The cas operation requires each operation to be made into an object, with a separate apply() function, forcing a lot of code to move. 2. Moreover, we need to do this for three different operations (PutItem, UpdateItem, DeleteItem) so to avoid massive code duplication, I had to move some common code. 3. The cas operation also forced us to change some of the utility functions' APIs. The end result is that this patch focuses more on a compact and understandable end result than it does on an easy to understand patch, so reviewers - sorry about that. All alternator-test/ tests pass with this patch (and also with all of the different optional modes enabled). However, other than that, I did not yet do any real isolation tests (are concurrent operations really isolated correctly? or is LWT just faking it? :-) ), performance tests or stress tests - and I'll definitely need to do those as well. Fixes #5054 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-01-23 13:57:28 +02:00
Nadav Har'El	7dfd081e0d	alternator: make "executor" a peering_sharded_service Alternator uses a sharded<executor> for handling execution of Alternator requests on different shards. In this patch we make executor a subclass of peering_sharded_service, to allow one of these executors to run an exector method on a different shard: Any one of the shard-local executor instances can call container() to get the full sharded<executor>. We will need this capability later, when we need to bounce requests between shards because of requirements of the storage_proxy::cas (LWT) code. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-01-23 13:57:23 +02:00
Benny Halevy	5b0ea4c114	storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service Match subscription done in main() and avoid cross shard access to _lifecycle_subscribers vector. Fixes #5385 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Acked-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>	2020-01-23 11:38:23 +02:00
Piotr Jastrzebski	df1b7d2805	config: add operator<< for seed_provider_type Following patch will start checking allowed_values in named_value and print errors for wrong values. This will require all the types used with named_value to have operator<< implemented. seed_provider_type is one such type. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-23 10:28:58 +01:00
Rafael Ávila de Espíndola	6058fe8007	build: remove timestamps from then antlr output The output of antrl always has the timestamp of when it was created. This expands the existing sed hack to remove that too. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 16:29:54 -08:00
Rafael Ávila de Espíndola	72e900291b	build: Make the output of idl-compiler deterministic If at any point during the topological sort we had more than one node with zero dependencies, the order they were printed was not deterministic. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 16:28:00 -08:00
Avi Kivity	46951f8b1a	Merge "Refactor migration_notifier listeners and gossip subscribers" from Rafael " This series refactors the code used by migration_notifier and gossiper into an atomic_vector type. " * 'espindola/gossiper_atomic_vector' of https://github.com/espindola/scylla: gossiper: Store subscribers in an atomic_vector load_broadcaster: Unregister from load_broadcaster::stop_broadcasting repair: add row_level::stop() locator: Return future from i_endpoint_snitch::reload_gossiper_state service: Refactor code into a atomic_vector class migration_manager: Fix typo load_meter: Use a shared_ptr to store a load_broadcaster	2020-01-22 18:58:15 +02:00
Rafael Ávila de Espíndola	845116dfaf	gossiper: Store subscribers in an atomic_vector The new guarantees are a bit better IMHO: Once a subscriber is removed, it is never notified. This was not true in the old code since it would iterate over a copy that would still have that subscriber. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	c62a33965d	load_broadcaster: Unregister from load_broadcaster::stop_broadcasting This is in preparation for unregistration returning a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	7390485e20	repair: add row_level::stop() Now unregister_ is called from stop(). This reduces the noise in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	085544f054	locator: Return future from i_endpoint_snitch::reload_gossiper_state This just reduces the noise of a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	d9a71a7cff	service: Refactor code into a atomic_vector class This templates the code for listener_vector, renames it to atomic_vector and moves it to the utils directory. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	baeb6744f6	migration_manager: Fix typo Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	9d4cf25c84	load_meter: Use a shared_ptr to store a load_broadcaster load_broadcaster::stop_broadcasting uses shared_from_this(). Since that is the only reference that the produced shared_ptr knows of, it is deleted immediately. Fix that by also using a shared_ptr in load_meter. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Pekka Enberg	0abb4e1742	Update seastar submodule * seastar afc46681...147d50b1 (6): > perftune.py: Use safe_load() for fix arbitrary code execution Fixes #5630 > clang: current_exception_as_future must be in namespaced > tests: add an expected failures version of thread fixture > Enable stack guards in Dev builds > net: posix: Introduce load_balancing_algorithm::fixed > stream: Move _next from subscription to stream	2020-01-22 17:54:14 +02:00
Pavel Solodovnikov	e1b22b6a4c	cql3: get rid of lw_shared_ptr for `variable_specifications` `parsed_statement::get_bound_variables` is assumed to always return a nonnull pointer to `variable_specifications` instance. In this case using a pointer is superfluous and can be safely replaced by a plain reference. Also add a default ctor and a utility method `set_bound_variables` to the `variable_specifications` class to actually reset the contents of the class instance. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200120195839.164296-1-pa.solodovnikov@scylladb.com>	2020-01-22 12:51:02 +02:00
Avi Kivity	5d78d511ad	Merge "cql: Simplify sum overflow" from Benny " As a followup to `0bde590` This series implements suggestions from @avikivity and @espindola It simplifies the template definitions for accumulator_for, adds some debug logging for the overflow values, and adds unit tests for float and double sum overflow. Test: unit(dev), paging_test:TestPagingWithIndexingAndAggregation.test_filter_{indexed,non_indexed,pk}_column(dev) " * 'simplify-sum-overflow' of https://github.com/bhalevy/scylla: test: cql_query_test: test float/double sum overflow cql3: aggregate_fcts: simplify accumulator_for template definitions	2020-01-22 11:30:25 +02:00
Asias He	be9d7c3b28	storage_service: Yield in get_changed_ranges_for_leaving It is always called inside a seastar thread. Call yield to prevent stalls. This patch fixes stalls like: 2019-12-17T15:18:33+00:00 ip-10-0-116-62 !INFO \| scylla: Reactor stalled for 4609 ms on shard 0. 0x0000000002accbd2 0x0000000002a4579b 0x0000000002a45cc2 0x0000000002a45ff7 0x00007ff0a609be7f 0x0000000001b0b500 0x0000000001b03185 0x0000000001af0d41 0x0000000001af027a 0x0000000001f7e89a 0x0000000001f9f55a 0x0000000001fc9c09 0x0000000001fcac08 0x00000000007dfee3 /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/reactor.cc:1041 (inlined by) seastar::reactor::block_notifier(int) at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/reactor.cc:1164 ?? ??:0 __gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > > std::__lower_bound<__gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, dht::token, __gnu_cxx::__ops::_Iter_less_val>(__gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, __gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, dht::token const&, __gnu_cxx::__ops::_Iter_less_val) at crtstuff.c:? locator::token_metadata::first_token_index(dht::token const&) const at crtstuff.c:? locator::token_metadata::ring_range(dht::token const&, bool) const at crtstuff.c:? locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at crtstuff.c:? service::storage_service::get_changed_ranges_for_leaving(seastar::basic_sstring<char, unsigned int, 15u, true>, gms::inet_address) at crtstuff.c:? service::storage_service::unbootstrap() at crtstuff.c:? service::storage_service::decommission()::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const::{lambda()#1}::operator()() const [clone .isra.0] at storage_service.cc:? Refs: #5495	2020-01-22 12:36:15 +08:00
Asias He	74b787c91a	storage_service: Make get_changed_ranges_for_leaving run inside thread It is the only place where get_changed_ranges_for_leaving is not running inside a thread. Preparing patch to futurize get_changed_ranges_for_leaving. Refs: #5495	2020-01-22 12:36:13 +08:00
Piotr Sarna	9b379e3d63	db,view: fix checking for secondary index special columns A mistake in handling legacy checks for special 'idx_token' column resulted in not recognizing materialized views backing secondary indexes properly. The mistake is really a typo, but with bad consequences - instead of checking the view schema for being an index, we asked for the base schema, which is definitely not an index of itself. Branches 3.1,3.2 (asap) Fixes #5621 Fixes #4744	2020-01-21 22:32:04 +02:00
Rafael Ávila de Espíndola	27bd3fe203	service: Add a lock around migration_notifier::_listeners Before this patch the iterations over migration_notifier::_listeners could race with listeners being added and removed. The addition side is not modified, since it is common to add a listener during construction and it would require a fairly big refactoring. Instead, the iteration is modified to use indexes instead of iterators so that it is still valid if another listener is added concurrently. For removal we use a rw lock, since removing an element invalidates indexes too. There are only a few places that needed refactoring to handle unregister_listener returning a future<>, so this is probably OK. Fixes #5541. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200120192819.136305-1-espindola@scylladb.com>	2020-01-20 22:14:02 +02:00
Avi Kivity	c317b952a3	Merge "cql_query_test: Fix abandoned failed futures" from Rafael " This series fixes all abandoned failed futures in cql_query_test and starts running it with --fail-on-abandoned-failed-futures to avoid regressions. " * 'espindola/fix-abandoned-failed-futures' of https://github.com/espindola/scylla: cql_query_test: Avoid new abandoned failed futures cql_query_test: Explicitly ignore a failed future cql_query_test: Remove duplicated do_with_cql_env_thread cql_query_test: Fix cql and values in test_int_sum_with_cast	2020-01-20 20:40:56 +02:00
Rafael Ávila de Espíndola	4ce7cb9aa6	cql_query_test: Avoid new abandoned failed futures Now that cql_query_test has no abandoned failed futures, run it with --fail-on-abandoned-failed-futures to avoid regressions. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-20 09:23:22 -08:00
Rafael Ávila de Espíndola	ef5cd107ea	cql_query_test: Explicitly ignore a failed future This avoids an abandoned future warning. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-20 09:20:46 -08:00
Rafael Ávila de Espíndola	b547659c07	cql_query_test: Remove duplicated do_with_cql_env_thread With this test_int_sum_with_cast now runs and passes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-20 09:19:08 -08:00
Rafael Ávila de Espíndola	9334514c7c	cql_query_test: Fix cql and values in test_int_sum_with_cast This test is not running because of the double do_with_cql_env_thread. Fix it before we remove the extra do_with_cql_env_thread. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-20 09:17:35 -08:00
Avi Kivity	7d64b0f478	Update seastar submodule * seastar 3f3e117de3...afc46681e5 (7): > json: add move assignment to json_return_type > net: do not check if an unsigned variabe is less than 0 > stack: add virtual destructor definition for class w/ virtual functions > future,json: add ":" at end of concept definition > Fixing a bug in the handling of abort_accept() > install-dependencies.sh: improve arch detect > metrics: Avoid a copy during unregistration	2020-01-20 18:52:36 +02:00
Botond Dénes	e8a948ece6	configure.py: enable alloc failure injection for dev and debug modes We have numerous tests that rely on the seastar alloc failure injection infrastructure to test the exception safety of different components. These tests are essentially useless when the said infrastructure is not enabled, which is currently the case for all build modes, allowing bugs to sneak in undetected. Enable the allocation failure injection infrastructure for the dev and debug modes. Sanitize is excluded as it produces some (suspected false positive) failures and is not run in gating either currently. Tests: unit(dev, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200117104747.748866-1-bdenes@scylladb.com>	2020-01-20 18:07:33 +02:00
Kamil Braun	957fa8da11	dht: make i_partitioner::get_token method(s) const	2020-01-20 14:55:12 +02:00
Nadav Har'El	bd419ae723	merge: alternator: Add prerequisites for tagging Merged patch series from Piotr Sarna: This miniseries adds two simple prerequisites for implementing tagging: 1. A table is able to generate its Arn identifier 2. Simple tests for TagResource, UntagResource, ListTagsOfResource In general, tags should be stored in table metadata - either by expanding the schema of an existing schema table, e.g. scylla_tables, or by providing another meta-table - e.g. system_schema.alternator_tables, which stores alternator-specific metadata, like tags. Refs #5066 Tests: alternator-test(local, remote) Piotr Sarna (2): alternator: add Arn support for tables alternator-test: add basic tests for tags alternator-test/test_describe_table.py \| 1 - alternator-test/test_tag.py \| 88 ++++++++++++++++++++++++++ alternator/executor.cc \| 5 ++ 3 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 alternator-test/test_tag.py	2020-01-20 14:42:40 +02:00
Piotr Jastrzebski	9279a679da	keys.hh: make it independent from schema.hh This cuts build dependency keys.hh -> schema.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-20 14:25:17 +02:00
Piotr Sarna	b8277e43e5	alternator-test: add basic tests for tags TagResource, UntagResource and ListTagsOfResource validation tests are added. Refs #5066	2020-01-20 12:24:51 +01:00
Piotr Sarna	8c17b5aec4	alternator: add Arn support for tables Several API-s, e.g. TagResource, UntagResource and ListTagsOfResource rely on identifying tables by their "Arn". According to the docs, an Arn should uniquely identify a resource, so it's implemented as: arn:KEYSPACE_NAME:TABLE_NAME which is a minimal set of information that uniquely identifies a table in Scylla. The `arn:` prefix is needed for compatibility purposes. This commit adds a simple function for generating the Arn string, and also includes it in DescribeTable result under the TableArn attribute. Refs #5066	2020-01-20 12:24:51 +01:00
Botond Dénes	a74a82d4d2	flat_mutation_reader: mutation_fragment_stream_validator: add name Add a name parameter to the validator, so that the validator can be identified in log messages. Schema identity information is added to the name automatically. This should help pinpoint the problematic place where validation failed. Although at the moment we have a single validator, it still benefits from having a name, as we can now include in it the name of the sstable being written and hence trace the source of the bad data. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200117150616.895878-1-bdenes@scylladb.com>	2020-01-20 11:06:30 +01:00
Takuya ASADA	893dfbce59	dist/ami: update packer to 1.5.1 Update Packer to 1.5.1. Needed to rename clean_ami_name -> clean_resource_name on scylla.json, since the variable name had been changed. Also fixed checksum verification code, trimmed unwanted extra strings from sha256sum output.	2020-01-20 11:24:57 +02:00
Takuya ASADA	46386beba2	install.sh: convert relocate_python_scripts.py to a bash function Since we need to run relocate_python_scripts.py on install time, python script may not able to run on various different environment. So convert the script to bash script, merge it into install.sh.	2020-01-20 11:15:34 +02:00
Takuya ASADA	5627888b7c	scylla_post_install.sh: fix 'integer expression expected' error awk returns float value on Debian, it causes postinst script failure since we compare it as integer value. Replaced with sed + bash. Fixes #5569	2020-01-20 11:13:55 +02:00
Asias He	343986a70b	gossiper: Introduce gossip STATUS_UNKNOWN When a node does not have gossip STATUS application_state, we currently use an empty string to present such state in get_gossip_status. It is better to use an explicit "UNKNOWN" to present it. It makes the log easier to understand when the status is unknown. Before: 'gossip - InetAddress n2 is now UP, status =' After: 'gossip - InetAddress n2 is now UP, status = UNKNOWN' This patch is safe because the STATUS_UNKNOWN is never sent over the cluster. So the presentation is only internal to the node. Fixes #5520	2020-01-20 10:59:14 +02:00
Benny Halevy	2b383b404a	test: cql_query_test: test float/double sum overflow Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-20 10:42:03 +02:00
Ivan Prisyazhnyy	8fde8e3600	dep: support arch linux Support arch linux dependencies. Tested on Arch 5.4.2-arch1-1 and docker archlinux. Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Message-Id: <20200118162110.824317-1-ivan@scylladb.com>	2020-01-19 14:30:03 +02:00
Benny Halevy	476a102de0	cql3: aggregate_fcts: simplify accumulator_for template definitions Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-19 08:26:40 +02:00
Avi Kivity	12bc965f71	atomic_cell: consistently use comma as separator in pretty-printers The atomic_cell pretty printers use a mix of commas and semicolons. This change makes them use commas everywhere, for consistency. Message-Id: <20200116133327.2610280-1-avi@scylladb.com>	2020-01-16 17:26:33 +01:00
Nadav Har'El	1ed21d70dc	merge: CDC: do mutation augmentation from storage proxy Merged pull request https://github.com/scylladb/scylla/pull/5567 from Calle Wilund: Fixes #5314 Instead of tying CDC handling into cql statement objects, this patch set moves it to storage proxy, i.e. shared code for mutating stuff. This means we automatically handle cdc for code paths outside cql (i.e. alternator). It also adds api handling (though initially inefficient) for batch statements. CDC is tied into storage proxy by giving the former a ref to the latter (per shard). Initially this is not a constructor parameter, because right now we have chicken and egg issues here. Hopefully, Pavels refactoring of migration manager and notifications will untie these and this relationship can become nicer. The actual augmentation can (as stated above) be made much more efficient. Hopefully, the stream management refactoring will deal with expensive stream lookup, and eventually, we can maybe coalesce pre-image selects for batches. However, that is left as an exercise for when deemed needed. The augmentation API has an optional return value for a "post-image handler" to be used iff returned after mutation call is finished (and successful). It is not yet actually invoked from storage_proxy, but it is at least in the call chain.	2020-01-16 17:12:56 +02:00
Avi Kivity	e677f56094	Merge "Enable general centos RPM (not only centos7)" from Hagit	2020-01-16 14:13:24 +02:00
Tomasz Grabiec	36d90e637e	Merge "Relax migration manager dependencies" from Pavel Emalyanov The set make dependencies between mm and other services cleaner, in particular, after the set: - the query processor no longer needs migration manager (which doesn't need query processor either) - the database no longer needs migration manager, thus the mutual dependency between these two is dropped, only migration manager -> database is left - the migration manager -> storage_service dependency is relaxed, one more patchset will be needed to remove it, thus dropping one more mutual dependency between them, only the storage_service -> migration manager will be left - the migration manager is stopped on drain, but several more services need it on stop, thus causing use after free problems, in particular there's a caught bug when view builder crashes when unregistering from notifier list on stop. Fixed. Tests: unit(dev) Fixes: #5404	2020-01-16 12:12:25 +01:00
Hagit Segev	d0405003bd	building-packages doc: Update no specific el7 on path	2020-01-16 12:49:08 +02:00
Rafael Ávila de Espíndola	c42a2c6f28	configure: Add -O1 when compiling generated parsers Enabling asan enables a few cleanup optimizations in gcc. The net result is that using -fsanitize=address -fno-sanitize-address-use-after-scope Produces code that uses a lot less stack than if the file is compiled with just -O0. This patch adds -O1 in addition to -fno-sanitize-address-use-after-scope to protect the unfortunate developer that decides to build in dev mode with --cflags='-O0 -g'. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200116012318.361732-2-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola	317e0228a8	configure: Put user flags after the mode flags It is sometimes convenient to build with flags that don't match any existing mode. Recently I was tracking a bug that would not reproduce with debug, but reproduced with dev, so I tried debugging the result of ./configure.py --cflags="-O0 -g" While the binary had debug info, it still had optimizations because configure.py put the mode flags after the user flags (-O0 -O1). This patch flips the order (-O1 -O0) so that the flags passed in the command line win. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200116012318.361732-1-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Gleb Natapov	51281bc8ad	lwt: fix write timeout exception reporting CQL transport code relies on an exception's C++ type to create correct reply, but in lwt we converted some mutation_timeout exceptions to more generic request_timeout while forwarding them which broke the protocol. Do not drop type information. Fixes #5598. Message-Id: <20200115180313.GQ9084@scylladb.com>	2020-01-16 12:05:50 +02:00
Piotr Jastrzębski	0c8c1ec014	config: fix description of enable_deprecated_partitioners Murmur3 is the default partitioner. ByteOrder and Random are the deprecated ones and should be mentioned in the description. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-16 12:05:50 +02:00
Nadav Har'El	9953a33354	merge "Adding a schema file when creating a snapshot" Merged pull request https://github.com/scylladb/scylla/pull/5294 from Amnon Heiman: To use a snapshot we need a schema file that is similar to the result of running cql DESCRIBE command. The DESCRIBE is implemented in the cql driver so the functionality needs to be re-implemented inside scylla. This series adds a describe method to the schema file and use it when doing a snapshot. There are different approach of how to handle materialize views and secondary indexes. This implementation creates each schema.cql file in its own relevant directory, so the schema for materializing view, for example, will be placed in the snapshot directory of the table of that view. Fixes #4192	2020-01-16 12:05:50 +02:00
Piotr Dulikowski	c383652061	gossip: allow for aborting on sleep This commit makes most sleeps in gossip.cc abortable. It is now possible to quickly shut down a node during startup, most notably during the phase while it waits for gossip to settle.	2020-01-16 12:05:50 +02:00
Avi Kivity	e5e0642f2a	tools: toolchain: add dependencies for building debian and rpm packages This reduces network traffic and eliminates time for installation when building packages from the frozen toolchain, as well as isolating the build from updates to those dependencies which may cause breakage.	2020-01-16 12:05:50 +02:00
Pekka Enberg	da9dae3dbe	Merge 'test.py: add support for CQL tests' from Kostja This patch set adds support for CQL tests to test.py, as well as many other improvements: * --name is now a positional argument * test output is preserved in testlog/${mode} * concise output format * better color support * arbitrary number of test suites * per-suite yaml-based configuration * options --jenkins and --xunit are removed and xml files are generated for all runs A simple driver is written in C++ to read CQL for standard input, execute in embedded mode and produce output. The patch is checked with BYO. Reviewed-by: Dejan Mircevski <dejan@scylladb.com> * 'test.py' of github.com:/scylladb/scylla-dev: (39 commits) test.py: introduce BoostTest and virtualize custom boost arguments test.py: sort tests within a suite, and sort suites test.py: add a basic CQL test test.py: add CQL .reject files to gitignore test.py: print a colored unidiff in case of test failure test.py: add CqlTestSuite to run CQL tests test.py: initial import of CQL test driver, cql_repl test.py: remove custom colors and define a color palette test.py: split test output per test mode test.py: remove tests_to_run test.py: virtualize Test.run(), to introduce CqlTest.Run next test.py: virtualize test search pattern per TestSuite test.py: virtualize write_xunit_report() test.py: ensure print_summary() is agnostic of test type test.py: tidy up print_summary() test.py: introduce base class Test for CQL and Unit tests test.py: move the default arguments handling to UnitTestSuite test.py: move custom unit test command line arguments to suite.yaml test.py: move command line argument processing to UnitTestSuite test.py: introduce add_test(), which is suite-specific ...	2020-01-16 12:05:50 +02:00
Pekka Enberg	e8b659ec5d	dist/docker: Remove Ubuntu-based Docker image The Ubuntu-based Docker image uses Scylla 1.0 and has not been updated since 2017. Let's remove it as unmaintained. Message-Id: <20200115102405.23567-1-penberg@scylladb.com>	2020-01-16 12:05:50 +02:00
Avi Kivity	546556b71b	Merge "allow commitlog to wait for specific entires to be flushed on disk" from Gleb " Currently commitlog supports two modes of operation. First is 'periodic' mode where all commitlog writes are ready the moment they are stored in a memory buffer and the memory buffer is flushed to a storage periodically. Second is a 'batch' mode where each write is flushed as soon as possible (after previous flush completed) and writes are only ready after they are flushed. The first option is not very durable, the second is not very efficient. This series adds an option to mark some writes as "more durable" in periodic mode meaning that they will be flushed immediately and reported complete only after the flush is complete (flushing a durable write also flushes all writes that came before it). It also changes paxos to use those durable writes to store paxos state. Note that strictly speaking the last patch is not needed since after writing to an actual table the code updates paxos table and the later uses durable writes that make sure all previous writes are flushed. Given that both writes supposed to run on the same shard this should be enough. But it feels right to make base table writes durable as well. " * 'gleb/commilog_sync_v4' of github.com:scylladb/seastar-dev: paxos: immediately sync commitlog entries for writes made by paxos learn stage paxos: mark paxos table schema as "always sync" schema: allow schema to be marked as 'always sync to commitlog' commitlog: add test for per entry sync mode database: pass sync flag from db::apply function to the commitlog commitlog: add sync method to entry_writer	2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola	2ebd1463b2	tests: Handle null and not present values differently Before this patch result_set_assertions was handling both null values and missing values in the same way. This patch changes the handling of missing values so that now checking for a null value is not the same as checking for a value not being present. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200114184116.75546-1-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Botond Dénes	0c52c2ba50	data: make cell::make_collection(): more consistent and safer `3ec889816` changed cell::make_collection() to take different code paths depending whether its `data` argument is nothrow copyable/movable or not. In case it is not, it is wrapped in a view to make it so (see the above mentioned commit for a full explanation), relying on the methods pre-existing requirement for callers to keep `data` alive while the created writer is in use. On closer look however it turns out that this requirement is neither respected, nor enforced, at least not on the code level. The real requirement is that the underlying data represented by `data` is kept alive. If `data` is a view, it is not expected to be kept alive and callers don't, it is instead copied into `make_collection()`. Non-views however are expected to be kept alive. This makes the API error prone. To avoid any future errors due to this ambiguity, require all `data` arguments to be nothrow copyable and movable. Callers are now required to pass views of nonconforming objects. This patch is a usability improvement and is not fixing a bug. The current code works as-is because it happens to conform to the underlying requirements. Refs: #5575 Refs: #5341 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200115084520.206947-1-bdenes@scylladb.com>	2020-01-16 12:05:50 +02:00
Amnon Heiman	ac8aac2b53	tests/cql_query_test: Add schema describe tests This patch adds tests for the describe method. test_describe_simple_schema tests regular tables. test_describe_view_schema tests view and index. Each test, create a table, find the schema, call the describe method and compare the results to the string that was used to create the table. The view tests also verify that adding an index or view does not change the base table. When comparing results, leading and trailing white spaces are ignored and all combination of whitespaces and new lines are treated equaly. Additional tests may be added at a future phase if required. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:07:57 +02:00
Amnon Heiman	028525daeb	database: add schema.cql file when creating a snapshot When creating a snapshot we need to add a schema.cql file in the snapshot directory that describes the table in that snapshot. This patch adds the file using the schema describe method. get_snapshot_details and manifest_json_filter were modified to ignore the schema.cql file. Fixes #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Amnon Heiman	82367b325a	schema: Add a describe method This patch adds a describe method to a table schema. It acts similar to a DESCRIBE cql command that is implemented in a CQL driver. The method supports tables, secondary indexes local indexes and materialize views. relates to: #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Amnon Heiman	6f58d51c83	secondary_index_manager: add the index_name_from_table_name function index_name_from_table_name is a reverse of index_table_name, it gets a table name that was generated for an index and return the name of the index that generated that table. Relates to #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Pavel Emelyanov	555856b1cd	migration_manager: Use in-place value factory The factory is purely a state-less thing, there is no difference what instance of it to use, so we may omit referencing the storage_service in passive_announce This is 2nd simple migration_manager -> storage_service link to cut (more to come later). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	f129d8380f	migration_manager: Get database through storage_proxy There are several places where migration_manager needs storage_service reference to get the database from, thus forming the mutual dependency between them. This is the simplest case where the migration_manager link to the storage_service can be cut -- the databse reference can be obtained from storage_proxy instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	5cf365d7e7	database: Explicitly pass migration_manager through init_non_system_keyspace This is the last place where database code needs the migration_manager instance to be alive, so now the mutual dependency between these two is gone, only the migration_manager needs the database, but not the vice-versa. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	ebebf9f8a8	database: Do not request migration_manager instance for passive_announce The helper in question is static, so no need to play with the migration_manager instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	3f84256853	migration_manager: Remove register/unregister helpers In the 2nd patch the migration_manager kept those for simpler patching, but now we can drop it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	9e4b41c32a	tests: Switch on migration notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	9d31bc166b	cdc: Use migration_notifier to (un)register for events If no one provided -- get it from storage_service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:19 +03:00
Pavel Emelyanov	ecab51f8cc	storage_service: Use migration_notifier (and stop worrying) The storage_server needs migration_manager for notifications and carefully handles the manager's stop process not to demolish the listeners list from under itself. From now on this dependency is no longer valid (however the storage_service seems still need the migration_manager, but this is different story). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	7814ed3c12	cql_server: Use migration_notifier in events_notifier This patch removes an implicit cql_server -> migration_manager dependency, as the former's event notifier uses the latter for notifications. This dependency also breaks a loop: storage_service -> cql_server -> migration_manager -> storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	d9edcb3f15	query_processor: Use migration_notifier This patch breaks one (probably harmless but still) dependency loop. The query_processor -> migration_manager -> storage_proxy -> tracing -> query_processor. The first link is not not needed, as the query_processor needs the migration_manager purely to (ub)subscribe on notifications. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	2735024a53	auth: Use migration_notifier The same as with view builder. The constructor still needs both, but the life-time reference is now for notifier only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	28f1250b8b	view_builder: Use migration notifier The migration manager itself is still needed on start to wait for schema agreement, but there's no longer the need for the life-time reference on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	7cfab1de77	database: Switch on mnotifier from migration_manager Do not call for local migration manager instance to send notifications, call for the local migration notifier, it will always be alive. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f45b23f088	storage_service: Keep migration_notifier The storage service will need this guy to initialize sub-services with. Also it registers itself with notifiers. That said, it's convenient to have the migration notifier on board. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	e327feb77f	database: Prepare to use on-database migration_notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f240d5760c	migration_manager: Split notifier from main class The _listeners list on migration_manager class and the corresponding notify_xxx helpers have nothing to do with the its instances, they are just transport for notification delivery. At the same time some services need the migration manager to be alive at their stop time to unregister from it, while the manager itself may need them for its needs. The proposal is to move the migration notifier into a complete separate sharded "service". This service doesn't need anything, so it's started first and stopped last. While it's not effectively a "migration" notifier, we inherited the name from Cassandra and renaming it will "scramble neurons in the old-timers' brains but will make it easier for newcomers" as Avi says. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:19 +03:00
Pavel Emelyanov	074cc0c8ac	migration_manager: Helpers for on_before_ notifications Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:27:27 +03:00
Pavel Emelyanov	1992755c72	storage_service: Kill initialization helper from init.cc The helper just makes further patching more complex, so drop it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:27:27 +03:00
Konstantin Osipov	a665fab306	test.py: introduce BoostTest and virtualize custom boost arguments	2020-01-15 13:37:25 +03:00
Gleb Natapov	51672e5990	paxos: immediately sync commitlog entries for writes made by paxos learn stage	2020-01-15 12:15:42 +02:00
Gleb Natapov	0fc48515d8	paxos: mark paxos table schema as "always sync" We want all writes to paxos table to be persisted on a storage before declared completed.	2020-01-15 12:15:42 +02:00
Gleb Natapov	16e0fc4742	schema: allow schema to be marked as 'always sync to commitlog' All writes that uses this schema will be immediately persisted on a storage.	2020-01-15 12:15:42 +02:00
Gleb Natapov	0ce70c7a04	commitlog: add test for per entry sync mode	2020-01-15 12:15:42 +02:00
Gleb Natapov	29574c1271	database: pass sync flag from db::apply function to the commitlog Allow upper layers to request a mutation to be persisted on a disk before making future ready independent of which mode commitlog is running in.	2020-01-15 12:15:42 +02:00
Gleb Natapov	e0bc4aa098	commitlog: add sync method to entry_writer If the method returns true commitlog should sync to file immediately after writing the entry and wait for flush to complete before returning.	2020-01-15 12:15:42 +02:00
Piotr Sarna	9aab75db60	alternator: clean up single value rjson comparator The comparator is refreshed to ensure the following: - null compares less to all other types; - null, true and false are comparable against each other, while other types are only comparable against themselves and null. Comparing mixed types is not currently reachable from the alternator API, because it's only used for sets, which can only use strings, binary blobs and numbers - thus, no new pytest cases are added. Fixes #5454	2020-01-15 10:57:49 +02:00
Juliusz Stasiewicz	d87d01b501	storage_proxy: intercept rpc::closed_error if counter leader is down (#5579 ) When counter mutation is about to be sent, a leader is elected, but if the leader fails after election, we get `rpc::closed_error`. The exception propagates high up, causing all connections to be dropped. This patch intercepts `rpc::closed_error` in `storage_proxy::mutate_counters` and translates it to `mutation_write_failure_exception`. References #2859	2020-01-15 09:56:45 +01:00
Konstantin Osipov	a351ea57d5	test.py: sort tests within a suite, and sort suites This makes it easier to navigate the test artefacts. No need to sort suites since they are already stored in a dict.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	ba87e73f8e	test.py: add a basic CQL test	2020-01-15 11:41:19 +03:00
Konstantin Osipov	44d31db1fc	test.py: add CQL .reject files to gitignore To avoid accidental commit, add .reject files to .gitignore	2020-01-15 11:41:19 +03:00
Konstantin Osipov	4f64f0c652	test.py: print a colored unidiff in case of test failure Print a colored unidiff between result and reject files in case of test failure.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	d3f9e64028	test.py: add CqlTestSuite to run CQL tests Run the test and compare results. Manage temporary and .reject files. Now that there are CQL tests, improve logging. run_test success no longer means test success.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	b114bfe0bd	test.py: initial import of CQL test driver, cql_repl cql_repl is a simple program which reads CQL from stdin, executes it, and writes results to stdout. It support --input, --output and --log options. --log is directed to cql_test.log by default. --input is stdin by default --output is stdout by default. The result set output is print with a basic JSON visitor.	2020-01-15 11:41:16 +03:00
Konstantin Osipov	0ec27267ab	test.py: remove custom colors and define a color palette Using a standard Python module improves readability, and allows using colors easily in other output.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	0165413405	test.py: split test output per test mode Store test temporary files and logs in ${testdir}/${mode}. Remove --jenkins and --xunit, and always write XML files at a predefined location: ${testdir}/${mode}/xml/. Use .xunit.xml extension for tests which XML output is in xunit format, and junit.xml for an accumulated output of all non-boost tests in junit format.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	4095ab08c8	test.py: remove tests_to_run Avoid storing each test twice, use per-tests list to construct a global iterable.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	169128f80b	test.py: virtualize Test.run(), to introduce CqlTest.Run next	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d05f6c3cc7	test.py: virtualize test search pattern per TestSuite CQL tests have .cql extension, while unit tests have .cc.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	abcc182ab3	test.py: virtualize write_xunit_report() Make sure any non-boost test can participate in the report.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	18aafacfad	test.py: ensure print_summary() is agnostic of test type Introduce a virtual Test.print_summary() to print a failed test summary.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	21fbe5fa81	test.py: tidy up print_summary() Now that we have tabular output, make print_summary() more concise.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	c171882b51	test.py: introduce base class Test for CQL and Unit tests	2020-01-15 10:53:24 +03:00
Konstantin Osipov	fd6897d53e	test.py: move the default arguments handling to UnitTestSuite Move UnitTeset default seastar argument handling to UnitTestSuite (cleanup).	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d3126f08ed	test.py: move custom unit test command line arguments to suite.yaml Load the command line arguments, if any, from suite.yaml, rather than keep them hard-coded in test.py. This is allows operations team to have easier access to these. Note I had to sacrifice dynamic smp count for mutation_reader_test (the new smp count is fixed at 3) since this is part of test configuration now.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	ef6cebcbd2	test.py: move command line argument processing to UnitTestSuite	2020-01-15 10:53:24 +03:00
Konstantin Osipov	4a20617be3	test.py: introduce add_test(), which is suite-specific	2020-01-15 10:53:24 +03:00
Konstantin Osipov	7e10bebcda	test.py: move long test list to suite.yaml Use suite.yaml for long tests	2020-01-15 10:53:24 +03:00
Konstantin Osipov	32ffde91ba	test.py: move test id assignment to TestSuite Going forward finding and creating tests will be a responsibility of TestSuite, so the id generator needs to be shared.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	b5b4944111	test.py: move repeat handling to TestSuite This way we can avoid iterating over all tests to handle --repeat. Besides, going forward the tests will be stored in two places: in the global list of all tests, for the runner, and per suite, for suite-based reporting, so it's easier if TestSuite if fully responsible for finding and adding tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	34a1b49fc3	test.py: move add_test_list() to TestSuite	2020-01-15 10:53:24 +03:00
Konstantin Osipov	44e1c4267c	test.py: introduce test suites - UnitTestSuite - for test/unit tests - BoostTestSuite - a tweak on UnitTestSuite, with options to log xml test output to a dedicated file	2020-01-15 10:53:24 +03:00
Konstantin Osipov	eed3201ca6	test.py: use path, rather than test kind, for search pattern Going forward there may be multiple suites of the same kind.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	f95c97667f	test.py: support arbitrary number of test suites Scan entire test/ for folders that contain suite.yaml, and load tests from these folders. Skip the rest. Each folder with a suite.yaml is expected to have a valid suite configuration in the yaml file. A suite is a folder with test of the same type. E.g. it can be a folder with unit tests, boost tests, or CQL tests. The harness will use suite.yaml to create an appropriate suite test driver, to execute tests in different formats.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	c1f8169cd4	test.py: add suite.yaml to boost and unit tests The plan is to move suite-specific settings to the configuration file.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	ec9ad04c8a	test.py: move 'success' to TestUnit class There will be other success attributes: program return status 0 doesn't mean the test is successful for all tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	b4aa4d35c3	test.py: save test output in tmpdir It is handy to have it so that a reference of a failed test is available without re-running it.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	f4efe03ade	test.py: always produce xml output, derive output paths from tmpdir It reduces the number of configurations to re-test when test.py is modified. and simplifies usage of test.py in build tools, since you no longer need to bother with extra arguments.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d2b546d464	test.py: output job count in the log	2020-01-15 10:53:24 +03:00
Konstantin Osipov	233f921f9d	test.py: make test output brief&tabular New format: % ./test.py --verbose --mode=release ================================================================================ [N/TOTAL] TEST MODE RESULT ------------------------------------------------------------------------------ [1/111] boost/UUID_test release [ PASS ] [2/111] boost/enum_set_test release [ PASS ] [3/111] boost/like_matcher_test release [ PASS ] [4/111] boost/observable_test release [ PASS ] [5/111] boost/allocation_strategy_test release [ PASS ] ^C % ./test.py foo ================================================================================ [N/TOTAL] TEST MODE RESULT ------------------------------------------------------------------------------ [3/3] unit/memory_footprint_test debug [ PASS ] ------------------------------------------------------------------------------	2020-01-15 10:53:24 +03:00
Konstantin Osipov	879bea20ab	test.py: add a log file Going forward I'd like to make terminal output brief&tabular, but some test details are necessary to preserve so that a failure is easy to debug. This information now goes to the log file. - open and truncate the log file on each harness start - log options of each invoked test in the log, so that a failure is easy to reproduce - log test result in the log Since tests are run concurrently, having an exact trace of concurrent execution also helps debugging flaky tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	cbee76fb95	test.py: gitignore the default ./test.py tmpdir, ./testlog	2020-01-15 10:53:24 +03:00
Konstantin Osipov	1de69228f1	test.py: add --tmpdir It will be used for test log files.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	caf742f956	test.py: flake8 style fix	2020-01-15 10:53:24 +03:00
Konstantin Osipov	dab364c87d	test.py: sort imports	2020-01-15 10:53:24 +03:00
Konstantin Osipov	7ec4b98200	test.py: make name a positional argument. Accept multiple test names, treat test name as a substring, and if the same name is given multiple times, run the test multiple times.	2020-01-15 10:53:24 +03:00
Dejan Mircevski	bb2e04cc8b	alternator: Improve comments on comparators Some comparator methods in conditions.cc use unexpected operators; explain why. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-14 22:25:55 +02:00
Tomasz Grabiec	c8a5a27bd9	Merge "storage_service: Move load_broadcaster away" from Pavel E. The storage_service struct is a collection of diverse things, most of them requiring only on start and on stop and/or runing on shard 0 (but is nonetheless sharded). As a part of clearing this structure and generated by it inter- -componenes dependencies, here's the sanitation of load_broadcaster.	2020-01-14 19:26:06 +01:00
Calle Wilund	313ed91ab0	cdc: Listen for migration callbacks on all shards Fixes #5582 ... but only populate log on shard 0. Migration manager callbacks are slightly assymetric. Notifications for pre-create/update mutations are sent only on initiating shard (neccesary, because we consider the mutations mutable). But "created" callbacks are sent on all shards (immutable). We must subscribe on all shards, but still do population of cdc table only once, otherwise we can either miss table creat or populate more than once. v2: - Add test case Message-Id: <20200113140524.14890-1-calle@scylladb.com>	2020-01-14 16:35:41 +01:00
Avi Kivity	2138657d3a	Update seastar submodule * seastar 36cf5c5ff0...3f3e117de3 (16): > memcached: don't use C++17-only std::optional > reactor: Comment why _backend is assigned in constructor body > log: restore --log-to-stdout for backward compatibility > used_size.hh: Include missing headers > core: Move some code from reactor.cc to future.cc > future-util: move parallel_for_each to future-util.cc > task: stop wrapping tasks with unique_ptr > Merge "Setup timer signal handler in backend constructor" from Pavel Fixes #5524 > future: avoid a branch in future's move constructor if type is trivial > utils: Expose used_size > stream: Call get_future early > future-util: Move parallel_for_each_state code to a .cc > memcached: log exceptions > stream: Delete dead code > core: Turn pollable_fd into a simple proxy over pollable_fd_state. > Merge "log to std::cerr" from Benny	2020-01-14 16:56:25 +02:00
Pavel Emelyanov	e1ed8f3f7e	storage_service: Remove _shadow_token_metadata This is the part of de-bloating storage_service. The field in question is used to temporary keep the _token_metadata value during shard-wide replication. There's no need to have it as class member, any "local" copy is enough. Also, as the size of token_metadata is huge, and invoke_on_all() copies the function for each shard, keep one local copy of metadata using do_with() and pass it into the invoke_on_all() by reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reviewed-by: Asias He <asias@scylladb.com> Message-Id: <20200113171657.10246-1-xemul@scylladb.com>	2020-01-14 16:29:10 +02:00
Rafael Ávila de Espíndola	054f5761a7	types: Refactor code into a serialize_varint helper This is a bit cleaner and avoids a boost::multiprecision::cpp_int copy while serializing a decimal. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200110221422.35807-1-espindola@scylladb.com>	2020-01-14 16:28:27 +02:00
Avi Kivity	6c84dd0045	cql3: update_statement: do not set query option always_return_static_content for list read-before-write The query option always_return_static_content was added for lightweight transations in commits `e0b31dd273` (infrastructure) and `65b86d155e` (actual use). However, the flag was added unconditionally to update_parameters::options. This caused it to be set for list read-modify-write operations, not just for lightweight transactions. This is a little wasteful, and worse, it breaks compatibility as old nodes do not understand the always_return_static_content flag and complain when they see it. To fix, remove the always_return_static_content from update_parameters::options and only set it from compare-and-swap operations that are used to implement lightweight transactions. Fixes #5593. Reviewed-by: Gleb Natapov <gleb@scylladb.com> Message-Id: <20200114135133.2338238-1-avi@scylladb.com>	2020-01-14 16:15:20 +02:00
Hagit Segev	ef88e1e822	CentOS RPMs: Remove target to enable general centos.	2020-01-14 14:31:03 +02:00
Alejo Sanchez	6909d4db42	cql3: BYPASS CACHE query counter This patch is the first part of requested full scan metrics. It implements a counter of SELECT queries with BYPASS CACHE option. In scope of #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200113222740.506610-2-alejo.sanchez@scylladb.com>	2020-01-14 12:19:00 +02:00
Rafael Ávila de Espíndola	dca1bc480f	everywhere: Use serialized(foo) instead of data_value(foo).serialize() This is just a simple cleanup that reduces the size of another patch I am working on and is an independent improvement. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200114051739.370127-1-espindola@scylladb.com>	2020-01-14 12:17:12 +02:00
Pavel Emelyanov	b9f28e9335	storage_service: Remove dead drain branch The drain_in_progress variable here is the future that's set by the drain() operation itself. Its promise is set when the drain() finishes. The check for this future in the beginning of drain() is pointless. No two drain()-s can run in parallels because of run_with_api_lock() protection. Doing the 2nd drain after successfull 1st one is also impossible due to the _operation_mode check. The 2nd drain after _exceptioned_ (and thus incomplete) 1st one will deadlock, after this patch will try to drain for the 2nd time, but that should by ok. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200114094724.23876-1-xemul@scylladb.com>	2020-01-14 12:07:29 +02:00
Piotr Sarna	36ec43a262	Merge "add table with connected cql clients" from Juliusz This change introduces system.clients table, which provides information about CQL clients connected. PK is the client's IP address, CK consists of outgoing port number and client_type (which will be extended in future to thrift/alternator/redis). Table supplies also shard_id and username. Other columns, like connection_stage, driver_name, driver_version..., are currently empty but exist for C* compatibility and future use. This is an ordinary table (i.e. non-virtual) and it's updated upon accepting connections. This is also why C*'s column request_count was not introduced. In case of abrupt DB stop, the table should not persist, so it's being truncated on startup. Resolves #4820	2020-01-14 10:01:07 +02:00
Avi Kivity	1f46133273	Merge "data: make cell::make_collection() exception safe" from Botond " Most of the code in `cell` and the `imr` infrastructure it is built on is `noexcept`. This means that extra care must be taken to avoid rouge exceptions as they will bring down the node. The changes introduced by 0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this code path by violating an undocumented and unvalidated assumption -- that fragment ranges passed to `cell::make_collection()` are nothrow copyable and movable. This series refactors `cell::make_collection()` such that it does not have this assumption anymore and is safe to use with any range. Note that the unit test included in this series, that was used to find all the possible exception sources will not be currently run in any of our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not being set. I plan to address this in a followup because setting this flags fails other tests using the failure injection mechanism. This is because these tests are normally run with the failure injection disabled so failures managed to lurk in without anyone noticing. Fixes: #5575 Refs: #5341 Tests: unit(dev, debug) " * 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla: test: mutation_test: add exception safety test for large collection serialization data/cell.hh: avoid accidental copies of non-nothrow copiable ranges utils/fragment_range.hh: introduce fragment_range_view	2020-01-14 10:01:06 +02:00
Nadav Har'El	5b08ec3d2c	alternator: error on unsupported ScanIndexForward=false We do not yet support the ScanIndexForward=false option for reversing the sort order of a Query operation, as reported in issue #5153. But even before implementing this feature, it is important that we produce an error if a user attempts to use it - instead of outright ignoring this parameter and giving the user wrong results. This is what this patch does. Before this patch, the reverse-order query in the xfailing test test_query.py::test_query_reverse seems to succeed - yet gives results in the wrong order. With this patch, the query itself fails - stating that the ScanIndexForward=false argument is not supported. Refs #5153 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200105113719.26326-1-nyh@scylladb.com>	2020-01-14 10:01:06 +02:00
Pavel Emelyanov	c4bf532d37	storage_service: Fix race in removenode/force_removenode/other Here's another theoretical problem, that involves 3 sequential calls to respectively removenode, force_removenode and some other operation. Let's walk through them First goes the removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now the force_removenode can run: run_with_no_api_lock storage_service::force_removenode check _operation_in_progress (not empty) _force_remove_completion = true sleep in _operation_in_progress.empty loop Now the 1st call wakes up and: if _force_remove_completion == true throw <some exception> .finally() handler in run_with_api_lock _operation_in_progress = <empty> At this point some other operation may start. Say, drain: run_with_api_lock _operation_in_progress = "drain" storage_service::drain ... go to sleep somewhere No let's go back to the 1st op that wakes up from its sleep. The code it executes is while (!ss._operation_in_progress.empty()) { sleep_abortable() } and while the drain is running it will never exit. However (! and this is the core of the race) should the drain operation happen _before_ the force_removenode, another check for _operation_in_progress would have made the latter exit with the "Operation drain is in progress, try again" message. Fix this inconsistency by making the check for current operation every wake-up from the sleep_abortable. Fixes #5591 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-14 10:01:06 +02:00
Pavel Emelyanov	cc92683894	storage_service: Fix race and deadlock in removenode/force_removenode Here's a theoretical problem, that involves 3 sequential calls to respectively removenode, force_removenode and removenode (again) operations. Let's walk through them First goes the removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now the force_removenode can run: run_with_no_api_lock storage_service::force_removenode check _operation_in_progress (not empty) _force_remove_completion = true sleep in _operation_in_progress.empty loop Now the 1st call wakes up and: if _force_remove_completion == true _force_remove_completion = false throw <some exception> .finally() handler in run_with_api_lock _operation_in_progress = <empty> ! at this point we have _force_remove_completion = false and _operation_in_progress = <empty>, which opens the following opportunity for the 3d removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now here's what we have in 2nd and 3rd ops: 1. _operation_in_progress = "removenode" (set by 3rd) prevents the force_removenode from exiting its loop 2. _force_remove_completion = false (set by 1st on exit) prevents the removenode from waiting on replicating_nodes list One can start the 4th call with force_removenode, it will proceed and wake up the 3rd op, but after it we'll have two force_removenode-s running in parallel and killing each other. I propose not to set _force_remove_completion to false in removenode, but just exit and let the owner of this flag unset it once it gets the control back. Fixes #5590 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-14 10:01:06 +02:00
Benny Halevy	ff55b5dca3	cql3: functions: limit sum overflow detection to integral types Other types do not have a wider accumulator at the moment. And static_cast<accumulator_type>(ret) != _sum evaluates as false for NaN/Inf floating point values. Fixes #5586 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>	2020-01-14 10:01:06 +02:00
Avi Kivity	e3310201dd	atomic_cell_or_collection: type-aware print atomic_cell or collection components Now that atomic_cell_view and collection_mutation_view have type-aware printers, we can use them in the type-aware atomic_cell_or_collection printer. Message-Id: <20191231142832.594960-1-avi@scylladb.com>	2020-01-14 10:01:06 +02:00
Avi Kivity	931b196d20	mutation_partition: row: resolve column name when in schema-aware printer Instead of printing the column id, print the full column name. Message-Id: <20191231142944.595272-1-avi@scylladb.com>	2020-01-14 10:01:06 +02:00
Nadav Har'El	4aa323154e	merge: Pretty print canonical_mutation objects Merged pull request https://github.com/scylladb/scylla/pull/5533 from Avi Kivity: canonical_mutation objects are used for schema reconciliation, which is a fragile area and thus deserves some debugging help. This series makes canonical_mutation objects printable.	2020-01-14 10:01:06 +02:00
Takuya ASADA	5241deda2d	dist: nonroot: fix CLI tool path for nonroot (#5584 ) CLI tool path is hardcorded, need to specify correct path on nonroot.	2020-01-14 10:01:06 +02:00
Nadav Har'El	1511b945f8	merge: Handle multiple regular base columns in view pk Merged patch series from Piotr Sarna: "Previous assumption was that there can only be one regular base column in the view key. The assumption is still correct for tables created via CQL, but it's internally possible to create a view with multiple such columns - the new assumption is that if there are multiple columns, they share their liveness. This series is vital for indexing to work properly on alternator, so it would be best to solve the issue upstream. I strived to leave the existing semantics intact as long as only up to one regular column is part of the materialized view primary key, which is the case for Scylla's materialized views. For alternator it may not be true, but all regular columns in alternator share liveness info (since alternator does not support per-column TTL), which is sufficient to compute view updates in a consistent way. Fixes #5006 Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)" Piotr Sarna (3): db,view: fix checking if partition key is empty view: handle multiple regular base columns in view pk test: add a case for multiple base regular columns in view key alternator-test/test_gsi.py \| 1 - view_info.hh \| 5 +- cql3/statements/alter_table_statement.cc \| 2 +- db/view/view.cc \| 77 ++++++++++++++---------- mutation_partition.cc \| 2 +- test/boost/cql_query_test.cc \| 58 ++++++++++++++++++ 6 files changed, 109 insertions(+), 36 deletions(-)	2020-01-14 10:01:00 +02:00
Nadav Har'El	f16e3b0491	merge: bouncing lwt request to an owning shard Merged patch series from Gleb Natapov: "LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by the transport code that jumps to a correct shard and re-process incoming message there. The nicer way to achieve the same would be to jump to a right shard inside of the storage_proxy::cas(), but unfortunately with current implementation of the modification statements they are unusable by a shard different from where it was created, so the jump should happen before a modification statement for an cas() is created. When we fix our cql code to be more cross-shard friendly this can be reworked to do the jump in the storage_proxy." Gleb Natapov (4): transport: change make_result to takes a reference to cql result instead of shared_ptr storage_service: move start_native_transport into a thread lwt: Process lwt request on a owning shard lwt: drop invoke_on in paxos_state prepare and accept auth/service.hh \| 5 +- message/messaging_service.hh \| 2 +- service/client_state.hh \| 30 +++- service/paxos/paxos_state.hh \| 10 +- service/query_state.hh \| 6 + service/storage_proxy.hh \| 2 + transport/messages/result_message.hh \| 20 +++ transport/messages/result_message_base.hh \| 4 + transport/request.hh \| 4 + transport/server.hh \| 25 ++- cql3/statements/batch_statement.cc \| 6 + cql3/statements/modification_statement.cc \| 6 + cql3/statements/select_statement.cc \| 8 + message/messaging_service.cc \| 2 +- service/paxos/paxos_state.cc \| 48 ++--- service/storage_proxy.cc \| 47 ++++- service/storage_service.cc \| 120 +++++++------ test/boost/cql_query_test.cc \| 1 + thrift/handler.cc \| 3 + transport/messages/result_message.cc \| 5 + transport/server.cc \| 203 ++++++++++++++++------ 21 files changed, 377 insertions(+), 180 deletions(-)	2020-01-14 09:59:59 +02:00
Botond Dénes	300728120f	test: mutation_test: add exception safety test for large collection serialization Use `seastar::memory::local_failure_injector()` to inject al possible `std::bad_alloc`:s into the collection serialization code path. The test just checks that there are no `std::abort()`:s caused by any of the exceptions. The test will not be run if `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` is not defined.	2020-01-13 16:53:35 +02:00
Botond Dénes	3ec889816a	data/cell.hh: avoid accidental copies of non-nothrow copiable ranges `cell::make_collection()` assumes that all ranges passed to it are nothrow copyable and movable views. This is not guaranteed, is not expressed in the interface and is not mentioned in the comments either. The changes introduced by 0a453e5d3a to collection serialization, making it use fragmented buffers, fell into this trap, as it passes `bytes_ostream` to `cell::make_collection()`. `bytes_ostream`'s copy constructor allocates and hence can throw, triggering an `std::terminate()` inside `cell::make_collection()` as the latter is noexcept. To solve this issue, non-nothrow copyable and movable ranges are now wrapped in a `fragment_range_view` to make them so. `cell::make_collection()` already requires callers to keep alive the range for the duration of the call, so this does not introduce any new requirements to the callers. Additionally, to avoid any future accidents, do not accept temporaries for the `data` parameter. We don't ever want to move this param anyway, we will either have a trivially copyable view, or a potentially heavy-weight range that we will create a trivially copyable view of.	2020-01-13 16:53:35 +02:00
Botond Dénes	b52b4d36a2	utils/fragment_range.hh: introduce fragment_range_view A lightweight, trivially copyable and movable view for fragment ranges. Allows for uniform treatment of all kinds of ranges, i.e. treating all of them as a view. Currently `fragment_range.hh` provides lightweight, view-like adaptors for empty and single-fragment ranges (`bytes_view`). To allow code to treat owning multi-fragment ranges the shame way as the former two, we need a view for the latter as well -- this is `fragment_range_view`.	2020-01-13 16:52:59 +02:00
Calle Wilund	75f2b2876b	cdc: Remove free function for mutation augmentation	2020-01-13 13:18:55 +00:00
Calle Wilund	3eda3122af	cdc: Move mutation augment from cql3::modification_statement to storage proxy Using the attached service object	2020-01-13 13:18:55 +00:00
Juliusz Stasiewicz	27dfda0b9e	main/transport: using the infrastructure of system.clients Resolves #4820. Execution path in main.cc now cleans up system.clients table if it exists (this is done on startup). Also, server.cc now calls functions that notify about cql clients connecting/disconnecting.	2020-01-13 14:07:04 +01:00
Pavel Emelyanov	148da64a7e	storage_servce: Move load_broadcaster away This simplifies the storage_service API and fixes the complain about shared_ptr usage instead of unique_ptr. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-13 13:55:09 +03:00
Pavel Emelyanov	b6e1e6df64	misc_services: Introduce load_meter There's a lonely get_load_map() call on storage_service that needs only load broadcaster, always runs on shard 0 and that's it. Next patch will move this whole stuff into its own helper no-shard container and this is preparation for this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-13 13:53:08 +03:00
Gleb Natapov	5753ab7195	lwt: drop invoke_on in paxos_state prepare and accept Since lwt requests are now running on an owning shard there is no longer a need to invoke cross shard call on paxos_state level. RPC calls may still arrive to a wrong shard so we need to make cross shard call there.	2020-01-13 10:26:02 +02:00
Gleb Natapov	d28dd4957b	lwt: Process lwt request on a owning shard LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by transport code that jumps to a correct shard and re-process incoming message there.	2020-01-13 10:26:02 +02:00
Piotr Sarna	3853594108	alternator-test: turn off TLS self-signed verification Two test cases did not ignore TLS self-signed warnings, which are used locally for testing HTTPS. Fixes #5557 Tests(test_health, test_authorization) Message-Id: <8bda759dc1597644c534f94d00853038c2688dd7.1578394444.git.sarna@scylladb.com>	2020-01-10 15:31:30 +02:00
Rafael Ávila de Espíndola	5313828ab8	cql3: Fix indentation Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200109025855.10591-2-espindola@scylladb.com>	2020-01-09 10:42:55 +02:00
Rafael Ávila de Espíndola	4da6dc1a7f	cql3: Change a lambda capture order to match another Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200109025855.10591-1-espindola@scylladb.com>	2020-01-09 10:42:49 +02:00
Avi Kivity	6d454d13ac	db/schema_tables: make gratuitous generic lambdas in do_merge_schema() concrete Those gratuitous lambdas make life harder for IDE users by hiding the actual types from the IDEs. Message-Id: <20200107154746.1918648-1-avi@scylladb.com>	2020-01-08 17:43:18 +01:00
Avi Kivity	454074f284	Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz " The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717 " * tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla: database: Avoid OOMing with flush continuations after failed memtable flush lsa: Introduce operator bool() to occupancy_stats lsa: Expose region_impl::evictable_occupancy in the region class	2020-01-08 16:58:54 +02:00
Gleb Natapov	feed544c5d	paxos: fix truncation time checking during learn stage The comparison is done in millisecons, not microseconds. Fixes #5566 Message-Id: <20200108094927.GN9084@scylladb.com>	2020-01-08 14:37:07 +01:00
Gleb Natapov	2832f1d9eb	storage_service: move start_native_transport into a thread The code runs only once and it is simple if it runs in a seastar thread.	2020-01-08 14:57:57 +02:00
Gleb Natapov	7fb2e8eb9f	transport: change make_result to takes a reference to cql result instead of shared_ptr	2020-01-08 14:57:57 +02:00
Avi Kivity	0bde5906b3	Merge "cql3: detect and handle int overflow in aggregate functions #5537 " from Benny " Fix overflow handling in sum() and avg(). sum: - aggregated into __int128 - detect overflow when computing result and log a warning if found avg: - fix division function to divide the accumulator type _sum (__int128 for integers) by _count Add unit tests for both cases Test: - manual test against Cassandra 3.11.3 to make sure the results in the scylla unit test agree with it. - unit(dev), cql_query_test(debug) Fixes #5536 " * 'cql3-sum-overflow' of https://github.com/bhalevy/scylla: test: cql_query_test: test avg overflow cql3: functions: protect against int overflow in avg test: cql_query_test: test sum overflow cql3: functions: detect and handle int overflow in sum exceptions: sort exception_code definitions exceptions: define additional cassandra CQL exceptions codes	2020-01-08 10:39:38 +02:00
Avi Kivity	d649371baa	Merge "Fix crash on SELECT SUM(udf(...))" from Rafael " We were failing to start a thread when the UDF call was nested in an aggregate function call like SUM. " * 'espindola/fix-sum-of-udf' of https://github.com/espindola/scylla: cql3: Fix indentation cql3: Add missing with_thread_if_needed call cql3: Implement abstract_function_selector::requires_thread remove make_ready_future call	2020-01-08 10:25:42 +02:00
Benny Halevy	dafbd88349	query: initialize read_command timestamp to now This was initialized to api::missing_timestamp but should be set to either a client provided-timestamp or the server's. Unlike write operations, this timestamp need not be unique as the one generated by client_state::get_timestamp. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200108074021.282339-2-bhalevy@scylladb.com>	2020-01-08 10:19:07 +02:00
Benny Halevy	39325cf297	storage_proxy: fix int overflow in service::abstract_read_executor::execute exec->_cmd->read_timestamp may be initialized by default to api::min_timestamp, causing: service/storage_proxy.cc:3328:116: runtime error: signed integer overflow: 1577983890961976 - -9223372036854775808 cannot be represented in type 'long int' Aborting on shard 1. Do not optimize cross-dc repair if read_timestamp is missing (or just negative) We're interested in reads that happen within write_timeout of a write. Fixes #5556 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200108074021.282339-1-bhalevy@scylladb.com>	2020-01-08 10:18:59 +02:00
Raphael S. Carvalho	390c8b9b37	sstables: Move STCS implementation to source file header only implementation potentially create a problem with duplicate symbols Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200107154258.9746-1-raphaelsc@scylladb.com>	2020-01-08 09:55:35 +02:00
Benny Halevy	20a0b1a0b6	test: cql_query_test: test avg overflow Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:50:50 +02:00
Benny Halevy	1c81422c1b	cql3: functions: protect against int overflow in avg Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	9053ef90c7	test: cql_query_test: test sum overflow Add unit tests for summing up int's and bigint's with possible handling of overflow. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	e97a111f64	cql3: functions: detect and handle int overflow in sum Detect integer overflow in cql sum functions and throw an error. Note that Cassandra quietly truncates the sum if it doesn't fit in the input type but we rather break compatibility in this case. See https://issues.apache.org/jira/browse/CASSANDRA-4914?focusedCommentId=14158400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14158400 Fixes #5536 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	98260254df	exceptions: sort exception_code definitions Be compatible with Cassandra source. It's easier to maintain this way. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:21 +02:00
Benny Halevy	30d0f1df75	exceptions: define additional cassandra CQL exceptions codes As of `e9da85723a` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:40:57 +02:00
Rafael Ávila de Espíndola	282228b303	cql3: Fix indentation Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola	4316bc2e18	cql3: Add missing with_thread_if_needed call This fixes an assert when doing sum(udf(...)). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola	d301d31de0	cql3: Implement abstract_function_selector::requires_thread Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:24 -08:00
Rafael Ávila de Espíndola	dc9b3b8ff2	remove make_ready_future call Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:10:27 -08:00
Calle Wilund	9f6b22d882	cdc: Assign self to storage proxy object	2020-01-07 12:01:58 +00:00
Calle Wilund	fc5904372b	storage_proxy: Add (optional) cdc service object pointer member The cdc service is assigned from outside, post construction, mainly because of the chickens and eggs in main startup. Would be nice to have it unconditionally, but this is workable.	2020-01-07 12:01:58 +00:00
Calle Wilund	d6003253dd	storage_proxy: Move mutate_counters to private section It is (and shall) only be called from inside storage proxy, and we would like this to be reflected in the interface so our eventual moving of cdc logic into the mutate call chains become easier to verify and comprehend.	2020-01-07 12:01:58 +00:00
Calle Wilund	b6c788fccf	cdc: Add augmentation call to cdc service To eventually replace the free function. Main difference is this is build to both handle batches correctly and to eventually allow hanging cdc object on storage proxy, and caches on the cdc object.	2020-01-07 12:01:58 +00:00
Piotr Sarna	04dc8faec9	test: add a case for multiple base regular columns in view key The test case checks that having two base regular columns in the materialized view key (not obtainable via CQL), still works fine when values are inserted or deleted. If TTL was involved and these columns would have different expiration rules, the case would be more complicated, but it's not possible for a user to reach that case - neither with CQL, nor with alternator.	2020-01-07 12:19:06 +01:00
Piotr Sarna	155a47cc55	view: handle multiple regular base columns in view pk Previous assumption was that there can only be one regular base column in the view key. The assumption is still correct for tables created via CQL, but it's internally possible to create a view with multiple such columns - the new assumption is that if there are multiple columns, they share their liveness. This patch is vital for indexing to work properly on alternator, so it would be best to solve the issue upstream. I strived to leave the existing semantics intact as long as only up to one regular column is part of the materialized view primary key, which is the case for Scylla's materialized views. For alternator it may not be true, but all regular columns in alternator share liveness info (since alternator does not support per-column TTL), which is sufficient to compute view updates in a consistent way. Fixes #5006 Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo) Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>	2020-01-07 12:18:39 +01:00
Avi Kivity	6e0a073b2e	mutation_partition: use type-aware printing of the clustering row Now that position_in_partition_view has type-aware printing, use it to provide a human readable version of clustering keys. Message-Id: <20191231151315.602559-2-avi@scylladb.com>	2020-01-07 12:17:11 +01:00
Avi Kivity	488c42408a	position_in_partition_view: add type-aware printer If the position_in_partition_view represents a clustering key, we can now see it with the clustering key decoded according to the schema. Message-Id: <20191231151315.602559-1-avi@scylladb.com>	2020-01-07 12:15:09 +01:00
Piotr Sarna	54315f89cd	db,view: fix checking if partition key is empty Previous implementation did not take into account that a column in a partition key might exist in a mutation, but in a DEAD state - if it's deleted. There are no regressions for CQL, while for alternator and its capability of having two regular base columns in a view key, this additional check must be performed.	2020-01-07 12:05:36 +01:00
Avi Kivity	3a3c20d337	schema_tables: de-templatize diff_table_or_view() This reduces code bloat and makes the code friendlier for IDEs, as the IDE now understands the type of create_schema. Message-Id: <20191231134803.591190-1-avi@scylladb.com>	2020-01-07 11:56:54 +01:00
Avi Kivity	e5e42672f5	sstables: reduce bloat from sstables::write_simple() sstables::write_simple() has quite a lot of boilerplate which gets replicated into each template instance. Move all of that into a non-template do_write_simple(), leaving only things that truly depend on the component being written in the template, and encapsulating them with a noncopyable_function. An explicit template instantiation was added, since this is used in a header file. Before, it likely worked by accident and stopped working when the template became small enough to inline. Tests: unit (dev) Message-Id: <20200106135453.1634311-1-avi@scylladb.com>	2020-01-07 11:56:11 +01:00
Avi Kivity	8f7f56d6a0	schema_tables: make gratuitous generic lambda in create_tables_from_partitions() concrete The generic lambda made IDE searches for create_table_from_table_row() fail. Message-Id: <20191231135210.591972-1-avi@scylladb.com>	2020-01-07 11:49:10 +01:00
Avi Kivity	92fd83d3af	schema_tables: make gratuitoous generic lambda in create_table_from_name() concrete The lambda made IDE searches for read_table_mutations fail. Message-Id: <20191231135103.591741-1-avi@scylladb.com>	2020-01-07 11:48:56 +01:00
Avi Kivity	dd6dd97df9	schema_tables: make gratuitous generic lambda in merge_tables_and_views() concrete The generic lambda made IDE searches for create_table_from_mutations fail. Message-Id: <20191231135059.591681-1-avi@scylladb.com>	2020-01-07 11:48:39 +01:00
Avi Kivity	c63cf02745	canonical_mutation: add pretty printing Add type-aware printing of canonical_mutation objects.	2020-01-07 12:06:31 +02:00
Avi Kivity	e093121687	mutation_partition_view: add virtual visitor mutation_partition_view now supports a compile-time resolved visitor. This is performant but results in bloat when the performance is not needed. Furthermore, the template function that applies the object to the visitor is private and out-of-line, to reduce compile time. To allow visitation on mutation_partition_view objects, add a virtual visitor type and a non-template accept function. Note: mutation_partition_visitor is very similar to the new type, but different enough to break the template visitor which is used to implement the new visitor. The new visitor will be used to implement pretty printing for canonical_mutation.	2020-01-07 12:06:31 +02:00
Avi Kivity	75d9909b27	collection_mutation_view: add type-aware pretty printer Add a way for the user to associate a type with a collection_mutation_view and get a nice printout.	2020-01-07 12:06:29 +02:00
Rafael Ávila de Espíndola	b80852c447	main: Explicitly allow scylla core dumps I have not looked into the security reason for disabling it when a program has file capabilities. Fixes #5560 [avi: remove extraneous semicolon] Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200106231836.99052-1-espindola@scylladb.com>	2020-01-07 11:15:59 +02:00
Rafael Ávila de Espíndola	07f1cb53ea	tests: run with ASAN_OPTIONS='disable_coredump=0:abort_on_error=1' These are the same options we use in seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200107001513.122238-1-espindola@scylladb.com>	2020-01-07 11:11:49 +02:00
Takuya ASADA	238a25a0f4	docker: fix typo of scylla-jmx script path (#5551 ) The path should /opt/scylladb/jmx, not /opt/scylladb/scripts/jmx. Fixes #5542	2020-01-07 10:54:16 +02:00
Asias He	401854dbaf	repair: Avoid duplicated partition_end write Consider this: 1) Write partition_start of p1 2) Write clustering_row of p1 3) Write partition_end of p1 4) Repair is stopped due to error before writing partition_start of p2 5) Repair calls repair_row_level_stop() to tear down which calls wait_for_writer_done(). A duplicate partition_end is written. To fix, track the partition_start and partition_end written, avoid unpaired writes. Backports: 3.1 and 3.2 Fixes: #5527	2020-01-06 14:06:02 +02:00
Eliran Sinvani	e64445d7e5	debian-reloc: Propagate PRODUCT variable to renaming command in debian pkg commit `21dec3881c` introduced a bug that will cause scylla debian build to fail. This is because the commit relied on the environment PRODUCT variable to be exported (and as a result, to propogate to the rename command that is executed by find in a subshell) This commit fixes it by explicitly passing the PRODUCT variable into the rename command. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20200106102229.24769-1-eliransin@scylladb.com>	2020-01-06 12:31:58 +02:00
Asias He	38d4015619	gossiper: Remove HIBERNATE status from dead state In scylla, the replacing node is set as HIBERNATE status. It is the only place we use HIBERNATE status. The replacing node is supposed to be alive and updating its heartbeat, so it is not supposed to be in dead state. This patch fixes the following problem in replacing. 1) start n1, n2 2) n2 is down 3) start n3 to replace n2, but kill n3 in the middle of the replace 4) start n4 to replace n2 After step 3 and step 4, the old n3 will stay in gossip forever until a full cluster shutdown. Note n3 will only stay in gossip but in system.peers table. User will see the annoying and infinite logs like on all the nodes rpc - client $ip_of_n3:7000: fail to connect: Connection refused Fixes: #5449 Tests: replace_address_test.py + manual test	2020-01-06 11:47:31 +02:00
Amos Kong	c5ec1e3ddc	scylla_ntp_setup: check redhat variant version by prase_version (#5434 ) VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7" scylla_ntp_setup doesn't work on OEL7.7 for ValueError. - ValueError: invalid literal for int() with base 10: '7.7' This patch changed redhat_version() to return version string, and compare with parse_version(). Fixes #5433 Signed-off-by: Amos Kong <amos@scylladb.com>	2020-01-06 11:43:14 +02:00
Asias He	145fd0313a	streaming: Fix map access in stream_manager::get_progress When the progress is queried, e.g., query from nodetool netstats the progress info might not be updated yet. Fix it by checking before access the map to avoid errors like: std::out_of_range (_Map_base::at) Fixes: #5437 Tests: nodetool_additional_test.py:TestNodetool.netstats_test	2020-01-06 10:31:15 +02:00
Rafael Ávila de Espíndola	98cd8eddeb	tests: Run with halt_on_error=1:abort_on_error=1 This depends on the just emailed fixes to undefined behavior in tests. With this change we should quickly notice if a change introduces undefined behavior. Fixes #4054 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191230222646.89628-1-espindola@scylladb.com>	2020-01-05 17:20:31 +02:00
Rafael Ávila de Espíndola	dc5ecc9630	enum_option_test: Add explicit underlying types to enums We expect to be able to create variables with out of range values, so these enums needs explicit underlying types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200102173422.68704-1-espindola@scylladb.com>	2020-01-05 17:20:31 +02:00
Nadav Har'El	f0d8dd4094	merge: CDC rolling upgrade Merged pull request https://github.com/scylladb/scylla/pull/5538 from Avi Kivity and Piotr Jastrzębski. This series prepares CDC for rolling upgrade. This consists of reducing the footprint of cdc, when disabled, on the schema, adding a cluster feature, and redacting the cdc column when transferring it to other nodes. The latter is needed because we'll want to backport this to 3.2, which doesn't have canonical_mutations yet.	2020-01-05 17:13:12 +02:00
Gleb Natapov	720c0aa285	commitlog: update last sync timestamp when cycle a buffer If in memory buffer has not enough space for incoming mutation it is written into a file, but the code missed updating timestamp of a last sync, so we may sync to often. Message-Id: <20200102155049.21291-9-gleb@scylladb.com>	2020-01-05 16:13:59 +02:00
Gleb Natapov	14746e4218	commitlog: drop segment gate The code that enters the gate never defers before leaving, so the gate behaves like a flag. Lets use existing flag to prohibit adding data to a closed segment. Message-Id: <20200102155049.21291-8-gleb@scylladb.com>	2020-01-05 16:13:59 +02:00
Gleb Natapov	f8c8a5bd1f	test: fix error reporting in commitlog_test Message-Id: <20200102155049.21291-7-gleb@scylladb.com>	2020-01-05 16:13:58 +02:00
Gleb Natapov	680330ae70	commitlog: introduce segment::close() function. Currently segment closing code is spread over several functions and activated based on the _closed flag. Make segment closing explicit by moving all the code into close() function and call it where _closed flag is set. Message-Id: <20200102155049.21291-6-gleb@scylladb.com>	2020-01-05 16:13:55 +02:00
Gleb Natapov	a1ae08bb63	commitlog: remove unused segment::flush() parameter Message-Id: <20200102155049.21291-5-gleb@scylladb.com>	2020-01-05 16:13:55 +02:00
Gleb Natapov	1e15e1ef44	commitlog: cleanup segment sync() Call cycle() only once. Message-Id: <20200102155049.21291-4-gleb@scylladb.com>	2020-01-05 16:13:54 +02:00
Gleb Natapov	3d3d2c572e	commitlog: move segment shutdown code from sync() Currently sync() does two completely different things based on the shutdown parameter. Separate code into two different function. Message-Id: <20200102155049.21291-3-gleb@scylladb.com>	2020-01-05 16:13:54 +02:00
Gleb Natapov	89afb92b28	commitlog: drop superfluous this Message-Id: <20200102155049.21291-2-gleb@scylladb.com>	2020-01-05 16:13:53 +02:00
Piotr Jastrzebski	95feeece0b	scylla_tables: treat empty cdc props as disabled Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	396e35bf20	cdc: add schema_change test for cdc_options The original "test_schema_digest_does_not_change" test case ensures that schema digests will match for older nodes that do not support all the features yet (including computed columns). The additional case uses sstables generated after CDC was enabled and a table with CDC enabled is created, in order to make sure that the digest computed including CDC column does not change spuriously as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	c08e6985cd	cdc: allow cluster rolling upgrade Addition of cdc column in scylla_tables changes how schema digests are calculated, and affect the ABI of schema update messages (adding a column changes other columns' indexes in frozen_mutation). To fix this, extend the schema_tables mechanism with support for the cdc column, and adjust schemas and mutations to remove that column when sending schemas during upgrade. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	caa0a4e154	tests: disable CDC in schema_change_tests Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	129af99b94	cdc: Return reference from cluster_supports_cdc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	4639989964	cdc: Add CDC_OPTIONS schema_feature Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Avi Kivity	c150f2e5d7	schema_tables, cdc: don't store empty cdc columns in scylla_tables An empty cdc column in scylla_tables is hashed differently from a missing column. This causes schema mismatch when a schema is propagated to another node, because the other node will redact the schema column completely if the cluster feature isn't enabled, and an empty value is hashed differently from a missing value. Store a tombstone instead. Tombstones are removed before digesting, so they don't affect the outcome. This change also undoes the changes in `386221da84` ("schema_tables: handle 'cdc' options") to schema_change_test test_merging_does_not_alter_tables_which_didnt_change. That change enshrined the breakage into the test, instead of fixing the root cause, which was that we added an an extra mutation to the schema (for cdc options, which were disabled).	2020-01-05 14:36:18 +02:00
Rafael Ávila de Espíndola	3d641d4062	lua: Use existing cpp_int cast logic Different versions of boost have different rules for what conversions from cpp_int to smaller intergers are allowed. We already had a function that worked with all supported versions, but it was not being use by lua. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200104041028.215153-1-espindola@scylladb.com>	2020-01-05 12:10:54 +02:00
Rafael Ávila de Espíndola	88b5aadb05	tests: cql_test_env: wait for two futures starting internal services I noticed this while looking at the crashes next is currently experiencing. While I have no idea if this fixes the issue, it does avoid broken future warnings (for no_sharded_instance_exception) in a debug build. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200103201540.65324-1-espindola@scylladb.com>	2020-01-05 12:09:59 +02:00
Avi Kivity	4b8e2f5003	Update seastar submodule * seastar 0525bbb08...36cf5c5ff (6): > memcached: Fix use after free in shutdown > Revert "task: stop wrapping tasks with unique_ptr" > task: stop wrapping tasks with unique_ptr > http: Change exception formating to the generic seastar one > Merge "Avoid a few calls to ~exception_ptr" from Rafael > tests: fix core generation with asan	2020-01-03 15:48:53 +02:00
Nadav Har'El	44c2a44b54	alternator-test: test for ConditionExpression feature This patch adds a very comprehensive test for the ConditionExpression feature, i.e., the newer syntax of conditional writes replacing the old-style "Expected" - for the UpdateItem, PutItem and DeleteItem operations. I wrote these tests while closely following the DynamoDB ConditionExpression documentation, and attempted to cover all conceivable features, subfeatures and subcases of the ConditionExpression syntax - to serve as a test for a future support for this feature in Alternator (see issue #5053). As usual, all these tests pass on AWS DynamoDB, but because we haven't yet implemented this feature in Alternator, all but one xfail on Alternator. Refs #5053. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191229143556.24002-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	aad5eeab51	alternator: better error messages when Alternator port is taken If Alternator is requested to be enabled on a specific port but the port is already taken, the boot fails as expected - but the error log is confusing; It currently looks something like this: WARN 2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) ... (many more messages about the server shutting down) INFO 2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) There are two problems here. First, the "WARN" should really be an "ERROR", because it causes the server to be shut down and the user must see this error. Second, the final line in the log, something the user is likely to see first, contains only the ultimate cause for the exception (an address already in use) but not the information what this address was needed for. This patch solves both issues, and the log now looks like: ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system :98, posix_listen failed for address 0.0.0.0:8000: Address already in use) ... INFO 2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191224124127.7093-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	1f64a3bbc9	alternator: error on unsupported ReturnValues option We don't support yet the ReturnValues option on PutItem, UpdateItem or DeleteItem operations (see issue #5053), but if a user tries to use such an option anyway, we silently ignore this option. It's better to fail, reporting the unsupported option. In this patch we check the ReturnValues option and if it is anything but the supported default ("NONE"), we report an error. Also added a test to confirm this fix. The test verifies that "NONE" is allowed, and something which is unsupported (e.g., "DOG") is not ignored but rather causes an error. Refs #5053. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191216193310.20060-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola	dc93228b66	reloc: Turn the default flags into common flags These are flags we always want to enable. In particular, we want them to be used by the bots, but the bots run this script with --configure-flags, so they were being discarded. We put the user option later so that they can override the common options. Fixes #5505 Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Takuya ASADA <syuu@scylladb.com> Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola	d4dfb6ff84	build-id: Handle the binary having multiple PT_NOTE headers There is no requirement that all notes be placed in a single PT_NOTE. It looks like recent lld's actually put each section in its own PT_NOTE. This change looks for build-id in all PT_NOTE headers. Fixes #5525 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191227000311.421843-1-espindola@scylladb.com>	2020-01-03 15:48:20 +02:00
Avi Kivity	1e9237d814	dist: redhat: use parallel compression for rpm payload rpm compression uses xz, which is painfully slow. Adjust the compression settings to run on all threads. The xz utility documentation suggests that 0 threads is equivalent to all CPUs, but apparently the library interface (which rpmbuild uses) doesn't think the same way. Message-Id: <20200101141544.1054176-1-avi@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	de1171181c	user defined types: fix support for case-sensitive type names In the current code, support for case-sensitive (quoted) user-defined type names is broken. For example, a test doing: CREATE TYPE "PHone" (country_code int, number text) CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk)) Fails - the first line creates the type with the case-sensitive name PHone, but the second line wrongly ends up looking for the lowercased name phone, and fails with an exception "Unknown type ks.phone". The problem is in cql3_type_name_impl. This class is used to convert a type object into its proper CQL syntax - for example frozen<list<int>>. The problem is that for a user-defined type, we forgot to quote its name if not lowercase, and the result is wrong CQL; For example, a list of PHone will be written as list<PHone> - but this is wrong because the CQL parser, when it sees this expression, lowercases the unquoted type name PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>. The solution is for cql3_type_name_impl to use for a user-defined type its get_name_as_cql_string() method instead of get_name_as_string(). get_name_as_cql_string() is a new method which prints the name of the user type as it should be in a CQL expression, i.e., quoted if necessary. The bug in the above test was apparently caused when our code serialized the type name to disk as the string PHone (without any quoting), and then later deserialized it using the CQL type parser, which converted it into a lowercase phone. With this patch, the type's name is serialized as "PHone", with the quotes, and deserialized properly as the type PHone. While the extra quotes may seem excessive, they are necessary for the correct CQL type expression - remember that the type expression may be significantly more complex, e.g., frozen<list<"PHone">> and all of this, including the quotes, is necessary for our parser to be able to translate this string back into a type object. This patch may cause breakage to existing databases which used case- sensitive user-defined types, but I argue that these use cases were already broken (as demonstrated by this test) so we won't break anything that actually worked before. Fixes #5544 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200101160805.15847-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Pavel Emelyanov	34f8762c4d	storage_service: Drop _update_jobs This field is write-only. Leftover from `83ffae1` (storage_service: Drop block_until_update_pending_ranges_finished) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191226091210.20966-1-xemul@scylladb.com>	2020-01-03 15:48:20 +02:00
Pavel Emelyanov	f2b20e7083	cache_hitrate_calculator: Do not reinvent the peering_sharded_service The class in question wants to run its own instances on different shards, for this sake it keeps reference on sharded self to call invoke_on() on. There's a handy peering_sharded_service<> in seastar for the same, using it makes the code nicer and shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191226112401.23960-1-xemul@scylladb.com>	2020-01-03 15:48:19 +02:00
Rafael Ávila de Espíndola	bbed9cac35	cql3: move function creation to a .cc file We had a lot of code in a .hh file, that while using templeates, was only used from creating functions during startup. This moves it to a new .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200101002158.246736-1-espindola@scylladb.com>	2020-01-03 15:48:19 +02:00
Benny Halevy	c0883407fe	scripts: Add cpp-name-format: pretty printer Pretty-print cpp-names, useful for deciphering complex backtraces. For example, the following line: service::storage_proxy::init_messaging_service()::{lambda(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>)#1}::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360 Is formatted as: service::storage_proxy::init_messaging_service()::{ lambda( seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector< frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info> )#1 }::operator()( seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector< frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info> ) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191226142212.37260-1-bhalevy@scylladb.com>	2020-01-01 12:08:12 +02:00
Rafael Ávila de Espíndola	75817d1fe7	sstable: Add checks to help track problems with large_data_handler use after free I can't quite figure out how we were trying to write a sstable with the large data handler already stopped, but the backtrace suggests a good place to add extra checks. This patch adds two check. One at the start and one at the end of sstable::write_components. The first one should give us better backtraces if the large_data_handler is already stopped. The second one should help catch some race condition. Refs: #5470 Message-Id: <20191231173237.19040-1-espindola@scylladb.com>	2020-01-01 12:03:31 +02:00
Rafael Ávila de Espíndola	3c34e2f585	types: Avoid an unaligned load in json integer serialization The patch also adds a test that makes the fixed issue easier to reproduce. Fixes #5413 Message-Id: <20191231171406.15980-1-espindola@scylladb.com>	2019-12-31 19:23:42 +02:00
Gleb Natapov	bae5cb9f37	commitlog: remove unused argument during segment creation Since `99a5a77234` all segments are created equal and "active" argument is never true, so drop it. Message-Id: <20191231150639.GR9084@scylladb.com>	2019-12-31 17:14:03 +02:00
Rafael Ávila de Espíndola	aa535a385d	enum_option_test: Add an explicit underlying type to an enum We expect to be able to create a variable with an out of range value, so the enum needs an explicit underlying type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191230222029.88942-1-espindola@scylladb.com>	2019-12-31 16:59:00 +02:00
Nadav Har'El	48a914c291	Fix uninitialized members Merged pull request https://github.com/scylladb/scylla/pull/5532 from Benny Halevy: Initialize bool members in row_level_repair and _storage_service causing ubsan errors. Fixes #5531	2019-12-31 10:32:54 +02:00
Takuya ASADA	aa87169670	dist/debian: add procps on Depends We require procps package to use sysctl on postinst script for scylla-kernel-conf. Fixes #5494 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191218234100.37844-1-syuu@scylladb.com>	2019-12-30 19:30:35 +02:00
Avi Kivity	972127e3a8	atomic_cell: add type-aware pretty printing The standard printer for atomic_cell prints the value as hex, because atomic_cell does not include the type. Add a type-aware printer that allows the user to provide the type.	2019-12-30 18:27:04 +02:00
Avi Kivity	19f68412ad	atomic_cell: move pretty printers from database.cc to atomic_cell.cc atomic_cell.cc is the logical home for atomic_cell pretty printers, and since we plan to add more pretty printers, start by tidying up.	2019-12-30 18:20:30 +02:00
Eliran Sinvani	21dec3881c	debian-reloc: rename buld product to the name specified in SCYLLA-VERSION-GEN When the product name is other than "scylla", the debian packaging scripts go over all files that starts with "scylla-" and change the prefix to be the actual product name. However, if there are no such files in the directory the script will fail since the renaming command will get the wildcard string instrad of an actual file name. This patch replaces the command with a command with an equivalent desired effect that only operates on files if there are any. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20191230143250.18101-1-eliransin@scylladb.com>	2019-12-30 17:45:50 +02:00
Takuya ASADA	263385cb4b	dist: stop replacing /usr/lib/scylla with symlink (#5530 ) Since we merged /usr/lib/scylla with /opt/scylladb, we removed /usr/lib/scylla and replace it with the symlink point to /opt/scylladb. However, RPM does not support replacing a directory with a symlink, we are doing some dirty hack using RPM scriptlet, but it causes multiple issues on upgrade/downgrade. (See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/) To minimize Scylla upgrading/downgrade issues on user side, it's better to keep /usr/lib/scylla directory. Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb, we can create symlinks for each setup scripts like /usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>. Fixes #5522 Fixes #4585 Fixes #4611	2019-12-30 13:52:24 +02:00
Hagit Segev	9d454b7dc6	reloc/build_rpm.sh: Fix '--builddir' option handling (#5519 ) The '--builddir' option value is assigned to the "builddir" variable, which is wrong. The correct variable is "BUILDDIR" so use that instead to fix the '--builddir' option. Also, add logging to the script when executing the "dist/redhat_build.rpm.sh" script to simplify debugging.	2019-12-30 13:25:22 +02:00
Benny Halevy	8aa5d84dd8	storage_service: initialize _is_bootstrap_mode Hit the following ubsan error with bootstrap_test:TestBootstrap.manual_bootstrap_test in debug mode: service/storage_service.cc:3519:37: runtime error: load of value 190, which is not a valid value for type 'bool' The use site is: service::storage_service::is_cleanup_allowed(seastar::basic_sstring<char, unsigned int, 15u, true>)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const at /local/home/bhalevy/dev/scylla/service/storage_service.cc:3519 While at it, initialize `_initialized` to false as well, just in case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-30 11:44:58 +02:00
Benny Halevy	474ffb6e54	repair: initialize row_level_repair: _zero_rows Avoid following UBSAN error: repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool' Fixes #5531 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-30 11:44:58 +02:00
Fabiano Lucchese	d7795b1efa	scylla_setup: Support for enforcing optimal Linux clocksource setting (#5499 ) A Linux machine typically has multiple clocksources with distinct performances. Setting a high-performant clocksource might result in better performance for ScyllaDB, so this should be considered whenever starting it up. This patch introduces the possibility of enforcing optimized Linux clocksource to Scylla's setup/start-up processes. It does so by adding an interactive question about enforcing clocksource setting to scylla_setup, which modifies the parameter "CLOCKSOURCE" in scylla_server configuration file. This parameter is read by perftune.py which, if set to "yes", proceeds to (non persistently) setting the clocksource. On x86, TSC clocksource is used. Fixes #4474 Fixes #5474 Fixes #5480	2019-12-30 10:54:14 +02:00
Avi Kivity	e223154268	cdc: options: return an empty options map when cdc is disabled This is compatible with 3.1 and below, which didn't have that schema field at all.	2019-12-29 16:34:37 +02:00
Benny Halevy	27e0aee358	docs/debugging.md: fix anchor links Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191229074136.13516-1-bhalevy@scylladb.com>	2019-12-29 16:26:26 +02:00
Pavel Solodovnikov	aba9a11ff0	cql: pass variable_specifications via lw_shared_ptr Instances of `variable_specifications` are passed around as shared_ptr's, which are redundant in this case since the class is marked as `final`. Use `lw_shared_ptr` instead since we know for sure it's not a polymorphic pointer. Tests: unit(debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>	2019-12-29 16:26:26 +02:00
Benny Halevy	4c884908bb	directories: Keep a unique set of directories to initialize If any two directories of data/commitlog/hints/view_hints are the same we still end up running verify_owner_and_mode and disk_sanity(check_direct_io_support) in parallel on the same directoriea and hit #5510. This change uses std::set rather than std::vector to collect a unique set of directories that need initialization. Fixes #5510 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>	2019-12-29 16:26:26 +02:00
Gleb Natapov	60a851d3a5	commitlog: always flush segments atomically with writing db::commitlog::segment::batch_cycle() assumes that after a write for a certain position completes (as reported by _pending_ops.wait_for_pending()) it will also be flushed, but this is true only if writing and flushing are atomic wrt _pending_ops lock. It usually is unless flush_after is set to false when cycle() is called. In this case only writing is done under the lock. This is exactly what happens when a segment is closed. Flush is skipped because zero header is added after the last entry and then flushed, but this optimization breaks batch_cycle() assumption. Fix it by flushing after the write atomically even if a segment is being closed. Fixes #5496 Message-Id: <20191224115814.GA6398@scylladb.com>	2019-12-24 14:52:23 +02:00
Pavel Emelyanov	a5cdfea799	directories: Do not mess with per-shard base dir The hints and view_hints directory has per-shard sub-dirs, and the directories code tries to create, check and lock all of them, including the base one. The manipulations in question are excessive -- it's enough to check and lock either the base dir, or all the per-shard ones, but not everything. Let's take the latter approach for its simplicity. Fixes #5510 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Looks-good-to: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223142429.28448-1-xemul@scylladb.com>	2019-12-24 14:49:28 +02:00
Benny Halevy	f8f5db42ca	dbuild: try to pull image if not present locally Pekka Enberg <penberg@scylladb.com> wrote: > Image might not be present, but the subsequent "docker run" command will automatically pull it. Just letting "docker run" fail produces kinda confusing error message, referring to docker help, but the we want to provide the user with our own help, so still fail early, just also try to pull the image if "docker image inspect" failed, indicating it's not present locally. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-4-bhalevy@scylladb.com>	2019-12-24 11:13:23 +02:00
Benny Halevy	ee2f97680a	dbuild: just die when no image-id is provided Suggested-by: Pekka Enberg <penberg@scylladb.com> > This will print all the available Docker images, > many (most?) of them completely unrelated. > Why not just print an error saying that no image was specified, > and then perhaps print usage. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-3-bhalevy@scylladb.com>	2019-12-24 11:13:22 +02:00
Benny Halevy	87b2f189f7	dbuild: s/usage/die/ Suggested-by: Dejan Mircevski <dejan@scylladb.com> > The use pattern of this function strongly suggests a name like `die`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-2-bhalevy@scylladb.com>	2019-12-24 11:13:21 +02:00
Benny Halevy	718e9eb341	table: move_sstables_from_staging: fix use after free of shared_sstable Introduced in `4b3243f5b9` Reproducible with materialized_views_test:TestMaterializedViews.mv_populating_from_existing_data_during_node_remove_test and read_amplification_test:ReadAmplificationTest.no_read_amplification_on_repair_with_mv_test ==955382==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200023de18 at pc 0x00000051d788 bp 0x7f8a0563fcc0 sp 0x7f8a0563fcb0 READ of size 8 at 0x60200023de18 thread T1 (reactor-1) #0 0x51d787 in seastar::lw_shared_ptr<sstables::sstable>::lw_shared_ptr(seastar::lw_shared_ptr<sstables::sstable> const&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:289 #1 0x10ba189 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530 #2 0x109c4f1 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1556 #3 0x106941a in do_for_each<__gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >, table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda( std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:618 #4 0x1069203 in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:626 #5 0x10ba589 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36 #6 0x10ba668 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging (std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44 #7 0x10ba7c0 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging (std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563 ... 0x60200023de18 is located 8 bytes inside of 16-byte region [0x60200023de10,0x60200023de20) freed by thread T1 (reactor-1) here: #0 0x7f8a153b796f in operator delete(void) (/lib64/libasan.so.5+0x11096f) #1 0x6ab4d1 in __gnu_cxx::new_allocator<seastar::lw_shared_ptr<sstables::sstable> >::deallocate(seastar::lw_shared_ptr<sstables::sstable>, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128 #2 0x612052 in std::allocator_traits<std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::deallocate(std::allocator<seastar::lw_shared_ptr<sstables::sstable> >&, seastar::lw_shared_ptr<sstables::sstable>, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470 #3 0x58fdfb in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::_M_deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351 #4 0x52a790 in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332 #5 0x52a99b in std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680 #6 0xff60fa in ~<lambda> /local/home/bhalevy/dev/scylla/table.cc:2477 #7 0xff7202 in operator() /local/home/bhalevy/dev/scylla/table.cc:2496 #8 0x106af5b in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1573 #9 0x102f5d5 in futurize_apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1645 #10 0x102f9ee in operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/semaphore.hh:488 #11 0x109d2f1 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36 #12 0x109d42c in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44 #13 0x109d595 in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563 ... Fixes #5511 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191222214326.1229714-1-bhalevy@scylladb.com>	2019-12-23 15:20:41 +02:00
Konstantin Osipov	476fbc60be	test.py: prepare to remove custom colors Add dbuild dependency on python3-colorama, which will be used in test.py instead of a hand-made palette. [avi: update tools/toolchain/image] Message-Id: <20191223125251.92064-2-kostja@scylladb.com>	2019-12-23 15:13:22 +02:00
Pavel Emelyanov	d361894b9d	batchlog_manager: Speed up token_metadata endpoints counting a bit In this place we only need to know the number of endpoints, while current code additionally shuffles them before counting. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:45 +02:00
Pavel Emelyanov	6e06c88b4c	token_metadata: Remove unused helper There are two _identical_ methods in token_metadata class: get_all_endpoints_count() and number_of_endpoints(). The former one is used (called) the latter one is not used, so let's remove it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:43 +02:00
Pavel Emelyanov	2662d9c596	migration_manager: Remove run_may_throw() first argument It's unused in this function. Also this helps getting rid of global instances of components. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:42 +02:00
Pavel Emelyanov	703b16516a	storage_service: Remove unused helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:41 +02:00
Takuya ASADA	e0071b1756	reloc: don't archive dist/ami/files/.rpm on relocatable package We should skip archiving dist/ami/files/.rpm on relocatable package, since it doesn't used. Also packer and variables.json, too. Fixes #5508 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191223121044.163861-1-syuu@scylladb.com>	2019-12-23 14:19:51 +02:00
Tomasz Grabiec	28dec80342	db/schema_tables: Add trace-level logging of schema digesting This greatly helps to narrow down the source of schema digest mismatch between nodes. Intented use is to enable this logger on disagreeing nodes and trigger schema digest recalculation and observe which mutations differ in digest and then examine their content. Message-Id: <1574872791-27634-1-git-send-email-tgrabiec@scylladb.com>	2019-12-23 12:28:22 +02:00
Konstantin Osipov	1116700bc9	test.py: do not return 0 if there are failed tests Fix a return value regression introduced when switching to asyncio. Message-Id: <20191222134706.16616-2-kostja@scylladb.com>	2019-12-22 16:14:32 +02:00
Asias He	7322b749e0	repair: Do not return working_row_buf_nr in get combined row hash verb In commit `b463d7039c` (repair: Introduce get_combined_row_hash_response), working_row_buf_nr is returned in REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is scheduled to be part of 3.1 release. However it is not backported to 3.1 by accident. In order to be compatible between 3.1 and 3.2 repair. We need to drop the working_row_buf_nr in 3.2 release. Fixes: #5490 Backports: 3.2 Tests: Run repair in a mixed 3.1 and 3.2 cluster	2019-12-21 20:13:15 +02:00
Takuya ASADA	8eaecc5ed6	dist/common/scripts/scylla_setup: add swap existance check Show warnings when no swap is configured on the node. Closes #2511 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191220080222.46607-1-syuu@scylladb.com>	2019-12-21 20:03:58 +02:00
Pavel Solodovnikov	5a15bed569	cql3: return `result_set` by cref in `cql3::result::result_set` Changes summary: * make `cql3::result_set` movable-only * change signature of `cql3::result::result_set` to return by cref * adjust available call sites to the aforementioned method to accept cref Motivation behind this change is elimination of dangerous API, which can easily set a trap for developers who don't expect that result_set would be returned by value. There is no point in copying the `result_set` around, so make `cql3::result::result_set` to cache `result_set` internally in a `unique_ptr` member variable and return a const reference so to minimize unnecessary copies here and there. Tests: unit(debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191220115100.21528-1-pa.solodovnikov@scylladb.com>	2019-12-21 16:56:42 +02:00
Takuya ASADA	3a6cb0ed8c	install.sh: drop limits.d from nonroot mode The file only required for root mode. Fixes #5507 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191220101940.52596-1-syuu@scylladb.com>	2019-12-21 15:26:08 +02:00
Botond Dénes	08bb0bd6aa	mutation_fragment_stream_validator: wrap exceptions into own exception type So a higher level component using the validator to validate a stream can catch only validation errors, and let any other incidental exception through. This allows building data correctors on top of the `mutation_fragment_stream_validator`, by filtering a fragment stream through a validator, catching invalid fragment stream exceptions and dropping the respective fragments from the stream. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>	2019-12-20 12:05:00 +01:00
Rafael Ávila de Espíndola	91c7f5bf44	Print build-id on startup Fixes #5426 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191218031556.120089-1-espindola@scylladb.com>	2019-12-19 15:43:04 +02:00
Avi Kivity	440ad6abcc	Revert "relocatable: Check that patchelf didn't mangle the PT_LOAD headers" This reverts commit `237ba74743`. While it works for the scylla executable, it fails for iotune, which is built by seastar. It should be reinstated after we pass the correct link parameters to the seastar build system.	2019-12-19 11:20:34 +02:00
Pekka Enberg	c0aea19419	Merge "Add a timeout for housekeeping for offline installs" from Amnon " These series solves an issue with scylla_setup and prevent it from waiting forever if housekeeping cannot look for the new Scylla version. Fixes #5302 It should be backported to versions that support offline installations. " * 'scylla_setup_timeout' of git://github.com/amnonh/scylla: scylla_setup: do not wait forever if no reply is return housekeeping scylla_util.py: Add optional timeout to out function	2019-12-19 08:18:19 +02:00
Rafael Ávila de Espíndola	8d777b3ad5	relocatable: Use a super long path for the dynamic linker Having a long path allows patchelf to change the interpreter without changing the PT_LOAD headers and therefore without moving the build-id out of the first page. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191213224803.316783-1-espindola@scylladb.com>	2019-12-18 19:10:59 +02:00
Pavel Solodovnikov	c451f6d82a	LWT: Fix required participants calculation for LOCAL_SERIAL CL Suppose we have a multi-dc setup (e.g. 9 nodes distributed across 3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]). When a query that uses LWT is executed with LOCAL_SERIAL consistency level, the `storage_proxy::get_paxos_participants` function incorrectly calculates the number of required participants to serve the query. In the example above it's calculated to be 5 (i.e. the number of nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL, which is equivalent to LOCAL_QUORUM cl in this case). This behavior results in an exception being thrown when executing the following query with LOCAL_SERIAL cl: INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'} Tests: unit(dev), dtest(consistency_test.py) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>	2019-12-18 16:58:32 +01:00
Botond Dénes	cd6bf3cb28	scylla-gdb.py: static_vector: update for changed storage The actual buffer is now in a member called 'data'. Leave the old `dummy.dummy` and `dummy` as fall-back. This seems to change every Fedora release. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191218153544.511421-1-bdenes@scylladb.com>	2019-12-18 17:39:56 +02:00
Tomasz Grabiec	5865d08d6c	migration_manager: Recalculate schema only on shard 0 Schema is node-global, update_schema_version_and_announce() updates all shards. We don't need to recalculate it from every shard, so install the listeners only on shard 0. Reduces noise in the logs. Message-Id: <1574872860-27899-1-git-send-email-tgrabiec@scylladb.com>	2019-12-18 16:43:26 +02:00
Pavel Emelyanov	998f51579a	storage_service: Rip join_ring config option The option in question apparently does not work, several sharded objects are start()-ed (and thus instanciated) in join_roken_ring, while instances themselves of these objects are used during init of other stuff. This leads to broken seastar local_is_initialized assertion on sys_dist_ks, but reading the code shows more examples, e.g. the auth_service is started on join, but is used for thrift and cql servers initialization. The suggestion is to remove the option instead of fixing. The is_joined logic is kept since on-start joining still can take some time and it's safer to report real status from the API. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191203140717.14521-1-xemul@scylladb.com>	2019-12-18 12:45:13 +02:00
Nadav Har'El	8157f530f5	merge: CDC: handle schema changes Merged pull request https://github.com/scylladb/scylla/pull/5366 from Calle Wilund: Moves schema creation/alter/drop awareness to use new "before" callbacks from migration manager, and adds/modifies log and streams table as part of the base table modification. Makes schema changes semi-atomic per node. While this does not deal with updates coming in before a schema change has propagated cluster, it now falls into the same pit as when this happens without CDC. Added side effect is also that now schemas are transparent across all subsystems, not just cql. Patches: cdc_test: Add small test for altering base schema (add column) cdc: Handle schema changes via migration manager callbacks migration_manager: Invoke "before" callbacks for table operations migration_listener: Add empty base class and "before" callbacks for tables cql_test_env: Include cdc service in cql tests cdc: Add sharded service that does nothing. cdc: Move "options" to separate header to avoid to much header inclusion cdc: Remove some code from header	2019-12-17 23:04:36 +02:00
Avi Kivity	1157ee16a5	Update seastar submodule * seastar 00da4c8760...0525bbb08f (7): > future: Simplify future_state_base::any move constructor > future: don't create temporary tuple on future::get(). > future: don't instantiate new future on future::then_wrapped(). > future: clean-up the Result handling in then_wrapped(). > Merge "Fix core dumps when asan is enabled" from Rafael > future: Move ignore to the base class > future: Don't delete in ignore	2019-12-17 19:47:50 +02:00
Botond Dénes	638623b56b	configure.py: make build.ninja target depend on SCYLLA-VERSION-GEN Currently `SCYLLA-VERSION-GEN` is not a dependency of any target and hence changes done to it will not be picked up by ninja. To trigger a rebuild and hence version changes to appear in the `scylla` target binary, one has to do `touch configure.py`. This is counter intuitive and frustrating to people who don't know about it and wonder why their changed version is not appearing as the output of `scylla --version`. This patch makes `SCYLLA-VERSION-GEN` a dependency of `build.ninja, making the `build.ninja` target out-of-date whenever `SCYLLA-VERSION-GEN` is changed and hence will trigger a rerun of `configure.py` when the next target is built, allowing a build of e.g. `scylla` to pick up any changes done to the version automatically. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191217123955.404172-1-bdenes@scylladb.com>	2019-12-17 17:40:04 +02:00
Avi Kivity	7152ba0c70	Merge "tests: automatically search for unit tests" from Kostja " This patch set rearranges the test files so that it is now possible to search for tests automatically, and adds this functionality to test.py " * 'test.py.requeue' of ssh://github.com/scylladb/scylla-dev: cmake: update CMakeLists.txt to scan test/ rather than tests/ test.py: automatically lookup all unit and boost tests tests: move all test source files to their new locations tests: move a few remaining headers tests: move another set of headers to the new test layout tests: move .hh files and resources to new locations tests: remove executable property from data_listeners_test.cc	2019-12-17 17:32:18 +02:00
Amnon Heiman	dd42f83013	scylla_setup: do not wait forever if no reply is return housekeeping When scylla is installed without a network connectivity, the test if a newer version is available can cause scylla_setup to wait forever. This patch adds a limit to the time scylla_setup will wait for a reply. When there is no reply, the relevent error will be shown that it was unable to check for newer version, but this will not block the setup script. Fixes #5302 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-12-17 14:56:47 +02:00
Nadav Har'El	aa1de5a171	merge: Synchronize snapshot and staging sstable deletion using sem Merged pull request https://github.com/scylladb/scylla/pull/5343 from Benny Halevy. Fixes #5340 Hold the sstable_deletion_sem table::move_sstables_from_subdirs to serialize access to the staging directory. It now synchronizes snapshot, compaction deletion of sstables, and view_update_generator moving of sstables from staging. Tests: unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master] snapshot_test.py (dev)	2019-12-17 14:06:02 +02:00
Juliusz Stasiewicz	7fdc8563bf	system_keyspace: Added infrastructure for table `system.clients' I used the following as a reference: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java At this moment there is only info about IP, clients outgoing port, client 'type' (i.e. CQL/thrift/alternator), shard ID and username. Column `request_count' is NOT present and CK consists of (`port', `client_type'), contrary to what C's has: (`port'). Code that notifies `system.clients` about new connections goes to top-level files `connection_notifier.`. Currently only CQL clients are observed, but enum `client_type` can be used in future to notify about connections with other protocols.	2019-12-17 11:31:28 +01:00
Benny Halevy	4b3243f5b9	table: move_sstables_from_staging_in_thread with _sstable_deletion_sem Hold the _sstable_deletion_sem while moving sstables from the staging directory so not to move them under the feet of table::snapshot. Fixes #5340 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	0446ce712a	view_update_generator::start: use variable binding Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	5d7c80c148	view_update_generator::start: fix indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	02784f46b9	view_update_generator: handle errors when processing sstable Consumer may throw, in this case, break from the loop and retry. move_sstable_from_staging_in_thread may theoretically throw too, ignore the error in this case since the sstable was already processed, individual move failures are already ignored and moving from staging will be retried upon restart. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	abda12107f	sstables: move_to_new_dir: add do_sync_dirs param To be used for "batch" move of several sstables from staging to the base directory, allowing the caller to sync the directories once when all are moved rather than for each one of them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	6efef84185	sstable: return future from move_to_new_dir distributed_loader::probe_file needlessly creates a seastar thread for it and the next patch will use it as part of a parallel_for_each loop to move a list of sstables (and sync the directories once at the end). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	0d2a7111b2	view_update_generator: sstable_with_table: std::move constructor args Just a small optimization. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:19:55 +02:00
Nadav Har'El	fc85c49491	alternator: error on unsupported parallel scan We do not yet support the parallel Scan options (TotalSegments, Segment), as reported in issue #5059. But even before implementing this feature, it is important that we produce an error if a user attempts to use it - instead of outright ignoring this parameter. This is what this patch does. The patch also adds a full test, test_scan.py::test_scan_parallel, for the parallel scan feature. The test passes on DynamoDB, and still xfails on Alternator after this patch - but now the Scan request fails immediately reporting the unsupported option - instead of what the pre-patch code did: returning the wrong results and the test failing just when the results do not match the expectations. Refs #5059. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191217084917.26191-1-nyh@scylladb.com>	2019-12-17 11:27:56 +02:00
Avi Kivity	f7d69b0428	Revert "Merge "bouncing lwt request to an owning shard" from Gleb" This reverts commit `64cade15cc`, reversing changes made to `9f62a3538c`. This commit is suspected of corrupting the response stream. Fixes #5479.	2019-12-17 11:06:10 +02:00
Rafael Ávila de Espíndola	237ba74743	relocatable: Check that patchelf didn't mangle the PT_LOAD headers Should avoid issue #4983 showing up again. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191213224803.316783-2-espindola@scylladb.com>	2019-12-16 20:18:32 +02:00
Avi Kivity	3b7aca3406	Merge "db: Don't create a reference to nullptr" from Rafael " Only the first patch is needed to fix the undefined behavior, but the followup ones simplify the memory management around user types. " * 'espindola/fix-5193-v2' of ssh://github.com/espindola/scylla: db: Don't use lw_shared_ptr for user_types_metadata user_types_metadata: don't implement enable_lw_shared_from_this cql3: pass a const user_types_metadata& to prepare_internal db: drop special case for top level UDTs db: simplify db::cql_type_parser::parse db: Don't create a reference to nullptr Add test for loading a schema with a non native type	2019-12-16 17:10:58 +02:00
Konstantin Osipov	d6bc7cae67	cmake: update CMakeLists.txt to scan test/ rather than tests/ A follow up on directory rename.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	e079a04f2a	test.py: automatically lookup all unit and boost tests	2019-12-16 17:47:42 +03:00
Konstantin Osipov	1c8736f998	tests: move all test source files to their new locations 1. Move tests to test (using singular seems to be a convention in the rest of the code base) 2. Move boost tests to test/boost, other (non-boost) unit tests to test/unit, tests which are expected to be run manually to test/manual. Update configure.py and test.py with new paths to tests.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	2fca24e267	tests: move a few remaining headers Move sstable_test.hh, test_table.hh and cql_assertions.hh from tests/ to test/lib or test/boost and update dependent .cc files. Move tests/perf_sstable.hh to test/perf/perf_sstable.hh	2019-12-16 17:47:42 +03:00
Konstantin Osipov	b9bf1fbede	tests: move another set of headers to the new test layout Move another small subset of headers to test/ with the same goals: - preserve bisectability - make the revision history traceable after a move Update dependent files.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	8047d24c48	tests: move .hh files and resources to new locations The plan is to move the unstructured content of tests/ directory into the following directories of test/: test/lib - shared header and source files for unit tests test/boost - boost unit tests test/unit - non-boost unit tests test/manual - tests intended to be run manually test/resource - binary test resources and configuration files In order to not break git bisect and preserve the file history, first move most of the header files and resources. Update paths to these files in .cc files, which are not moved.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	644595e15f	tests: remove executable property from data_listeners_test.cc Executable flag must be committed to git by mistake.	2019-12-16 17:47:41 +03:00
Benny Halevy	d2e00abe13	tests: commitlog_test: test_allocation_failure: improve error reporting We're seeing the following error from test from time to time: fatal error: in "test_allocation_failure": std::runtime_error: Did not get expected exception from writing too large record This is not reproducible and the error string does not contain enough information to figure out what happened exactly, therefore this patch adds an exception if the call succeeded unexpectedly and also prints the unexpected exception if one was caught. Refs #4714 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191215052434.129641-1-bhalevy@scylladb.com>	2019-12-16 15:38:48 +01:00
Asias He	6b7344f6e5	streaming: Fix typo in stream_result_future::maybe_complete s/progess/progress/ Refs: #5437	2019-12-16 11:12:03 +02:00
Dejan Mircevski	f3883cd935	dbuild: Fix podman invocation (#5481 ) The is_podman check was depending on `docker -v` printing "podman" in the output, but that doesn't actually work, since podman prints $0. Use `docker --help` instead, which will output "podman". Also return podman's return status, which was previously being dropped. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-16 11:11:48 +02:00
Avi Kivity	00ae4af94c	Merge "Sanitize and speed-up (a bit) directories set up" from Pavel " On start there are two things that scylla does on data/commitlog/etc. dirs: locks and verifies permissions. Right now these two actions are managed by different approaches, it's convenient to merge them. Also the introduced in this set directories class makes a ground for better --workdir option handling. In particular, right now the db::config entries are modified after options parse to update directories with the workdir prefix. With the directories class at hands will be able to stop doing this. " * 'br-directories-cleanup' of https://github.com/xemul/scylla: directories: Make internals work on fs::path directories: Cleanup adding dirs to the vector to work on directories: Drop seastar::async usage directories: Do touch_and_lock and verify sequentially directories: Do touch_and_lock in parallel directories: Move the whole stuff into own .cc file directories: Move all the dirs code into .init method file_lock: Work with fs::path, not sstring	2019-12-15 16:02:46 +02:00
Takuya ASADA	5e502ccea9	install.sh: setup workdir correctly on nonroot mode Specify correct workdir on nonroot mode, to set correct path of data / commitlog / hints directories at once. Fixes #5475 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191213012755.194145-1-syuu@scylladb.com>	2019-12-15 16:00:57 +02:00
Avi Kivity	c25d51a4ea	Revert "scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379 )" This reverts commit `4333b37f9e`. It breaks upgrades, and the user question is not informative enough for the user to make a correct decision. Fixes #5478. Fixes #5480.	2019-12-15 14:37:40 +02:00
Pavel Emelyanov	23a8d32920	directories: Make internals work on fs::path Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	373fcfdb3e	directories: Cleanup adding dirs to the vector to work on The unordered_set is turned into vector since for fs::path there's no hash() method that's needed for set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	14437da769	directories: Drop seastar::async usage Now the only future-able operation remained is the call to parallel_for_each(), all the rest is non-blocking preparation, so we can drop the seastar::async and just return the future from parallel_for_each. The indendation is now good, as in previous patch is was prepared just for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	06f4f3e6d8	directories: Do touch_and_lock and verify sequentially The goal is to drop the seastar::async() usage. Currently we have two places that return futures -- calls to parallel_for_each-s. We can either chain them together or, since both are working on the same set of directories, chain actions inside them. For code simplicity I propose to chain actions. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	8d0c820aa1	directories: Do touch_and_lock in parallel The list of paths that should be touch-and-locked is already at hands, this shortens the code and makes it slightly faster (in theory). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	71a528d404	directories: Move the whole stuff into own .cc file In order not to pollute the root dir place the code in utils/ directory, "utils" namespace. While doing this -- move the touch_and_lock from the class declaration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Benny Halevy	9ec98324ed	messaging_service: unregister_handler: return rpc unregister_handler future Now that seastar returns it. Fixes https://github.com/scylladb/scylla/issues/5228 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191212143214.99328-1-bhalevy@scylladb.com>	2019-12-12 16:38:36 +02:00
Pavel Emelyanov	f2b3c17e66	directories: Move all the dirs code into .init method The seastar::async usage is tempoarary, added for bisect-safety, soon it will go away. For this reason the indentation in the .init method is not "canonical", but is prepared for one-patch drop of the seastar::async. The hinted_handoff_enabled arg is there, as it's not just a parameter on config, it had been parsed in main.cc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 17:33:11 +03:00
Pavel Emelyanov	82ef2a7730	file_lock: Work with fs::path, not sstring The main.cc code that converts sstring to fs::path will be patched soon, the file_desc::open belongs to seastar and works on sstrings. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 17:32:10 +03:00
Konstantin Osipov	bc482ee666	test.py: remove an unused option Message-Id: <20191204142622.89920-2-kostja@scylladb.com>	2019-12-12 15:53:35 +02:00
Avi Kivity	64cade15cc	Merge "bouncing lwt request to an owning shard" from Gleb " LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by the transport code that jumps to a correct shard and re-process incoming message there. " * 'gleb/bounce_lwt_request' of github.com:scylladb/seastar-dev: lwt: take raw lock for entire cas duration lwt: drop invoke_on in paxos_state prepare and accept lwt: Process lwt request on a owning shard storage_service: move start_native_transport into a thread transport: change make_result to takes a reference to cql result instead of shared_ptr	2019-12-12 15:50:22 +02:00
Nadav Har'El	9f62a3538c	alternator: fix BEGINS_WITH operator for blobs The implementation of Expected's BEGINS_WITH operator on blobs was incorrect, naively comparing the base64-encoded strings, which doesn't work. This patches fixes the code to compare the decoded strings. The reason why the BEGINS_WITH test missed this bug was that we forgot to check the blob case and only tested the string case; So this patch also adds the missing test - which reproduces this bug, and verifies its fix. Fixes #5457 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191211115526.29862-1-nyh@scylladb.com>	2019-12-12 14:02:56 +01:00
Dejan Mircevski	27b8b6fe9d	cql3: Fix needs_filtering() for clustering columns The LIKE operator requires filtering, so needs_filtering() must check is_LIKE(). This already happens for partition columns, but it was overlooked for clustering columns in the initial implementation of LIKE. Fixes #5400. Tests: unit(dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-12 01:19:13 +02:00
Benny Halevy	d1bcb39e7f	hinted handoff: log message after removing hints directory (#5372 ) To be used by dtest as an indicator that endpoint's hints were drained and hints directory is removed. Refs #5354 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-12 01:16:19 +02:00
Rafael Ávila de Espíndola	3b61cf3f0b	db: Don't use lw_shared_ptr for user_types_metadata The user_types_metadata can simply be owned by the keyspace. This simplifies the code since we never have to worry about nulls and the ownership is now explicit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	a55838323b	user_types_metadata: don't implement enable_lw_shared_from_this It looks like this was done just to avoid including user_types_metadata.hh, which seems a bit much considering that it requires adding specialization to the seastar namespace. A followup patch will also stop using lw_shared_ptr for user_types_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	f7c2c60b07	cql3: pass a const user_types_metadata& to prepare_internal We never modify the user_types_metadata via prepare_internal, so we can pass it a const reference. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	99cb8965be	db: drop special case for top level UDTs This was originally done in `7f64a6ec4b`, but that commit was reverted in reverted in `8517eecc28`. The revert was done because the original change would call parse_raw for non UDT types. Unlike the old patch, this one doesn't change the behavior of non UDT types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	7ae9955c5f	db: simplify db::cql_type_parser::parse The variant of db::cql_type_parser::parse that has a user_types_metadata argument was only used from the variant that didn't. This inlines one in the other. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	2092e1ef6f	db: Don't create a reference to nullptr The user_types variable can be null during db startup since we have to create types before reading the system table defining user types. This avoids undefined behavior, but is unlikely that it was causing more serious problems since the variable is only used when creating user types and we don't create any until after all system tables are read, in which case the user_types variable is not null. Fixes #5193 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	6143941535	Add test for loading a schema with a non native type This would have found the error with the previous version of the patch series. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:43:34 -08:00
Gleb Natapov	64cfb9b1f6	lwt: take raw lock for entire cas duration It will prevent parallel update by the same coordinator and should reduce contention.	2019-12-11 14:41:31 +02:00
Gleb Natapov	898d2330a2	lwt: drop invoke_on in paxos_state prepare and accept Since lwt requests are now running on an owning shard there is no longer a need to invoke cross shard call.	2019-12-11 14:41:31 +02:00
Gleb Natapov	964c532c4f	lwt: Process lwt request on a owning shard LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by transport code that jumps to a correct shard and re-process incoming message there.	2019-12-11 14:41:31 +02:00
Gleb Natapov	54be057af3	storage_service: move start_native_transport into a thread The code runs only once and it is simple if it runs in a seastar thread.	2019-12-11 14:41:31 +02:00
Gleb Natapov	007ba3e38e	transport: change make_result to takes a reference to cql result instead of shared_ptr	2019-12-11 14:41:31 +02:00
Nadav Har'El	9e5c6995a3	alternator-test: add tests for ReturnValues parameter This patch adds comprehensive tests for the ReturnValue parameter of the write operations (PutItem, UpdateItem, DeleteItem), which can return pre-write or post-write values of the modified item. The tests are in a new test file, alternator-test/test_returnvalues.py. This feature is not yet implemented in Alternator, so all the new tests xfail on Alternator (and all pass on AWS). Refs #5053 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191127163735.19499-1-nyh@scylladb.com>	2019-12-11 13:26:39 +01:00
Nadav Har'El	ab69bfc111	alternator-test: add xfailing tests for ScanIndexForward This patch adds tests for Query's "ScanIndexForward" parameter, which can be used to return items in reversed sort order. We test that a Limit works and returns the given number of last items in the sort order, and also that such reverse queries can be resumed, i.e., paging works in the reverse order. These tests pass against AWS DynamoDB, but fail against Alternator (which doesn't support ScanIndexForward yet), so it is marked xfail. Refs #5153. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191127114657.14953-1-nyh@scylladb.com>	2019-12-11 13:26:39 +01:00
Pekka Enberg	6bc18ba713	storage_proxy: Remove reference to MBean interface The JMX interface is implemented by the scylla-jmx project, not scylla. Therefore, let's remove this historical reference to MBeans from storage_proxy. Message-Id: <20191211121652.22461-1-penberg@scylladb.com>	2019-12-11 14:24:28 +02:00
Avi Kivity	63474a3380	Merge "Add `experimental_features` option" from Dejan " Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser. Fixes #5338 " * 'vecexper' of https://github.com/dekimir/scylla: config: Add `experimental_features` option utils: Add enum_option	2019-12-11 14:23:08 +02:00
Avi Kivity	56b9bdc90f	Update seastar submodule * seastar e440e831c8...00da4c8760 (7): > Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi Fixes #5443. > install-dependencies.sh: fix arch dependencies > Merge " rpc: fix use-after-free during rpc teardown vs. rpc server message handling" from Benny > Merge "testing: improve the observability of abandoned failed futures" from Botond > rework the fair_queue tester > directory_test: Update to use run instead of run_deprecated > log: support fmt 6.0 branch with chrono.h for log	2019-12-11 14:17:49 +02:00
Benny Halevy	105c8ef5a9	messaging_service: wait on unregister_handler Prepare for returning future<> from seastar rpc unregister_handler. Refs https://github.com/scylladb/scylla/issues/5228 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191208153924.1953-1-bhalevy@scylladb.com>	2019-12-11 14:17:41 +02:00
Nadav Har'El	06c3802a1a	storage_proxy: avoid overflow in view-backlog delay calculation In the calculate_delay() code for view-backlog flow control, we calculate a delay and cap it at a "budget" - the remaining timeout. This timeout is measured in milliseconds, but the capping calculation converted it into microseconds, which overflowed if the timeout is very large. This causes some tests which enable the UB sanitizer to fail. We fix this problem by comparing the delay to the budget in millisecond resolution, not in microsecond resolution. Then, if the calculated delay is short enough, we return it using its full microsecond resolution. Fixes #5412 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191205131130.16793-1-nyh@scylladb.com>	2019-12-11 14:10:54 +02:00
Nadav Har'El	2824d8f6aa	Merge: alternator: Fix EQ operator for sets Merged pull request https://github.com/scylladb/scylla/pull/5453 from Piotr Sarna: Checking the EQ relation for alternator attributes is usually performed simply by comparing underlying JSON objects, but sets (SS, BS, NS types) need a special routine, as we need to make sure that sets stored in a different order underneath are still equal, e.g: [1, 3, 2] == [1, 2, 3] Fixes #5021	2019-12-11 13:20:25 +02:00
Piotr Sarna	421db1dc9d	alternator-test: remove XFAIL from set EQ test With this series merged, test_update_expected_1_eq_set from test_expected.py suite starts passing.	2019-12-11 12:07:39 +01:00
Piotr Sarna	a8e45683cb	alternator: add EQ comparison for sets Checking the EQ relation for alternator attributes is usually performed simply by comparing underlying JSON objects, but sets (SS, BS, NS types) need a special routine, as we need to make sure that sets stored in a different order underneath are still equal, e.g: [1, 3, 2] == [1, 2, 3] Fixes #5021	2019-12-11 12:07:39 +01:00
Piotr Sarna	fb37394995	schema_tables: notify table deletions before creations If a set of mutations contains both an entry that deletes a table and an entry that adds a table with the same name, it's expected to be a replacement operation (delete old + create new), rather than a useless "try to create a table even though it exists already and then immediately delete the original one" operation. As such, notifications about the deletions should be performed before notifications about the creations. The place that originally suffered from this wrong order is view building - which in this case created an incorrect duplicated entry in the view building bookkeeping, and then immediately deleted it, resulting in having old, deprecated entries with stale UUIDS lying in the build queue and never proceeding, because the underlying table is long gone. The issue is fixed by ensuring the order of notifications: - drops are announced first, view drops are announced before table drops; - creations follow, table creations are announced before views; - finally, changes to tables and views are announced; Fixes #4382 Tests: unit(dev), mv_populating_from_existing_data_during_node_stop_test	2019-12-11 12:48:29 +02:00
Benny Halevy	d544df6c3c	dist/ami/build_ami.sh: support incremental build of rpms (#5191 ) Iterate over an array holding all rpm names to see if any of them is missing from `dist/ami/files`. If they are missing, look them up in build/redhat/RPMS/x86_64 so that if reloc/build_rpm.sh was run manually before dist/ami/build_ami.sh we can just collect the built rpms from its output dir. If we're still missing any rpms, then run reloc/build_rpm.sh and copy the required rpms from build/redhat/RPMS/x86_64. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2019-12-11 12:48:29 +02:00
Amnon Heiman	f43285f39a	api: replace swagger definition to use long instead of int (#5380 ) In swagger 1.2 int is defined as int32. We originally used int following the jmx definition, in practice internally we use uint and int64 in many places. While the API format the type correctly, an external system that uses swagger-based code generator can face a type issue problem. This patch replace all use of int in a return type with long that is defined as int64. Changing the return type, have no impact on the system, but it does help external systems that use code generator from swagger. Fixes #5347 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-12-11 12:48:29 +02:00
Nadav Har'El	2abac32f2e	Merged: alternator: Implement CONTAINS and NOT_CONTAINS in Expected Merged pull request https://github.com/scylladb/scylla/pull/5447 by Dejan Mircevski. Adds the last missing operators in the "Expected" parameter and re-enable their tests. Fixes #5034.	2019-12-11 12:48:29 +02:00
Cem Sancak	86b8036502	Fix DPDK mode in prepare script Fixes #5455.	2019-12-11 12:48:29 +02:00
Calle Wilund	35089da983	conf/config: Add better descriptive text on server/client encryption Provide some explanation on prio strings + direction to gnutls manual. Document client auth option. Remove confusing/misleading statement on "custom options" Message-Id: <20191210123714.12278-1-calle@scylladb.com>	2019-12-11 12:48:28 +02:00
Dejan Mircevski	32af150f1d	alternator: Implement NOT_CONTAINS operator in Expected Enable existing NOT_CONTAINS test, add NOT_CONTAINS to the list of recognized operators, implement check_NOT_CONTAINS, and hook it up to verify_expected_one(). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 15:31:47 -05:00
Dejan Mircevski	bd2bd3c7c8	alternator: Implement CONTAINS operator in Expected Enable existing CONTAINS test, implement check_CONTAINS, and hook it up to verify_expected_one(). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 15:31:47 -05:00
Dejan Mircevski	5a56fd384c	config: Add `experimental_features` option When the user wants to turn on only some experimental features, they can use this new option. The existing `experimental` option is preserved for backwards compatibility. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 11:47:03 -05:00
Piotr Sarna	9504bbf5a4	alternator: move unwrap_set to serialization header The utility function for unwrapping a set is going to be useful across source files, so it's moved to serialization.hh/serialization.cc.	2019-12-10 15:08:47 +01:00
Piotr Sarna	4660e58088	alternator: move rjson value comparison to rjson.hh The comparison struct is going to be useful across source files, so it's moved into rjson header, where it conceptually belongs anyway.	2019-12-10 15:08:47 +01:00
Botond Dénes	db0e2d8f90	scylla-gdb.py: document and add safety net to seastar::thread related commands Almost all commands provided by `scylla-gdb.py` are safe to use. The worst that could happen if they fail is that you won't get the desired information. There is one notable exception: `scylla thread`. If anything goes wrong while this command is executed - gdb crashes, a bug in the command, etc. - there is a good change the process under examination will crash. Sometimes this is fine, but other times e.g. when live debugging a production node, this is unacceptable. To avoid any accidents add documentation to all commands working with `seastar::thread`. And since most people don't read documentation, especially when debugging under pressure, add a safety net to the `scylla thread` command. When run, this command will now warn of the dangers and will ask for explicit acknowledgment of the risk of crash, by means of passing an `--iamsure` flag. When this flag is missing, it will refuse to run. I am sure this will be very annoying but I am also sure that the avoided crashes are worth it. As part of making `scylla thread` safe, its argument parsing code is migrated to `argparse`. This changes the usage but this should be fine because it is well documented. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191129092838.390878-1-bdenes@scylladb.com>	2019-12-10 11:51:57 +02:00
Eliran Sinvani	765db5d14f	build_ami: Trim ami description attribute to the allowed size The ami description attribute is only allowed to be 255 characters long. When build_ami.sh generates an ami, it generates an ami description which is a concatenation of all of the componnents version strings. It can happen that the description string is too long which eventually causes the ami build to fail. This patch trims the description string to 255 characters. It is ok since the individual versions of the components are also saved in tags attached to the image. Tests: 1. Reproduced with a long description and validated that it doesn't fail after the fix. Fixes #5435 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20191209141143.28893-1-eliransin@scylladb.com>	2019-12-10 11:51:57 +02:00
Fabiano Lucchese	4333b37f9e	scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379 ) A Linux machine typically has multiple clocksources with distinct performances. Setting a high-performant clocksource might result in better performance for ScyllaDB, so this should be considered whenever starting it up. This patch introduces the possibility of enforcing optimized Linux clocksource to Scylla's setup/start-up processes. It does so by adding an interactive question about enforcing clocksource setting to scylla_setup, which modifies the parameter "CLOCKSOURCE" in scylla_server configuration file. This parameter is read by perftune.py which, if set to "yes", proceeds to (non persistently) setting the clocksource. On x86, TSC clocksource is used. Fixes #4474	2019-12-10 11:51:57 +02:00
Pavel Emelyanov	3a21419fdb	features: Remove _FEATURE suffix from hinted_handoff feature name All the other features are named w/o one. The internal const-s are all different, but I'm fixing it separately. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191209154310.21649-1-xemul@scylladb.com>	2019-12-10 11:51:57 +02:00
Dejan Mircevski	a26bd9b847	utils: Add enum_option This allows us to accept command-line options with a predefined set of valid arguments. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-09 09:45:59 -05:00
Calle Wilund	7c5e4c527d	cdc_test: Add small test for altering base schema (add column)	2019-12-09 14:35:04 +00:00
Calle Wilund	cb0117eb44	cdc: Handle schema changes via migration manager callbacks This allows us to create/alter/drop log and desc tables "atomically" with the base, by including these mutations in the original mutation set, i.e. batch create/alter tables. Note that population does not happen until types are actually already put into database (duh), thus there _is_ still a gap between creating cdc and it being truly usable. This may or may not need handling later.	2019-12-09 14:35:04 +00:00
Rafael Ávila de Espíndola	761b19cee5	build: Split the build and host linker flags A general build system knows about 3 machines: * build: where the building is running * host: where the built software will run * target: the machine the software will produce code for The target machine is only relevant for compilers, so we can ignore it. Until now we could ignore the build and host distinction too. This patch adds the first difference: don't use host ld_flags when linking build tools (gen_crc_combine_table). The reason for this change is to make it possible to build with -Wl,--dynamic-linker pointing to a path that will exist on the host machine, but may not exist on the build machine. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191207030408.987508-1-espindola@scylladb.com>	2019-12-09 15:54:57 +02:00
Calle Wilund	27183f648d	migration_manager: Invoke "before" callbacks for table operations Potentially allowing (cdc) augmentation of mutations. Note: only does the listener part in seastar::thread, to avoid changing call behaviour.	2019-12-09 12:12:09 +00:00
Calle Wilund	f78a3bf656	migration_listener: Add empty base class and "before" callbacks for tables Empty base type makes for less boiler plate in implementations. The "before" callbacks are for listeners who need to potentially react/augment type creation/alteration _before_ actually committing type to schema tables (and holding the semaphore for this). I.e. it is for cdc to add/modify log/desc tables "atomically" with base.	2019-12-09 12:12:09 +00:00
Calle Wilund	4e406105b1	cql_test_env: Include cdc service in cql tests	2019-12-09 12:12:09 +00:00
Calle Wilund	a21e140169	cdc: Add sharded service that does nothing. But can be used to hang functionality into eventually.	2019-12-09 12:12:09 +00:00
Calle Wilund	2787b0c4f8	cdc: Move "options" to separate header to avoid to much header inclusion cdc should not contaminate the whole universe.	2019-12-09 12:12:09 +00:00
fastio	8f326b28f4	Redis: Combine all the source files redis/commands/* into redis/commands.{hh,cc} Fixes: #5394 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-08 13:54:33 +02:00
Avi Kivity	9c63cd8da5	sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417 ) The vm.swappiness sysctl controls the kernel's prefernce for swapping anonymous memory vs page cache. Since Scylla uses very large amounts of anonymous memory, and tiny amounts of page cache, the correct setting is to prefer swapping page cache. If the kernel swaps anonymous memory the reactor will stall until the page fault is satisfied. On the other hand, page cache pages usually belong to other applications, usually backup processes that read Scylla files. This setting has been used in production in Scylla Cloud for a while with good results. Users can opt out by not installing the scylla-kernel-conf package (same as with the other kernel tunables).	2019-12-08 13:04:25 +02:00
Avi Kivity	0e319e0359	Update seastar submodule * seastar 166061da3...e440e831c (8): > Fail tests on ubsan errors > future: make a couple of asserts more strict > future: Move make_ready out of line > config: Do not allow zero rates Fixes #5360 > future: add new state to avoid temporaries in get_available_state(). > future: avoid temporary future_state on get_available_state(). > future: inline future::abandoned > noncopyable_function: Avoid uninitialized warning on empty types	2019-12-06 18:33:23 +02:00
Piotr Sarna	0718ff5133	Merge 'min/max on collections returns human-readable result' from Juliusz Previously, scylla used min/max(blob)->blob overload for collections, tuples and UDTs; effectively making the results being printed as blobs. This PR adds "dynamically"-typed min()/max() functions for compound types. These types can be complicated, like map<int,set<tuple<..., and created in runtime, so functions for them are created on-demand, similarly to tojson(). The comparison remains unchanged - underneath this is still byte-by-byte weak lex ordering. Fixes #5139 * jul-stas/5139-minmax-bad-printing-collections: cql_query_tests: Added tests for min/max/count on collections cql3: min()/max() for collections/tuples/UDTs do not cast to blobs	2019-12-06 16:40:17 +01:00
Juliusz Stasiewicz	75955beb0b	cql_query_tests: Added tests for min/max/count on collections This tests new min/max function for collections and tuples. CFs in test suite were named according to types being tested, e.g. `cf_map<int,text>' what is not a valid CF name. Therefore, these names required "escaping" of invalid characters, here: simply replacing with '_'.	2019-12-06 12:15:49 +01:00
Juliusz Stasiewicz	9efad36fb8	cql3: min()/max() for collections/tuples/UDTs do not cast to blobs Before: cqlsh> insert into ks.list_types (id, val) values (1, [3,4,5]); cqlsh> select max(val) from ks.list_types; system.max(val) ------------------------------------------------------------ 0x00000003000000040000000300000004000000040000000400000005 After: cqlsh> select max(val) from ks.list_types; system.max(val) -------------------- [3, 4, 5] This is accomplished similarly to `tojson()`/`fromjson()`: functions are generated on demand from within `cql3::functions::get()`. Because collections can have a variety of types, including UDTs and tuples, it would be impossible to statically define max(T t)->T for every T. Until now, max(blob)->blob overload was used. Because `impl_max/min_function_for` is templated with the input/output type, which can be defined in runtime, we need type-erased ("dynamic") versions of these functors. They work identically, i.e. they compare byte representations of lhs and rhs with `bytes::operator<`. Resolves #5139	2019-12-06 12:14:51 +01:00
Avi Kivity	a18a921308	docs: maintainer.md: use command line to merge multi-commit pull requests If you merge a pull request that contains multiple patches via the github interface, it will document itself as the committer. Work around this brain damage by using the command line.	2019-12-06 10:59:46 +01:00
Botond Dénes	7b37a700e1	configure.py: make tests explicitely depend on libseastar_testing.a So that changes to libseastar_testing.a make all test target out of date. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191205142436.560823-1-bdenes@scylladb.com>	2019-12-05 19:30:34 +02:00
Piotr Sarna	3a46b1bb2b	Merge "handle hints on separate connection and scheduling group" from Piotr Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write. The intent of using a separate connection is to increase fairness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one. Information about new RPC support is propagated through new gossip feature HINTED_HANDOFF_SEPARATE_CONNECTION. Fixes #4974. Tests: unit(release)	2019-12-05 17:25:26 +01:00
Calle Wilund	c11874d851	gms::inet_address: Use special ostream formatting to match Java To make gms::inet_address::to_string() similar in output to origin. The sole purpose being quick and easy fix of API/JMX ipv6 formatting of endpoints etc, where strings are used as lexical comparisons instead of textual representation. A better, but more work, solution is to fix the scylla-jmx bridge to do explicit parse + re-format of addresses, but there are many such callpoints. An even better solution would be to fix nodetool to not make this mistake of doing lexical comparisons, but then we risk breaking merge compatibility. But could be an option for a separate nodeprobe impl. Message-Id: <20191204135319.1142-1-calle@scylladb.com>	2019-12-05 17:01:26 +02:00
Gleb Natapov	4893bc9139	tracing: split adding prepared query parameters from stopping of a trace Currently query_options objects is passed to a trace stopping function which makes it mandatory to make them alive until the end of the query. The reason for that is to add prepared statement parameters to the trace. All other query options that we want to put in the trace are copied into trace_state::params_values, so lets copy prepared statement parameters there too. Trace enabled case will become a little bit more expensive but on the other hand we can drop a continuation that holds query_options object alive from a fast path. It is safe to drop the call to stop_foreground_prepared() here since The tracing will be stopped in process_request_one(). Message-Id: <20191205102026.GJ9084@scylladb.com>	2019-12-05 17:00:47 +02:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Nadav Har'El	5b2f35a21a	Merge "Redis: fix the options related to Redis API, fix the DEL and GET command" Merged pull request https://github.com/scylladb/scylla/pull/5381 by Peng Jian, fixing multiple small issues with Redis: * Rename the options related to Redis API, and describe them clearly. * Rename redis_transport_port to redis_port * Rename redis_transport_port_ssl to redis_ssl_port * Rename redis_default_database_count to redis_database_count * Remove unnecessary option enable_redis_protocol * Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM * Fix the DEL command: support to delete mutilple keys in one command. * Fix the GET command: return the empty string when the required key is not exists. * Fix the redis-test/test_del_non_existent_key: mark xfail.	2019-12-05 11:58:34 +02:00
Avi Kivity	85822c7786	database: fix schema use-after-move in make_multishard_streaming_reader On aarch64, asan detected a use-after-move. It doesn't happen on x86_64, likely due to different argument evaluation order. Fix by evaluating full_slice before moving the schema. Note: I used "auto&&" and "std::move()" even though full_slice() returns a reference. I think this is safer in case full_slice() changes, and works just as well with a reference. Fixes #5419.	2019-12-05 11:58:34 +02:00
Piotr Sarna	79c3a508f4	table: Reduce read amplification in view update generation This commit makes sure that single-partition readers for read-before-write do not have fast-forwarding enabled, as it may lead to huge read amplification. The observed case was: 1. Creating an index. CREATE INDEX index1 ON myks2.standard1 ("C1"); 2. Running cassandra-stress in order to generate view updates. cassandra-stress write no-warmup n=1000000 cl=ONE -schema \ 'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \ keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors skip-read-validation -node 127.0.0.1; Without disabling fast-forwarding, single-partition readers were turned into scanning readers in cache, which resulted in reading 36GB (sic!) on a workload which generates less than 1GB of view updates. After applying the fix, the number dropped down to less than 1GB, as expected. Refs #5409 Fixes #4615 Fixes #5418	2019-12-05 11:58:34 +02:00
Konstantin Osipov	6a5e7c0e22	tests: reduce the number of iterations of dynamic_bitset_test This test execution time dominates by a serious margin test execution time in dev/release mode: reducing its execution time improves the test.py turnaround by over 70%. Message-Id: <20191204135315.86374-2-kostja@scylladb.com>	2019-12-05 11:58:34 +02:00
Avi Kivity	07427c89a2	gdb: change 'scylla thread' command to access fs_base register directly Currently, 'scylla thread' uses arch_prctl() to extract the value of fsbase, used to reference thread local variables. gdb 8 added support for directly accessing the value as $fs_base, so use that instead. This works from core dumps as well as live processes, as you don't need to execute inferior functions. The patch is required for debugging threads in core dumps, but not sufficient, as we still need to set $rip and $rsp, and gdb still[1] doesn't allow this. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=9370	2019-12-05 11:58:34 +02:00
Piotr Dulikowski	adfa7d7b8d	messaging_service: don't move `unsigned` values in handlers Performing std::move on integral types is pointless. This commit gets rid of moves of values of `unsigned` type in rpc handlers.	2019-12-05 00:58:31 +01:00
Piotr Dulikowski	77d2ceaeba	storage_proxy: handle hints through separate rpc verb	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	2609065090	storage_proxy: move register_mutation handler to local lambda This refactor makes it possible to reuse the lambda in following commits.	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	6198ee2735	hh: introduce HINTED_HANDOFF_SEPARATE_CONNECTION feature The feature introduced by this commit declares that hints can be sent using the new dedicated RPC verb. Before using the new verb, nodes need to know if other nodes in the cluster will be able to handle the new RPC verb.	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	2e802ca650	hh: add HINT_MUTATION verb Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write. The intent of using a separate connection is to increase fariness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.	2019-12-05 00:51:49 +01:00
Avi Kivity	fd951a36e3	Merge "Let compaction wait on background deletions" from Benny " In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done. However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted. This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish. Fixes #4909 Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction "	2019-12-04 11:18:41 +02:00
Takuya ASADA	c9d8606786	dist/common/scripts/scylla_ntp_setup: relax RHEL version check We may able to use chrony setup script on future version of RHEL/CentOS, it better to run chrony setup when RHEL version >= 8, not only 8. Note that on Fedora it still provides ntp/ntpdate package, so we run ntp setup on it for now. (same on debian variants) Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191203192812.5861-1-syuu@scylladb.com>	2019-12-04 10:59:14 +02:00
Juliusz Stasiewicz	430b2ad19d	commitlog+region_group: timeout exceptions with names `segment_manager' now uses a decorated version of `timed_out_error' with hardcoded name. On the other hand `region_group' uses named `on_request_expiry' within its `expiring_fifo'.	2019-12-03 19:07:19 +01:00
Avi Kivity	91d3f2afce	docs: maintainers.md: fix typo in git push --force-with-lease Just one lease, not many. Reported by Piotr Sarna.	2019-12-03 18:17:46 +01:00
Calle Wilund	56a5e0a251	commitlog_replayer: Ensure applied frozen_mutation is safe during apply Fixes #5211 In `79935df959` replay apply-call was changed from one with no continuation to one with. But the frozen mutation arg was still just lambda local. Change to use do_with for this case as well. Message-Id: <20191203162606.1664-1-calle@scylladb.com>	2019-12-03 18:28:01 +02:00
Juliusz Stasiewicz	d043393f52	db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore Exception messages contain semaphore's name (provided in ctor). This affects the queue overflow exception as well as timeout exception. Also, custom throwing function in ctor was changed to `prethrow_action', i.e. metrics can still be updated there but now callers have no control over the type of the exception being thrown. This affected `restricted_reader_max_queue_length' test. `reader_concurrency_semaphore'-s docs are updated accordingly.	2019-12-03 15:41:34 +01:00
Amos Kong	e26b396f16	scylla-docker: fix default data_directories in scyllasetup.py (#5399 ) Use default data_file_directories if it's not assigned in scylla.yaml Fixes #5398 Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 13:58:17 +02:00
Rafael Ávila de Espíndola	1cd17887fa	build: strip debug when configured with --debuginfo 0 In a build configured with --debuginfo 0 the scylla binary still ends up with some debug info from the libraries that are statically linked in. We should avoid compiling subprojects (including seastar) with debug info when none is needed, but this at least avoids it showing up in the binary. The main motivation for this is that it is confusing to get a binary with some debug info in it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191127215843.44992-1-espindola@scylladb.com>	2019-12-03 12:41:04 +02:00
Tomasz Grabiec	0a453e5d30	Merge "Use fragmented buffers for collection de/serialization" from Botond This series refactors the collection de/serialization code to use fragmented buffers, avoiding the large allocations and the associated pains when working with large collections. Currently all operations that involve collections require deserializing them, executing the operation, then serializing them again to their internal storage format. The de/serialization operations happen in linearized buffers, which means that we have to allocate a buffer large enough to hold the entire collection. This can cause immense pressure on the memory allocator, which, in the face of memory fragmentation, might be unable to serve the allocation at all. We've seen this causing all sorts of nasty problems, including but not limited to: failing compactions, failing memtable flush, OOM crash and etc. Users are strongly discouraged from using large collections, yet they are still a fact of life and have been haunting us since forever. The proper solution for these problems would be to come up with an in-memory format for collections, however that is a major effort, with a lot of unknowns. This is something we plan on doing at some point but until it happens we should make life less painful for those with large collections. The goal of this series is to avoid the need of allocating these large buffers. Serialization now happens into a `bytes_ostream` which automatically fragments the values internally. Deserialization happens with `utils::linearizing_input_stream` (introduced by this series), which linearizes only the individual collection cells, but not the entire collection. An important goal of this series was to introduce the least amount of risk, and hence the least amount of code. This series does not try to make a revolution and completely revamp and optimize the de/serialization codepaths. These codepaths have their days numbered so investing a lot of effort into them is in vain. We can apply incremental optimizations where we deem it necessary. Fixes: #5341	2019-12-03 10:31:34 +01:00
fastio	01599ffbae	Redis API: Support the syntax of deleting multiple keys in one DEL command, fix the returning value for GET command. Support to delete multiple keys in one DEL command. The feature of returning number of the really deleted keys is still not supported. Return empty string to client for GET command when the required key is not exists. Fixes: #5334 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-03 17:27:40 +08:00
fastio	039b83ad3b	Redis API: Rename options related to Redis API, describe them clearly, and remove unnecessary one. Rename option redis_transport_port to redis_port, which the redis transport listens on for clients. Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients. Rename option redis_database_count. Set the redis dabase count. Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace. Remove option enable_redis_protocol, which is unnecessary. Fixes: #5335 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-03 17:13:35 +08:00
Nadav Har'El	7b93360c8d	Merge: redis: skip processing request of EOF Merged pull request https://github.com/scylladb/scylla/pull/5393/ by Amos Kong: ` When I test the redis cmd by echo and nc, there is a redundant error in the end. I checked by strace, currently if client read nothing from stdin, it will shutdown the socket, redis server will read nothing (0 byte) from socket. But it tries to process the empty command and returns an error. $ echo -n -e '1\r\n$4\r\nping\r\n' \|strace nc localhost 6379 \| ... \| read(0, "1\r\n$4\r\nping\r\n", 8192) = 14 \| select(5, [4], [4], [], NULL) = 1 (out [4]) \|>>> sendto(4, "1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14 \| select(5, [0 4], [], [], NULL) = 1 (in [0]) \| recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket) \| read(0, "", 8192) = 0 \|>>> shutdown(4, SHUT_WR) = 0 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32 \| write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG \| -ERR unknown command '' \| ) = 32 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0 \| close(1) = 0 \| close(4) = 0 Current result: $ echo -n -e '' \|nc localhost 6379 -ERR unknown command '' $ echo -n -e '1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG -ERR unknown command '' Expected: $ echo -n -e '' \|nc localhost 6379 $ echo -n -e '*1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG	2019-12-03 10:40:20 +02:00
Avi Kivity	83feb9ea77	tools: toolchain: update frozen image Commit `96009881d8` added diffutils to the dependencies via Seastar's install-dependencies.sh, after it was inadvertantly dropped in `1164ff5329` (update to Fedora 31; diffutils is no longer brought in as a side effect of something else). Regenerate the image to include diffutils. Ref #5401.	2019-12-03 10:36:55 +02:00
Amos Kong	fb9af2a86b	redis-test: add test_raw_cmd.py This patch added subtests for EOF process, it reads and writes the socket directly by using protocol cmds. We can add more tests in future, tests with Redis module will hide some protocol error. Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 10:47:56 +08:00
Amos Kong	4fa862adf4	redis: skip processing request of EOF When I test the redis cmd by echo and nc, there is a redundant error in the end. I checked by strace, currently if client read nothing from stdin, it will shutdown the socket, redis server will read nothing (0 byte) from socket. But it tries to process the empty command and returns an error. $ echo -n -e '1\r\n$4\r\nping\r\n' \|strace nc localhost 6379 \| ... \| read(0, "1\r\n$4\r\nping\r\n", 8192) = 14 \| select(5, [4], [4], [], NULL) = 1 (out [4]) \|>>> sendto(4, "1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14 \| select(5, [0 4], [], [], NULL) = 1 (in [0]) \| recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket) \| read(0, "", 8192) = 0 \|>>> shutdown(4, SHUT_WR) = 0 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32 \| write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG \| -ERR unknown command '' \| ) = 32 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0 \| close(1) = 0 \| close(4) = 0 Current result: $ echo -n -e '' \|nc localhost 6379 -ERR unknown command '' $ echo -n -e '1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG -ERR unknown command '' Expected: $ echo -n -e '' \|nc localhost 6379 $ echo -n -e '*1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 10:47:56 +08:00
Rafael Ávila de Espíndola	bb114de023	dbuild: Fix confusion about relabeling podman needs to relabel directories in exactly the same cases docker does. The difference is that podman cannot relabel /tmp. The reason it was working before is that in practice anyone using dbuild has already relabeled any directories that need relabeling, with the exception of /tmp, since it is recreated on every boot. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191201235614.10511-2-espindola@scylladb.com>	2019-12-02 18:38:16 +02:00
Rafael Ávila de Espíndola	867cdbda28	dbuild: Use a temporary directory for /tmp With this we don't have to use --security-opt label=disable. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191201235614.10511-1-espindola@scylladb.com>	2019-12-02 18:38:14 +02:00
Botond Dénes	1d1f8b0d82	tests: mutation_test: add large collection allocation test Checking that there are no large allocations when a large collection is de/serialized.	2019-12-02 17:13:53 +02:00
Avi Kivity	28355af134	docs: add maintainer's handbook (#5396 ) This is a list of recipes used by maintainers to maintain scylla.git.	2019-12-02 15:01:54 +02:00
Calle Wilund	8c6d6254cf	cdc: Remove some code from header	2019-12-02 13:00:19 +00:00
Botond Dénes	4c59487502	collection_mutation: don't linearize the buffer on deserialization Use `utils::linearizing_input_stream` for the deserizalization of the collection. Allows for avoiding the linearization of the entire cell value, instead only linearizing individual values as they are deserialized from the buffer.	2019-12-02 10:10:31 +02:00
Botond Dénes	690e9d2b44	utils: introduce linearizing_input_stream `linearizing_input_stream` allows transparently reading linearized values from a fragmented buffer. This is done by linearizing on-the-fly only those read values that happen to be split across multiple fragments. This reduces the size of the largest allocation from the size of the entire buffer (when the entire buffer is linearized) to the size of the largest read value. This is a huge gain when the buffer contains loads of small objects, and modest gains when the buffer contains few large objects. But the even in the worst case the size of the largest allocation will be less or equal compared to the case where the entire buffer is linearized. This stream is planned to be used as glue code between the fragmented cell value and the collection deserialization code which expects to be reading linearized values.	2019-12-02 10:10:31 +02:00
Botond Dénes	065d8d37eb	tests: random-utils: get_string(): add overload that takes engine parameter	2019-12-02 10:10:31 +02:00
Botond Dénes	2f9307c973	collection_mutation: use a fragmented buffer for serialization For the serialization `bytes_ostream` is used.	2019-12-02 10:10:31 +02:00
Botond Dénes	fc5b096f73	imr: value_writer::write_to_destination(): don't dereference chunk iterator eagerly Currently the loop which writes the data from the fragmented origin to the destination, moves to the next chunk eagerly after writing the value of the current chunk, if the current chunk is exhausted. This presents a problem when we are writing the last piece of data from the last chunk, as the chunk will be exhausted and we eagerly attempt to move to the next chunk, which doesn't exist and dereferencing it will fail. The solution is to not be eager about moving to the next chunk and only attempt it if we actually have more data to write and hence expect more chunks.	2019-12-02 10:10:31 +02:00
Botond Dénes	875314fc4b	bytes_ostream: make it a FragmentRange The presence of `const_iterator` seems to be a requirement as well although it is not part of the concept. But perhaps it is just an assumption made by code using it.	2019-12-02 10:10:31 +02:00
Botond Dénes	4054ba0c45	serialization: accept any CharOutputIterator Not just bytes::output_iterator. Allow writing into streams other than just `bytes`. In fact we should be very careful with writing into `bytes` as they require potentially large contiguous allocations. The `write()` method is now templatized also on the type of its first argument, which now accepts any CharOutputIterator. Due to our poor usage of namespace this now collides with `write` defined inside `db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to be templatized on the data type it reads from, and de-templatizing it resolves the clash.	2019-12-02 10:10:31 +02:00
Botond Dénes	07007edab9	bytes_ostream: add output_iterator To allow it being used for serialization code, which works in terms of output iterators.	2019-12-02 10:10:31 +02:00
Takuya ASADA	c5a95210fe	dist/common/scripts/scylla_setup: list virtio-blk devices correctly on interactive RAID setup Currently interactive RAID setup prompt does not list virtio-blk devices due to following reasons: - We fail matching '-p' option on 'lsblk --help' output since misusage of regex functon, list_block_devices() always skipping to use lsblk output. - We don't check existance of /dev/vd* when we skipping to use lsblk. - We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e' option, but we actually needed them. To fix the problem we need to use re.search() instead of re.match() to match '-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list, then need to stop '-e 252' option on lsblk which excludes virtio-blk. Additionally, it better to parse 'TYPE' field of lsblk output, we should skip 'loop' devices and 'rom' devices since these are not disk devices. Fixes #4066 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191201160143.219456-1-syuu@scylladb.com>	2019-12-01 18:36:48 +02:00
Takuya ASADA	124da83103	dist/common/scripts: use chrony as NTP server on RHEL8/CentOS8 We need to use chrony as NTP server on RHEL8/CentOS8, since it dropped ntpd/ntpdate. Fixes #4571 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191101174032.29171-1-syuu@scylladb.com>	2019-12-01 18:35:03 +02:00
Nadav Har'El	b82417ba27	Merge "alternator: Implement Expected operators LE, GE, and BETWEEN" Merged pull request https://github.com/scylladb/scylla/pull/5392 from Dejan Mircevski. Refs #5034 The patches: alternator: Implement LE operator in Expected alternator: Implement GE operator in Expected alternator: Make cmp diagnostic a value, not funct utils: Add operator<< for big_decimal alternator: Implement BETWEEN operator in Expected	2019-12-01 16:11:11 +02:00
Nadav Har'El	8614c30bcf	Merge "implement echo command" Merged pull request https://github.com/scylladb/scylla/pull/5387 from Amos Kong: This patch implemented echo command, which return the string back to client. Reference: https://redis.io/commands/echo	2019-12-01 10:29:57 +02:00
Amos Kong	49fee4120e	redis-test: add test_echo Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-30 13:32:00 +08:00
Amos Kong	3e2034f07b	redis: implement echo command This patch implemented echo command, which return the string back to client. Reference: - https://redis.io/commands/echo Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-30 13:30:35 +08:00
Dejan Mircevski	dcb1b360ba	alternator: Implement BETWEEN operator in Expected Enable existing BETWEEN test, and add some more coverage to it. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 16:47:21 -05:00
Dejan Mircevski	c43b286f35	utils: Add operator<< for big_decimal ... and remove an existing duplicate from lua.cc. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 15:32:09 -05:00
Dejan Mircevski	e0d77739cc	alternator: Make cmp diagnostic a value, not funct All check_compare diagnostics are static strings, so there's no need to call functions to get them. Instead of a function, make diagnostic a simple value. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 15:09:05 -05:00
Dejan Mircevski	65cb84150a	alternator: Implement GE operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 12:29:08 -05:00
Dejan Mircevski	f201f0eaee	alternator: Implement LE operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 11:59:52 -05:00
Avi Kivity	96009881d8	Update seastar submodule * seastar 8eb6a67a4...166061da3 (3): > install-dependencies.sh: add diffutils > reactor: replace std::optional (in _network_stack_ready) with compat::optional > noncopyable_function: disable -Wuninitialized warning in noncopyable_function_base Ref #5386.	2019-11-29 12:50:48 +02:00
Tomasz Grabiec	6562c60c86	Merge "test.py: terminate children upon signal" from Kostja Allows a signal to terminate the outstanding test tasks, to avoid dangling children.	2019-11-29 12:05:03 +02:00
Pekka Enberg	bb227cf2b4	Merge "Fix default directories in Scylla setup scripts" from Amos "Fix two problem in scylla_io_setup: - Problem 1: paths of default directories is invalid, introduced by commit `5ec1915` ("scylla_io_setup: assume default directories under /var/lib/scylla"). - Problem 2: wrong path join, introduced by commit `31ddb21` ("dist/common/scripts: support nonroot mode on setup scripts"). Fix a problem in scylla_io_setup, scylla_fstrim and scylla_blocktune.py: - Fixed default scylla directories when they aren't assigned in scylla.yaml" Fixes #5370 Reviewed-by: Pavel Emelyanov <xemul@scylladb.com> * 'scylla_io_setup' of git://github.com/amoskong/scylla: use parse_scylla_dirs_with_default to get scylla directories scylla_io_setup: fix data_file_directories check scylla_util: introduce helper to process the default scylla directories scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml scylla_io_setup: fix path join of default scylla directories	2019-11-29 12:05:03 +02:00
Ultrabug	61f1e6e99c	test.py: fix undefined variable 'options' in write_xunit_report()	2019-11-28 19:06:22 +03:00
Ultrabug	5bdc0386c4	test.py: comparison to False should be 'if cond is False:'	2019-11-28 19:06:22 +03:00
Ultrabug	737b1cff5e	test.py: use isinstance() for type comparison	2019-11-28 19:06:22 +03:00
Konstantin Osipov	c611325381	test.py: terminate children upon signal Use asyncio as a more modern way to work with concurrency, Process signals in an event loop, terminate all outstanding tests before exiting. Breaking change: this commit requires Python 3.7 or newer to run this script. The patch adds a version check and a message to enforce it.	2019-11-28 19:06:22 +03:00
Botond Dénes	cf24f4fe30	imr: move documentation to docs/ Where all the other documentation is, and hence where people would be looking for it. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191128144612.378244-1-bdenes@scylladb.com>	2019-11-28 16:47:52 +02:00
Avi Kivity	36dd0140a8	Update seastar submodule * seastar 5c25de907a...8eb6a67a4b (1): > util/backtrace.hh: add missing print.hh include	2019-11-28 16:47:16 +02:00
Benny Halevy	7aef39e400	tracing: one_session_records: keep local tracing ptr Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr in one_session_records when constructed so it can be used during shutdown. Fixes #5243 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-28 15:24:10 +01:00
Gleb Natapov	75499896ab	client_state: store _user as optional instead of shared_ptr _user cannot outlive client_state class instance, so there is no point in holding it in shared_ptr. Tested: debug test.py and dtest auth_test.py Message-Id: <20191128131217.26294-5-gleb@scylladb.com>	2019-11-28 15:48:59 +02:00
Gleb Natapov	1538cea043	cql: modification_statement: store _restrictions as optional instead of shared_ptr _restrictions can be optional since its lifetime is managed by modification_statement class explicitly. Message-Id: <20191128131217.26294-4-gleb@scylladb.com>	2019-11-28 15:48:54 +02:00
Gleb Natapov	ce5d6d5eee	storage_service: store thrift server as an optional instead of shared_ptr Only do_stop_rpc_server uses the shared_ptr to prolong server's lifetime until stop() completes, but do_with() can be used to achieve the same. Message-Id: <20191128131217.26294-3-gleb@scylladb.com>	2019-11-28 15:48:51 +02:00
Gleb Natapov	b9b99431a8	storage_service: store cql server as an optional instead of shared_ptr Only do_stop_native_transport() uses the shared_ptr to prolong server's lifetime until stop() completes, but do_with() can be used to achieve the same. Message-Id: <20191128131217.26294-2-gleb@scylladb.com>	2019-11-28 15:48:47 +02:00
Avi Kivity	2b7e97514a	Update seastar submodule * seastar 6f0ef32514...5c25de907a (7): > shared_future: Fix crash when all returned futures time out Fixes #5322. > future: don't create temporaries on get_value(). > reactor: lower the default stall threshold to 200ms > reactor: Simplify network initialization > reactor: Replace most std::function with noncopyable_function > futures: Avoid extra moves in SEASTAR_TYPE_ERASE_MORE mode > inet_address: Make inet_address == operator ignore scope (again)	2019-11-28 14:48:01 +02:00
Juliusz Stasiewicz	fa12394dfe	reader_concurrency_semaphore: cosmetic changes Added line breaks, replaced unused include, included seastarx.hh instead of `using namespace seastar`.	2019-11-28 13:39:08 +01:00
Nadav Har'El	fde336a882	Merged "5139 minmax bad printing" Merged pull request https://github.com/scylladb/scylla/pull/5311 from Juliusz Stasiewicz: This is a partial solution to #5139 (only for two types) because of the above and because collections are much harder to do. They are coming in a separate PR.	2019-11-28 14:06:43 +02:00
Juliusz Stasiewicz	3b9ebca269	tests/cql_query_test: add test for aggregates on inet+time_type This is a test to max(), min() and count() system functions on the arguments of types: `net::inet_address` and `time_native_type`.	2019-11-28 11:20:43 +01:00
Juliusz Stasiewicz	9c23d89531	cql3/functions: add missing min/max/count for inet and time type References #5139. Aggregate functions, like max(), when invoked on `inet_address' and `time_native_type' used to choose max(blob)->blob overload, with casting of argument and result to bytes. This is because appropriate calls to `aggregate_fcts::make_XXX_function()' were missing. This commit adds them. Functioning remains the same but now clients see user-friendly representations of aggregate result, not binary. Comparing inet addresses without inet::operator< is performed by trick, where ADL is bypassed by wrapping the name of std::min/max and providing an overload of wrapper on inet type.	2019-11-28 11:18:31 +01:00
Pavel Emelyanov	8532093c61	cql: The cql_server does not need proxy reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191127153842.4098-1-xemul@scylladb.com>	2019-11-28 10:58:46 +01:00
Amos Kong	e2eb754d03	use parse_scylla_dirs_with_default to get scylla directories Use default data_file_directories/commitlog_directory if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 15:48:14 +08:00
Amos Kong	bd265bda4f	scylla_io_setup: fix data_file_directories check Use default data_file_directories if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 15:47:56 +08:00
Amos Kong	123c791366	scylla_util: introduce helper to process the default scylla directories Currently we support to assign workdir from scylla.yaml, and we use many hardcode '/var/lib/scylla' in setup scripts. Some setup scripts get scylla directories by parsing scylla.yaml, introduced parse_scylla_dirs_with_default() that adds default values if scylla directories aren't assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:54:32 +08:00
Amos Kong	b75061b4bc	scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:38:01 +08:00
Amos Kong	ada0e92b85	scylla_io_setup: fix path join of default scylla directories Currently we are checking an invalid path of some default scylla directories, the directories don't exist, so the tune will always be skipped. It caused by two problem. Problem 1: paths of default directories is invalid Introduced by commit `5ec191536e`, we try to tune some scylla default directories if they exist. But the directory paths we try are wrong. For example: - What we check: /var/lib/scylla/commitlog_directory - Correct one: /var/lib/scylla/commitlog Problem 2: wrong path join Introduced by commit `31ddb2145a`, default_path might be replaced from '/var/lib/scylla/' to '/var/lib/scylla'. Our code tries to check an invalid path that is wrongly join, eg: '/var/lib/scyllacommitlog' Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:37:58 +08:00
Amos Kong	d4a26f2ad0	scylla_util: get_scylla_dirs: return default data/commitlog directories if they aren't set (#5358 ) The default values of data_file_directories and commitlog_directory were commented by commit `e0f40ed16a`. It causes scylla_util.py:get_scylla_dirs() to fail in checking the values. This patch changed get_scylla_dirs() to return default data/commitlog directories if they aren't set. Fixes #5358 Reviewed-by: Pavel Emelyanov <xemul@scylladb.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-27 13:52:05 +02:00
Nadav Har'El	cb1ed5eab2	alternator-test: test Query's Limit parameter Add a test, test_query.py::test_query_limit, to verify that the Limit parameter correctly limits the number of rows returned by the Query. This was supposed to already work correctly - but we never had a test for it. As we hoped, the test passes (on both Alternator and DynamoDB). Another test, test_query.py::test_query_limit_paging, verifies that paging can be done with any setting of Limit. We already had tests for paging of the Scan operation, but not for the Query operation. Refs #5153 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-27 12:27:26 +01:00
Nadav Har'El	c01ca661a0	alternator-test: Select parameter of Query and Scan This is a comprehensive test for the "Select" parameter of Query and Scan operations, but only for the base-table case, not index, so another future patch should add similar tests in test_gsi.py and test_lsi.py as well. The main use of the Select parameter is to allow returning just the count of items, instead of their content, but it also has other esoteric options, all of which we test here. The test currently succeeds on AWS DynamoDB, demonstrating that the test is correct, but fails on Alternator because the "Select" parameter is not yet supported. So the test is marked xfail. Refs #5058 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-27 12:22:33 +01:00
Botond Dénes	9d09f57ba5	scylla-gdb.py: scylla_smp_queues: use lazy initalization Currently the command tries to read all seastar smp queues in its initialization code in the constructor. This constructor is run each time `scylla-gdb.py` is sourced in `gdb` which leads to slowdowns and sometimes also annoying errors because the sourcing happens in the wrong context and seastar symbols are not available. Avoid this by running this initializing code lazily, on the first invocation. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191127095408.112101-1-bdenes@scylladb.com>	2019-11-27 12:04:57 +01:00
Tomasz Grabiec	87b72dad3e	Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov This patchset adds missing "const" function qualifiers throughout the Scylla code base, which would make code less error-prone. The changeset incorporates Kostja's work regarding const qualifiers in the cql code hierarchy along with a follow-up patch addressing the review comment of the corresponding patch set (the patch subject is "cql: propagate const property through prepared statement tree.").	2019-11-27 10:56:20 +01:00
Rafael Ávila de Espíndola	91b43f1f06	dbuild: fix podman with selinux enabled With this change I am able to run tests using docker-podman. The option also exists in docker. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126194101.25221-1-espindola@scylladb.com>	2019-11-26 21:50:56 +02:00
Rafael Ávila de Espíndola	480055d3b5	dbuild: Fix missing docker options With the recent changes docker was missing a few options. In particular, it was missing -u. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126194347.25699-1-espindola@scylladb.com>	2019-11-26 21:45:31 +02:00
Rafael Ávila de Espíndola	c0a2cd70ff	lua: fix test with boost 1.66 The boost 1.67 release notes says Changed maximum supported year from 10000 to 9999 to resolve various issues So change the test to use a larger number so that we get an exception with both boost 1.66 and boost 1.67. Fixes #5344 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126180327.93545-1-espindola@scylladb.com>	2019-11-26 21:17:15 +02:00
Pavel Solodovnikov	55a1d46133	cql: some more missing const qualifiers There are several virtual functions in public interfaces named "is_*" that clearly should be marked as "const", so fix that.	2019-11-26 17:57:51 +03:00
Pavel Solodovnikov	412f1f946a	cql: remove "mutable" on _opts in select_statement _opts initialization can be safely done in the constructor, hence no need to make it mutable.	2019-11-26 17:55:10 +03:00
Piotr Sarna	d90dbd6ab0	Merge "support podman as a replacement to docker" from Avi Docker on Fedora 31 is flakey, and is not supported at all on RHEL 8. Podman is a drop-in replacement for docker; this series adds support for using podman in dbuild. Apart from actually working on Fedora 31 hosts, podman is nicer in being more secure and not requiring a daemon. Fixes #5332	2019-11-26 15:17:49 +01:00
Tomasz Grabiec	5c9fe83615	Merge "Sanitize sub-modules shutting down" from Pavel As suggested in issue #4586 here is the helper that prints "shutting down foo" message, then shuts the foo down, then prints the "[it] was successull" one. In between it catches the exception (if any) and warns this in logs. By "then" I mean literally then, not the seastar's then() :) Fixes: #4586	2019-11-26 15:14:22 +02:00
Piotr Sarna	9c5a5a5ac2	treewide: add names to semaphores By default, semaphore exceptions bring along very little context: either that a semaphore was broken or that it timed out. In order to make debugging easier without introducing significant runtime costs, a notion of named semaphore is added. A named semaphore is simply a semaphore with statically defined name, which is present in its errors, bringing valuable context. A semaphore defined as: auto sem = semaphore(0); will present the following message when it breaks: "Semaphore broken" However, a named semaphore: auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"}); will present a message with at least some debugging context: "Semaphore broken: io_concurrency_sem" It's not much, but it would really help in pinpointing bugs without having to inspect core dumps. At the same time, it does not incur any costs for normal semaphore operations (except for its creation), but instead only uses more CPU in case an error is actually thrown, which is considered rare and not to be on the hot path. Refs #4999 Tests: unit(dev), manual: hardcoding a failure in view building code	2019-11-26 15:14:21 +02:00
Avi Kivity	6fbb724140	conf: remove unsupported options from scylla.yaml (#5299 ) These unsupported options do nothing except to confuse users who try to tune them. Options removed: hinted_handoff_throttle_in_kb max_hints_delivery_threads batchlog_replay_throttle_in_kb key_cache_size_in_mb key_cache_save_period key_cache_keys_to_save row_cache_size_in_mb row_cache_save_period row_cache_keys_to_save counter_cache_size_in_mb counter_cache_save_period counter_cache_keys_to_save memory_allocator saved_caches_directory concurrent_reads concurrent_writes concurrent_counter_writes file_cache_size_in_mb index_summary_capacity_in_mb index_summary_resize_interval_in_minutes trickle_fsync trickle_fsync_interval_in_kb internode_authenticator native_transport_max_threads native_transport_max_concurrent_connections native_transport_max_concurrent_connections_per_ip rpc_server_type rpc_min_threads rpc_max_threads rpc_send_buff_size_in_bytes rpc_recv_buff_size_in_bytes internode_send_buff_size_in_bytes internode_recv_buff_size_in_bytes thrift_framed_transport_size_in_mb concurrent_compactors compaction_throughput_mb_per_sec sstable_preemptive_open_interval_in_mb inter_dc_stream_throughput_outbound_megabits_per_sec cross_node_timeout streaming_socket_timeout_in_ms dynamic_snitch_update_interval_in_ms dynamic_snitch_reset_interval_in_ms dynamic_snitch_badness_threshold request_scheduler request_scheduler_options throttle_limit default_weight weights request_scheduler_id	2019-11-26 15:14:21 +02:00
Amos Kong	817f34d1a9	ami: support new aws instance types: c5d, m5d, m5ad, r5d, z1d (#5330 ) Currently scylla_io_setup will skip in scylla_setup, because we didn't support those new instance types. I manually executed scylla_io_setup, and the scylla-server started and worked well. Let's apply this patch first, then check if there is some new problem in ami-test. Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-26 15:14:21 +02:00
Konstantin Osipov	90346236ac	cql: propagate const property through prepared statement tree. cql_statement is a class representing a prepared statement in Scylla. It is used concurrently during execution, so it is important that its change is not changed by execution. Add const qualifier to the execution methods family, throghout the cql hierarchy. Mark a few places which do mutate prepared statement state during execution as mutable. While these are not affecting production today, as code ages, they may become a source of latent bugs and should be moved out of the prepared state or evaluated at prepare eventually: cf_property_defs::_compaction_strategy_class list_permissions_statement::_resource permission_altering_statement::_resource property_definitions::_properties select_statement::_opts	2019-11-26 14:18:17 +03:00
Pavel Solodovnikov	2f442f28af	treewide: add const qualifiers throughout the code base	2019-11-26 02:24:49 +03:00
Pavel Emelyanov	50a1ededde	main: Remove now unused defer-with-log helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	a0f92d40ee	main: Shut down sighup handler with verbose helper And (!) fix the misprinted variable name. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	0719369d83	repair: Remove extra logging on shutdown The shutdown start/finish messages are already printed in verbose_shutdown() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	2d64fc3a3e	main: Shut down database with verbose_shutdown helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	636c300db5	main: Shut down prometheus with verbose_shutdown() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> --- v2: - Have stop easrlier so that exception in start/listen do not prevent prometheu.stop from calling	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	804b152527	main: Sanitize shutting down callbacks As suggested in issue #4586 here is the helper that prints "shutting down foo" message, then shuts the foo down, then prints the "shutting down foo was successfull". In between it catches the exception (if any) and warns this in logs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:45:49 +03:00
Nadav Har'El	4160b3630d	Merge "Return preimage from CDC only when it's enabled" Merged pull request https://github.com/scylladb/scylla/pull/5218 from Piotr Jastrzębski: Users should be able to decide whether they need preimage or not. There is already an option for that but it's not respected by the implementation. This PR adds support for this functionality. Tests: unit(dev). Individual patches: cdc: Don't take storage_proxy as transformer::pre_image_select param cdc::append_log_mutations: use do_with instead of shared_ptr cdc::append_log_mutations: fix undefined behavior cdc: enable preimage in test_pre_image_logging test cdc: Return preimage only when it's requested cdc: test both enabled and disabled preimage in test_pre_image_logging	2019-11-25 14:32:17 +02:00
Pavel Emelyanov	f6ac969f1e	mm: Stop migration manager Before stopping the db itself, stop the migration service. It must be stopped before RPC, but RPC is not stopped yet itself, so we should be safe here. Here's the tail of the resulting logs: INFO 2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager INFO 2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service INFO 2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished INFO 2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server INFO 2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete. Also -- stop the mm on drain before the commitlog it stopped. [Tomasz: mm needs the cl because pulling schema changes from other nodes involves applying them into the database. So cl/db needs to be stopped after mm is stopped.] The drain logs would look like ... INFO 2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service INFO 2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service INFO 2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED: and then on stop ... INFO 2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished INFO 2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server INFO 2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete. Fixes #5300 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191125080605.7661-1-xemul@scylladb.com>	2019-11-25 12:59:01 +01:00
Asias He	6ec602ff2c	repair: Fix rx_hashes_nr metrics (#5213 ) In get_full_row_hashes_with_rpc_stream and repair_get_row_diff_with_rpc_stream_process_op which were introduced in the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not updated correctly. In the test we have 3 nodes and run repair on node3, we makes sure the following metrics are correct. assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'], node3_metrics['scylla_repair_rx_hashes_nr']) assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'], node3_metrics['scylla_repair_tx_hashes_nr']) assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'], node3_metrics['scylla_repair_rx_row_nr']) assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'], node3_metrics['scylla_repair_tx_row_nr']) assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'], node3_metrics['scylla_repair_rx_row_bytes']) assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'], node3_metrics['scylla_repair_tx_row_bytes']) Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test Fixes: #5339 Backports: 3.2	2019-11-25 13:57:37 +02:00
Piotr Jastrzebski	2999cb5576	cdc: test both enabled and disabled preimage in test_pre_image_logging Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	222b94c707	cdc: Return preimage only when it's requested Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	c94a5947b7	cdc: enable preimage in test_pre_image_logging test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	595c9f9d32	cdc::append_log_mutations: fix undefined behavior The code was iterating over a collection that was modified at the same time. Iterators were used for that and collection modification can invalidate all iterators. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	f0f44f9c51	cdc::append_log_mutations: use do_with instead of shared_ptr This will not only safe some allocations but also improve code readability. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	b8d9158c21	cdc: Don't take storage_proxy as transformer::pre_image_select param transformer has access to storage_proxy through its _ctx field. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Nadav Har'El	3eab6cd549	Merged "toolchain: update to Fedora 31" Merged pull request https://github.com/scylladb/scylla/pull/5310 from Avi Kivity: This is a minor update as gcc and boost versions did not change. A noteable update is patchelf 0.10, which adds support to large binaries. A few minor issues exposed by the update are fixed in preparatory patches. Patches: dist: rpm: correct systemd post-uninstall scriptlet build: force xz compression on rpm binary payload tools: toolchain: update to Fedora 31	2019-11-24 13:38:45 +02:00
Tomasz Grabiec	e3d025d014	row_cache: Fix abort on bad_alloc during cache update Since `90d6c0b`, cache will abort when trying to detach partition entries while they're updated. This should never happen. It can happen though, when the update fails on bad_alloc, because the cleanup guard invalidates the cache before it releases partition snapshots (held by "update" coroutine). Fix by destroying the coroutine first. Fixes #5327. Tests: - row_cache_test (dev) Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>	2019-11-24 12:06:51 +02:00
Rafael Ávila de Espíndola	8599f8205b	rpmbuild: don't use dwz By default rpm uses dwz to merge the debug info from various binaries. Unfortunately, it looks like addr2line has not been updated to handle this: // This works $ addr2line -e build/release/scylla 0x1234567 $ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug // now this fails $ addr2line -e build/release/scylla 0x1234567 I think the issue is https://sourceware.org/bugzilla/show_bug.cgi?id=23652 Fixes #5289 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123015734.89331-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	25d5d39b3c	reloc: Force using sha1 for build-ids The default build-id used by lld is xxhash, which is 8 bytes long. rpm requires build-ids to be at least 16 bytes long (https://github.com/rpm-software-management/rpm/issues/950). We force using sha1 for now. That has no impact in gold and bfd since that is their default. We set it in here instead of configure.py to not slow down regular builds. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123020801.89750-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	b5667b9c31	build: don't compress debug info in executables By default we were compressing debug info only in release executables. The idea, if I understand it correctly, is that those are the ones we ship, so we want a more compact binary. I don't think that was doing anything useful. The compression is just gzip, so when we ship a .tar.xz, having the debug info compressed inside the scylla binary probably reduces the overall compression a bit. When building a rpm the situation in amusing. As part of the rpm build process the debug info is decompressed and extracted to an external file. Given that most of the link time goes to compressing debug info, it is probably a good idea to just skip that. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123022825.102837-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Tomasz Grabiec	d84859475e	Merge "Refactor test.py and cleanup resources" from Kostja Structure the code to be able to introduce futures. Apply trivial cleanups. Switch to asyncio and use it to work with processes and handle signals. Cleanup all processes upon signal.	2019-11-24 11:35:29 +02:00
Tomasz Grabiec	e166fdfa26	Merge "Optimize LWT query phase" from Vladimir Davydov This patch implements a simple optimization for LWT: it makes PAXOS prepare phase query locally and return the current value of the modified key so that a separate query is not necessary. For more details see patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial preparatory refactoring.	2019-11-24 11:35:29 +02:00
Pavel Solodovnikov	4879db70a6	system_keyspace: support timeouts in queries to `system.paxos` table. Also introduce supplementary `execute_cql_with_timeout` function. Remove redundant comment for `execute_cql`. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191121214148.57921-1-pa.solodovnikov@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	bf5f864d80	paxos: piggyback result query on prepare response Current LWT implementation uses at least three network round trips: - first, execute PAXOS prepare phase - second, query the current value of the updated key - third, propose the change to participating replicas (there's also learn phase, but we don't wait for it to complete). The idea behind the optimization implemented by this patch is simple: piggyback the current value of the updated key on the prepare response to eliminate one round trip. To generate less network traffic, only the closest to the coordinator replica sends data while other participating replicas send digests which are used to check data consistency. Note, this patch changes the API of some RPC calls used by PAXOS, but this should be okay as long as the feature in the early development stage and marked experimental. To assess the impact of this optimization on LWT performance, I ran a simple benchmark that starts a number of concurrent clients each of which updates its own key (uncontended case) stored in a cluster of three AWS i3.2xlarge nodes located in the same region (us-west-1) and measures the aggregate bandwidth and latency. The test uses shard-aware gocql driver. Here are the results: latency 99% (ms) bandwidth (rq/s) timeouts (rq/s) clients before after before after before after 1 2 2 626 637 0 0 5 4 3 2616 2843 0 0 10 3 3 4493 4767 0 0 50 7 7 10567 10833 0 0 100 15 15 12265 12934 0 0 200 48 30 13593 14317 0 0 400 185 60 14796 15549 0 0 600 290 94 14416 15669 0 0 800 568 118 14077 15820 2 0 1000 710 118 13088 15830 9 0 2000 1388 232 13342 15658 85 0 3000 1110 363 13282 15422 233 0 4000 1735 454 13387 15385 329 0 That is, this optimization improves max LWT bandwidth by about 15% and allows to run 3-4x more clients while maintaining the same level of system responsiveness.	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	6160b9017d	commitlog: make sure a file is closed If allocate or truncate throws, we have to close the file. Fixes #4877 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191114174810.49004-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	3d1d4b018f	paxos: remove unnecessary move constructor invocations invoke_on() guarantees that captures object won't be destroyed until the future returned by the invoked function is resolved so there's no need to move key, token, proposal for calling paxos_state::*_impl helpers.	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	cfb079b2c9	types: Refactor duplicated value_cast implementation The two implementations of value_cast were almost identical. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-3-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	ef2e96c47c	storage_proxy: factor out helper to sort endpoints by proximity We need it for PAXOS.	2019-11-24 11:35:29 +02:00
Nadav Har'El	854e6c8d7b	alternator-test: test_health_only_works_for_root_path: remove wrong check The test_health_only_works_for_root_path test checks that while Alternator's HTTP server responds to a "GET /" request with success ("health check"), it should respond to different URLs with failures (page not found). One of the URLs it tested was "/..", but unfortunately some versions of Python's HTTP client canonize this request to just a "/", causing the request to unexpectedly succeed - and the test to fail. So this patch just drops the "/.." check. A few other nonsense URLs are attempted by the test - e.g., "/abc". Fixes #5321 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	63d4590336	storage_proxy: move digest_algorithm upper We need it for PAXOS. Mark it as static inline while we are at it.	2019-11-24 11:35:29 +02:00
Nadav Har'El	43d3e8adaf	alternator: make DescribeTable return table schema One of the fields still missing in DescribeTable's response (Refs #5026) was the table's schema - KeySchema and AttributeDefinitions. This patch adds this missing feature, and enables the previously-xfailing test test_describe_table_schema. A complication of this patch is that in a table with secondary indexes, we need to return not just the base table's schema, but also the indexes' schema. The existing tests did not cover that feature, so we add here two more tests in test_gsi.py for that. One of these secondary-index schema tests, test_gsi_2_describe_table_schema, still fails, because it outputs a range-key which Scylla added to a view because of its own implementation needs, but wasn't in the user's definition of the GSI. I opened a separate issue #5320 for that. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	f5c2a23118	serializer: add reference_wrapper handling Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or std::tuple<> as T. We need it to avoid copying query::result while serializing paxos::promise.	2019-11-24 11:35:29 +02:00
Botond Dénes	89f9b89a89	scylla-gdb.py: scylla task_histogram: scan all tasks with -a or -s 0 Currently even if `-a` or `-s 0` is provided, `scylla task_histogram` will scan a limited amount of pages due to a bug in the scan loop's stop condition, which will be trigger a stop once the default sample limit is reached. Fix the loop by skipping this check when the user wants to scan all tasks. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	1452653fbc	query_context: fix use after free of timeout_config in execute_cql_with_timeout timeout_config is used by reference by cql3::query_processor::process(), see cql3::query_options, so the caller must make sure it doesn't go away.	2019-11-24 11:35:29 +02:00
Avi Kivity	ff7e78330c	tools: toolchain: dbuild: work around "podman logs --follow" hang At least some versions of 'podman logs --follow' hang when the container eventually exits (also happens with docker on recent versions). Fortunately, we don't need to use 'podman logs --follow' and can use the more natural non-detached 'podman run', because podman does not proxy SIGTERM and instead shuts down the container when it receives it. So, to work around the problem, use the same code path in interactive and non-interactive runs, when podman is in use instead of docker.	2019-11-22 13:59:05 +02:00
Avi Kivity	702834d0e4	tools: dbuild: avoid uid/gid/selinux hacks when using podman With docker, we went to considerable lengths to ensure that access to mounted volume was done using the calling user, including supplementary groups. This avoids root-owned files being left around after a build, and ensures that access to group-shared files (like /var/cache/ccache) works as expected. All of this is unnecessary and broken when using podman. Podman uses a proxy to access files on behalf of the container, so naturally all access is done using the calling user's identity. Since it remaps user and group IDs, assigning the host uid/gid is meaningless. Using --userns host also breaks, because sudo no longer works. Fix this by making all the uid/gid/selinux games specific to docker and ignore them when using podman. To preserve the functionality of tools that depend on $HOME, set that according to the host setting.	2019-11-22 13:58:29 +02:00
Tomasz Grabiec	9d7f8f18ab	database: Avoid OOMing with flush continuations after failed memtable flush The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717	2019-11-22 12:08:36 +01:00
Tomasz Grabiec	fb28543116	lsa: Introduce operator bool() to occupancy_stats	2019-11-22 12:08:28 +01:00
Tomasz Grabiec	a69fda819c	lsa: Expose region_impl::evictable_occupancy in the region class	2019-11-22 12:08:10 +01:00
Avi Kivity	1c181c1b85	tools: dbuild: don't mount duplicate volumes podman refuses to start with duplicate volumes, which routinely happen if the toplevel directory is the working directory. Detect this and avoid the duplicate.	2019-11-22 10:13:30 +02:00
Konstantin Osipov	b8b5834cf1	test.py: simplify message output in run_test()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	90a8f79d7e	test.py: use UnitTest class where possible	2019-11-21 23:16:22 +03:00
Konstantin Osipov	8cd8cfc307	test.py: rename harness command line arguments to 'options' UnitTest class uses juggles with the name 'args' quite a bit to construct the command line for a unit test, so let's spread the harness command line arguments from the unit test command line arguments a bit apart by consistently calling the harness command line arguments 'options', and unit test command line arguments 'args'. Rename usage() to parse_cmd_line().	2019-11-21 23:16:22 +03:00
Konstantin Osipov	e5d624d055	test.py: consolidate argument handling in UnitTest constructor Create unique UnitTest objects in find_tests() for each found match, including repeat, to ensure each test has its own unique id. This will also be used to store execution state in the test.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	dd60673cef	test.py: move --collectd to standard args	2019-11-21 23:16:22 +03:00
Konstantin Osipov	fe12f73d7f	test.py: introduce class UnitTest	2019-11-21 23:16:22 +03:00
Konstantin Osipov	bbcdee37f7	test.py: add add_test_list() to find_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	4723afa09c	test.py: add long tests with add_test()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	13f1e2abc6	test.py: store the non-default seastar arguments along with definition	2019-11-21 23:16:22 +03:00
Konstantin Osipov	72ef11eb79	test.py: introduce add_test() to find_tests() To avoid code duplication, and to build upon later.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	b50b24a8a7	test.py: avoid an unnecessary loop in find_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	a5103d0092	test.py: move args.repeat processing to find_tests() It somewhat stands in the way of using asyncio This patch also implements a more comprehensive fix for #5303, since we not only have --repeat, but run some tests in different configurations, in which case xml output is also overwritten.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	0f0a49b811	test.py: introduce print_summary() and write_xunit_report() (One more moving of the code around).	2019-11-21 23:16:22 +03:00
Konstantin Osipov	22166771ef	test.py: rename test_to_run tests_to_run	2019-11-21 23:16:22 +03:00
Konstantin Osipov	1d94d9827e	test.py: introduce run_all_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	29087e1349	test.py: move out run_test() routine (Trivial code refactoring.)	2019-11-21 23:16:22 +03:00
Konstantin Osipov	79506fc5ab	test.py: introduce find_tests() Trivial code refactoring.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	a44a1c4124	test.py: remove print_status_succint (Trivial code cleanup.)	2019-11-21 23:16:22 +03:00
Konstantin Osipov	b9605c1d37	test.py: move mode list evaluation to usage()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	0c4df5a548	test.py: add usage()	2019-11-21 23:16:22 +03:00
Pavel Emelyanov	e0f40ed16a	cli: Add the --workdir\|-W option When starting scylla daemon as non-root the initialization fails because standard /var/lib/scylla is not accessible by regular users. Making the default dir accessible for user is not very convenient either, as it will cause conflicts if two or more instances of scylla are in use. This problem can be resolved by specifying --commitlog-directory, --data-file-directories, etc on start, but it's too much typing. I propose to revive Nadav's --home option that allows to move all the directories under the same prefix in one go. Unlike Nadav's approach the --workdir option doesn't do any tricky manipulations with existing directories. Insead, as Pekka suggested, the individual directories are placed under the workir if and only if the respective option is NOT provided. Otherwise the directory configuration is taken as is regardless of whether its absolute or relative path. The values substutution is done early on start. Avi suggested that this is unsafe wrt HUP config re-read and proper paths must be resolved on the fly, but this patch doesn't address that yet, here's why. First of all, the respective options are MustRestart now and the substitution is done before HUP handler is installed. Next, commitlog and data_file values are copied on start, so marking the options as LiveUpdate won't make any effect. Finally, the existing named_value::operator() returns a reference, so returning a calculated (and thus temporary) value is not possible (from my current understanding, correct me if I'm wrong). Thus if we want the _directory() to return calculated value all callers of them must be patched to call something different (e.g. _directory.get() ?) which will lead to more confusion and errors. Changes v3: - the option is --workdir back again - the existing *directory are only affected if unset - default config doesn't have any of these set - added the short -W alias Changes v2: - the option is --home now - all other paths are changed to be relative Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191119130059.18066-1-xemul@scylladb.com>	2019-11-21 15:07:39 +02:00
Rafael Ávila de Espíndola	5417c5356b	types: Move get_castas_fctn to cql3 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-9-espindola@scylladb.com>	2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola	f06d6df4df	types: Simplify casts to string These now just use the to_string member functions, which makes it possible to move the code to another file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-8-espindola@scylladb.com>	2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola	786b1ec364	types: Move json code to its own file Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-7-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola	af8e207491	types: Avoid using deserialize_value in json code This makes it independent of internal functions and makes it possible to move it to another file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-6-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola	ed65e2c848	types: Move cql3_kind to the cql3 directory Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-5-espindola@scylladb.com>	2019-11-21 12:08:47 +02:00
Rafael Ávila de Espíndola	bd560e5520	types: Fix dynamic types of some data_value objects I found these mismatched types while converting some member functions to standalone functions, since they have to use the public API that has more type checks. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-4-espindola@scylladb.com>	2019-11-21 12:08:46 +02:00
Rafael Ávila de Espíndola	0d953d8a35	types: Add a test for value_cast We had no tests on when value_cast throws or when it moves the value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-2-espindola@scylladb.com>	2019-11-21 12:08:45 +02:00
Konstantin Osipov	002ff51053	lua: make sure the latest master builds on Debian/Ubuntu Use pkg-config to search for Lua dependencies rather than hard-code include and link paths. Avoid using boost internals, not present in earlier versions of boost. Reviewed-by: Rafael Avila de Espindola <espindola@scylladb.com> Message-Id: <20191120170005.49649-1-kostja@scylladb.com>	2019-11-21 07:57:12 +02:00
Pavel Solodovnikov	d910899d61	configure.py: support multi-threaded linking via `gold` Use `-Wl,--threads` flag to enable multi-threaded linking when using `ld.gold` linker. Additional compilation test is required because it depends on whether or not the `gold` linker has been compiled with `--enable-threads` option. This patch introduces a substantial improvement to the link times of `scylla` binary in release and debug modes (around 30 percent). Local setup reports the following numbers with release build for linking only build/release/scylla: Single-threaded mode: Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.30 Multi-threaded mode: Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.57 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191120163922.21462-1-pa.solodovnikov@scylladb.com>	2019-11-20 19:28:00 +02:00
Nadav Har'El	89d6d668cb	Merge "Redis API in Scylla" Merged patch series from Peng Jian, adding optionally-enabled Redis API support to Scylla. This feature is experimental, and partial - the extent of this support is detailed in docs/redis/redis.md. Patches: Document: add docs/redis/redis.md redis: Redis API in Scylla Redis API: graft redis module to Scylla redis-test: add test cases for Redis API	2019-11-20 16:59:13 +02:00
Piotr Sarna	086e744f8f	scripts/find-maintainer: refresh maintainers list This commit attempts to make the maintainers list up-to-date to the best of my knowledge, because it got really stale over the time. Message-Id: <eab6d3f481712907eb83e91ed2b8dbfa0872155f.1574261533.git.sarna@scylladb.com>	2019-11-20 16:56:31 +02:00
Glauber Costa	73aff1fc95	api: export system uptime via REST This will be useful for tools like nodetool that want to query the uptime of the system. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190619110850.14206-1-glauber@scylladb.com>	2019-11-20 16:44:11 +02:00
Tomasz Grabiec	9a686ac551	Merge "scylla-gdb: active sstables: support k_l/mc sstable readers" from Benny Fixes #5277	2019-11-19 23:49:39 +01:00
Avi Kivity	1164ff5329	tools: toolchain: update to Fedora 31 This is a minor update as gcc and boost versions do not change. glibc-langpack-en no longer gets pulled in by default. As it is required by some locale use somewhere, it is added to the explicit dependencies.	2019-11-20 00:08:30 +02:00
Avi Kivity	301c835cbf	build: force xz compression on rpm binary payload Fedora 31 switched the default compression to zstd, which isn't readable by some older rpm distributions (CentOS 7 in particular). Tell it to use the older xz compression instead, so packages produced on Fedora 31 can be installed on older distributions.	2019-11-20 00:08:24 +02:00
Avi Kivity	3ebd68ef8a	dist: rpm: correct systemd post-uninstall scriptlet The post-uninstall scriptlet requires a parameter, but older versions of rpm survived without it. Fedora 31's rpm is more strict, so supply this parameter.	2019-11-20 00:03:49 +02:00
Peng Jian	e6adddd8ef	redis-test: add test cases for Redis API Signed-off-by: Peng Jian <pengjian.uestc@gmail.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-20 04:56:16 +08:00
Peng Jian	f2801feb66	Redis API: graft redis module to Scylla In this document, the detailed design and implementation of Redis API in Scylla is provided. v2: build: work around ragel 7 generated code bug (suggested by Avi) Ragel 7 incorrectly emits some unused variables that don't compile. As a workaround, sed them away. Signed-off-by: Peng Jian <pengjian.uestc@gmail.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-20 04:55:58 +08:00
Peng Jian	0737d9e84d	redis: Redis API in Scylla Scylla has advantage and amazing features. If Redis build on the top of Scylla, it has the above features automatically. It's achived great progress in cluster master managment, data persistence, failover and replication. The benefits to the users are easy to use and develop in their production environment, and taking avantages of Scylla. Using the Ragel to parse the Redis request, server abtains the command name and the parameters from the request, invokes the Scylla's internal API to read and write the data, then replies to client. Signed-off-by: Peng Jian, <pengjian.uestc@gmail.com>	2019-11-20 04:55:56 +08:00
Peng Jian	708a42c284	Document: add docs/redis/redis.md In this document, the detailed design and implementation of Redis API in Scylla is provided. Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-11-20 04:46:33 +08:00
Nadav Har'El	9b9609c65b	merge: row_marker: correct row expiry condition Merged patch set by Piotr Dulikowski: This change corrects condition on which a row was considered expired by its TTL. The logic that decides when a row becomes expired was inconsistent with the logic that decides if a single cell is expired. A single cell becomes expired when expiry_timestamp <= now, while a row became expired when expiry_timestamp < now (notice the strict inequality). For rows inserted with TTL, this caused non-key cells to expire (change their values to null) one second before the row disappeared. Now, row expiry logic uses non-strict inequality. Fixes #4263, Fixes #5290. Tests: unit(dev) python test described in issue #5290	2019-11-19 18:14:15 +02:00
Amnon Heiman	9df10e2d4b	scylla_util.py: Add optional timeout to out function It is useful to have an option to limit the execution time of a shell script. This patch adds an optional timeout parameter, if a parameter will be provided a command will return and failure if the duration is passed. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-11-19 17:30:28 +02:00
Nadav Har'El	b38c3f1288	Merge "Add separate counters for accesses to system tables" Merged patch series from Juliusz Stasiewicz: Welcome to my first PR to Scylla! The task was intended as a warm-up ("noob") exercise; its description is here: #4182 Sorry, I also couldn't help it and did some scouting: edited descriptions of some metrics and shortened few annoyingly long LoC.	2019-11-19 15:21:56 +02:00
Piotr Dulikowski	9be842d3d8	row_marker: tests for row expiration	2019-11-19 13:45:30 +01:00
Tomasz Grabiec	5e4abd75cc	main: Abort on EBADF and ENOTSOCK by default Those are typically symptoms of use-after-free or memory corruption in the program. It's better to catch such error sooner than later. That situation is also dangerous since if a valid descriptor would land under the invalid access, not the one which was intended for the operation, then the operation may be performed on the wrong file and result in corruption. Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>	2019-11-19 13:07:33 +02:00
Piotr Dulikowski	589313a110	row_marker: correct expiration condition This change corrects condition on which a row was considered expired by its TTL. The logic that decides when a row becomes expired was inconsistent with the logic that decides if a single cell is expired. A single cell becomes expired when `expiry_timestamp <= now`, while a row became expired when `expiry_timestamp < now` (notice the strict inequality). For rows inserted with TTL, this caused non-key cells to expire (change their values to null) one second before the row disappeared. Now, row expiry logic uses non-strict inequality. Fixes: #4263, #5290. Tests: - unit(dev) - python test described in issue #5290	2019-11-19 11:46:59 +01:00
Pekka Enberg	505f2c1008	test.py: Append test repeat cycle to output XML filename Currently, we overwrite the same XML output file for each test repeat cycle. This can cause invalid XML to be generated if the XML contents don't match exactly for every iteration. Fix the problem by appending the test repeat cycle in the XML filename as follows: $ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test $ ls -1 *.xml jenkins_test.release.vint_serialization_test.0.boost.xml jenkins_test.release.vint_serialization_test.1.boost.xml jenkins_test.release.vint_serialization_test.2.boost.xml Fixes #5303. Message-Id: <20191119092048.16419-1-penberg@scylladb.com>	2019-11-19 11:30:47 +02:00
Rafael Ávila de Espíndola	750adee6e3	lua: fix build with boost 1.67 and older vs fmt It is not completely clear why the fmt base code fails with boost 1.67, but it is easy to avoid. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191118210540.129603-1-espindola@scylladb.com>	2019-11-19 11:14:00 +02:00
Tomasz Grabiec	ff567649fa	Merge "gossip: Limit number of pending gossip ACK and ACK2 messages" from Asias In a cross-dc large cluster, the receiver node of the gossip SYN message might be slow to send the gossip ACK message. The ack messages can be large if the payload of the application state is big, e.g., CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK message can consume unlimited amount of memory which causes OOM eventually. To fix, this patch queues the SYN message and handles it later if the previous ACK message is still being sent. However, we only store the latest SYN message. Since the latest SYN message from peer has the latest information, so it is safe to drop the previous SYN message and keep the latest one only. After this patch, there can be at most 1 pending SYN message and 1 pending ACK message per peer node.	2019-11-18 10:52:38 +01:00
Benny Halevy	f9e93bba38	sstables: compaction: move cleanup parameter to compaction_descriptor Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>	2019-11-18 10:52:20 +01:00
Avi Kivity	1fe062aed4	Merge "Add basic UDF support" from Rafael " This patch series adds only UDF support, UDA will be in the next patch series. With this all CQL types are mapped to Lua. Right now we setup a new lua state and copy the values for each argument and return. This will be optimized once profiled. We require --experimental to enable UDF in case there is some change to the table format. " * 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits) Lua: Document the conversions between Lua and CQL Lua: Implement decimal subtraction Lua: Implement decimal addition Lua: Implement support for returning decimal Lua: Implement decimal to string conversion Lua: Implement decimal to floating point conversion Lua: Implement support for decimal arguments Lua: Implement support for returning varint Lua: Implement support for returning duration Lua: Implement support for duration arguments Lua: Implement support for returning inet Lua: Implement support for inet arguments Lua: Implement support for returning time Lua: Implement support for time arguments Lua: Implement support for returning timeuuid Lua: Implement support for returning uuid Lua: Implement support for uuid and timeuuid arguments Lua: Implement support for returning date Lua: Implement support for date arguments Lua: Implement support for returning timestamp ...	2019-11-17 16:38:19 +02:00
Konstantin Osipov	48f3ca0fcb	test.py: use the configured build modes from ninja mode_list Add mode_list rule to ninja build and use it by default when searching for tests in test.py. Now it is no longer necessary to explicitly specify the test mode when invoking test.py. (cherry picked from commit a211ff30c7f2de12166d8f6f10d259207b462d4b)	2019-11-17 13:42:10 +01:00
Nadav Har'El	2fb2eb27a2	sstables: allow non-traditional characters in table name The goal of this patch is to fix issue #5280, a rather serious Alternator bug, where Scylla fails to restart when an Alternator table has secondary indexes (LSI or GSI). Traditionally, Cassandra allows table names to contain only alphanumeric characters and underscores. However, most of our internal implementation doesn't actually have this restriction. So Alternator uses the characters ':' and '!' in the table names to mark global and local secondary indexes, respectively. And this actually works. Or almost... This patch fixes a problem of listing, during boot, the sstables stored for tables with such non-traditional names. The sstable listing code needlessly assumes that the directory name, i.e., the CF names, matches the "\w+" regular expression. When an sstable is found in a directory not matching such regular expression, the boot fails. But there is no real reason to require such a strict regular expression. So this patch relaxes this requirement, and allows Scylla to boot with Alternator's GSI and LSI tables and their names which include the ":" and "!" characters, and in fact any other name allowed as a directory name. Fixes #5280. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191114153811.17386-1-nyh@scylladb.com>	2019-11-17 14:27:47 +02:00
Shlomi Livne	3e873812a4	Document backport queue and procedure (#5282 ) This document adds information about how fixes are tracked to be backported into releases and what is the procedure that is followed to backport those fixes. Signed-off-by: Shlomi Livne <shlomi@scylladb.com>	2019-11-17 01:45:24 -08:00
Benny Halevy	c215ad79a9	scylla-gdb: resolve: add startswith parameter Allow filtering the resolved addresses by a startswith string. The common use case if for resolving vtable ptrs, when resolving the output of `find_vptrs` that may be too long for the host (running gdb) memory size. In this case the number of vtable ptrs is considerably smaller than the total number of objects returned by find_ptrs (e.g. 462 vs. 69625 in a OOM core I examined from scylla --smp=2 --memory=1024M) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-17 11:40:54 +02:00
Benny Halevy	2f688dcf08	scylla-gdb.py: find_single_sstable_readers: fix support for sstable_mutation_reader provide template arguments for k_l and m readers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-17 11:02:05 +02:00
Kamil Braun	a67e887dea	sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285 ) CQL tracing would only report file I/O involving one sstable, even if multiple sstables were read from during the query. Steps to reproduce: create a table with NullCompactionStrategy insert row, flush memtables insert row, flush memtables restart Scylla tracing on select * from table The trace would only report DMA reads from one of the two sstables. Kudos to @denesb for catching this. Related issue: #4908	2019-11-17 00:38:37 -08:00
Tomasz Grabiec	a384d0af76	Merge "A set of cleanups over main() code" from Pavel E. There are ... signs of massive start/stop code rework in the main() function. While fixing the sub-modules interdependencies during start/stop I've polished these signs too, so here's the simplest ones.	2019-11-15 15:25:18 +01:00
Pavel Emelyanov	1dc490c81c	tracing: Move register_tracing_keyspace_backend forward decl into proper header Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	7e81df71ba	main: Shorten developer_mode() evaluation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	1bd68d87fc	main: Do not carry pctx all over the code v2: - do not use struct initialization extention Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	655b6d0d1e	main: Hide start_thrift Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	26f2b2ce5e	main,db: Kill some unused .hh includes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	f5b345604f	main: Factor out get_conf_sub Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	924d52573d	main: Remove unused return_value variable (and capture) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	2195edb819	gitignore: Add tags file This file is generated by ctags utility for navigation, so it is not to be tracked by git. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191031221339.19030-1-xemul@scylladb.com>	2019-11-14 16:50:11 +01:00
Gleb Natapov	e0668f806a	lwt: change format of partition key serialization for system.paxos table Serialize provided partition_key in such a way that the serialized value will hash to the same token as the original key. This way when system.paxos table is updated the update is shard local. Message-Id: <20191114135449.GU10922@scylladb.com>	2019-11-14 15:07:16 +01:00
Avi Kivity	19b665ea6b	Merge "Correctly handle null/unset frozen collection/UDT columns in INSERT JSON." from Kamil " When using INSERT JSON with frozen collection/UDT columns, if the columns were left unspecified or set to null, the statement would create an empty non-null value for these columns instead of using null values as it should have. For example: cqlsh:b> create table t (k text primary key, l frozen<list<int>>, m frozen<map<int, int>>, s frozen<set<int>>, u frozen<ut>); cqlsh:b> insert into t JSON '{"k": "insert_json"}'; cqlsh:b> select * from t; k \| l \| m \| s \| u -------------------+------+------+------+------ insert_json \| [] \| {} \| {} \| This PR fixes this. Resolves #5246 and closes #5270. " * 'frozen-json' of https://github.com/kbr-/scylla: tests: add null/unset frozen collection/UDT INSERT JSON test cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON cql3: decouple execute from term binding in user_type::setter	2019-11-14 15:29:30 +02:00
Avi Kivity	4544aa0b34	Update seastar submodule * seastar 75e189c6ba...6f0ef32514 (6): > Merge "Add named semaphores" from Piotr > parallel_for_each_state: pass rvalue reference to add_future > future: Pass rvalue to uninitialized_wrapper::uninitialized_set. > dependencies: Add libfmt-dev to debian > log: Fix logger behavior when logging both to stdout and syslog. > README.md: list Scylla among the projects using Seastar	2019-11-14 15:01:18 +02:00
Juliusz Stasiewicz	1cfa458409	metrics: separate counters for `system' KS accesses Resolves #4182. Metrics per system tables are accumulated separately, depending on the origin of query (DB internals vs clients).	2019-11-14 13:14:39 +01:00
Vladimir Davydov	ab42b72c6d	cql: fix SERIAL consistency check for batch statements If CONSISTENCY is set to SERIAL or LOCAL SERIAL, all write requests must fail according to Cassandra's documentation. However, batched writes bypass this check. Fix this.	2019-11-14 12:15:39 +01:00
Vladimir Davydov	25aeefd6f3	cql: fix CAS consistency level validation This patch resurrects Cassandra's code validating a consistency level for CAS requests. Basically, it makes CAS requests use a special function instead of validate_for_write to make error messages more coherent. Note, we don't need to resurrect requireNetworkTopologyStrategy as EACH_QUORUM should work just fine for both CAS and non-CAS writes. Looks like it is just an artefact of a rebase in the Cassandra repository.	2019-11-14 12:15:39 +01:00
Juliusz Stasiewicz	b1e4d222ed	cql3: cosmetics - improved description of metrics	2019-11-14 10:35:42 +01:00
Avi Kivity	cd075e9132	reloc: do not install dependencies when building the relocatable package The dependencies are provided by the frozen toolchain. If a dependency is missing, we must update the toolchain rather than rely on build-time installation, which is not reproducible (as different package versions are available at different times). Luckily "dnf install" does not update an already-installed package. Had that been a case, none of our builds would have been reproducible, since packages would be updated to the latest version as of the build time rather than the version selected by the frozen toolchain. So, to prevent missing packages in the frozen toolchain translating to an unreproducible build, remove the support for installing dependencies from reloc/build_reloc.sh. We still parse the --nodeps option in case some script uses it. Fixes #5222. Tests: reloc/build_reloc.sh.	2019-11-14 09:37:14 +02:00
Gleb Natapov	552c56633e	storage_proxy: do not release mutation if not all replies were received MV backpressure code frees mutation for delayed client replies earlier to save memory. The commit `2d7c026d6e` that introduced the logic claimed to do it only when all replies are received, but this is not the case. Fix the code to free only when all replies are received for real. Fixes #5242 Message-Id: <20191113142117.GA14484@scylladb.com>	2019-11-13 16:23:19 +02:00
Raphael S. Carvalho	3e70523111	distributed_loader: Release disk space of SSTables deleted by resharding Resharding is responsible for the scheduling the deletion of sstables resharded, but it was not refreshing the cache of the shards those sstables belong to, which means cache was incorrectly holding reference to them even after they were deleted. The consequence is sstables deleted by resharding not having their disk space freed until cache is refreshed by a subsequent procedure that triggers it. Fixes #5261. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20191107193550.7860-1-raphaelsc@scylladb.com>	2019-11-13 16:03:27 +02:00
Avi Kivity	6aed3b7471	Merge "cql: trivial cleanup" from Vova * 'cql-trivial-cleanup' of ssh://github.com/scylladb/scylla-dev: cql: rename modification_statement::_sets_a_collection to _selects_a_collection cql: rename _column_conditions to _regular_conditions cql: remove unnecessary optional around prefetch_data	2019-11-13 15:12:10 +02:00
Avi Kivity	1cb9f9bdfe	Merge "Use a fixed-size bitset for column set" from Kostja " Use a fixed-size, rather than a dynamically growing bitset for column mask. This avoids unnecessary memory reallocation in the most common case. " * 'column_set' of ssh://github.com/scylladb/scylla-dev: schema: pre-allocate the bitset of column_set schema: introduce schema::all_columns_count() schema: rename column_mask to column_set	2019-11-13 15:08:13 +02:00
Tomasz Grabiec	f68e17eb52	Merge "Partition/row hit/miss counters for memtable write operations" from Piotr D. Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-13 13:11:51 +01:00
Juliusz Stasiewicz	8318a6720a	cql3: error msg w/ arg counts for prepared stmts with wrong arg cnt Fixes #3748. Very small change: added argument count (expectation vs. reality) to error msg within `invalid_request_exception'.	2019-11-13 13:43:37 +02:00
Nadav Har'El	ccb9038c69	alternator: Implement Expected operators LT and GT Merged patch series from Dejan Mircevski. Implements the "LT" and "GT" operators of the Expected update option (i.e., conditional updates), and enables the pre-existing tests for them.	2019-11-13 12:07:44 +02:00
Konstantin Osipov	6159c012db	schema: pre-allocate the bitset of column_set The number of columns is usually small, and avoiding a resize speeds up bit manipulation functions.	2019-11-13 11:41:51 +03:00
Konstantin Osipov	e95d675567	schema: introduce schema::all_columns_count() schema::all_columns_count() will be used to reserve memory of the column_set bitmask.	2019-11-13 11:41:42 +03:00
Konstantin Osipov	191acec7ab	schema: rename column_mask to column_set Since it contains a precise set of columns, it's more accurate to call it a set, not a mask. Besides, the name column_mask is already used for column options on storage level.	2019-11-13 11:41:30 +03:00
Kamil Braun	d6446e352e	tests: add null/unset frozen collection/UDT INSERT JSON test When using INSERT JSON with null/unspecified frozen collection/UDT columns, the columns should be set to null. See #5270.	2019-11-12 18:24:47 +01:00
Vladimir Davydov	8110178e5d	cql: rename modification_statement::_sets_a_collection to _selects_a_collection This is merely to avoid confusion: we use _sets prefix to indicate that there are operations over static/regular columns (_sets_static_columns, _sets_regular_columns), but _sets_a_collection is set for both operations and conditions. So let's rename it to _selects_a_collection and add some comments.	2019-11-12 20:15:42 +03:00
Vladimir Davydov	a19192950e	cql: rename _column_conditions to _regular_conditions It's weird that modification_statement has _static_conditions for conditions on static columns and _column_conditions for conditions on regular columns, as if conditions on static columns are not column conditions. Let's rename _column_conditions to _regular_conditions to avoid confusion.	2019-11-12 20:15:35 +03:00
Konstantin Osipov	0ad0369684	cql: remove unnecessary optional around prefetch_data	2019-11-12 20:15:24 +03:00
Kamil Braun	6c04c5bed5	cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON Before this commit, an empty non-null value was created for frozen collection/UDT columns when an INSERT JSON statement was executed with the value left unspecified or set to null. This was incompatible with Cassandra which inserted a null (dead cell). Fixes #5270.	2019-11-12 18:05:01 +01:00
Kamil Braun	0ad7d71f31	cql3: decouple execute from term binding in user_type::setter This commit makes it possible to pass a bound value terminal directly to the setter. Continuation of commit `bfe3c20035`.	2019-11-12 18:02:21 +01:00
Takuya ASADA	614ec6fc35	install.sh: drop --pkg option, use .install file on .deb package --pkg option on install.sh is introduced for .deb packaging since it requires different install directory for each subpackage. But we actually able to use "debian/tmp" for shared install directory, then we can specify file owner of the package using .install files. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191030203142.31743-1-syuu@scylladb.com>	2019-11-12 16:50:37 +02:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Piotr Dulikowski	48f7b2e4fb	table: move out table::stats to table_stats This change was done in order to be able to forward-declare the table::stats structure.	2019-11-12 13:35:41 +01:00
Avi Kivity	cf7291462d	Merge "cql3/functions: add missing min/max/count functions for ascii type" from Piotr " Adds missing overloads of functions count, min, max for type ascii. Now they work: cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE ks; cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd'); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh'); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl'); cqlsh:ks> SELECT * FROM test_ascii; id \| value ----+------- 1 \| efgh 0 \| abcd 2 \| ijkl (3 rows) cqlsh:ks> SELECT count(value) FROM test_ascii; system.count(value) --------------------- 3 (1 rows) cqlsh:ks> SELECT min(value) FROM test_ascii; system.min(value) ------------------- abcd (1 rows) cqlsh:ks> SELECT max(value) FROM test_ascii; system.max(value) ------------------- ijkl (1 rows) Tests: unit(release) cql_group_functions_tests.py (with added check for ascii type) Fixes #5147. " * '5147-fix-min-max-count-for-ascii' of https://github.com/piodul/scylla: tests/cql_query_test: add aggregate functions test cql3/functions: add missing min/max/count for ascii	2019-11-12 14:15:14 +02:00
Piotr Dulikowski	41cb16a526	tests/cql_query_test: add aggregate functions test Adds a test for min, max and avg functions for those primitive types for which those functions are working at the moment.	2019-11-12 13:01:34 +01:00
Piotr Dulikowski	6d78d7cc69	cql3/functions: add missing min/max/count for ascii Adds missing overloads of functions `count`, `min`, `max` for type `ascii`. Now they work: cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE ks; cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd'); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh'); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl'); cqlsh:ks> SELECT * FROM test_ascii; id \| value ----+------- 1 \| efgh 0 \| abcd 2 \| ijkl (3 rows) cqlsh:ks> SELECT count(value) FROM test_ascii; system.count(value) --------------------- 3 (1 rows) cqlsh:ks> SELECT min(value) FROM test_ascii; system.min(value) ------------------- abcd (1 rows) cqlsh:ks> SELECT max(value) FROM test_ascii; system.max(value) ------------------- ijkl (1 rows) Tests: - unit(release) - cql_group_functions_tests.py (with added check for `ascii` type) Fixes #5147.	2019-11-12 13:01:34 +01:00
Rafael Ávila de Espíndola	10bcbaf348	Lua: Document the conversions between Lua and CQL Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	6ffddeae5e	Lua: Implement decimal subtraction Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	aba8e531d1	Lua: Implement decimal addition Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bb84eabbb3	Lua: Implement support for returning decimal Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bc17312a86	Lua: Implement decimal to string conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	e83d5bf375	Lua: Implement decimal to floating point conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b568bf4f54	Lua: Implement support for decimal arguments This is just the minimum to pass a value to Lua. Right now you can't actually do anything with it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	6c3f050eb4	Lua: Implement support for returning varint Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dc377abd68	Lua: Implement support for returning duration Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	c3f021d2e4	Lua: Implement support for duration arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9208b2f498	Lua: Implement support for returning inet Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	64be94ab01	Lua: Implement support for inet arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	faf029d472	Lua: Implement support for returning time Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	772f2a4982	Lua: Implement support for time arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	484f498534	Lua: Implement support for returning timeuuid Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9c2daf6554	Lua: Implement support for returning uuid Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ae1a1a4085	Lua: Implement support for uuid and timeuuid arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	f8aeed5beb	Lua: Implement support for returning date Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	384effa54b	Lua: Implement support for date arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	63bc960152	Lua: Implement support for returning timestamp Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ee95756f62	Lua: Implement support for timestamp arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	1c6d5507b4	Lua: Implement support for returning counter Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	0d9d53b5da	Lua: Implement support for counter arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	74c4e58b6b	Lua: Add a test for nested types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b226511ce8	Lua: Implement support for returning maps Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	5c8d1a797f	Lua: Implement support for map arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b5b15ce4e6	Lua: Implement support for returning set Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	cf7ba441e4	Lua: Implement support for set arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	02f076be43	Lua: Implement support for returning udt Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	92c8e94d9a	Lua: Implement support for udt arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	a7c3f6f297	Lua: Implement support for returning list Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	688736f5ff	Lua: Implement support for returning tuple Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ab5708a711	Lua: Implement support for list and tuple arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	534f29172c	Lua: Implement support for returning boolean Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b03c580493	Lua: Implement support for boolean arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dcfe397eb6	Lua: Implement support for returning floating point Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	cf4b7ab39a	Lua: Implement support for returning blob Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	3d22433cd4	Lua: Implement support for blob arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dd754fcf01	Lua: Implement support for returning ascii Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	affb1f8efd	Lua: Implement support for returning text Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	f8ed347ee7	Lua: Implement support for string arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	0e4f047113	Lua: Implement a visitor for return values This adds support for all integer types. Followup commits will implement the missing types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	34b770e2fb	Lua: Push varint as decimal This makes it substantially simpler to support both varint and decimal, which will be implemented in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9b3cab8865	Lua: Implement support for varint to integer conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	5a40264d97	Lua: Implement support for varint arguments Right now it is not possible to do anything with the value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	3230b8bd86	Lua: Implement support for floating point arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9ad2cc2850	Lua: Implement a visitor for arguments With this we support all simple integer types. Followup patches will implement the missing types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ee1d87a600	Lua: Plug in the interpreter This add a wrapper around the lua interpreter so that function executions are interruptible and return futures. With this patch it is possible to write and use simple UDFs that take and return integer values. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bc3bba1064	Lua: Add lua.cc and lua.hh skeleton files Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	7015e219ca	Lua: Link with liblua Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	61200ebb04	Lua: Add config options This patch just adds the config options that we will expose for the lua runtime. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	d9337152f3	Use threads when executing user functions This adds a requires_thread predicate to functions and propagates that up until we get to code that already returns futures. We can then use the predicate to decide if we need to use seastar::async. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	52b48b415c	Test that schema digests with UDFs don't change This refactors test_schema_digest_does_not_change to also test a schema with user defined functions and user defined aggregates. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	fc72a64c67	Add schema propagation and storage for UDF With this it is possible to create user defined functions and aggregates and they are saved to disk and the schema change is propagated. It is just not possible to call them yet. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ce6304d920	UDF: Add a feature and config option to track if udf is enabled It can only be enabled with --experimental. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:40:47 -08:00
Rafael Ávila de Espíndola	dd17dfcbef	Reject "OR REPLACE ... IF NOT EXISTS" in the grammar The parser now rejects having both OR REPLACE and IF NOT EXISTS in the same statement. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	e7e3dab4aa	Convert UDF parsing code to c++ For now this just constructs the corresponding c++ classes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	5c45f3b573	Update UDF syntax This updates UDF syntax to the current specification. In particular, this removes DETERMINISTIC and adds "CALLED ON NULL INPUT" and "RETURNS NULL ON NULL INPUT". Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	c75cd5989c	transport: Add support for FUNCTION and AGGREGATE to schema_change While at it, modernize the code a bit and add a test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	dac3cf5059	Clear functions between cql_test_env runs At some point we should make the function list non static, but this allows us to write tests for now. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	de1a970b93	cql: convert functions to add, remove and replace functions Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	33f9d196f9	Add iterator version of functions::find This avoids allocating a std::vector and is more flexible since the iterator can be passed to erase. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	7f9dadee5c	Implement functions::type_equals. Since the types are uniqued we can just use ==. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	5cef5a1b38	types: Add a friend visitor over data_value This is a simple wrapper that allows code that is not in the types hierarchy to visit a data_value. Will be used by UDF. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	9bf9a84e4d	types: Move the data_value visitor to a header It will be used by the UDF implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Yaron Kaikov	4a9b2a8d96	dist/docker: Add SCYLLA_REPO_URL argument to Dockerfile (#5264 ) This change adds a SCYLLA_REPO_URL argument to Dockerfile, which defines the RPM repository used to install Scylla from. When building a new Docker image, users can specify the argument by passing the --build-arg SCYLLA_REPO_URL=<url> option to the docker build command. If the argument is not specified, the same RPM repository is used as before, retaining the old default behavior. We intend to use this in release engineering infrastructure to specify RPM repositories for nightly builds of release branches (for example, 3.1.x), which are currently only using the stable RPMs.	2019-11-07 09:21:05 +02:00
Pavel Emelyanov	486e3f94d0	deps: Add libunistring-dev to debian With this, previous patch to seastar and (suddenly) xenial repo for scylla-libthrift010-dev scylla-antlr35-c++-dev the build on debian buster finally passes. Signed-off-by: Pavel Emelyanov <xemul@scyladb.com> Message-Id: <CAHTybb-QFyJ7YQW0b6pjhY_xUr-_b1w_O3K1=1FOwrNM55BkLQ@mail.gmail.com>	2019-11-01 09:03:39 +02:00
Dejan Mircevski	859883b31d	alternator: Implement GT operator in Expected Add cmp_gt and use it in check_compare() to handle the GT case. Also reactivate GT tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 17:18:22 -04:00
Dejan Mircevski	0f7d837757	alternator: Factor out check_compare() Code for check_LT(), check_GT(), etc. will be nearly identical, so factor it out into a single function that takes a comparator object. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 17:01:29 -04:00
Dejan Mircevski	a47b768959	alternator: Implement LT operator in Expected Add check_LT() function and reactivate LT tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 16:07:29 -04:00
Dejan Mircevski	ceae3c182f	alternator: Overload base64_decode on rjson::value In `1ca9dc5d47`, it was established that the correct way to base64-decode a JSON value is via string_view, rather than directly from GetString(). This patch adds a base64_decode(rjson::value) overload, which automatically uses the correct procedure. It saves typing, ensures correctness (fixing one incorrect call found), and will come in handy for future EXPECTED comparisons. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 15:56:03 -04:00
Dejan Mircevski	9955f0342f	alternator: Make unwrap_number() visible unwrap_number() is now a public function in serialization.hh instead of a static function visible only in executor.cc. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 10:46:30 -04:00
Nadav Har'El	3f859adebd	Merge: Fix filtering static columns on empty partitions Merged patch series from Piotr Sarna: An otherwise empty partition can still have a valid static column. Filtering didn't take that fact into account and only filtered full-fledged rows, which may result in non-matching rows being returned to the client. Fixes #5248	2019-10-31 10:50:21 +02:00
Pavel Emelyanov	5fe4757725	docs: The scylla's dpdk config is boolean Docs say one can say --disable-dpdk , while it's not so. It's the seastar's configure.py that has tristate -dpdk option, the scylla's one can only be enabled. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <CAHTybb-rxP8DbH-wW4Zf-w89iuCirt6T6-PjZAUfVFj7C5yb=A@mail.gmail.com>	2019-10-31 10:12:17 +02:00
Vladimir Davydov	9ea8114f8c	cql: fix CAS metric label "type" label is already in use for the counter type ("derive", "gauge", etc). Using the same label for "cas" / "non-cas" overwrites it. Let's instead call the new label "conditional" and use "yes" / "no" for its value, as suggested by Kostja. Message-Id: <3082b16e4d6797f064d58da95fb4e50b59ab795c.1572451480.git.vdavydov@scylladb.com>	2019-10-30 17:14:17 +01:00
Avi Kivity	398c482cd0	Merge "combined reader gallop mode" from Piotr " In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode. In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations. In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally. A microbenchmark was added in order to test performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization. As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping. Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode) test name before after improvement one_row 49.070ns 48.287ns 1.60% single_active 61.574us 61.235us 0.55% many_overlapping 488.193us 514.977us -5.49% disjoint_interleaved 57.462us 57.111us 0.61% disjoint_ranges 56.545us 56.006us 0.95% overlapping_partitions_disjoint_rows 127.039us 80.849us 36.36% Same results, normalized per mutation fragment: test name before after improvement one_row 16.36ns 16.10ns 1.60% single_active 109.46ns 108.86ns 0.55% many_overlapping 216.97ns 228.88ns -5.49% disjoint_interleaved 102.15ns 101.53ns 0.61% disjoint_ranges 100.52ns 99.57ns 0.95% overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36% Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz. Tests: unit(release) Fixes #3593. " * '3593-combined_reader-gallop-mode' of https://github.com/piodul/scylla: mutation_reader: gallop mode microbenchmark mutation_reader: combined reader gallop tests mutation_reader: gallop mode for combined reader mutation_reader: refactor prepare_next	2019-10-30 17:34:47 +02:00
Piotr Sarna	dd00470a44	tests: add a test case for filtering on static columns The test case covers filtering with an empty partition. Refs #5248	2019-10-30 15:34:10 +01:00
Piotr Sarna	ca6fe598ec	cql3: fix filtering on a static column for empty partitions An otherwise empty partition can still have a valid static column. Filtering didn't take that fact into account and only filtered full-fledged rows, which may result in non-matching rows being returned to the client. Fixes #5248	2019-10-30 15:31:54 +01:00
Tomasz Grabiec	9da3aec115	Merge "Mutation diff improvements" from Benny - accept diff_command option - standard input support	2019-10-30 13:40:58 +01:00
Tomasz Grabiec	0d9367e08f	Merge "Scyllatop: one pass update of multiple metrics" from Benny Update previous results dictionary using the update_metrics method. It calls metric_source.query_list to get a list of results (similar to discover()) then for each line in the response it updates results dictionary. New results may be appeneded depending on the do_append parameter (True by default). Previously, with prometheous, each metric.update called query_list resulting in O(n^2) when all metric were updated, like in the scylla_top dtest - causing test timeout when testing debug build. (E.g. dtest-debug/216/testReport/scyllatop_test/TestScyllaTop/default_start_test/)	2019-10-30 13:38:39 +01:00
Tomasz Grabiec	b7b0a53b50	Merge "Add metrics for light-weigth transactions" from Vova This patch set adds metrics useful for analyzing light-weight transaction performance. The same metrics are available in Cassandra.	2019-10-30 12:09:03 +01:00
Vladimir Davydov	f0075ba845	cql: account cas requests separately This patch adds "type" label to the following CQL metrics: inserts updates deletes batches statements_in_batches The label is set to "cas" for conditional statements and "non-cas" for unconditional statements. Note, for a batch to be accounted as CAS, it is enough to have just one conditional statement. In this case all statements within the batch are accounted as CAS as well.	2019-10-30 13:44:35 +03:00
Piotr Dulikowski	81883a9f2e	mutation_reader: gallop mode microbenchmark This microbenchmark tests performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization. As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping. Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode) test name before after improvement one_row 49.070ns 48.287ns 1.60% single_active 61.574us 61.235us 0.55% many_overlapping 488.193us 514.977us -5.49% disjoint_interleaved 57.462us 57.111us 0.61% disjoint_ranges 56.545us 56.006us 0.95% overlapping_partitions_disjoint_rows 127.039us 80.849us 36.36% Same results, normalized per mutation fragment: test name before after improvement one_row 16.36ns 16.10ns 1.60% single_active 109.46ns 108.86ns 0.55% many_overlapping 216.97ns 228.88ns -5.49% disjoint_interleaved 102.15ns 101.53ns 0.61% disjoint_ranges 100.52ns 99.57ns 0.95% overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36% Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.	2019-10-30 09:51:18 +01:00
Piotr Dulikowski	29d6842db9	mutation_reader: combined reader gallop tests	2019-10-30 09:51:18 +01:00
Piotr Dulikowski	2b4ca0c562	mutation_reader: gallop mode for combined reader In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode. In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations. In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally. Fixes #3593.	2019-10-30 09:51:18 +01:00
Piotr Dulikowski	2a46a09e7c	mutation_reader: refactor prepare_next Move out logic responsible for adding readers at partition boundary into `maybe_add_readers_at_partition_boundary`, and advancing one reader into `prepare_one`. This will allow to reuse this logic outside `prepare_next`.	2019-10-30 09:49:12 +01:00
Avi Kivity	623071020e	commitlog: change variadic stream in read_log_file to future<struct> Since seastar::streams are based on future/promise, variadic streams suffer the same fate as variadic futures - deprecation and eventual removal. This patch therefore replaces a variadic stream in commitlog::read_log_file() with a non-variadic stream, via a helper struct. Tests: unit (dev)	2019-10-29 19:25:12 +01:00
Botond Dénes	271ab750a6	scylla-gdb.py: add replica section to scylla memory Recently, scylla memory started to go beyond just providing raw stats about the occupancy of the various memory pools, to additionally also provide an overview of the "usual suspects" that cause memory pressure. As part of this, recently `46341bd63f` added a section of the coordinator stats. This patch continues this trend and adds a replica section, with the "usual suspects": * read concurrency semaphores * execution stages * read/write operations Example: Replica: Read Concurrency Semaphores: user sstable reads: 0/100, remaining mem: 84347453 B, queued: 0 streaming sstable reads: 0/ 10, remaining mem: 84347453 B, queued: 0 system sstable reads: 0/ 10, remaining mem: 84347453 B, queued: 0 Execution Stages: data query stage: 03 "service_level_sg_0" 4967 Total 4967 mutation query stage: Total 0 apply stage: 03 "service_level_sg_0" 12608 06 "statement" 3509 Total 16117 Tables - Ongoing Operations: pending writes phaser (top 10): 2 ks.table1 2 Total (all) pending reads phaser (top 10): 3380 ks.table2 898 ks.table1 410 ks.table3 262 ks.table4 17 ks.table8 2 system_auth.roles 4969 Total (all) pending streams phaser (top 10): 0 Total (all) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191029164817.99865-1-bdenes@scylladb.com>	2019-10-29 18:03:06 +01:00
Vladimir Davydov	e510288b6f	api: wire up column_family cas-related statistics	2019-10-29 19:26:18 +03:00
Vladimir Davydov	b75862610e	paxos_state: account paxos round latency This patch adds the following per table stats: cas_prepare_latency cas_propose_latency cas_commit_latency They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed by Cassandra.	2019-10-29 19:26:18 +03:00
Vladimir Davydov	21c3c98e5b	api: wire up storage_proxy cas-related statistics	2019-10-29 19:26:18 +03:00
Vladimir Davydov	c27ab87410	storage_proxy: add cas request accounting This patch implements accounting of Cassandra's metrics related to lightweight transactions, namely: cas_read_latency transactional read latency (histogram) cas_write_latency transactional write latency (histogram) cas_read_timeouts number of transactional read timeouts cas_write_timeouts number of transactional write timeouts cas_read_unavailable number of transactional read unavailable errors cas_write_unavailable number of transactional write unavailable errors cas_read_unfinished_commit number of transaction commit attempts that occurred on read cas_write_unfinished_commit number of transaction commit attempts that occurred on write cas_write_condition_not_met number of transaction preconditions that did not match current values cas_read_contention how many contended reads were encountered (histogram) cas_write_contention how many contended writes were encountered (histogram)	2019-10-29 19:25:47 +03:00
Vladimir Davydov	967a9e3967	storage_proxy: zap ballot_and_contention Pass contention by reference to begin_and_repair_paxos(), where it is incremented on every sleep. Rationale: we want to account the total number of times query() / cas() had to sleep, either directly or within begin_and_repair_paxos(), no matter if the function failed or succeeded.	2019-10-29 19:22:18 +03:00
Botond Dénes	49aa8ab8a0	scylla-gdb.py: add compatibility with Scylla 3.0 Even though every Scylla version has its own scylla-gdb.py, because we don't backport any fixes or improvements, practically we end up always using master's version when debugging older versions of Scylla too. This is made harder by the fact that both Scylla's and its dependencies' (most notably that of libstdc++ and boost) code is constantly changing between releases, requiring edits to scylla-gdb.py to make it usable with past releases. This patch attempts to make it easier to use scylla-gdb.py with past releases, more specifically Scylla 3.0. This is achieved by wrapping problematic lines in a `try: except:` and putting the backward compatible version in the `except:` clause. These lines have comments with the version they provide support for, so they can be removed when said version is not supported anymore. I did not attempt to provide full coverage, I only fixed up problems that surfaced when using my favourite commands with 3.0. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191029155737.94456-1-bdenes@scylladb.com>	2019-10-29 17:05:19 +01:00
Botond Dénes	e48f301e95	repair: repair_cf_range(): extract result of local checksum calculation only once The loop that collects the result of the checksum calculations and logs any errors. The error logging includes `checksums[0]` which corresponds to the checksum calculation on the local node. This violates the assumption of the code following the loop, which assumes that the future of `checksums[0]` is intact after the loop terminates. However this is only true when the checksum calculation is successful and is false when it fails, as in this case the loop extracts the error and logs it. When the code after the loop checks again whether said calculation failed, it will get a false negative and will go ahead and attempt to extract the value, triggering an assert failure. Fix by making sure that even in the case of failed checksum calculation, the result of `checksum[0]` is extracted only once. Fixes: #5238 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191029151709.90986-1-bdenes@scylladb.com>	2019-10-29 17:00:37 +01:00
Avi Kivity	60ea29da90	Update seastar submodule * seastar 2963970f6b...75e189c6ba (7): > posix-stack: Do auto-resolve of ipv6 scope iff not set for link-local dests > README.md: Add redpanda and smf to 'Projects using Seastar' > unix_domain_test: don't assume that at temporary_buffer is null terminated > socket_address: Use offsetof instead of null pointer > README: add projects using seastar section to readme > Adjustments for glibc 2.30 and hwloc 2.0 > Mark future::failed() as const	2019-10-29 14:34:10 +02:00
Gleb Natapov	0e9df4eaf8	lwt: mark lwt as experimental We may want to change paxos tables format and change internode protocol, so hide lwt behind experimental flag for now. Message-Id: <20191029102725.GM2866@scylladb.com>	2019-10-29 14:33:48 +02:00
Benny Halevy	79d5fed40b	mutation_fragment_stream_validator: validate end of stream in partition_key filter Currently end of stream validation is done in the destructor, but the validator may be destructed prematurely, e.g. on exception, as seen in https://github.com/scylladb/scylla/issues/5215 This patch adds a on_end_of_stream() method explicitly called by consume_pausable_in_thread. Also, the respective concepts for ParitionFilter, MutationFragmentFilter and a new on for the on_end_of_stream method were unified as FlattenedConsumerFilter. Refs #5215 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 506ff40bd447f00158c24859819d4bb06436c996)	2019-10-29 12:35:33 +01:00
Benny Halevy	d5f53bc307	mutation_fragment_stream_validator: validate partition key monotonicity Fixes #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 736360f823621f7994964fee77f37378ca934c56)	2019-10-29 12:35:33 +01:00
Gleb Natapov	e5e44bfda2	client_state: fix get_timestamp_for_paxos() to always advance a timestamp Message-Id: <20191029102336.GL2866@scylladb.com>	2019-10-29 13:07:33 +02:00
Tomasz Grabiec	c2a4c915f3	Merge "Fix a few issues with CAS requests" from Vladimir D. There are a few issues at the CQL layer, because of which the result of a CAS request execution may differ between Scylla and Cassandra. Mostly, it happens when static columns are involved. The goal of this patch set is to fix these issues, thus making Scylla's implementation of CAS yield the same results as Cassandra's.	2019-10-29 11:50:15 +01:00
Rafael Ávila de Espíndola	c74864447b	types: Simplify validate_visitor for strings We have different types for ascii and utf8, so there is no need for an extra if. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191024232911.22700-1-espindola@scylladb.com>	2019-10-29 11:02:55 +02:00
Nadav Har'El	d69ab1b588	CDC: (atomic) delta + (non-optional) pre-image data columns Merged patch series by Calle Wilund, with a few fixes by Piotr Jastrzębski: Adds delta and pre-image data column writes for the atomic columns in a cdc-enabled table. Note that in this patch set it is still unconditional. Adding option support comes in next set. Uses code more or less derived from alternator to select pre-image, using raw query interface. So should be fairly low overhead to query generation. Pre-image and delta mutations are mixed in with the actual modification mutations to generate the full cdc log (sans post-image).	2019-10-29 09:39:28 +02:00
Calle Wilund	7db393fe12	cdc_test: Add helper methods + preimage test Add filtering, sorting etc helpers + simple pre-image test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-29 07:49:05 +01:00
Vladimir Davydov	65b86d155e	cql: add static row to CAS failure result if there are static conditions Even if no rows match clustering key restrictions of a conditional statement with static columns conditions, we still must include the static column value into the CAS failure result set. For example, the following conditional DELETE statement create table t(k int, c int, s int static, v int, primary key(k, c)); insert into t(k, s) values(1, 1); delete v from t where k=1 and c=1 if v=1 and s=1; must return [applied=False, v=null, s=1] not just [applied=False, v=null, s=null] To fix that, set partition_slice::option::always_return_static_content for querying rows used for checking conditions so that we have the static row in update_parameters::prefetch_data even if no regular row matches clustering column restrictions. Plus modify cas_request:: applies_to() so that it sets is_in_cas_result_set flag for the static row in case there are static column conditions, but the result set happens to be empty. As pointed out by Tomek, there's another reason to set partition_slice:: option::always_return_static_content apart from building a correct result set on CAS failure. There could be a batch with two statements, one with clustering key restrictions which select no row, and another statement with only static column conditions. If we didn't enable this flag, we wouldn't get a static row even if it exists, and static column conditions would evaluate as if the static row didn't exist, for example, the following batch create table t(k int, c int, s int static, primary key(k, c)); insert into t(k, s) values(1, 1); begin batch insert into t(k, c) values(1, 1) if not exists update t set s = 2 where k = 1 if s = 1 apply batch; would fail although it clearly must succeed.	2019-10-28 22:30:37 +03:00
Vladimir Davydov	e0b31dd273	query: add flag to return static row on partition with no rows A SELECT statement that has clustering key restrictions isn't supposed to return static content if no regular rows matches the restrictions, see #589. However, for the CAS statement we do need to return static content on failure so this patch adds a flag that allows the caller to override this behavior.	2019-10-28 21:50:44 +03:00
Vladimir Davydov	57d284d254	cql: exclude statements not checked by cas from result set Apart from conditional statements, there may be other reading statements in a batch, e.g. manipulating lists. We must not include rows fetched for them into the CAS result set. For instance, the following CAS batch: create table t(p int, c int, i int, l list<int>, primary key(p, c)); insert into t(p, c, i) values(1, 1, 1) insert into t(p, c, i, l) values(1, 1, 1, [1, 2, 3]) begin batch update t set i=3 where p=1 and c=1 if i=2 update t set l=l-[2] where p=1 and c=2 apply batch; is supposed to return [applied] \| p \| c \| i ----------+---+---+--- False \| 1 \| 1 \| 1 not [applied] \| p \| c \| i ----------+---+---+--- False \| 1 \| 1 \| 1 False \| 1 \| 2 \| 1 To filter out such collateral rows from the result set, let's mark rows checked by conditional statements with a special flag.	2019-10-28 21:50:43 +03:00
Vladimir Davydov	74b9e80e4c	cql: fix EXISTS check that applies only to static columns If a CQL statement only updates static columns, i.e. has no clustering key restrictions, we still fetch a regular row so that we can check it against EXISTS condition. In this case we must be especially careful: we can't simply pass the row to modification_statement::applies_to, because it may turn out that the row has no static columns set, i.e. there's no in fact static row in the partition. So we filter out such rows without static columns right in cas_request::applies_to before passing them further to modification_statement::applies_to. Example: create table t(p int, c int, s int static, primary key(p, c)); insert into t(p, c) values(1, 1); insert into t(p, s) values(1, 1) if not exists; The conditional statement must succeed in this case.	2019-10-28 21:49:37 +03:00
Vladimir Davydov	8fbf344f03	cql: ignore clustering key if statement checks only static columns In case a CQL statement has only static columns conditions, we must ignore clustering key restrictions. Example: create table t(p int, c int, s int static, v int, primary key(p, c)); insert into t(p, s) values(1, 1); update t set v=1 where p=1 and c=1 if s=1; This conditional statement must successfully insert row (p=1, c=1, v=1) into the table even though there's no regular row with p=1 and c=1 in the table before it's executed, because the statement condition only applies to the static column s, which exists and matches.	2019-10-28 21:13:19 +03:00
Vladimir Davydov	54cf903bb2	cql: differentiate static from regular EXISTS conditions If a modification statement doesn't have a clustering column restriction while the table has static columns, then EXISTS condition just needs to check if there's a static row in the partition, i.e. it doesn't need to select any regular rows. Let's treat such EXIST condition like a static column condition so that we can ignore its clustering key range while checking CAS conditions.	2019-10-28 21:13:05 +03:00
Vladimir Davydov	934a87999f	cql: turn prefetch_data::row into struct This will allow us to add helper methods and store extra info in each row. For example, we can add a method for checking if a row has static columns. Also, to build CAS result set, we need to differentiate rows fetched to check conditions from those fetched for reading operations. Using struct as row container will allow us to store this information in each prefetched row.	2019-10-28 21:12:52 +03:00
Vladimir Davydov	bdd62b8bc3	cql: remove static column check from create_clustering_ranges The check is pointless, because we check exactly the same while preparing the statement, see process_where_clause() method of modification_statement.	2019-10-28 21:12:43 +03:00
Vladimir Davydov	a8ddbffa75	cql: fix applies_only_to_static_columns check Currently, we set _sets_regular_columns/_sets_static_columns flags when adding regular/static conditions to modification_statement. We use them in applies_only_to_static_columns() function that returns true iff _sets_static_columns is set and _sets_regular_columns is clear. We assume that if this function returns true then the statement only deals with static columns and so must not have clustering key restrictions. Usually, that's true, but there's one exception: DELETE FROM ... statement that deletes whole rows. Technically, this statement doesn't have any column operations, i.e. _sets_regular_columns flag is clear. So if such a statement happens to have a static condition, we will assume that it only applies to static columns and mistakenly raise an error. Example: create table t(k int, c int, s int static, v int, primary key(k, c)); delete from t where k=1 and c=1 if s=1; To fix this, let's not set the above mentioned flags when adding conditions and instead check if _column_conditions array is empty in applies_only_to_static_columns().	2019-10-28 21:12:36 +03:00
Vladimir Davydov	fbb11dac11	cql: set conditions before processing where clause modification_statement::process_where_clause() assumes that both operations and conditions has been added to the statement when it's called: it uses this information to raise an error in case the statement restrictions are incompatible with operations or conditions. Currently, operations are set before this function is called, but not conditions. This results in "Invalid restrictions on clustering columns since the {} statement modifies only static columns" error while trying to execute the following statements: create table t(k int, c int, s int static, v int, primary key(k, c)); delete s from t where k=1 and c=1 if v=1; update t set s=1 where k=1 and c=1 if v=1; Fix this by always initializing conditions before processing WHERE clause.	2019-10-28 21:12:22 +03:00
Botond Dénes	edc1750297	scylla-gdb.py: introduce scylla smp-queues Print a histogram of the number of async work items in the shard's outgoing smp queues. Example: (gdb) scylla smp-queues 10747 17 -> 3 ++++++++++++++++++++++++++++++++++++++++ 721 17 -> 19 ++ 247 17 -> 20 + 233 17 -> 10 + 210 17 -> 14 + 205 17 -> 4 + 204 17 -> 5 + 198 17 -> 16 + 197 17 -> 6 + 189 17 -> 11 + 181 17 -> 1 + 179 17 -> 13 + 176 17 -> 2 + 173 17 -> 0 + 163 17 -> 8 + 1 17 -> 9 + Useful for identifying the target shard, when `scylla task_histogram` indicates a high number of async work items. To produce the histogram the command goes over all virtual objects in memory and identifies the source and target queues of each `seastar::smp_message_queue::async_work_item` object. Practically the source queue will always be that of the current shard. As this scales with the number of virtual objects in memory, it can take some time to run. An alternative implementation would be to instead read the actual smp queues, but the code of that is scary so I went for the simpler and more reliable solution. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191028132456.37796-1-bdenes@scylladb.com>	2019-10-28 15:42:55 +02:00
Tomasz Grabiec	3b37027598	Merge "lwt: implement basic lightweight transactions support" from Kostja This patch set introduces light-weight transactions support to ScyllaDB. It is a subset of the full series, which adds basic LWT support and which has been reviewed thus far.	2019-10-28 11:45:28 +01:00
Tomasz Grabiec	f745819ed7	Merge "lwt: paxos protocol implementation" from Gleb This is paxos implementation for LWT. LWT itself is not included in the patch so the code is essentially is not wired yet (except read path).	2019-10-28 11:29:40 +01:00
Avi Kivity	f8ba96efcf	Merge "test_udt_mutations fixes" from Benny " mutation_test/test_udt_mutations kept failing on my machine and I tracked it down to the 3rd patch in this series (use int64_t constants for long_type). While at it, this series also fixes a comment and the end iterator in BOOST_REQUIRE(std::all_of(...)) mutation_test: test_udt_mutations: fixup udt comment mutation_test: test_udt_mutations: fix end iterator in call to std::all_of mutation_test: test_udt_mutations: use int64_t constants for long_type Test: mutation_test(dev, debug) " * 'test_udt_mutations-fixes' of https://github.com/bhalevy/scylla: mutation_test: test_udt_mutations: use int64_t constants for long_type mutation_test: test_udt_mutations: fix end iterator in call to std::all_of mutation_test: test_udt_mutations: fixup udt comment	2019-10-28 10:43:52 +02:00
Calle Wilund	36328acf60	cql_assertions: Change signature to accept sstring	2019-10-28 06:16:12 +01:00
Calle Wilund	7d98f735ee	cdc: Add static columns to data/preimage mutations Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Calle Wilund	19bba5608a	cdc: Create and perform a pre-image select for mutations As well as generate per-image rows in resulting log mutation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Calle Wilund	d4ee1938c7	cdc: Add modification record for regular atomic values in mutations Fills in the data columns for regular columns iff they are atomic (not unfrozed collections)	2019-10-28 06:16:12 +01:00
Calle Wilund	3fdcbd9dff	cdc: Set row op in log Adds actual operation (part delete, range delete, update) to cdc log	2019-10-28 06:16:12 +01:00
Calle Wilund	8a6b72f47e	cdc: Add pre-image select generator method Based on a mutation, creates a pre-image select operation. Note, this uses raw proxy query to shortcut parsing etc, instead of trying to cache by generated query. Hypothesis is that this is essentially faster. The routine assumes all rows in a mutation touch same static/regular columns. If this is not always true it will need additional calculations. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Calle Wilund	d74f32b07a	cql3::untyped_result_set: Add constructor from cql3:;result_set Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Calle Wilund	3ed7a9dd69	cql3::untyped_result_set: Add view getter to make non-intrusive read chaper Also use in actual data conversion.	2019-10-28 06:16:12 +01:00
Calle Wilund	451bb7447d	cdc: Add log / log data column operation types and make data cols tuples of these Makes static/regular data columns tuple<op, value, ttl> as per spec. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Konstantin Osipov	e555dc502e	lwt: implement basic lightweight transactions support Support single-statement conditional updates and as well as batches. This patch almost fully rewrites column_condition.cc, implementing is_satisfied_by(). Most of the remaining complications in column_condition implementation come from the need to properly handle frozen and multi-cell collection in predicates - up until now it was not possible to compare entire collection values between each other. This is further complicated since multi-cell lists and sets are returned as maps. We can no longer assume that the columns fetched by prefetch operation are non-frozen collections. IF EXISTS/IF NOT EXISTS condition fetches all columns, besides, a column may be needed to check other condition. When fetching the old row for LWT or to apply updates on list/columns, we now calculate precisely the list of columns to fetch. The primary key columns are also included in CAS batch result set, and are thus also prefetched (the user needs them to figure out which statements failed to apply). The patch is cross-checked for compatibility with cassandra-3.11.4-1545-g86812fa502 but does deviate from the origin in handling of conditions on static row cells. This is addressed in future series.	2019-10-27 23:42:49 +03:00
Konstantin Osipov	67e68dabf0	lwt: ensure we don't crash when we get a LIKE	2019-10-27 23:42:49 +03:00
Konstantin Osipov	f8f36d066c	lwt: check for unsupported collection type in condition element access We don't support conditions with element access on non-frozen UDTs, check that only supported collection types are supplied.	2019-10-27 23:42:49 +03:00
Konstantin Osipov	c9f0adf616	lwt: rewrite cql3::raw::column_condition::prepare() Restructure the code to avoid quite a bit of code duplication.	2019-10-27 23:42:47 +03:00
Konstantin Osipov	c2217df4d8	lwt: reorganize column_condition declaration and add comments	2019-10-27 23:42:03 +03:00
Konstantin Osipov	22b0240fe7	lwt: remove useless code in column_condition.hh Each column_condition and raw::column_condition construction case had a static method wrapping its constructor, simply supplying some defaults. This neither improves clarity nor maintainability.	2019-10-27 23:42:03 +03:00
Konstantin Osipov	3e25b83391	lwt: propagate if_exists condition from the parser to AST UPDATE ... IF EXISTS is legal, but IF EXISTS condition was not propagated from the parser to AST (rad::update_statement).	2019-10-27 23:42:03 +03:00
Konstantin Osipov	df28985295	lwt: introduce cql_statment_opt_metadata cql_statement_opt_metadata is an interim node in cql (prepared) statement hierarchy parenting modification_statement and batch_statement. If there is IF condition in such statements, they return a result set, and thus have a result set metadata. The metadata itself is filled in a subsequent patch.	2019-10-27 23:42:03 +03:00
Vladimir Davydov	c8869e803e	lwt: remove commented out validateWhereClauseForConditions This logic was implemented in validate_where_clause_for_conditions() method of modification_statement class.	2019-10-27 23:42:03 +03:00
Konstantin Osipov	eb5e82c6a1	lwt: add CAS where clause validation Add checks for conditional modification statement limitations: - WHERE clustering_key IN (list) IF condition is not supported since a conditions is evaluated for a single row/cell, so allowing multiple rows to match the WHERE clause would create ambiguity, - the same is true for conditional range deletions. - ensure all clustering restrictions are eq for conditional delete We must not allow statements like create table t(p int, c int, v int, primary key (p, c)); delete from t where p=1 and c>0 if v=1; because there may be more than one statement in a partition satisfying WHERE clause, in which case it's unclear which of them should satisfy IF condition: all or just one. Raising an error on such a statement is consistent with Cassandra's behavior.	2019-10-27 23:42:03 +03:00
Konstantin Osipov	203eb3eccc	lwt: sleep a random amount of time when retrying CAS Sleep a random interval between 0 and 100 ms before retrying CAS. Reuse sleep function, make the distribution object thread local.	2019-10-27 23:42:03 +03:00
Konstantin Osipov	0674fab05c	lwt: implement storage_proxy::cas() Introduce service::cas_request abstract base class which can be used to parameterize Paxos logic. Implement storage_proxy::cas() - compare and swap - the storage proxy entry point for lightweight transactions.	2019-10-27 23:42:03 +03:00
Gleb Natapov	70adf65341	storage_proxy: make mutation holder responsible for mutation operation Currently the code that manipulates mutations during write need to check what kind of mutations are those and (sometimes) choose different code paths. This patch encapsulates the differences in virtual functions of mutation_holder object, so that high level code will not concern itself with the details. The functions that are added: apply_locally(), apply_remotely() and store_hint().	2019-10-27 23:21:51 +03:00
Gleb Natapov	b3e01a45d7	lwt: storage_proxy: implement paxos protocol This patch adds all functionality needed for Paxos protocol. The implementation does not strictly adhere to Paxos paper since the original paper allows setting a value only once, while for LWT we need to be able to make another Paxos round after "learn" phase completes, which requires things like repair to be introduced.	2019-10-27 23:21:51 +03:00
Gleb Natapov	8d6201a23b	lwt: Add RPC verbs needed for paxos implementation Paxos protocol has three stages: prepare, accept, learn. This patch adds rpc verb for each of those stages. To be term compatible with Cassandra the patch calls those stages: prepare, propose, commit.	2019-10-27 23:21:51 +03:00
Gleb Natapov	d1774693bf	lwt: Define state needed by paxos and persist it Paxos protocol relies on replicas having a state that persists over crashes/restarts. This patch defines such state and stores it in the database itself in the paxos table to make it persistent. The stored state is: in_progress_ballot - promised ballot proposal - accepted value proposal_ballot - the ballot of the accepted value most_recent_commit - most recently learned value most_recent_commit_at - the ballot of the most recently learned value	2019-10-27 23:21:51 +03:00
Gleb Natapov	15b935b95d	lwt: add data structures needed for paxos implementation This patch add two data structures that will be used by paxos. First one is "proposal" which contains a ballot and a mutation representing a value paxos protocol is trying to set. Second one is "prepare_response" which is a value returned by paxos prepare stage. It contains currently accepted value (if any) and most recently learned value (again if any). The later is used to "repair" replicas that missed previous "learn" message.	2019-10-27 23:21:51 +03:00
Benny Halevy	1895fb276e	mutation_test: test_udt_mutations: use int64_t constants for long_type Otherwise they are decomposed and serialized as 4-byte int32. For example, on my machine cell[1] looked like this: {0002, atomic_cell{0000000310600000;ts=0;expiry=-1,ttl=0}} and it failed cells_equal against: {0002, atomic_cell{0000000300000000;ts=0;expiry=-1,ttl=0}} Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-27 20:51:29 +02:00
Benny Halevy	fec772538c	mutation_test: test_udt_mutations: fix end iterator in call to std::all_of Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-27 19:49:25 +02:00
Benny Halevy	9c8cf9f51d	mutation_test: test_udt_mutations: fixup udt comment Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-27 19:47:43 +02:00
Benny Halevy	76581e7f14	docs/debugging.md: fix gdb command for retrieving shared libraries information This correct command is `info sharedlibrary`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191027153541.27286-1-bhalevy@scylladb.com>	2019-10-27 18:15:09 +02:00
Dejan Mircevski	2a136ba1bc	alternator: Fix race condition in set_routes() server::set_routes() was setting the value of server::_callbacks. This led to a race condition, as set_routes() is invoked on every shard simultaneously. It is also unnecessary, since _callbacks can be initialized in the constructor. Fixes #5220. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-27 12:31:24 +02:00
Avi Kivity	27ef73f4f1	Merge "Report file I/O in CQL tracing when reading from sstables." from Kamil " Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future. Use this file to trace reads from SSTable data and index files. Fixes #4908. " * 'traced_file' of https://github.com/kbr-/scylla: sstables: report sstable index file I/O in CQL tracing sstables: report sstable data file I/O in CQL tracing tracing: add traced_file class	2019-10-26 22:53:37 +03:00
Avi Kivity	2b856a7317	Merge "Support non-frozen UDTs." from Kamil " This change allows creating tables with non-frozen UDT columns. Such columns can then have single fields modified or deleted. I had to do some refactoring first. Please read the initial commit messages, they are pretty descriptive of what happened (read the commits in the order they are listed on my branch: https://github.com/kbr-/scylla/commits/udt, starting from kbr-@8eee36e, in order to understand them). I also wrote a bunch of documentation in the code. Fixes #2201. " * 'udt' of https://github.com/kbr-/scylla: (64 commits) tests: too many UDT fields check test collection_mutation: add a FIXME. tests: add a non-frozen UDT materialized view test tests: add a UDT mutation test. tests: add a non-frozen UDT "JSON INSERT" test. tests: add a non-frozen UDT to for_each_schema_change. tests: more non-frozen UDT tests. tests: move some UDT tests from cql_query_test.cc to new file. types: handle trailing nulls in tuples/UDTs better. cql3: enable deleting single fields of non-frozen UDTs. cql3: enable setting single fields of a non-frozen UDT. cql3: enable non-frozen UDTs. cql3: introduce user_types::marker. cql3: generalize function_call::make_terminal to UDTs. cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs. cql3: use a dedicated setter operation for inserting user types. cql3: introduce user_types::value. types: introduce to_bytes_opt_vec function. cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>. cql3: make cql3_type::raw_ut::to_string distinguish frozenness. ...	2019-10-26 22:53:37 +03:00
Piotr Sarna	657e7ef5a5	alternator: add alternator health check The health check is performed simply by issuing a GET request to the alternator port - it returns the following status 200 response when the server is healthy: $ curl -i localhost:8000 HTTP/1.1 200 OK Content-Type: text/plain Content-Length: 23 Server: Seastar httpd Date: 21 Oct 2019 12:55:33 GMT healthy: localhost:8000 This commit comes with a test. Fixes #5050 Message-Id: <3050b3819661ee19640c78372e655470c1e1089c.1571921618.git.sarna@scylladb.com>	2019-10-26 18:14:18 +03:00
Botond Dénes	01e913397a	tests: memtable_test: flush_reader_test: compare compacted mutations To filter out artificial differences due to different representation of an equivalent set of writes. Fixes: #5207 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191024103718.29266-1-bdenes@scylladb.com>	2019-10-26 18:14:18 +03:00
Kamil Braun	432ef7c9af	sstables: report sstable index file I/O in CQL tracing Use tracing::make_traced_file when reading from the index file in index_reader.	2019-10-25 14:10:28 +02:00
Kamil Braun	394c36835a	sstables: report sstable data file I/O in CQL tracing Use tracing::make_traced_file when creating an sstable input_stream. To achieve that, trace_state needs to be plumbed down through some functions.	2019-10-25 14:10:28 +02:00
Kamil Braun	a8c9d1206a	tracing: add traced_file class This is a thin wrapper over the `seastar::file` class which adds CQL trace messages before and after I/O operations.	2019-10-25 14:10:24 +02:00
Kamil Braun	2889edea3e	tests: too many UDT fields check test	2019-10-25 12:05:10 +02:00
Kamil Braun	adfc04ebec	collection_mutation: add a FIXME. We could use iterators over cells instead of a vector of cells in collection_mutation(_view)_description. Then some use cases could provide iterators that construct the cells "on the fly".	2019-10-25 12:05:10 +02:00
Kamil Braun	45d2a96980	tests: add a non-frozen UDT materialized view test	2019-10-25 12:05:10 +02:00
Kamil Braun	e0c233ede1	tests: add a UDT mutation test.	2019-10-25 12:05:08 +02:00
Kamil Braun	a21d12faae	tests: add a non-frozen UDT "JSON INSERT" test.	2019-10-25 12:04:44 +02:00
Kamil Braun	ae3464da45	tests: add a non-frozen UDT to for_each_schema_change.	2019-10-25 12:04:44 +02:00
Kamil Braun	b87b700e66	tests: more non-frozen UDT tests.	2019-10-25 12:04:44 +02:00
Kamil Braun	474742ac5d	tests: move some UDT tests from cql_query_test.cc to new file.	2019-10-25 12:04:44 +02:00
Kamil Braun	612de1f4e3	types: handle trailing nulls in tuples/UDTs better. Comparing user types after adding new fields was bugged. In the following scenario: create type ut (a int); create table cf (a int primary key, b frozen<ut>); insert into cf (a, b) values (0, (0)); alter type ut add b int; select * from cf where b = {a:0,b:null}; the row with a = 0 should be returned, even though the value stored in the database is shorter (by one null) than the value given by the user. Until now it wouldn't have.	2019-10-25 12:04:44 +02:00
Kamil Braun	1a9034e38a	cql3: enable deleting single fields of non-frozen UDTs. This was already possible by setting the field to null, but now it supports the DELETE syntax.	2019-10-25 12:04:44 +02:00
Kamil Braun	4d271051dd	cql3: enable setting single fields of a non-frozen UDT. The commit introduces the necessary modifications to the grammar, a set_field raw operation, and a setter_by_field operation.	2019-10-25 12:04:44 +02:00
Kamil Braun	e74b5deb5d	cql3: enable non-frozen UDTs. Add a cluster feature for non-frozen UDTs. If the cluster supports non-frozen UDTs, do not return an error message when trying to create a table with a non-frozen user type.	2019-10-25 12:04:44 +02:00
Kamil Braun	7ac7a3994d	cql3: introduce user_types::marker. cql3::user_types::marker is a dedicated cql3::abstract_marker for user type placeholders in prepared CQL queries. When bound, it returns a user_types::value.	2019-10-25 12:04:44 +02:00
Kamil Braun	36999c94f4	cql3: generalize function_call::make_terminal to UDTs. Use the dedicated user_types::value. There is no way this code can be executed now, so I left a TODO.	2019-10-25 12:04:44 +02:00
Kamil Braun	49a7461345	cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs. For user types, use its dedicated setter and value.	2019-10-25 12:04:44 +02:00
Kamil Braun	40f9ce2781	cql3: use a dedicated setter operation for inserting user types. cql3::user_types::setter is a dedicated cql3::operation for inserting and updating user types. It handles the multi-cell (non-frozen) case.	2019-10-25 12:04:44 +02:00
Kamil Braun	51be1e3e9d	cql3: introduce user_types::value. This is a dedicated multi_item_terminal for user type values. Will be useful in future commits.	2019-10-25 12:04:44 +02:00
Kamil Braun	abe6c2d3d2	types: introduce to_bytes_opt_vec function. It converts a vector<bytes_view_opt> to a vector<bytes_opt>. Used in a bunch of places.	2019-10-25 12:04:44 +02:00
Kamil Braun	8ff2aebd76	cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>. Previously it returned vector<cql3::raw_value>, even though we don't use unset values when setting a UDT value (fields that are not provided become nulls. Thats how C* does it). This simplifies future implementation of user_types::{value, setter}.	2019-10-25 12:04:44 +02:00
Kamil Braun	f0a3af6adc	cql3: make cql3_type::raw_ut::to_string distinguish frozenness. This is used in error messages and may be useful.	2019-10-25 12:04:44 +02:00
Kamil Braun	c89de228e3	cql3: generalize some error messages to UDTs	2019-10-25 12:04:44 +02:00
Kamil Braun	fd3bc27418	cql3: disallow non-frozen UDTs when creating secondary indexes	2019-10-25 12:04:44 +02:00
Kamil Braun	ff0bd0bb7a	cql3: check for nested non-frozen UDTs in create_type_statement.	2019-10-25 12:04:44 +02:00
Kamil Braun	adf857e9ed	cql3: add cql3_type::is_user_type. This will be used in future commits.	2019-10-25 12:04:44 +02:00
Kamil Braun	6ccb1ee19f	cql3: generalize create_table_statement::raw_statement::prepare to UDTs. Check for UDT with nested non-frozen collection. Check for UDT with COMPACT STORAGE. Check for UDT inside PRIMARY KEY.	2019-10-25 12:04:44 +02:00
Kamil Braun	a8c7670722	types: add multi_cell field to user_type_impl. is_value_compatible_with_internal and update_user_type were generalized to the non-frozen case. For now, all user_type_impls in the code are non-multi-cell (frozen). This will be changed in future commits.	2019-10-25 12:04:44 +02:00
Kamil Braun	b904d04925	cql3: add a TODO to implement column_conditions for UDTs. This will become relevant after LWT is implemented.	2019-10-25 12:04:44 +02:00
Kamil Braun	44534a4a0a	sstables: generalize some comments to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	b38b8af0f2	schema: generalize compound_name to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	270cf2b289	query-result-set: generalize result_set_builder to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	2ada219f2c	view: generalize create_virtual_column and maybe_make_virtual to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	574e1cd514	tests: generalize timestamp_based_spliiting_writer and bucket_writer to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	6da89e40df	tests: generalize random_schema.cc:generate_collection to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	0fbfb67cbb	tests: generalize mutation_test.cc summaries to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	a3a2f65fbf	types: generalize serialize_for_cql to UDTs. Also introduces a helper "linearized" function, which implements a pattern occurring in all serialize_for_cql_aux functions.	2019-10-25 12:04:44 +02:00
Kamil Braun	05d4b2e1a4	tests: generalize data_model.cc:mutation_description::build to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	338fde672a	mp_row_consumer: generalize consume_cell (kl) and consume_column (mc) to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	5e447e3250	mutation_partition_view: generalize read_collection_cell to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	90927c075a	converting_mutation_partition_applier: generalize accept_cell to UDTs.	2019-10-25 12:04:42 +02:00
Kamil Braun	d9baff0e4b	collection_mutation: generalize collection_mutation.cc:difference to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	a344019b25	collection_mutation: generalize collection_mutation_view::last_update to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	691f00408d	collection_mutation: generalize merge to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	7f5cd8e8ce	collection_mutation: generalize collection_mutation_view_description::materialize to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	20b42b1155	collection_mutation: generalize collection_mutation_view::is_any_live to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	323370e4ba	collection_mutation: generalize deserialize_collection_mutation to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	393974df3b	cql3: make {lists,maps,sets}::value::from_serialized take const {}_type&. This will simplify the code a bit where from_serialized is used after switching to visitors. Also reduces the number of shared_ptr copies.	2019-10-25 10:49:19 +02:00
Kamil Braun	4327bba0db	types: introduce `(de)serialize_field_index` functions. These functions are used to translate field indices, which are used to identify fields inside UDTs, from/to a serialized representation to be stored inside sstables and mutations. They do it in a way that is compatible with C*.	2019-10-25 10:49:19 +02:00
Kamil Braun	90d05eb627	cql3: reject too long user-defined types	2019-10-25 10:49:19 +02:00
Kamil Braun	0f8f950b74	cql3: optimize multi_item_terminal::get_elements(). Now it returns const std::vector<bytes_opt>& instead of std::vector<bytes_opt>.	2019-10-25 10:49:19 +02:00
Kamil Braun	4374982de0	types: collection_type_impl::to_value becomes serialize_for_cql. The purpose of collection_type_impl::to_value was to serialize a collection for sending over CQL. The corresponding function in origin is called serializeForNativeProtocol, but the name is a bit lengthy, so I settled for serialize_for_cql. The method now became a free-standing function, using the visit function to perform a dispatch on the collection type instead of a virtual call. This also makes it easier to generalize it to UDTs in future commits. Remove the old serialize_for_native_protocol with a FIXME: implement inside. It was already implemented (to_value), just called differently. remove dead methods: enforce_limit and serialized_values. The corresponding methods in C* are auxiliary methods used inside serializeForNativeProtocol. In our case, the entire algorithm is wholly written in serialize_for_cql.	2019-10-25 10:49:19 +02:00
Kamil Braun	e5c0a992ef	cql3: make cql3_type::raw::to_string private. It only needs to be used in operator<<, which is a friend of cql3_type::raw.	2019-10-25 10:42:58 +02:00
Kamil Braun	ff4d857a9d	cql3: remove a dynamic_pointer_cast to user_type_impl. There exists a method to check if something is a user type: is_user_type(); use it instead.	2019-10-25 10:42:58 +02:00
Kamil Braun	d8f8908d34	types: introduce user_type_impl::idx_of_field method. Each field of a user type has its index inside the type. This method allows to find it easily, which is needed in a bunch of places.	2019-10-25 10:42:58 +02:00
Kamil Braun	c77643a345	cql3: make cql3_type::_frozen protected. Add is_frozen() method. Noone modifies _frozen from the outside. Moving the field to `protected` makes it harder to introduce bugs.	2019-10-25 10:42:58 +02:00
Kamil Braun	d83ebe1092	collection_mutation: move collection_type_impl::difference to collection_mutation.hh.	2019-10-25 10:42:58 +02:00
Kamil Braun	7e3bbe548c	collection_mutation: move collection_type_impl::merge to collection_mutation.hh.	2019-10-25 10:42:58 +02:00
Kamil Braun	a41277a7cd	collection_mutation: move collection_type_impl::last_update to collection_mutation_view	2019-10-25 10:42:58 +02:00
Kamil Braun	30802f5814	collection_mutation: move collection_type_impl::is_any_live to collection_mutation_view	2019-10-25 10:42:58 +02:00
Kamil Braun	e16ba76c2e	collection_mutation: move collection_type_impl::is_empty to collection_mutation_view.	2019-10-25 10:42:58 +02:00
Kamil Braun	bbdb438d89	collection_mutation: easier (de)serialization of collection_mutation(s). `collection_type_impl::serialize_mutation_form` became `collection_mutation(_view)_description::serialize`. Previously callers had to cast their data_type down to collection_type to use serialize_mutation_form. Now it's done inside `serialize`. In the future `serialize` will be generalized to handle UDTs. `collection_type_impl::deserialize_mutation_form` became a free standing function `deserialize_collection_mutation` with similiar benefits. Actually, noone needs to call this function manually because of the next paragraph. A common pattern consisting of linearizing data inside a `collection_mutation_view` followed by calling `deserialize_mutation_form` has been abstracted out as a `with_deserialized` method inside collection_mutation_view. serialize_mutation_form_only_live was removed, because it hadn't been used anywhere.	2019-10-25 10:42:58 +02:00
Kamil Braun	e4101679e4	collection_mutation: generalize constructor of collection_mutation to abstract_type. The constructor doesn't use anything specific to collection_type_impl. In the future it will also handle non-frozen user types.	2019-10-25 10:42:58 +02:00
Kamil Braun	b1d16c1601	types: move collection_type_impl::mutation(_view) out of collection_type_impl. collection_type_impl::mutation became collection_mutation_description. collection_type_impl::mutation_view became collection_mutation_view_description. These classes now reside inside collection_mutation.hh. Additional documentation has been written for these classes. Related function implementations were moved to collection_mutation.cc. This makes it easier to generalize these classes to non-frozen UDTs in future commits. The new names (together with documentation) better describe their purpose.	2019-10-25 10:19:45 +02:00
Kamil Braun	c0d3e6c773	atomic_cell: move collection_mutation(_view) to a new file. The classes 'collection_mutation' and 'collection_mutation_view' were moved to a separate header, collection_mutation.hh. Implementations of functions that operate on these classes, including some methods of collection_type_impl, were moved to a separate compilation unit, collection_mutation.cc. This makes it easier to modify these structures in future commits in order to generalize them for non-frozen User Defined Types. Some additional documentation has been written for collection_mutation.	2019-10-25 10:19:45 +02:00
Kamil Braun	c90ea1056b	Remove mutation_partition_applier. It had been replaced by partition_builder in commit `dc290f0af7`.	2019-10-25 10:19:45 +02:00
Asias He	f32ae00510	gossip: Limit number of pending gossip ACK2 messages Similar to "gossip: Limit number of pending gossip ACK messages", limit the number of pending gossip ACK2 messages in gossiper::handle_ack_msg. Fixes #5210	2019-10-25 12:44:28 +08:00
Asias He	15148182ab	gossip: Limit number of pending gossip ACK messages In a cross-dc large cluster, the receiver node of the gossip SYN message might be slow to send the gossip ACK message. The ack messages can be large if the payload of the application state is big, e.g., CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK message can consume unlimited amount of memory which causes OOM eventually. To fix, this patch queues the SYN message and handles it later if the previous ACK message is still being sent. However, we only store the latest SYN message. Since the latest SYN message from peer has the latest information, so it is safe to drop the previous SYN message and keep the latest one only. After this patch, there can be at most 1 pending SYN message and 1 pending ACK message per peer node. Fixes #5210	2019-10-25 12:44:28 +08:00
Nadav Har'El	8bffb800e1	alternator: Use system_auth.roles for alternator authorization Merged patch series from Piotr Sarna: This series couples system_auth.roles with authorization routines in alternator. The `salted_hash` field, which is every user's hashed password, is used as a secret key for the signature generation in alternator. This series also adds related expiration verifications for alternator signatures. It also comes with more test cases and docs updates. Tests: alternator(local, remote), manual Piotr Sarna (11): alternator: add extracting key from system_auth.roles alternator: futurize verify_signature function alternator: move the api handler to a separate function alternator: use keys from system_auth.roles for authorization alternator: add key cache to authorization alternator-test: add a wrong password test alternator: verify that the signature has not expired alternator: add additional datestamp verification alternator-test: add tests for expired signatures docs: update alternator entry for authorization alternator-test: add authorization to README alternator-test/conftest.py \| 2 +- alternator-test/test_authorization.py \| 44 ++++++++- alternator-test/test_describe_endpoints.py \| 2 +- alternator/auth.hh \| 15 ++- alternator/server.hh \| 10 +- alternator/auth.cc \| 62 +++++++++++- alternator/server.cc \| 106 ++++++++++++--------- alternator-test/README.md \| 28 ++++++ docs/alternator/alternator.md \| 7 +- 9 files changed, 221 insertions(+), 55 deletions(-)	2019-10-23 20:51:08 +03:00
Tomasz Grabiec	e621db591e	Merge "Fix TTL serialization breakage" from Avi ommit `93270dd` changed gc_clock to be 64-bit, to fix the Y2038 problem. While 64-bit tombstone::deletion_time is serialized in a compatible way, TTLs (gc_clock::duration) were not. This patchset reverts TTL serialization to the 32-bit serialization format, and also allows opting-in to the 64-bit format in case a cluster was installed with the broken code. Only Scylla 3.1.0 is vulnerable. Fixes #4855 Tests: unit (dev)	2019-10-23 18:23:26 +02:00
Tomasz Grabiec	71720be4f7	Merge "storage_service: Reject nodetool cleanup when there is pending ranges" from Asias From Shlomi: 4 node cluster Node A, B, C, D (Node A: seed) cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node> cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node> while read is progressing Node D: nodetool decommission Node A: nodetool status node - wait for UL Node A: nodetool cleanup (while decommission progresses) I get the error on c-s once decommission ends java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated The problem is when a node gets new ranges, e.g, the bootstrapping node, the existing nodes after a node is removed or decommissioned, nodetool cleanup will remove data within the new ranges which the node just gets from other nodes. To fix, we should reject the nodetool cleanup when there is pending ranges on that node. Note, rejecting nodetool cleanup is not a full protection because new ranges can be assigned to the node while cleanup is still in progress. However, it is a good start to reject until we have full protection solution. Refs: #5045	2019-10-23 17:45:41 +02:00
Avi Kivity	2970578677	config: add configuration option for 3.1.0 heritage clusters Scylla 3.1.0 broke the serialization format for TTLs. Later versions corrected it, but if a cluster was originally installed as 3.1.0, it will use the broken serialization forever. This configuration option allows upgrades from 3.1.0 to succeed, by enabling the broken format even for later versions.	2019-10-23 18:36:35 +03:00
Avi Kivity	bf4c319399	gc_clock, serialization: define new serialization for gc_clock::duration (aka TTLs) Scylla 3.1.0 inadvertently changed the serialization format of TTLs (internally represented as gc_clock::duration) from 32-bit to 64-bit, as part of preparation for Y2038 (which comes earlier for TTLed cells). This breaks mutations transported in a mixed cluster. To fix this, we revert back to the 32-bit format, unless we're in a 3.1.0- heritage cluster, in which case we use the 64-bit format. Overflow of a TTL is not a concern, since TTLs are capped to 20 years by the TTL layer. An assertion is added to verify this. This patch only defines a variable to indicate we're in a 3.1.0 heritage cluster, but a way to set it is left to a later patch.	2019-10-23 18:36:33 +03:00
Avi Kivity	771e028c1a	Update seastar submodule * seastar 6bcb17c964...2963970f6b (4): > Merge "IPv6 scope support and network interface impl" from Calle > noncopyable_function: do not copy uninitialized data > Merge "Move smp and smp queue out of reactor" from Asias > Consolidate posix socket implementations	2019-10-23 16:43:02 +03:00
Piotr Sarna	472e3cb4e1	alternator-test: add authorization to README The README paragraph informs about turning on authorization with: alternator-enforce-authorization: true and has a short note on how to set up the secret key for tests.	2019-10-23 15:05:39 +02:00
Piotr Sarna	280eb28324	docs: update alternator entry for authorization The document now mentions that secret keys are extracted from the system_auth.roles table.	2019-10-23 15:05:39 +02:00
Piotr Sarna	ebb0af3500	alternator-test: add tests for expired signatures The first test case ensures that expired signatures are not accepted, while the second one checks that signatures with dates that reach out too far into the future are also refused.	2019-10-23 15:05:39 +02:00
Piotr Sarna	a0a33ae4f3	alternator: add additional datestamp verification The authorization signature contains both a full obligatory date header and a shortened datestamp - an additional verification step ensures that the shortened stamp matches the full date.	2019-10-23 15:05:39 +02:00
Piotr Sarna	718cba10a1	alternator: verify that the signature has not expired AWS signatures have a 15min expiration policy. For compatibility, the same policy is applied for alternator requests. The policy also ensures that signatures expanding more than 15 minutes into the future are treated as unsafe and thus not accepted.	2019-10-23 15:05:39 +02:00
Piotr Sarna	e90c4a8130	alternator-test: add a wrong password test The additional test case submits a request as a user that is expected to exist (in the local setup), but the provided password is incorrect. It also updates test_wrong_key_access so it uses an empty string for trying to authenticate as an inexistent user - in order to cover more corner cases.	2019-10-23 15:05:39 +02:00
Piotr Sarna	524b03dea5	alternator: add key cache to authorization In order to avoid fetching keys from system_auth.roles system table on every request, a cache layer is introduced. And in order not to reinvent the wheel, the existing implementation of loading_cache with max size 1024 and a 1 minute timeout is used.	2019-10-23 15:05:39 +02:00
Piotr Sarna	6dee7737d7	alternator: use keys from system_auth.roles for authorization Instead of having a hardcoded secret key, the server now verifies an actual key extracted from system_auth.roles system table. This commit comes with a test update - instead of 'whatever':'whatever', the credentials used for a local run are 'alternator':'secret_pass', which matches the initial contents of system_auth.roles table, which acts as a key store. Fixes #5046	2019-10-23 15:05:39 +02:00
Piotr Sarna	388b492040	alternator: move the api handler to a separate function The lambda used for handling the api request has grown a little bit too large, so it's moved to a separate method. Along with it, the callbacks are now remembered inside the class itself.	2019-10-23 15:05:39 +02:00
Piotr Sarna	a93cf12668	alternator: futurize verify_signature function The verify_signature utility will later be coupled with Scylla authorization. In order to prepare for that, it is first transformed into a function that returns future<>, and it also becomes a member of class server. The reason it becoming a member function is that it will make it easier to implement a server-local key cache.	2019-10-23 15:05:39 +02:00
Piotr Sarna	dc310baa2d	alternator: add extracting key from system_auth.roles As a first step towards coupling alternator authorization with Scylla authorization, a helper function for extracting the key (salted_hash) belonging to the user is added.	2019-10-23 15:05:39 +02:00
Asias He	f876580740	storage_service: Reject nodetool cleanup when there is pending ranges From Shlomi: 4 node cluster Node A, B, C, D (Node A: seed) cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node> cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node> while read is progressing Node D: nodetool decommission Node A: nodetool status node - wait for UL Node A: nodetool cleanup (while decommission progresses) I get the error on c-s once decommission ends java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated The problem is when a node gets new ranges, e.g, the bootstrapping node, the existing nodes after a node is removed or decommissioned, nodetool cleanup will remove data within the new ranges which the node just gets from other nodes. To fix, we should reject the nodetool cleanup when there is pending ranges on that node. Note, rejecting nodetool cleanup is not a full protection because new ranges can be assigned to the node while cleanup is still in progress. However, it is a good start to reject until we have full protection solution. Refs: #5045	2019-10-23 19:20:36 +08:00
Asias He	a39c8d0ed0	Revert "storage_service: remove storage_service::_is_bootstrap_mode." It will be needed by "storage_service: Reject nodetool cleanup when there is pending ranges" This reverts commit `dbca327b46`.	2019-10-23 19:20:36 +08:00
Raphael S. Carvalho	fc120a840d	compaction: dont rely on undefined behavior when making garbage collected writer Argument evaluation order is UB, so it's not guaranteed that c->make_garbage_collected_sstable_writer() is called before compaction is moved to run(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20191023052647.9066-1-raphaelsc@scylladb.com>	2019-10-23 11:04:51 +03:00
Benny Halevy	3b3611b57a	mutation_diff: standard input support Also, not that the file name is properly quoted it may contain space characters. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-23 08:29:58 +03:00
Benny Halevy	6feb4d5207	mutation_diff: accept diff_command option To support using other diff tools than colordiff Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-23 08:29:47 +03:00
Tomasz Grabiec	dfac542466	Merge "extend multi-cell list & set type support" from Kostja Make it possible to compare multi-cell lists and sets serialized as maps with literal values and serialize them to network using a standard format (vector of values). This is a pre-requisite patch for column condition evaluation in light-weight transactions.	2019-10-23 07:39:57 +03:00
Nadav Har'El	774f8aa4b8	docs/debugging.md: add guide on how to debug cores Merged patch series from Botond Dénes: This series extends the existing docs/debugging.md with a detailed guide on how to debug Scylla coredumps. The intended target audience is developers who are debugging their first core, hence the level of details (hopefully enough). That said this should be just as useful for seasoned debuggers just quickly looking up some snippet they can't remember exactly. A Throubleshooting chapter is also added in this series for commonly-met problems. I decided to create this guide after myself having struggled for more than a day on just opening(!) a coredump that was produced on Ubuntu. As my main source, I used the How-to-debug-a-coredump page from the internal wiki which contains many useful information on debugging coredumps, however I found it to be missing some crucial information, as well being very terse, thus being primarily useful for experienced debuggers who can fill in the blanks. The reason I'm not extending said wiki page is that I think this information should not be hidden in some internal wiki page. Also, docs/debugging.md now seems to be a much better base for such a document. This document was started as a comprehensive debugging manual for beginners (but not just). You will notice that the information on how to debug cores from CentOS/Redhat are quite sparse. This is because I have no experience with such cores, so for now the respective chapters are just stubs. I intend to complete them in the future after having gained the necessary experience and knowledge, however those being in possession of said knowledge are more then welcome to send a patch. :) Botond Dénes (4): docs/debugging.md: demote 'Starting GDB' and 'Using GDB' docs/debugging.md: fix formatting issues docs/debugging.md: add 'Debugging coredumps' subchapter docs/debugging.md: add 'Throubleshooting' subchapter docs/debugging.md \| 240 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 228 insertions(+), 12 deletions(-)	2019-10-23 07:39:57 +03:00
Rafael Ávila de Espíndola	b3372be679	install-dependencies: Add Lua Add lua as a dependency in preparation for UDF. This is the first patch since it has to go in before to allow for a frozen toolchain update. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> [avi: update frozen toolchain image] Message-Id: <20191018231442.11864-2-espindola@scylladb.com>	2019-10-23 07:39:57 +03:00
Konstantin Osipov	a30c08e04e	lwt: support for multi-cell set & list value serialization	2019-10-22 17:40:42 +03:00
Piotr Jastrzebski	eb8ae06ced	cdc: Return db_context::builder by reference from it's with_* functions. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-22 17:13:43 +03:00
Konstantin Osipov	605755e3f6	lwt: support for multi-cell map & list comparison with literal values Multi-cell lists and maps may be stored in different formats: as sorted vectors of pairs of values, when retreived from storage, or as sorted vectors of values, when created from parser literals or supplied as parameter values. Implement a specialized compare for use when receiver and paramter representation don't match. Add helpers.	2019-10-22 17:07:33 +03:00
Raphael S. Carvalho	3b6583990d	sstables: Fix sluggish backlog controller with incremental compaction The problem is that backlog tracker is not being updated properly after incremental compaction. When replacing sstables earlier, we tell backlog tracker that we're done with exhausted sstables[1], but we don't tell it about the new, sealed sstables created that will replace the exhausted ones. [1]: exhausted sstable is one that can be replaced earlier by compaction. We need to notify backlog tracker about every sstable replacement which was triggered by incremental compaction. Otherwise, backlog for a table that enables incremental compaction will be lower than it actually should. That's because new sstables being tracked as partial decrease the backlog, whereas the exhausted ones increase it. The formula for a table's backlog is basically: backlog(sstable set + compacting(1) - partial(2)) (1) compacting includes all compaction's input sstables, but the exhausted ones are removed from it (correct behavior). (2) partial includes all compaction's output sstables, but the ones that replaced the exhausted sstables aren't removed from it (incorrect behavior). This problem is fixed by making backlog track fully aware of the early replacement, not only the exhausted sstables, but also the new sstables that replaced the exhausted ones. The new sstables need to be moved inside the tracker from partial state to the regular one. Fixes #5157. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20191016002838.23811-1-raphaelsc@scylladb.com>	2019-10-22 16:19:57 +03:00
Vladimir Davydov	6c6689f779	cql: refactor statement accounting Rather than passing a pointer to a cql_stats member corresponding to the statement type, pass a reference to a cql_stats object and use statement_type, which is already stored in modification_statement, for determining which counter to increment. This will allow us to account conditional statements, which will have a separate set of counters, right in modification_statement::execute() - all we'll need to do is add the new counters and bump them in case execute_with_condition is called. While we are at it, remove extra inclusions from statement_type.hh so as not to introduce any extra dependencies for cql_stats.hh users. Message-Id: <20191022092258.GC21588@esperanza>	2019-10-22 12:39:14 +03:00
Nadav Har'El	51fc6c7a8e	make static_row optional to reduce memory footprint Merged patch series from Avi Kivity: The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by allocating it as an external object rather than inlined into mutation_partition. This adds overhead when the static row is present (17 bytes for the reference, back reference, and lsa allocator overhead). perf_simple_query appears to marginally (2%) faster. Footprint is reduced by ~9% for a cache entry, 12% in memtables. More details are provided in the patch commitlog. Tests: unit (debug) Avi Kivity (4): managed_ref: add get() accessor managed_ref: add external_memory_usage() mutation_partition: introduce lazy_row mutation_partition: make static_row optional to reduce memory footprint cell_locking.hh \| 2 +- converting_mutation_partition_applier.hh \| 4 +- mutation_partition.hh \| 284 ++++++++++++++++++++++- partition_builder.hh \| 4 +- utils/managed_ref.hh \| 12 + flat_mutation_reader.cc \| 2 +- memtable.cc \| 2 +- mutation_partition.cc \| 45 +++- mutation_partition_serializer.cc \| 2 +- partition_version.cc \| 4 +- tests/multishard_mutation_query_test.cc \| 2 +- tests/mutation_source_test.cc \| 2 +- tests/mutation_test.cc \| 12 +- tests/sstable_mutation_test.cc \| 10 +- 14 files changed, 355 insertions(+), 32 deletions(-)	2019-10-22 12:25:15 +03:00
Avi Kivity	bc03b0fd47	Merge "Some refactoring of node startup code" from Kamil " The node startup code (in particular the functions storage_service::prepare_to_join and storage_service::join_token_ring) is complicated and hard to understand. This patch set aims to simplify it at least a bit by removing some dead code, moving code around so it's easier to understand and adding some comments that explain what the code does. I did it to help me prepare for implementing generation and gossiping of CDC streams. " * 'bootstrap-refactors' of https://github.com/kbr-/scylla: storage_service: more comments in join_token_ring db: remove system_keyspace::update_local_tokens db: improve documentation for update_tokens and get_saved_tokens in system_keyspace storage_service: remove storage_service::_is_bootstrap_mode. storage_service: simplify storage_service::bootstrap method storage_service: fix typo in handle_state_moving storage_service: remove unnecessary use of stringstream storage_service: remove redundant call to update_tokens during join_token_ring storage_service: remove storage_service::set_tokens method. storage_service: remove is_survey_mode storage_service::handle_state_normal: tokens_to_update* -> owned_tokens storage_service::handle_state_normal: remove local_tokens_to_remove db::system_keyspace::update_tokens: take tokens by const ref db::system_keyspace::prepare_tokens: make static, take tokens by const ref token_metadata::update_normal_tokens: take tokens by const ref	2019-10-22 12:11:11 +03:00
Asias He	0a52ecb6df	gossip: Fix max generation drift measure Assume n1 and n2 in a cluster with generation number g1, g2. The cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1 reboots with generation g1' which is time based, n2 will see g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update. To fix, check the generation drift with generation value this node would get if this node were restarted. This is a backport of CASSANDRA-10969. Fixes #5164	2019-10-21 20:20:55 +02:00
Kamil Braun	f1c26bf5c9	storage_service: more comments in join_token_ring Explain why a call to update_normal_tokens is needed.	2019-10-21 11:11:03 +02:00
Kamil Braun	fb1e35f032	db: remove system_keyspace::update_local_tokens That was dead code.	2019-10-21 11:11:03 +02:00
Kamil Braun	1b0c8e5d99	db: improve documentation for update_tokens and get_saved_tokens in system_keyspace	2019-10-21 11:11:03 +02:00
Kamil Braun	dbca327b46	storage_service: remove storage_service::_is_bootstrap_mode. The flag did nothing. It was used in one place to check if there's a bug, but it can easily by proven by reading the code that the check would never pass.	2019-10-21 11:11:03 +02:00
Kamil Braun	b757a19f84	storage_service: simplify storage_service::bootstrap method The storage_service::bootstrap method took a parameter: tokens to bootstrap with. However, this method is only called in one place (join_token_ring) with only one parameter: _bootstrap_tokens. It doesn't make sense to call this method anywhere else with any other parameter. This commit also adds a comment explaining what the method does and moves it into the private section of storage_service.	2019-10-21 11:11:03 +02:00
Kamil Braun	84b41bd89b	storage_service: fix typo in handle_state_moving	2019-10-21 11:11:03 +02:00
Kamil Braun	2ff4f9b8f4	storage_service: remove unnecessary use of stringstream	2019-10-21 11:11:03 +02:00
Kamil Braun	06cc7d409d	storage_service: remove redundant call to update_tokens during join_token_ring When a non-seed node was bootstrapping, system_keyspace::update_tokens was called twice: first right after the tokens were generated (or received if we were replacing a different node) in the call to `bootstrap`, and then later in join_token_ring. The second call was redundant. The join_token_ring call was also redundant if we were not bootstrapping and had tokens saved previously (e.g. when restarting). In that case we would have read them from LOCAL and then save the same tokens again. This commit removes the redundant call and inserts calls to update_tokens where they are necessary, when new tokens are generated. The aim is to make the code easier to understand. It also adds a comment which explains why the tokens don't need to be generated in one of the cases.	2019-10-21 11:11:03 +02:00
Kamil Braun	a223864f81	storage_service: remove storage_service::set_tokens method. After commit `36ccf72f3c`, this method was used only in one place. Its name did not make it obvious what it does and when is it safe to call it. This commit pulls out the code from set_tokens to the point where it was called (join_token_ring). The code is only possible to understand in context. This code was also saving the tokens to the LOCAL table before retrieving them from this table again. There is no point in doing that: 1. there are no races, since when join_token_ring is running, it is the only function which can call system_keyspace::update_tokens (which saves them to the LOCAL table). There can be no multiple instances of join_token_ring. 2. Even if there was a race, this wouldn't fix anything. The tokens we retrieve from LOCAL by calling get_local_tokens().get0() could already be different in the LOCAL table when the get0() returns.	2019-10-21 11:09:59 +02:00
Kamil Braun	36ccf72f3c	storage_service: remove is_survey_mode That was dead, untested code, making it unnecessarily hard to implement new features.	2019-10-21 10:38:49 +02:00
Kamil Braun	602c7268cc	storage_service::handle_state_normal: tokens_to_update* -> owned_tokens Replace the two variables: tokens_to_update_in_metadata tokens_to_update_in_system_keyspace which were exactly the same, with one variable owned_tokens. The new name describes what the variable IS instead what's it used for. Add a comment to clarify what "owned" means: those are the tokens the node chose and any collision was resolved positively for this node. Move the variable definition further down in the code, where it's actually needed.	2019-10-21 10:38:49 +02:00
Kamil Braun	2db07c697f	storage_service::handle_state_normal: remove local_tokens_to_remove That was dead code. Removing tokens is handled inside remove_endpoint, using the endpoints_to_remove set.	2019-10-21 10:38:49 +02:00
Kamil Braun	8c8a17a0fe	db::system_keyspace::update_tokens: take tokens by const ref	2019-10-21 10:38:49 +02:00
Kamil Braun	00dcea3478	db::system_keyspace::prepare_tokens: make static, take tokens by const ref	2019-10-21 10:38:49 +02:00
Kamil Braun	e4ac4db1c5	token_metadata::update_normal_tokens: take tokens by const ref	2019-10-21 10:38:45 +02:00
Nadav Har'El	765dc86de4	Fix legacy token column handling for local indexes Merged patch series from Piotr Sarna: Calculating the select statement for given view_info structure used to work fine, but once local indexes were introduced, a subtle bug appeared: the legacy token column does not exist in local indexes and a valid clustering key column was omitted instead. That results in potentially incorrect partition slices being used later in read-before-write. There's a long term plan for removing select_statement from view info altogether, but nonetheless the bug needs to be fixed first. Branch: master, 3.1 Tests: unit(dev) + manual confirmation that a correct legacy column is picked	2019-10-20 16:04:40 +03:00
Nadav Har'El	631846a852	CDC: Implement minimal version that logs only primary key of each change Merge a patch series from Piotr Jastrzębski (haaawk): This PR introduces CDC in it's minimal version. It is possible now to create a table with CDC enabled or to enable/disable CDC on existing table. There is a management of CDC log and description related to enabling/disabling CDC for a table. For now only primary key of the changed data is logged. To be able to co-locate cdc streams with related base table partitions it was needed to propagate the information about the number of shards per node. This was node through gossip. There is an assumption that all the nodes use the same value for sharding_ignore_msb_bits. If it does not hold we would have to gossip sharding_ignore_msb_bits around together with the number of shards. Fixes #4986. Tests: unit(dev, release, debug)	2019-10-20 11:41:01 +03:00
Botond Dénes	4aa734f238	scylla-gdb.py: scylla generate_object_graph: use correct obj in edges Currently, the function that generates the graph edges (and vertices) with a breadth-first traversal of the object graph accidentally uses the object that is the starting point of the graph as the "to" part of each edge. This results in the graph having each of its edges point to the starting point, as if all objects in it referenced said object directly. Fix by using the object of the currently examined object. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191018113019.95093-1-bdenes@scylladb.com>	2019-10-18 13:48:20 +02:00
Botond Dénes	4dff50b7a4	docs/debugging.md: add 'Throubleshooting' subchapter To the 'Debugging Scylla with GDB' chapter.	2019-10-18 10:08:23 +03:00
Botond Dénes	77ea086975	docs/debugging.md: add 'Debugging coredumps' subchapter To the 'Debuggin Scylla with GDB` chapter. The '### Debugging relocatable binaries built with the toolchain' subchapter is demoted to be just a section in this new subchapter. It is also renamed to 'Relocatable binaries'. This subchapter intends to be a complete guide on how to debug coredumps from how to obtain the correct version of all the binaries all the way to how to correctly open the core with GDB.	2019-10-18 10:08:23 +03:00
Pekka Enberg	f01d0e011c	Update seastar submodule * seastar e888b1df...6bcb17c9 (4): > iotune: don't crash in sequential read test if hitting EOF > Remove FindBoost.cmake from install files > Merge "Move reactor backend out of reactor" from Asias > fair_queue: Add fair_queue.cc	2019-10-18 08:45:22 +03:00
Piotr Jastrzebski	2b26e3c904	test: change test_partition_key_logging to test_primary_key_logging Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	997be35ef3	modification_statement: log in cdc clustering key of a change Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	d8718a4ffc	test: add test_partition_key_logging Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	96c800ed0b	modification_statement: log in cdc partition key of a change Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	a1edb68b16	test: check that alter table with cdc manages log and desc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	a45c894032	alter_table_statement: handle 'with cdc =' Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	629cdb5065	test: check that drop table with cdc removes log and desc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	57c3377b1f	cql_test_env: add require_table_does_not_exist assertion Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	50d53cd43e	drop_table_statement: remove cdc log and desc if cdc is enabled Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	b9d6635fc5	test: check that create table with cdc sets up log and desc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	81a34168a3	create_table_statement: handle 'with cdc =' Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:14 +02:00
Piotr Jastrzebski	6e29f5e826	create_table_statement: prepare announce_migration for cdc This patch wrapps announce_migration logic into a lambda that will be used both when cdc is used and when it's not. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	a9e43f4e86	test: add test_with_cdc_parameter At the moment, this test only checks that table creation and alteration sets cdc_options property on a table correctly. Future patches will extend this test to cover more CDC aspects. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	8c6d860402	cql3: add cdc table property Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	386221da84	schema_tables: handle 'cdc' options cdc options will be stored in scylla_tables to preserve compatibility with Cassandra. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	8df942a320	schema_builder: handle schema::_cdc_options Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	ca9536a771	schema: add _cdc_options field Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	f079dce7b1	snitch: Provide getter for ignore_msb_bits of an endpoint Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	afe520ad77	gossip: Add application_state::IGNORE_MSB_BITS We would like to share with other nodes the value of ignore_msb_bits property used by the node. This is needed because CDC will operate on streams of changes. Each shard on each node will have its own stream that will be identified by a stream_id. Stream_id will be selected in such a way that using stream_id as partition key will locate partition identified by stream_id on a node and shard that the stream belongs to. To be able to generate such stream_id we need to know ignore_msb_bits property value for each node. IMPORTANT NOTE: At this point CDC does not support topology changes. It will work only on a stable cluster. Support for topology modifications will be added in later steps. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	b9d5851830	snitch: Provide getter for shard_count of an endpoint Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	a66d7cfe57	gossip: Add application_state::SHARD_COUNT We would like to share with other nodes the number of shards available at the node. This is needed because CDC will operate on streams of changes. Each shard on each node will have its own stream that will be identified by a stream_id. Stream_id will be selected in such a way that using stream_id as partition key will locate partition identified by stream_id on a node and shard that the stream belongs to. To be able to generate such stream_id we need to know how many shards are on each node. IMPORTANT NOTE: At this point CDC does not support topology changes. It will work only on a stable cluster. Support for topology modifications will be added in later steps. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	f7ce8e4f2b	cdc: Add flag guarding it's usage At first, CDC will only be enabled when experimental flag is on. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Tomasz Grabiec	d7c3e48e8c	Merge "Prepare modification_statement for LWT" from Kostja Refactor modification_statement to enable lightweight transaction implementation. This patch set re-arranges logic of modification_statement::get_mutations() and uses a column mask of identify the columns to prefetch. It also pre-computes a few modification statement properties at prepare, assuming the prepared statement is invalidated if the underlying schema changes.	2019-10-17 10:51:00 +02:00
Konstantin Osipov	5d3bf03811	lwt: pre-compute modification_statement properties at prepare They are used more extensively with introduction of lightweight transactions, and pre-computing makes it easier to reason about complexity of the scenarios where they are involved.	2019-10-16 22:44:44 +03:00
Konstantin Osipov	6e0f76ea60	lwt: use column mask to build partition_slice Pre-compute column mask of columns to prefetch when preparing a modification statement and use it to build partition_slice object for read command. Fetch only the required columns. Ligthweight transactions build up on this by using adding columns used in conditions and in cas result set to the column maks of columns to read. Batch statements unite all column masks to build a single relation for all rows modified by conditional statements of a batch.	2019-10-16 22:44:37 +03:00
Konstantin Osipov	f32a7a0763	lwt: move option set for modification statement read command Move the option set for read command to update_parameters class, since this class encapsulates the logic of working with the read command result.	2019-10-16 22:41:00 +03:00
Konstantin Osipov	c0f0ab5edd	lwt: introduce column mask Introduce a bitset container which can be used to compute all columns used in a query. Add a partition_slice constructor which uses the bitset.	2019-10-16 22:40:55 +03:00
Konstantin Osipov	a00b9a92b3	lwt: refactor modification statement get_mutations() Refactor get_mutations() so that the read command and apply_updates() functions can be used in lightweight transactions. Move read_command creation to an own method, as well as apply_updates(). Rewrite get_mutations() using the new API. Avoid unnecessary shared pointers.	2019-10-16 22:32:51 +03:00
Tomasz Grabiec	7b7e4be049	Merge "lwt: introduce column_definition::ordinal_id" from Kostja Introduce a column definition ordinal_id and use it in boosted update_parameters::prefetch_data as a column index of a full row. Lightweight transactions prefetch data and return a result set. Make sure update_parameters::prefetch_data can serve as a single representation of prefetched list cells as well as condition cells and as a CAS result set. I have a lot of plans for column_definition::ordinal_id, it simplifies a lot of operations with columns and will also be used for building a bitset of columns used in a query or in multiple queries of a batch.	2019-10-16 15:11:10 +02:00
Konstantin Osipov	a2b629c3a1	lwt: boost update_parameters to serve as a CAS result set In modification_statement/batch_statement, we need to prefetch data to 1) apply list operations 2) evaluate CAS conditions 3) return CAS result set. Boost update_parameters::prefetch_data to serve as a single result set for all of the above. In case of a batch, store multiple rows for multiple clustering keys involved in the batch. Use an ordered set for columns and rows to make sure 3) CAS result set is returned to the client in an ordered manner. Deserialize the primary key and add it to result set rows since it is returned to the client as part of CAS result set. Index columns using ordinal_id - this allows having a single set for all columns and makes columns easy to look up. Remove an extra memcpy to build view objects when looking up a cell by primary key, use partition_key/clustering_key objects for lookup.	2019-10-16 15:56:50 +03:00
Konstantin Osipov	a450c25946	lwt: remove dead code in cql3/update_parameters.hh	2019-10-16 15:48:40 +03:00
Konstantin Osipov	a4ccbece5c	lwt: remove an unnecessary optional around prefetch_data Get rid of an unnecessary optional around update_parameters::prefetch_data. update_parameters won't own prefetch_data in the future anyway, since prefetch_data can be shared among multiple modification statements of a batch, each statement having its own options and hence its own update_parameters instance.	2019-10-16 15:48:25 +03:00
Konstantin Osipov	7a399ebe0d	lwt: move prefetch_data_builder to update_parameters.cc Move prefetch_data_builder class from modification_statement.cc to update_parameters.cc. We're going to share the same builder to build a result set for condition evaluation and to apply updates of batch statements, so we need to share it. No other changes.	2019-10-16 15:48:08 +03:00
Konstantin Osipov	fa73421198	lwt: introduce column_definition::ordinal_id Make sure every column in the schema, be it a column of partition key, clustering key, static or regular one, has a unique ordinal identifier. This makes it easy to compute the set of columns used in a query, as well as index row cells. Allow to get column definition in schema by ordinal id.	2019-10-16 15:46:25 +03:00
Avi Kivity	543e6974b9	Merge "Fix Incremental Compaction Efficiency" from Raphael " Incremental compaction code to release exhausted sstables was inefficient because it was basically preventing any release from ever happening. So a new solution is implemented to make incremental compaction approach actually efficient while being cautious about not introducing data resurrection. This solution consists of storing GC'able tombstones in a temporary sstable and keeping it till the end of compaction. Overhead is avoided by not enabling it to strategies that don't work with runs composed of multiple fragments. Fixes #4531. tests: unit, longevity 1TB for incremental compaction " * 'fix_incremental_compaction_efficiency/v6' of https://github.com/raphaelsc/scylla: tests: Check that partition is not resurrected on compaction failure tests: Add sstable compaction test for gc-only mutation compactor consumer sstables: Fix Incremental Compaction Efficiency	2019-10-16 15:15:53 +03:00
Tomasz Grabiec	054b53ac06	Merge "Introduce scylla generate_object_graph and improve scylla find and scylla fiber" from Botond Introduce `scylla generate_object_graph`, a command which generates a visual object graph, where vertices are objects and edges are references. The graph starts from the object specified by the user. The graph allows visual inspection of the object graph and hopefully allows the user to identify the object in question. Add the `--resolve` flag to `scylla find`. When specified, `scylla find` will attempt to resolve the first pointer in the found objects as a vtable pointer. If successful the pointer as well as the resolved symbol will be added to the listing. In the listing of `scylla fiber` also print the starting task (as the first item).	2019-10-15 20:11:16 +02:00
Tomasz Grabiec	c76f905497	Merge "scylla-gdb.py: improve the toolbox for investigating OOMs (but not just)" from Botond This mini-series contains assorted improvements that I found very useful while debugging OOM crashes in the past weeks: * A wrapper for `std::list`. * A wrapper for `std::variant`. * Making `scylla find` usable from python code. * Improvements to `scylla sstables` and `scylla task_histogram` commands. * The `$downcast_vptr()` convenience function. * The `$dereference_lw_shared_ptr()` convenience function. Convenience functions in gdb are similar to commands, with some key differences: * They have a defined argument list. * They can return values. * They can be part of any gdb expression in which functions are allowed. This makes them very useful for doing operations on values then returning them so that the developer can use it the gdb shell.	2019-10-15 19:54:09 +02:00
Avi Kivity	acc433b286	mutation_partition: make static_row optional to reduce memory footprint The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by using lazy_row instead of row. Some call sites treewide were adjusted to deal with the extra indirection. perf_simple_query appears to improve by 2%, from 163krps to 165 krps, though it's hard to be sure due to noisy measurements. memory_footprint comparisons (before/after): mutation footprint: mutation footprint: - in cache: 1096 - in cache: 992 - in memtable: 854 - in memtable: 750 - in sstable: 351 - in sstable: 351 - frozen: 540 - frozen: 540 - canonical: 827 - canonical: 827 - query result: 342 - query result: 342 sizeof(cache_entry) = 112 sizeof(cache_entry) = 112 -- sizeof(decorated_key) = 36 -- sizeof(decorated_key) = 36 -- sizeof(cache_link_type) = 32 -- sizeof(cache_link_type) = 32 -- sizeof(mutation_partition) = 200 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 112 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 232 sizeof(rows_entry) = 232 sizeof(lru_link_type) = 16 sizeof(lru_link_type) = 16 sizeof(deletable_row) = 168 sizeof(deletable_row) = 168 sizeof(row) = 112 sizeof(row) = 112 sizeof(atomic_cell_or_collection) = 8 sizeof(atomic_cell_or_collection) = 8 Tests: unit (dev)	2019-10-15 15:42:05 +03:00
Avi Kivity	88613e6882	mutation_partition: introduce lazy_row lazy_row adds indirection to the row class, in order to reduce storage requirements when the row is not present. The intent is to use it for the static row, which is not present in many schemas, and is often not present in writes even in schemas that have a static row. Indirection is done using managed_ref, which is lsa-compatible. lazy_row implements most of row's methods, and a few more: - get(), get_existing(), and maybe_create(): bypass the abstraction and the underlying row - some methods that accept a row parameter also have an overload with a lazy_row parameter	2019-10-15 15:42:05 +03:00
Avi Kivity	efe8fa6105	managed_ref: add external_memory_usage() Like other managed containers, add external_memory_usage() so we can account for a partition's memory footprint in memtable/cache.	2019-10-15 15:41:42 +03:00
Botond Dénes	71923577a4	docs/debugging.md: fix formatting issues	2019-10-15 14:40:24 +03:00
Botond Dénes	4babd116d8	docs/debugging.md: demote 'Starting GDB' and 'Using GDB' They really belong to the 'Introduction' chapter, instead of being separate chapters of their own.	2019-10-15 14:40:20 +03:00
Pekka Enberg	0c1dad0838	Merge "Misc documentation cleanup" from Botond "Delete README-DPDK.md, move IDL.md to docs/ and fix docs/review-checklist.md to point to scylla's coding style document, instead of seastar's." * 'documentation-cleanup/v3' of https://github.com/denesb/scylla: docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's docs: mv coding-style.md docs/ rm README-DPDK.md docs: mv IDL.md docs/	2019-10-15 12:53:49 +02:00
Pekka Enberg	b466d7ee33	Merge "Misc documentation cleanup" from Botond "Delete README-DPDK.md, move IDL.md to docs/ and fix docs/review-checklist.md to point to scylla's coding style document, instead of seastar's." * 'documentation-cleanup/v3' of https://github.com/denesb/scylla: docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's docs: mv coding-style.md docs/ rm README-DPDK.md docs: mv IDL.md docs/	2019-10-15 08:53:22 +03:00
Benny Halevy	fef3342a34	test: random_schema::make_ckeys: fix inifinte loop Allow returning fewer random clustering keys than requested since the schema may limit the total number we can generate, for example, if there is only one boolean clustering column. Fixes #5161 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-15 08:52:39 +03:00
Botond Dénes	544f38ea6d	docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's	2019-10-15 08:23:08 +03:00
Botond Dénes	56df6fbd58	docs: mv coding-style.md docs/ It is not discoverable in its current location (root directory) due to the sheer number of source files in there.	2019-10-15 08:23:08 +03:00
Botond Dénes	c0706e52ce	rm README-DPDK.md Probably a leftover from the era when seastar and scylla shared the same git repo.	2019-10-15 08:23:01 +03:00
Botond Dénes	061ac53332	docs: mv IDL.md docs/ Documentations should be in docs/.	2019-10-15 08:21:09 +03:00
Piotr Sarna	9e98b51aaa	view: fix view_info select statement for local indexes Calculating the select statement for given view_info structure used to work fine, but once local indexes were introduced, a subtle bug appeared: the legacy token column does not exist in local indexes and a valid clustering key column was omitted instead. That results in potentially incorrect partition slices being used later in read-before-write. There's a long term plan for removing select_statement from view info altogether, but nonetheless the bug needs to be fixed first.	2019-10-14 17:14:19 +02:00
Piotr Sarna	2ee8c6f595	index: add is_global_index() utility The helper function is useful for determining if given schema represents a global index.	2019-10-14 17:13:32 +02:00
Botond Dénes	b2e10a3f2f	scylla-gdb.py: introduce scylla generate_object_graph When investigating OOM:s a prominent pattern is a size class that is exploded, using up most of the available memory alone. If one is lucky, the objects causing the OOM are instances of some virtual class, making their identification easy. Other times the objects are referenced by instances of some virtual class, allowing their identification with some work. However there are cases where neither these objects nor their direct referrers are instances of virtual classes. This is the case `scylla generate_object_graph` intends to help. scylla generate_object_graph, like its name suggests generates the object graph of the requested object. The object graph is a directed graph, where vertices are objects and edges are references between them, going from referrers to the referee. The vertices contain information, like the address of the object, its size, whether it is a live or not and if applies, the address and symbol name of its vtable. The edges contain the list of offsets the referrer has references at. The generated graph is an image, which allows the visual inspection of the object graph, allowing the developer to notice patterns and hopefully identify the problematic objects. The graph is generated with the help of `graphwiz`. The command generates `.dot` files which can be converted to images with the help of the `dot` utility. The command can do this if the output file is one of the supported image formats (e.g. `png`), otherwise only the `.dot` file is generated, leaving the actual image generation to the user.	2019-10-14 16:21:18 +03:00
Botond Dénes	f9e8e54603	scylla-gdb.py: boost scylla find Add `--resolve` flag, which will make the command attempt to resolve the first pointer of the found objects as a vtable pointer. If this is successful the vtable pointer as well as the symbol name will be added to the listing. This in particular makes backtracing continuation chains a breeze, as the continuation object the searched one depends on can be found at glance in the resulting listing (instead of having to manually probe each item). The arguments of `scylla find` are now parsed via `argparse`. While at it, support for all the size classes supported by the underlying `find` command were added, in addition to `w` and `g`. However the syntax of specifying the size class to use has been changed, it now has to be specified with the `-s\|--size` command line argument, instead of passing `-w` or `-g`.	2019-10-14 16:21:18 +03:00
Botond Dénes	0773104f32	scylla_fiber: also print the task that is the starting point of the fiber Or in other words, the task that is the argument of the search. Example: (gdb) scylla fiber 0x60001a305910 Starting task: (task) 0x000060001a305910 0x0000000004aa5260 vtable for seastar::continuation<...> + 16 #0 (task) 0x0000600016217c80 0x0000000004aa5288 vtable for seastar::continuation<...> + 16 #1 (task) 0x000060000ac42940 0x0000000004aa2aa0 vtable for seastar::continuation<...> + 16 #2 (task) 0x0000600023f59a50 0x0000000004ac1b30 vtable for seastar::continuation<...> + 16	2019-10-14 13:36:25 +03:00
Botond Dénes	1a8846c04a	scylla-gdb.py: move the code finding text_start and text_end to get_text_range() This code is currently duplicated in `find_vptrs()` and `scylla_task_histogram`. Refactor it out into a function. The code is also improved in two ways: * Make the search stricter, ensuring (hopefully) that indeed the executable's text section is found, not that of the first object in the `gdb file` listing. * Throw an exception in the case when the search fails.	2019-10-14 13:25:28 +03:00
Raphael S. Carvalho	7f1a2156c7	table: Don't account for shared SSTables in compaction backlog tracker We don't want to add shared sstables to table's backlog tracker because: 1) table's backlog tracker has only an influence on regular compaction 2) shared sstables are never regular compacted, they're worked by resharding which has its own backlog tracker. Such sstables belong to more than one shard, meaning that currently they're added to backlog tracker of all shards that own them. But the thing is that such sstables ends up being resharded in shard that may be completely random. So increasing backlog of all shards such sstables belong to, won't lead to faster resharding. Also, table's backlog tracker is supposed to deal only with regular compaction. Accounting for shared sstables in table's tracker may lead to incorrect speed up of regular compactions because the controller is not aware that some relevant part of the backlog is due to pending resharding. The fix is about ignoring sstables that will be resharded and let table's backlog tracker account only for sstables that can be worked on by regular compaction, and rely on resharding controlling itself with its own tracker. NOTE: this doesn't fix the resharding controlling issue completely, as described in #4952. We'll still need to throttle regular compaction on behalf of resharding. So subsequent work may be about: - move resharding to its own priority class, perhaps streaming. - make a resharding's backlog tracker accounts for sstables in all of its pending jobs, not only the ongoing ones (currently limited to 1 by shard). - limit compaction shares when resharding is in progress. THIS only fixes the issue in which controller for regular compaction shouldn't account sstables completely exclusive to resharding. Fixes #5077. Refs #4952. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190924022109.17400-1-raphaelsc@scylladb.com>	2019-10-13 10:14:13 +03:00
Raphael S. Carvalho	88611d41d0	sstables: Fix major compaction's space amplification with incremental compaction Incremental compaction efficiency depends on the reference of sstables compacted being all released because the file descriptors of sstable components are only closed once the sstable object is destructed. Incremental compaction is not working for major compaction because a reference to released sstables are being kept in the compaction manager, which prevents their disk usage from being released. So the space amplification would be the same as with a non-incremental approach, i.e. needs twice the amount of used disk space for the table(s). With this issue fixed, the database now becomes very major compaction friendly, the space requirement becoming very low, a constant which is roughly number of fragments being currently compacted multiplied by fragment size (1GB by default), for each table involved. Fixes #5140. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20191003211927.24153-1-raphaelsc@scylladb.com>	2019-10-13 09:55:11 +03:00
Raphael S. Carvalho	17c66224f7	tests: Check that partition is not resurrected on compaction failure Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-13 00:06:51 -03:00
Raphael S. Carvalho	6301a10fd7	tests: Add sstable compaction test for gc-only mutation compactor consumer Make sure gc'able-tombstone-only sstable is properly generated with data that comes from regular compaction's input sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-12 21:38:53 -03:00
Raphael S. Carvalho	91260cf91b	sstables: Fix Incremental Compaction Efficiency Compaction prevents data resurrection from happening by checking that there's no way a data shadowed by a GC'able tombstone will survive alone, after a failure for example. Consider the following scenario: We have two runs A and B, each divided to 5 fragments, A1..A5, B1..B5. They have the following token ranges: A: A1=[0, 3] A2=[4, 7] A3=[8, 11] A4=[12, 15] A5=[16,18] B is the same as A's ranges, offset by 1: B: B1=[1,4] B2=[5,8] B3=[9,12] B4=[13,16] B5=[17,19] Let's say we are finished flushing output until position 10 in the compaction. We are currently working on A3 and B3, so obviously those cannot be deleted. Because B2 overlaps with A3, we cannot delete B2 either. Otherwise, B2 could have a GC'able tombstone that shadows data in A3, and after B2 is gone, dead data in A3 could be resurrected on failure. Now, A2 overlaps with B2 which we couldn't delete yet, so we can't delete A2. Now A2 overlaps with B1 so we can't delete B1. And B1 overlaps with A1 so we can't delete A1. So we can't delete any fragment. The problem with this approach is obvious, fragments can potentially not be released due to data dependency, so incremental compaction efficiency is severely reduced. To fix it, let's not purge GC'able tombstones right away in the mutation compactor step. Instead, let's have compaction writing them to a separate sstable run that would be deleted in the end of compaction. By making sure that tombstone information from all compacting sstables is not lost, we no longer need to have incremental compaction imposing lots of restriction on which fragments could be released. Now, any sstable which data is safe in a new sstable can be released right away. In addition, incremental compaction will only take place if compaction procedure is working with one multi-fragment sstable run at least. Fixes #4531. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-12 21:36:03 -03:00
Kamil Braun	ef9d5750c8	view: fix bug in virtual columns. When creating a virtual column of non-frozen map type, the wrong type was used for the map's keys. Fixes #5165.	2019-10-11 20:47:06 +03:00
Avi Kivity	f12feec2c9	Update seastar submodule * seastar 1f68be436f...e888b1df9c (8): > sharded: Make map work with mapper that returns a future > cmake: Remove FindBoost.cmake > Reduce noncopyable_function instruction cache footprint > doc: add Loops section to the tutorial > Merge "Move file related code out of reactor" from Asias > Merge "Move the io_queue code out of reactor" from Asias > cmake: expose seastar_perf_testing lib > future: class doc: explain why discarding a future is bad - main.cc now includes new file io_queue.hh - perf tests now include seastar perf utilities via user, not system, includes since those are not exported	2019-10-10 18:17:28 +03:00
Nadav Har'El	33027a36b4	alternator: Add authorization Merged patch set from Piotr Sarna: Refs #5046 This commit adds handling "Authorization:" header in incoming requests. The signature sent in the authorization is recomputed server-side and compared with what the client sent. In case of a mismatch, UnrecognizedClientException is returned. The signature computation is based on boto3 Python implementation and uses gnutls to compute HMAC hashes. This series is rebased on a previous HTTPS series in order to ease merging these two. As such, it depends on the HTTPS series being merged first. Tests: alternator(local, remote) The series also comes with a simple authorization test and a docs update. Piotr Sarna (6): alternator: migrate split() function to string_view alternator: add computing the auth signature config: add alternator_enforce_authorization entry alternator: add verifying the auth signature alternator-test: add a basic authorization test case docs: update alternator authorization entry alternator-test/test_authorization.py \| 34 ++++++++ configure.py \| 1 + alternator/{server.hh => auth.hh} \| 22 ++--- alternator/server.hh \| 3 +- db/config.hh \| 1 + alternator/auth.cc \| 88 ++++++++++++++++++++ alternator/server.cc \| 112 +++++++++++++++++++++++--- db/config.cc \| 1 + main.cc \| 2 +- docs/alternator/alternator.md \| 7 +- 10 files changed, 241 insertions(+), 30 deletions(-) create mode 100644 alternator-test/test_authorization.py copy alternator/{server.hh => auth.hh} (58%) create mode 100644 alternator/auth.cc	2019-10-10 15:57:46 +03:00
Nadav Har'El	df62499710	docs/isolation.md: copy-edit Minor spelling and syntax corrections. No new content or semantic changes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191010093457.20439-1-nyh@scylladb.com>	2019-10-10 15:17:28 +03:00
Piotr Dulikowski	c04e8c37aa	distributed_loader: populate non-system keyspaces in parallel Before this change, when populating non-system keyspaces, each data directory was scanned and for each entry (keyspace directory), a keyspace was populated. This was done in a serial fashion - populating of one keyspace was not started until the previous one was done. Loading keyspaces in such fashion can introduce unnecessary waiting in case of a large number of keyspaces in one data directory. Population process is I/O intensive and barely uses CPU. This change enables parallel loading of keyspaces per data directory. Populating the next keyspace does not wait for the previous one. A benchmark was performed measuring startup time, with the following setup: - 1 data directory, - 200 keyspaces, - 2 tables in each keyspace, with the following schema: CREATE TABLE tbl (a int, b int, c int, PRIMARY KEY(a, b)) WITH CLUSTERING ORDER BY (b DESC), - 1024 rows in each table, with values (i, 2i, 3i) for i in 0..1023, - ran on 6-core virtual machine running on i7-8750H CPU, - compiled in dev mode, - parameters: --smp 6 --max-io-requests 4 --developer-mode=yes --datadir $DIR --commitlog-directory $DIR --hints-directory $DIR --view-hints-directory $DIR The benchmark tested: - boot time, by comparing timestamp of the first message in log, and timestamp of the following message: "init - Scylla version ... initialization completed." - keyspace population time, by comparing timestamps of messages: "init - loading non-system sstables" and "init - starting view update generator" The benchmark was run 5 times for sequential and parallel version, with the following results: - sequential: boot 31.620s, keyspace population 6.051s - parallel: boot 29.966s, keyspace population 4.360s Keyspace population time decreased by ~27.95%, and overall boot time by about ~5.23%. Tests: unit(release) Fixes #2007	2019-10-10 15:12:23 +03:00
Piotr Sarna	6ca55d3c83	docs: update alternator authorization entry The entry now contains a comment that computing a signature works, but is still based on a hardcoded key.	2019-10-10 13:51:00 +02:00
Piotr Sarna	23798b7301	alternator-test: add a basic authorization test case The test case ensures that passing wrong credential results in getting an UnrecognizedClientException.	2019-10-10 13:51:00 +02:00
Piotr Sarna	97cbb9a2c7	alternator: add verifying the auth signature The signature sent in the "Authorization:" header is now verified by computing the signature server-side with a matching secret key and confirming that the signatures match. Currently the secret key is hardcoded to be "whatever" in order to work with current tests, but it should be replaced by a proper key store. Refs #5046	2019-10-10 13:51:00 +02:00
Piotr Sarna	e245b54502	config: add alternator_enforce_authorization entry The config entry will be used to turn authorization for alternator requests on and off. The default is currently off, since the key store is not implemented yet.	2019-10-10 13:51:00 +02:00
Piotr Sarna	589a22d078	alternator: add computing the auth signature A function for computing the auth signature from user requests is added, along with helper functions. The implementation is based on gnutls's HMAC. Refs #5046	2019-10-10 13:51:00 +02:00
Piotr Sarna	ca58b46b4c	alternator: migrate split() function to string_view The implementation of string split was based on sstring type for simplicity, but it turns out that more generic std::string_view will be beneficial later to avoid unneeded string copying. Unfortunately boost::split does not cooperate well with string views, so a simple manual implementation is provided instead.	2019-10-10 13:50:59 +02:00
Botond Dénes	52afbae1e5	README.md: add links to other documentation sources Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191010103926.34705-3-bdenes@scylladb.com>	2019-10-10 14:15:01 +03:00
Botond Dénes	e52712f82c	docs: add README.md Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191010103926.34705-2-bdenes@scylladb.com>	2019-10-10 14:14:09 +03:00
Amnon Heiman	64c2d28a7f	database: Add counter for the number of schema changes Schema changes can have big effects on performance, typically it should be a rare event. It is usefull to monitor how frequently the schema changed. This patch adds a counter that increases each time a schema changed. After this patch the metrics would look like: scylla_database_schema_changed{shard="0",type="derive"} 2 Fixes #4785 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-10-08 17:54:49 +02:00
Asias He	b89ced4635	streaming: Do not open rpc stream connection if reader has no data We can use the reader::peek() to check if the reader contains any data. If not, do not open the rpc stream connection. It helps to reduce the port usage. Refs: #4943	2019-10-08 10:31:02 +02:00
Konstantin Osipov	94006d77b1	lwt: add cas_contention_timeout_in_ms to config Make the default conform to the origin. Message-Id: <20191006154532.54856-3-kostja@scylladb.com>	2019-10-08 00:02:35 +02:00
Konstantin Osipov	383e17162a	lwt: implement query_options::check_serial_consistency() Both in a single-statement transaction and in a batch we expect that serial consistency is provided. Move the check to query_options class and make it available for reuse. Keep get_serial_consistency() around for use in transport/server.cc. Message-Id: <20191006154532.54856-2-kostja@scylladb.com>	2019-10-08 00:02:35 +02:00
Piotr Sarna	36a1905e98	storage_proxy: handle unstarted write cancelling When another node is reported to be down, view updates queued for it are cancelled, but some of them may already be initiated. Right now, cancelling such a write resulted in an exception, but on conceptual level it's not really an exception, since this behaviour is expected. Previous version of this patch was based on introducing a special exception type that was later handled specially, but it's not clear if it's a good direction. Instead, this patch simply makes this path non-exceptional, as was originally done by Nadav in the first version of the series that introduced handling unstarted write cancellations. Additionally, a message containing the information that a write is cancelled is logged with debug level.	2019-10-07 16:55:36 +03:00
Vladimir Davydov	e8bcb34ed4	api: drop /storage_proxy/metrics/cas_read/condition_not_met There's no such metric in Cassandra (although Cassadra's docs mistakenly say it exists). Having it would make no sense anyway so let's drop it. Message-Id: <b4f7a6ad278235c443cb8ea740bfa6399f8e4ee1.1570434332.git.vdavydov@scylladb.com>	2019-10-07 16:54:39 +03:00
Piotr Sarna	5ab134abef	alternator-test: update HTTPS section of README README.md has 3 fixes applied: - s/alternator_tls_port/alternator_https_port - conf directory is mentioned more explicitly - it now correctly states that the self-signed certificate warning is explicitly ignored in tests Message-Id: <e5767f7dbea260852fc2fa9b613e1bebf490cc78.1570444085.git.sarna@scylladb.com>	2019-10-07 14:51:16 +03:00
Avi Kivity	8ed6f94a16	Merge "Fix handling of schema alters and eviction in cache" from Tomasz " Fixes #5134, Eviction concurrent with preempted partition entry update after memtable flush may allow stale data to be populated into cache. Fixes #5135, Cache reads may miss some writes if schema alter followed by a read happened concurrently with preempted partition entry update. Fixes #5127, Cache populating read concurrent with schema alter may use the wrong schema version to interpret sstable data. Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may fail or cause a node crash after schema alter. " * tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla: tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read tests: row_cache_stress_test: Verify all entries are evictable at the end tests: row_cache_stress_test: Exercise single-partition reads tests: row_cache_stress_test: Add periodic schema alters tests: memtable_snapshot_source: Allow changing the schema tests: simple_schema: Prepare for schema altering row_cache: Record upgraded schema in memtable entries during update memtable: Extract memtable_entry::upgrade_schema() row_cache, mvcc: Prevent locked snapshots from being evicted row_cache: Make evict() not use invalidate_unwrapped() mvcc: Introduce partition_snapshot::touch() row_cache, mvcc: Do not upgrade schema of entries which are being updated row_cache: Use the correct schema version to populate the partition entry delegating_reader: Optimize fill_buffer() row_cache, memtable: Use upgrade_schema() flat_mutation_reader: Introduce upgrade_schema()	2019-10-07 14:43:36 +03:00
Nadav Har'El	f2f0f5eb0f	alternator: add https support Merged patch series from Piotr Sarna: This series adds HTTPS support for Alternator. The series comes with --https option added to alternator-test, which makes the test harness run all the tests with HTTPS instead of HTTP. All the tests pass, albeit with security warnings that a self-signed x509 certificate was used and it should not be trusted. Fixes #5042 Refs scylladb/seastar#685 Patches: docs: update alternator entry on HTTPS alternator-test: suppress the "Unverified HTTPS request" warning alternator-test: add HTTPS info to README.md alternator-test: add HTTPS to test_describe_endpoints alternator-test: add --https parameter alternator: add HTTPS support config: add alternator HTTPS port	2019-10-07 12:38:20 +03:00
Avi Kivity	969113f0c9	Update seastar submodule * seastar c21a7557f9...1f68be436f (6): > scheduling: Add per scheduling group data support > build: Include dpdk as a single object in libseastar.a > sharded: fix foreign_ptr's move assignment > build: Fix DPDK libraries linking in pkg-config file > http server: https using tls support > Make output_stream blurb Doxygen	2019-10-07 12:18:49 +03:00
Nadav Har'El	754add1688	alternator: fix Expected's BEGINS_WITH error handling The BEGINS_WITH condition in conditional updates (via Expected) requires that the given operand be either a string or a binary. Any other operand should result in a validation exception - not a failed condition as we generate now. This patch fixes the test for this case so it will succeed against Amazon DynamoDB (before this patch it fails - this failure was masked by a typo before commit `332ffa77ea`). The patch then fixes our code to handle this case correctly. Note that BEGINS_WITH handling of wrong types is now asymmetrical: A bad type in the operand is now handled differently from a bad type in the attribute's value. We add another check to the test to verify that this is the case. Fixes #5141 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191006080553.4135-1-nyh@scylladb.com>	2019-10-06 17:16:55 +03:00
Botond Dénes	d0fa5dc34d	scylla-gdb.py: introduce the downcast_vptr convenience function When debugging one constantly has to inspect object for which only a "virtual pointer" is available, that is a pointer that points to a common parent class or interface. Finding the concrete type and downcasting the pointer is easy enough but why do it manually when it is possible to automate it trivially? $downcast_vptr() returns any virtual pointer given to it, casted to the actual concrete object. Exlample: (gdb) p $1 $2 = (flat_mutation_reader::impl ) 0x60b03363b900 (gdb) p $downcast_vptr(0x60b03363b900) $3 = (combined_mutation_reader ) 0x60b03363b900 # The return value can also be dereferenced on the spot. (gdb) p *$downcast_vptr($1) $4 = {<flat_mutation_reader::impl> = {_vptr.impl = 0x46a3ea8 <vtable for combined_mutation_reader+16>, _buffer = {_impl = {<std::al...	2019-10-04 17:45:47 +03:00
Botond Dénes	434a41d39b	scylla-gdb.py: introduce the dereference_lw_shared_ptr convenience function Dereferencing an `seastar::lw_shared_ptr` is another tedious manual task. The stored pointer (`_p`) has to be casted to the right subclass of `lw_shared_ptr_counter_base`, which involves inspecting the code, then make writing a cast expression that gdb is willing to parse. This is something machines are so much better at doing. `$dereference_lw_shared_ptr` returns a pointer to the actual pointed-to object, given an instance of `seastar::lw_shared_ptr`. Example: (gdb) p $1._read_context $2 = {_p = 0x60b00b068600} (gdb) p $dereference_lw_shared_ptr($1._read_context) $3 = {<seastar::enable_lw_shared_from_this<cache::read_context>> = {<seastar::lw_shared_ptr_counter_base> = {_count = 1}, ...	2019-10-04 17:45:47 +03:00
Botond Dénes	f5de002318	scylla-gdb.py: scylla_sstables: also print the sstable filename And expose the method that obtains the file-name of an sstble object to python code.	2019-10-04 17:45:32 +03:00
Botond Dénes	ad7a668be9	scylla-gdb.py: scylla_task_histogram: expose internal parameters Make all the parameters of the sampling tweakable via command line arguments. I strived to keep full backward compatibility, but due to the limitations of `argparse` there is one "breaking" change. The optional positional size argument is now a non-positional argument as `argparse` doesn't support optional positional arguments. Added documentation for both the command itself as well as for all the arguments.	2019-10-04 17:44:40 +03:00
Botond Dénes	7767cc486e	scylla-gdb.py: make scylla_find usable from python code	2019-10-04 17:44:40 +03:00
Botond Dénes	9cdea440ef	scylla-gdb.py: add std_variant, a wrapper for std::variant Allows conveniently obtaining the active member via calling `get()`.	2019-10-04 17:44:40 +03:00
Botond Dénes	55e9097dd9	scylla-gdb.py: add std_list, a wrapper for an std::list std_list makes an `std::list` instance accessible from python code just like a regular (read-only) python container.	2019-10-04 17:44:40 +03:00
Botond Dénes	b8f0b3ba93	std_optional: fix get() Apparently there is now another layer of indirection: `std::_Storage`.	2019-10-04 17:43:40 +03:00
Tomasz Grabiec	020a537ade	tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read	2019-10-04 11:38:13 +02:00
Tomasz Grabiec	ebedefac29	tests: row_cache_stress_test: Verify all entries are evictable at the end	2019-10-04 11:38:12 +02:00
Tomasz Grabiec	1b95f5bf60	tests: row_cache_stress_test: Exercise single-partition reads make_single_key_reader() currently doesn't actually create single-partition readers because it doesn't set mutation_reader::forwarding::no when it creates individual readers. The readers will default to mutation_reader::forwarding::yes and actually create scanning readers in preparation for fast-forwarding across partitions. Fix by passing mutation_reader::forwarding::no.	2019-10-04 11:38:12 +02:00
Tomasz Grabiec	81dd17da4e	tests: row_cache_stress_test: Add periodic schema alters Reproduces #5127.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	2fc144e1a8	tests: memtable_snapshot_source: Allow changing the schema	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	22dde90dba	tests: simple_schema: Prepare for schema altering Currently, methods of simple_schema assume that table's schema doesn't change. Accessors like get_value() assume that rows were generated using simple_schema::_s. Because if that, the column_definition& for the "v" column is cached in the instance. That column_definiion& cannot be used to access objects created with a different schema version. To allow using simple_schema after schema changes, column_definition& caching is now tagged with the table schema version of origin. Methods which access schema-dependent objects, like get_value(), are now accepting schema& corresponding to the objects. Also, it's now possible to tell simple_schema to use a different schema version in its generator methods.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	e6afc89735	row_cache: Record upgraded schema in memtable entries during update Cache update may defer in the middle of moving of partition entry from a flushed memtable to the cache. If the schema was changed since the entry was written, it upgrades the schema of the partition_entry first but doesn't update the schema_ptr in memtable_entry. The entry is removed from the memtable afterward. If a memtable reader encounters such an entry, it will try to upgrade it assuming it's still at the old schema. That is undefined behavior in general, which may include: - read failures due to bad_alloc, if fixed-size cells are interpreted as variable-sized cells, and we misinterpret a value for a huge size - wrong read results - node crash This doesn't result in a permanent corruption, restarting the node should help. It's the more likely to happen the more rows there are in a partition. It's unlikely to happen with single-row partitions. Introduced in `70c7277`. Fixes #5128.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	ea461a3884	memtable: Extract memtable_entry::upgrade_schema()	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	90d6c0b9a2	row_cache, mvcc: Prevent locked snapshots from being evicted If the whole partition entry is evicted while being updated from the memtable, a subsequent read may populate the partition using the old version of data if it attempts to do it before cache update advances past that partition. Partial eviction is not affected because populating reads will notice that there is a newer snapshot corresponding to the updater. This can happen only in OOM situations where the whole cache gets evicted. Affects only tables with multi-row partitions, which are the only ones that can experience the update of partition entry being preempted. Introduced in `70c7277`. Fixes #5134.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	57a93513bd	row_cache: Make evict() not use invalidate_unwrapped() invalidate_unwrapped() calls cache_entry::evict(), which cannot be called concurrently with cache update. invalidate() serializes it properly by calling do_update(), but evict() doesn't. The purpose of evict() is to stress eviction in tests, which can happen concurrently with cache update. Switch it to use memory reclaimer, so that it's both correct and more realistic. evict() is used only in tests.	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	c88a4e8f47	mvcc: Introduce partition_snapshot::touch()	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	25e2f87a37	row_cache, mvcc: Do not upgrade schema of entries which are being updated When a read enters a partition entry in the cache, it first upgrades it to the current schema of the cache. The same happens when an entry is updated after a memtable flush. Upgrading the entry is currently performed by squashing all versions and replacing them with a single upgraded version. That has a side effect of detaching all snapshots from the partition entry. Partition entry update on memtable flush is writing into a snapshot. If that snapshot is detached by a schema upgrade, the entry will be missing writes from the memtable which fall into continuous ranges in that entry which have not yet been updated. This can happen only if the update of the entry is preempted and the schema was altered during that, and a read hit that partition before the update went past it. Affects only tables with multi-row partitions, which are the only ones that can experience the update of partition entry being preempted. The problem is fixed by locking updated entries and not upgrading schema of locked entries. cache_entry::read() is prepared for this, and will upgrade on-the-fly to the cache's schema. Fixes #5135	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	0675088818	row_cache: Use the correct schema version to populate the partition entry The sstable reader which populates the partition entry in the cache is using the schema of the partition entry snapshot, which will be the schema of the cache at the time the partition was entered. If there was a schema change after the cache reader entered the partition but before it created the sstable reader, the cache populating reader will interpret sstable fragments using the wrong schema version. That is more likely if partitions have many rows, and the front of the partition is populated. With single-row partitions that's unlikely to happen. That is undefined behavior in general, which may include: - read failures due to bad_alloc, if fixed-size cells are interpreted as variable-sized cells, and we misinterpret a value for a huge size - wrong read results - node crash This doesn't result in a permanent corruption, restarting the node should help. Fixes #5127.	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	10992a8846	delegating_reader: Optimize fill_buffer() Use move_buffer_content_to() which is faster than fill_buffer_from() because it doesn't involve popping and pushing the fragments across buffers. We save on size estimation costs.	2019-10-03 22:03:28 +02:00
Piotr Sarna	07ac3ea632	docs: update alternator entry on HTTPS The HTTPS entry is updated - it's now supported, but still misses the same features as HTTP - CRC headers, etc.	2019-10-03 19:10:30 +02:00
Piotr Sarna	b63077a8dc	alternator-test: suppress the "Unverified HTTPS request" warning Running with --https and a self-signed certificate results in a flood of expected warnings, that the connection is not to be trusted. These warnings are silenced, as users runing a local test with --https usually use self-signed certificates.	2019-10-03 19:10:30 +02:00
Piotr Sarna	e65fd490da	alternator-test: add HTTPS info to README.md A short paragraph about running tests with `--https` and configuring the cluster to work correctly with this parameter is added to README.md.	2019-10-03 19:10:30 +02:00
Piotr Sarna	0d28d7f528	alternator-test: add HTTPS to test_describe_endpoints The test_describe_endpoints test spawns another client connection to the cluster, so it needs to be HTTPS-aware in order to work properly with --https parameter.	2019-10-03 19:10:30 +02:00
Piotr Sarna	9fd77ed81d	alternator-test: add --https parameter Running with --https parameter will result in sending the requests via HTTPS instead of HTTP. By default, port 8043 is used for a local cluster. Before running pytest --https, make sure that Scylla was properly configured to initialize a HTTPS alternator server by providing the alternator_tls_port parameter. The HTTPS-based connection runs with verification disabled, otherwise it would not work with self-signed certificates, which are useful for tests.	2019-10-03 19:10:30 +02:00
Piotr Sarna	e1b0537149	alternator: add HTTPS support By providing a server based on a TLS socket, it's now possible to serve HTTPS requests in alternator. The HTTPS server is enabled by setting its port in scylla.yaml: alternator_tls_port=XXXX. Alternator TLS relies on the existing TLS configuration, which is provided by certificate, keyfile, truststore, priority_string options. Fixes #5042	2019-10-03 19:10:30 +02:00
Piotr Sarna	b42eb8b80a	config: add alternator HTTPS port The config variable will be used to set up a TLS-based server for serving alternator HTTPS requests.	2019-10-03 19:10:29 +02:00
Nadav Har'El	9d4e71bbc6	alternator-test: fix misleading xfail message The test test_update_expression_function_nesting() fails because DynamoDB don't allow an expression like list_append(list_append(:val1, :val2), :val3) but Alternator doesn't check for this (and supports this expression). The "xfail" message was outdated, suggesting that the test fails because the "SET" expression isn't supported - but it is. So replace the message by a more accurate one. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190915104708.30471-1-nyh@scylladb.com>	2019-10-03 18:45:03 +03:00
Nadav Har'El	9747019e7b	alternator: implement additional Expected operators Merged patch set from Dejan Mircevski implementing some of the missing operators for Expected: NE, IN, NULL and NOT_NULL. Patches: alternator: Factor out Expected operand checks alternator: Implement NOT_NULL operator in Expected alternator: Implement NULL operator in Expected alternator: Fix expected_1_null testcase alternator: Implement IN operator in Expected alternator: Implement NE operator in Expected alternator: Factor out common code in Expected	2019-10-03 18:12:38 +03:00
Konstantin Osipov	25ffd36d21	lwt: prepare the expression tree for IF condition evaluation Frozen empty lists/map/sets are not equal to null value, whil multi-cell empty lists/map/sets are equal to null values. Return a NULL value for an empty multi-cell set or list if we know the receiver is not frozen - this makes it easy to compare the parameter with the receiver. Add a test case for inserting an empty list or set - the result is indistinguishable from NULL value. Message-Id: <20191003092157.92294-2-kostja@scylladb.com>	2019-10-03 14:56:25 +02:00
Avi Kivity	3cb081eb84	Merge " hinted handoff: fix races during shutdown and draining" from Vlad " Fix races that may lead to use-after-free events and file system level exceptions during shutdown and drain. The root cause of use-after-free events in question is that space_watchdog blocks on end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as it's accessed even if the corresponding end_point_hints_manager instance is destroyed in the context of manager::drain_for(). File system exceptions may occur when space_watchdog attempts to scan a directory while it's being deleted from the drain_for() context. In case of such an exception new hints generation is going to be blocked - including for materialized views, till the next space_watchdog round (in 1s). Issues that are fixed are #4685 and #4836. Tested as follows: 1) Patched the code in order to trigger the race with (a lot) higher probability and running slightly modified hinted handoff replace dtest with a debug binary for 100 times. Side effect of this testing was discovering of #4836. 2) Using the same patch as above tested that there are no crashes and nodes survive stop/start sequences (they were not without this series) in the context of all hinted handoff dtests. Ran the whole set of tests with dev binary for 10 times. " * 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla: hinted handoff: fix a race on a directory removal between space_watchdog and drain_for() hinted handoff: make taking file_update_mutex safe db::hints::manager::drain_for(): fix alignment db::hints::manager: serialize calls to drain_for() db::hints: cosmetics: identation and missing method qualifier	2019-10-03 14:38:00 +03:00
Tomasz Grabiec	aad1307b14	row_cache, memtable: Use upgrade_schema()	2019-10-03 13:28:33 +02:00
Tomasz Grabiec	3177732b35	flat_mutation_reader: Introduce upgrade_schema()	2019-10-03 13:28:33 +02:00
Asias He	a9b95f5f01	repair: Fix tracker::start and tracker::done in case of error The operation after gate.enter() in tracker::start() can fail and throw, we should call gate.leave() in such case to avoid unbalanced enter and leave calls. tracker::done() has similar issue too. Fix it by removing the gate enter and leave logic in tracker start and done. A helper tracker::run() is introduced to take care of the gate and repair status. In addition, the error log is improved. It now logs exceptions on all shards in the summary. e.g., [shard 0] repair - repair id 1 failed: std::runtime_error ({shard 0: std::runtime_error (error0), shard 1: std::runtime_error (error1)}) Fixes #5074	2019-10-03 13:33:02 +03:00
Botond Dénes	00b432b61d	querier_cache: correctly account entries evicted on insertion in the population Currently, the population stat is not increased for entries that are evicted immediately on insert, however the code that does the eviction still decreases the population stat, leading to an imbalance and in some cases the underflow of the population stat. To fix, unconditionally increase the population stat upon inserting an entry, regardless of whether it is immediately evicted or not. Fixes: #5123 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>	2019-10-03 11:49:44 +03:00
Dejan Mircevski	ac98385d04	alternator: Factor out Expected operand checks Put all AttributeValuelist size verification under verify_operand_count(), rather than have some cases invoke verify_operand_count() while others verify it in check_*() functions. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 17:11:58 -04:00
Dejan Mircevski	de18b3240b	alternator:Implement NOT_NULL operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 16:23:59 -04:00
Dejan Mircevski	75960639a4	alternator: Implement NULL operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 16:19:14 -04:00
Dejan Mircevski	e4fd5f3ef0	alternator: Fix expected_1_null testcase Testcase "For NULL, AttributeValueList must be empty" accidentally used NOT_NULL instead of NULL. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 16:19:14 -04:00
Dejan Mircevski	b7ac510581	alternator: Implement IN operator in Expected Add check_IN() and a switch case that invokes it. Reactivate IN tests. Add a testcase for non-scalar attribute values. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 16:17:38 -04:00
Dejan Mircevski	56efa55a06	alternator: Implement NE operator in Expected Recognize "NE" as a new operator type, add check_NE() function, invoke it in verify_expected_one(), and reactivate NE tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 14:47:13 -04:00
Dejan Mircevski	af0462d127	alternator: Factor out common code in Expected Operand-count verification will be repeated a lot as more operators are implemented, so factor it out into verify_operand_count(). Also move `got` null checks to check_* functions, which reduces duplication at call sites. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 14:36:57 -04:00
Konstantin Osipov	e8c13efb41	lwt: move mutation hashers to mutation.hh Prepare mutation hashers for reuse in CAS implementation. Message-Id: <20190930202409.40561-2-kostja@scylladb.com>	2019-10-01 19:49:31 +02:00
Konstantin Osipov	6cde985946	lwt: remove code that no longer servers as a reference Remove ifdef'ed Java code, since LWT implementation is based on the current state of the origin. Message-Id: <20190930201022.40240-2-kostja@scylladb.com>	2019-10-01 19:46:15 +02:00
Konstantin Osipov	4d214b624b	lwt: ensure enum_set::of is constexpr. This allows using it to initialize const static members. Message-Id: <20190930200530.40063-2-kostja@scylladb.com>	2019-10-01 19:45:56 +02:00
Tomasz Grabiec	3b9bf9d448	Merge "storage_proxy: replace variadic futures with structs" from Avi Seastar variadic futures are deprecated, so replace with structs to avoid nasty deprecation warnings.	2019-10-01 19:32:55 +02:00
Avi Kivity	162730862d	storage_proxy: remove variadic future from query_partition_key_range_concurrent() Seastar variadic futures are deprecated, so replace with a nice struct.	2019-09-30 21:33:44 +03:00
Avi Kivity	968b34a2b4	storage_proxy: remove variadic future from digest_read_resolver Seastar variadic futures are deprecated, so replace with a nice struct.	2019-09-30 21:32:17 +03:00
Avi Kivity	90096da9f3	managed_ref: add get() accessor While a managed_ref emulates a reference more closely than it does a pointer, it is still nullable, so add a get() (similar to unique_ptr::get()) that can be nullptr if the reference is null. The immediate use will be mutation_partition::_static_row, which is often empty and takes up about 10% of a cache entry.	2019-09-30 20:55:36 +03:00
Nadav Har'El	c9aae13fae	docs/alternator/getting-started.md: fix indentation in example code The example Python code had wrong indentation, and wouldn't actually work if naively copy-pasted. Noticed by Noam Hasson. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190929091440.28042-1-nyh@scylladb.com>	2019-09-30 13:03:29 +03:00
Avi Kivity	c6b66d197b	Merge "Couple of preparatory patches for lwt" from Gleb " This is a collection of assorted patches that will be needed for LWT. Most of them are trivial, but one touches a lot of files, so have a good chance to cause rebase headache (I already had to rebase it on top of Alternator). Lets push them earlier instead of carrying them in the lwt branch. " * 'gleb/lwt-prepare-v2' of github.com:scylladb/seastar-dev: lwt: make _last_timestamp_micros static lwt: Add client_state::get_timestamp_for_paxos() function lwt: Pass client_state reference all the way to storage_proxy::query exceptions: Add a constructor for unavailable_exception that allows providing a custom message serializer: Add std::variant support lwt: Add missing functions to utils/UUID_gen.hh	2019-09-29 13:02:26 +03:00
Avi Kivity	9e990725d9	Merge "Simplify and explain from_varint_to_integer #5031 " from Rafael " This is the second version of the patch series. The previous one was just the second patch, this one adds more tests an another patch to make it easier to test that the new code has the same behavior as the old one. " * 'espindola/overflow-is-intentional' of https://github.com/espindola/scylla: types: Simplify and explain from_varint_to_integer Add more cast tests	2019-09-29 11:27:55 +03:00
Tomasz Grabiec	b0e0f29b06	db: read: Filter-out sstables using its first and last keys Affects single-partition reads only. Refs #5113 When executing a query on the replica we do several things in order to narrow down the sstable set we read from. For tables which use LeveledCompactionStrategy, we store sstables in an interval set and we select only sstables whose partition ranges overlap with the queried range. Other compaction strategies don't organize the sstables and will select all sstables at this stage. The reasoning behind this is that for non-LCS compaction strategies the sstables' ranges will typically overlap and using interval sets in this case would not be effective and would result in quadratic (in sstable count) memory consumption. The assumption for overlap does not hold if the sstables come from repair or streaming, which generates non-overlapping sstables. At a later stage, for single-partition queries, we use the sstables' bloom filter (kept in memory) to drop sstables which surely don't contain given partition. Then we proceed to sstable indexes to narrow down the data file range. Tables which don't use LCS will do unnecessary I/O to read index pages for single-partition reads if the partition is outside of the sstable's range and the bloom filter is ineffective (Refs #5112). This patch fixes the problem by consulting sstable's partition range in addition to the bloom filter, so that the non-overlapping sstables will be filtered out with certainty and not depend on bloom filter's efficiency. It's also faster to drop sstables based on the keys than the bloom filter. Tests: - unit (dev) - manual using cqlsh Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>	2019-09-28 19:42:57 +03:00
Tomasz Grabiec	b93cc21a94	sstables: Fix partition key count estimation for a range The method sstable::estimated_keys_for_range() was severely under-estimating the number of partitions in an sstable for a given token range. The first reason is that it underestimated the number of sstable index pages covered by the range, by one. In extreme, if the requested range falls into a single index page, we will assume 0 pages, and report 1 partition. The reason is that we were using get_sample_indexes_for_range(), which returns entries with the keys falling into the range, not entries for pages which may contain the keys. A single page can have a lot of partitions though. By default, there is a 1:20000 ratio between summary entry size and the data file size covered by it. If partitions are small, that can be many hundreds of partitions. Another reason is that we underestimate the number of partitions in an index page. We multiply the number of pages by: (downsampling::BASE_SAMPLING_LEVEL * _components->summary.header.min_index_interval) / _components->summary.header.sampling_level Using defaults, that means multiplying by 128. In the cassandra-stress workload a single partition takes about 300 bytes in the data file and summary entry is 22 bytes. That means a single page covers 22 * 20'000 = 440'000 bytes of the data file, which contains about 1'466 partitions. So we underestimate by an order of magnitude. Underestimating the number of partitions will result in too small bloom filters being generated for the sstables which are the output of repair or streaming. This will make the bloom filters ineffective which results in reads selecting more sstables than necessary. The fix is to base the estimation on the number of index pages which may contain keys for the range, and multiply that by the average key count per index page. Fixes #5112. Refs #4994. The output of test_key_count_estimation: Before: count = 10000 est = 10112 est([-inf; +inf]) = 512 est([0; 0]) = 128 est([0; 63]) = 128 est([0; 255]) = 128 est([0; 511]) = 128 est([0; 1023]) = 128 est([0; 4095]) = 256 est([0; 9999]) = 512 est([5000; 5000]) = 1 est([5000; 5063]) = 1 est([5000; 5255]) = 1 est([5000; 5511]) = 1 est([5000; 6023]) = 128 est([5000; 9095]) = 256 est([5000; 9999]) = 256 est(non-overlapping to the left) = 1 est(non-overlapping to the right) = 1 After: count = 10000 est = 10112 est([-inf; +inf]) = 10112 est([0; 0]) = 2528 est([0; 63]) = 2528 est([0; 255]) = 2528 est([0; 511]) = 2528 est([0; 1023]) = 2528 est([0; 4095]) = 5056 est([0; 9999]) = 10112 est([5000; 5000]) = 2528 est([5000; 5063]) = 2528 est([5000; 5255]) = 2528 est([5000; 5511]) = 2528 est([5000; 6023]) = 5056 est([5000; 9095]) = 7584 est([5000; 9999]) = 7584 est(non-overlapping to the left) = 0 est(non-overlapping to the right) = 0 Tests: - unit (dev) Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190927141339.31315-1-tgrabiec@scylladb.com>	2019-09-28 19:36:43 +03:00
Piotr Sarna	10f90d0e25	types: remove deprecated comment The comment does not apply anymore, as this definition is no more in database.hh. Message-Id: <a0b6ff851e1e3bcb5fcd402fbf363be7af0219af.1569580556.git.sarna@scylladb.com>	2019-09-27 19:32:17 +02:00
Dejan Mircevski	9a89e0c5ec	dbuild: Update README on interactive mode `dbuild` was recently (`24c732057`) updated to run in interactive mode when given no arguments; we can now update the README to mention that. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-27 16:33:27 +02:00
Dejan Mircevski	f8638d8ae1	alternator: Add build byproducts to .gitignore Add .pytest_cache and expressions.tokens to the top-level .gitignore. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-27 16:18:45 +02:00
Dejan Mircevski	332ffa77ea	alternator: Actually use BEGINS_WITH in its tests For some reason, BEGINS_WITH tests used EQ as comparison operator. Tests: pytest test_expected.py Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-26 22:41:34 +03:00
Tomasz Grabiec	5b0e48f25b	Merge "toppartitions: don't transport schema_ptr across shards" from Avi When the toppartitions operation gathers results, it copies partition keys with their schema_ptr:s. When these schema_ptr:s are copies or destroyed, they can cause leaks or premature frees of the schema in its original shard since reference count operations in are not atomic. Fix that by converting the schema_ptr to a global_schema_ptr during transportation. Fixes #5104 (direct bug) Fixes #5018 (schema prematurely freed, toppartitions previously executed on that node) Fixes #4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node) Tests: new test added that fails with the existing code in debug mode, manual toppartitions test	2019-09-26 17:09:54 +02:00
Avi Kivity	36b4d55b28	tests: add test for toppartitions cross-shard schema_ptr copy	2019-09-26 17:40:46 +03:00
Avi Kivity	670f398a8a	toppartitions: do not copy schema_ptr:s in item keys across shards Copying schema_ptrs across shards results in memory corruption since lw_shared_ptr does not use atomic operations for reference counts. Prevent that by converting schema_ptr:s to global_schema_ptr:s before shipping them across shards in the map operation, and converting them back to local schema_ptr:s in the reduce operation.	2019-09-26 17:26:40 +03:00
Avi Kivity	f015bd69b7	toppartitions: compare schemas using schema::id(), not pointer to schema This allows keys from different stages in the schema's like to compare equal. This is safe since the partition key cannot change, unlike the rest of the schema. More importantly, it will allow us to compare keys made local after a pass through global_schema_ptr, which does not guarantee that the schema_ptr conversion will be the same even when starting with the same global_schema_ptr.	2019-09-26 17:15:46 +03:00
Avi Kivity	ea4976a128	schema_registry: mark global_schema_ptr move constructor noexcept Throwing move constructors are a a pain; so we should try to make them noexcept. Currently, global_schema_ptr's move constructor throws an exception if used illegaly (moving from a different shard); this patch changes it to an assert, on the grounds that this error is impossible to recover from. The direct motivation for the patch is the desire to store objects containing a global_schema_ptr in a chunked_vector, to move lists of partition keys across shards for the topppartitions functionality. chunked_vector currently requires noexcept move constructors for its value_type.	2019-09-26 16:56:59 +03:00
Avi Kivity	ba64ec78cf	messaging_service: use rpc::tuple instead of variadic futures for rpc Since variadic future<> is deprecated, switch to rpc::tuple for multiple return values in rpc calls. This is more or less mechanical translation.	2019-09-26 12:09:31 +02:00
Tomasz Grabiec	9183e28f2c	Merge "Recreate dependent user types" from Rafael When a user type changes we were not recreating other uses types that use it. This patch series fixes that and makes it clear which code is responsible for it. In the system.types table a user type refers to another by name. When a user type is modified, only its entry in the table is changed. At runtime a user type has direct pointer to the types it uses. To handle the discrepancy we need to recreate any dependent types when a entry in system.types changes. Fixes #5049	2019-09-26 12:06:32 +02:00
Gleb Natapov	e0b303b432	lwt: make _last_timestamp_micros static If each client_state has its own copy of the variable two clients may generate timestamps that clash and needlessly create contention. Making the variable shared between all client_state on the same shard will make sure this will not happen to two clients on the same shard. It may still happen for two client on two different shards or two different nodes.	2019-09-26 11:44:00 +03:00
Gleb Natapov	622d21f740	lwt: Add client_state::get_timestamp_for_paxos() function Paxos needs a unique timestamp that is greater than some other timestamp, so that the next round had more chances to succeed. Add a function that returns such a timestamp.	2019-09-26 11:44:00 +03:00
Gleb Natapov	e72a105b5e	lwt: Pass client_state reference all the way to storage_proxy::query client_state holds a state to generate monotonically increasing unique timestamp. Queries with a SERIAL consistency level need it to generate a paxos round.	2019-09-26 11:44:00 +03:00
Gleb Natapov	556f65e8a1	exceptions: Add a constructor for unavailable_exception that allows providing a custom message	2019-09-26 11:44:00 +03:00
Gleb Natapov	209414b4eb	serializer: Add std::variant support	2019-09-26 11:44:00 +03:00
Gleb Natapov	f9209e27d4	lwt: Add missing functions to utils/UUID_gen.hh Some lwt related code is missing in our UUID implementation. Add it.	2019-09-26 11:44:00 +03:00
Rafael Ávila de Espíndola	5af8b1e4a3	types: recreate dependent user types. In the system.types table a user type refers to another by name. When a user type is modified, only its entry in the table is changed. At runtime a user type has direct pointer to the types it uses. To handle the discrepancy we need to recreate any dependent types when a entry in system.types changes. Fixes #5049 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	4c3209c549	types: Don't include dependent user types in update. The way schema changes propagate is by editing the system tables and comparing the before and after state. When a user type A uses another user type B and we modify B, the representation of A in the system table doesn't change, so this code was not producing any changes on the diff that the receiving side uses. Deleting it makes it clear that it is the receiver's responsibility to handle dependent user types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	34eddafdb0	types: Don't modify the type list in db::cql_type_parser::raw_builder With this patch db::cql_type_parser::raw_builder creates a local copy of the list of existing types and uses that internally. By doing that build() should have no observable behavior other than returning the new types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	d6b2e3b23b	types: pass a reference to prepare_internal We were never passing a null pointer and never saving a copy of the lw_shared_ptr. Passing a reference is more flexible as not all callers are required to hold the user_types_metadata in a lw_shared_ptr. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:40:30 -07:00
Avi Kivity	03260dd910	Update seastar submodule * seastar b56a8c5045...c21a7557f9 (3): > net: socket::{set,get}_reuseaddr() should not be virtual > iotune: print verbose message in case of shutdown errors > iotune: close test file on shutdown Fixes #4946.	2019-09-25 16:08:32 +03:00
Tomasz Grabiec	06b9818e98	Merge "storage_proxy: tolerate view_update_write_response_handler id not found on shutdown" from Benny 1. Add assert in remove_response_handler to make crashes like in #5032 easier to understand. 2. Lookup the view_update_write_response_handler id before calling timeout_cb and tolerate it not found. Just log a warning if this happened. Fixes #5032	2019-09-25 14:49:42 +02:00
Avi Kivity	83bc59a89f	Merge "mvcc: Fix incorrect schema version being used to copy the mutation when applying (#5099 )" from Tomasz " Currently affects only counter tables. Introduced in `27014a2`. mutation_partition(s, mp) is incorrect because it uses s to interpret mp, while it should use mp_schema. We may hit this if the current node has a newer schema than the incoming mutation. This can happen during table schema altering when we receive the mutation from a node which hasn't processed the schema change yet. This is undefined behavior in general. If the alter was adding or removing columns, this may result in corruption of the write where values of one column are inserted into a different column. Fixes #5095. " * 'fix-schema-alter-counter-tables' of https://github.com/tgrabiec/scylla: mvcc: Fix incorrect schema verison being used to copy the mutation when applying mutation_partition: Track and validate schema version in debug builds tests: Use the correct schema to access mutation_partition	2019-09-25 15:30:22 +03:00
Tomasz Grabiec	11440ff792	mvcc: Fix incorrect schema verison being used to copy the mutation when applying Currently affects only counter tables. Introduced in `27014a2`. mutation_partition(s, mp) is incorrect, because it uses s to interpret mp, while it should use mp_schema. We may hit this if the current node has a newer schema than the incoming mutation. This can happen during alter when we receive the mutation from a node which hasn't processed the schema change yet. This is undefined behavior in general. If the alter was adding or removing columns, this may result in corruption of the write where values of one column are inserted into a different column. Fixes #5095.	2019-09-25 11:28:07 +02:00
Tomasz Grabiec	bce0dac751	mutation_partition: Track and validate schema version in debug builds This patch makes mutation_partition validate the invariant that it's supposed to be accessed only with the schema version which it conforms to. Refs #5095	2019-09-25 10:27:06 +02:00
Avi Kivity	721fa44c4f	Update seastar submodule * seastar e51a1a8ed9...b56a8c5045 (3): > net: add support for UNIX-domain sockets > future: Warn on promise::set_exception with no corresponding future or task > Merge "Handle exceptions in repeat_until_value and misc cleanups" from Rafael	2019-09-25 11:21:57 +03:00
Benny Halevy	e9388b3f03	storage_proxy::drain_on_shutdown fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	b7c7af8a75	storage_proxy: validate id from view_update_handlers_list Handle a race where a write handler is removed from _response_handlers but not yet from _view_update_handlers_list. Fixes #5032 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	1fea5f5904	storage_proxy: refactor remove_response_handler Refactor remove_response_handler_entry out of remove_response_handler, to be called on a valid iterator found by _response_handlers.find(id). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	592c4bcfc2	storage_proxy: remove_response_handler: assert id was found Help identify cases like seen in #5032 where the handler id wasn't found from the on_down -> timeout_cb path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Raphael S. Carvalho	571fa94eb5	sstables/compaction_manager: Don't perform upgrade on shared SSTables compaction_manager::perform_sstable_upgrade() fails when it feeds compaction mechanism with shared sstables. Shared sstables should be ignored when performing upgrade and so wait for reshard to pick them up in parallel. Whenever a shared sstable is brought up either on restart or via refresh, reshard procedure kicks in. Reshard picks the highest supported format so the upgrade for shared sstable will naturally take place. Fixes #5056. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190925042414.4330-1-raphaelsc@scylladb.com>	2019-09-25 11:18:40 +03:00
Asias He	19e8c14ad1	gossiper: Improve the gossip timer callback lock handling (#5097 ) - Update the outdated comments in do_stop_gossiping. It was storage_service not storage_proxy that used the lock. More importantly, storage_service does not use it any more. - Drop the unused timer_callback_lock and timer_callback_unlock API - Use with_semaphore to make sure the semaphore usage is balanced. - Add log in gossiper::do_stop_gossiping when it tries to take the semaphore to help debug hang during the shutdown. Refs: #4891 Refs: #4971	2019-09-25 10:46:38 +03:00
Tomasz Grabiec	4d9b176aaa	tests: Use the correct schema to access mutation_partition	2019-09-24 19:46:57 +02:00
Botond Dénes	425cc0c104	doc: add debugging.md A documentation file that is intended to be a place for anything debugging related: getting started tutorial, tips and tricks and advanced guides. For now it contains a short introductions, some selected links to more in-depth documentation and some trips and tricks that I could think off the top of my head. One of those tricks describes how to load cores obtained from relocatable packages inside the `dbuild` container. I originally intended to add that to `tools/toolchain/README.md` but was convinced that `docs/debugging.md` would be a better place for this. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190924133110.15069-1-bdenes@scylladb.com>	2019-09-24 20:18:45 +03:00
Botond Dénes	d57ab83bc8	querier_cache: add `inserted` stat Recently we have seen a case where the population stat of the cache was corrupt, either due to misaccounting or some more serious corruption. When debugging something like that it would have been useful to know how many items have been inserted to the cache. I also believe that such a counter could be useful generally as well. Refs: #4918 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190924083429.43038-1-bdenes@scylladb.com>	2019-09-24 10:52:49 +02:00
Avi Kivity	8e8a048ada	Merge "lsa: Assert no cross-shard region locking #5090 " from Tomasz " We observed an abort on bad_alloc which was not caused by real OOM, but could be explained by cache region being locked from a different shard, which is not allowed, concurrently with memory reclamation. It's impossible now to prove this, or, if that was indeed the case, to determine which code path was attempting such lock. This patch adds an assert which would catch such incorrect locking at the attempt. Refs #4978 Tests: - unit (dev, release, debug) " * 'assert-no-xshard-lsa-locking' of https://github.com/tgrabiec/scylla: lsa: Assert no cross-shard region locking tests: Make managed_vector_test a seastar test	2019-09-23 19:52:47 +03:00
Avi Kivity	79d17f3c80	Update seastar submodule * seastar 2a526bb120...e51a1a8ed9 (2): > rpc: introduce rpc::tuple as a way to move away from variadic future > shared_future: don't warn on broken futures	2019-09-23 19:50:40 +03:00
Avi Kivity	1b8009d10c	sstables: compaction_manager: #include seastarx.hh Make it easier for the IDE to resolve references to the seastar namespace. In any case include files should be stand-alone and not depend on previously included files.	2019-09-23 16:12:49 +02:00
Avi Kivity	07af9774b3	relocatable: erase build directory from executable and debug info The build directory is meaningless, since it is typically some directory in a continuous integration server. That means someone debugging the relocatable package needs to issue the gdb command 'set substitute-path' with the correct arguments, or they lose source debugging. Doing so in the relocatable package build saves this step. The default build is not modified, since a typical local build benefits from having the paths hardcoded, as the debugger will find the sources automatically.	2019-09-23 13:08:15 +02:00
Tomasz Grabiec	eb08ab7ed9	lsa: Assert no cross-shard region locking We observed an abort on bad_alloc which was not caused by real OOM, but could be explained by cache region being locked from a different shard, which is not allowed, concurrently with memory reclamation. It's impossible now to prove this, or, if that was indeed the case, to determine which code path was attempting such lock. This patch adds an assert which would catch such incorrect locking at the attempt. Refs #4978	2019-09-23 12:51:29 +02:00
Tomasz Grabiec	8bedcd6696	tests: Make managed_vector_test a seastar test LSA will depend on seastar reactor being present.	2019-09-23 12:51:24 +02:00
Raphael S. Carvalho	b4cf429aab	sstables/LCS: Fix increased write amplification due to incorrect SSTable demotion LCS demotes a SSTable from a given level when it thinks that level is inactive. Inactive level means N rounds (compaction attempt) without any activity in it, in other words, no SSTable has been promoted to it. The problem happens because the metadata that tracks inactiveness of each level can be incorrectly updated when there's an ongoing compaction. LCS has parallel compaction disabled. So if a table finds itself running a long operation like cleanup that blocks minor compaction, LCS could incorrectly think that many levels need demotion, and by the time cleanup finishes, some demotions would incorrectly take place. This problem is fixed by only updating the counter that tracks inactiveness when compaction completes, so it's not incorrectly updated when there's an ongoing compaction for the table. Fixes #4919. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190917235708.8131-1-raphaelsc@scylladb.com>	2019-09-22 10:46:38 +03:00
Eliran Sinvani	280715ad45	Storage proxy: protect against infinite recursion in query_partition_key_range_concurrent A recent fix to #3767 limited the amount of ranges that can return from query_ranges_to_vnodes_generator. This with the combination of a large amount of token ranges can lead to an infinite recursion. The algorithm multiplies by factor of 2 (actualy a shift left by one) the amount of requested tokens in each recursion iteration. As long as the requested number of ranges is greater than 0, the recursion is implicit, and each call is scheduled separately since the call is inside a continuation of a map reduce. But if the amount of iterations is large enough (~32) the counter for requested ranges zeros out and from that moment on two things will happen: 1. The counter will remain 0 forever (02 == 0) 2. The map reduce future will be immediately available and this will result in the continuation being invoked immediately. The latter causes the recursive call to be a "regular" recursive call thus, through the stack and not the task queue of the scheduler, and the former causes this recursion to be infinite. The combination creates a stack that keeps growing and eventually overflows resulting in undefined behavior (due to memory overrun). This patch prevent the problem from happening, it limits the growth of the concurrency counter beyond twice the last amount of tokens returned by the query_ranges_to_vnodes_generator.And also makes sure it is not get stuck at zero. Testing: Unit test in dev mode. * Modified add 50 dtest that reproduce the problem Fixes #4944 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190922072838.14957-1-eliransin@scylladb.com>	2019-09-22 10:33:31 +03:00
Gleb Natapov	73e3d0a283	messaging_service: enable reuseaddr on messaging service rpc Fixes #4943 Message-Id: <20190918152405.GV21540@scylladb.com>	2019-09-19 11:43:03 +03:00
Rafael Ávila de Espíndola	4d0916a094	commitlog: Handle gate_closed_exception Before this patch, if the _gate is closed, with_gate throws and forward_to is not executed. When the promise<> p is destroyed it marks its _task as a broken promise. What happens next depends on the branch. On master, we warn when the shared_future is destroyed, so this patch changes the warning from a broken_promise to a gate closed. On 3.1, we warn when the promises in shared_future::_peers are destroyed since they no longer have a future attached: The future that was attached was the "auto f" just before the with_gate call, and it is destroyed when with_gate throws. The net result is that this patch fixes the warning in 3.1. I will send a patch to seastar to make the warning on master more consistent with the warning in 3.1. Fixes #4394 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190917211915.117252-1-espindola@scylladb.com>	2019-09-17 23:41:21 +02:00
Avi Kivity	60656d1959	Update seastar submodule * seastar 84d8e9fe9b...2a526bb120 (1): > iotune: fix exception handling in case test file creation fails Fixes #5001.	2019-09-16 19:39:14 +03:00
Glauber Costa	c9f2d1d105	do not crash in user-defined operations if the controller is disabled Scylla currently crashes if we run manual operations like nodetool compact with the controller disabled. While we neither like nor recommend running with the controller disabled, due to some corner cases in the controller algorithm we are not yet at the point in which we can deprecate this and are sometimes forced to disable it. The reason for the crash is that manual operations will invoke _backlog_of_shares, which returns what is the backlog needed to create a certain number of shares. That scan the existing control points, but when we run without the controller there are no control points and we crash. Backlog doesn't matter if the controller is disabled, and the return value of this function will be immaterial in this case. So to avoid the crash, we return something right away if the controller is disabled. Fixes #5016 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-09-16 18:26:57 +02:00
Avi Kivity	d77171e10e	build: adjust libthread_db file name to match gdb expectations gdb searches for libthread_db.so using its canonical name of libthread_db.so.1 rather than the file name of libthread_db-1.0.so, so use that name to store the file in the archive. Fixes #4996.	2019-09-16 14:48:42 +02:00
Avi Kivity	7502985112	Update seastar submodule * seastar b3fb4aaab3...84d8e9fe9b (8): > Use aio fsync if available > Merge "fix some tcp connection bugs and add reuseaddr option to a client socket" from Gleb > lz4: use LZ4_decompress_safe > reactor: document seastar::remove_file() > core/file.hh: remove redundant std::move() > core/{file,sstring}: do not add `const` to return value > http/api_docs: always call parent constructor > Add input_stream blurb	2019-09-16 11:52:55 +03:00
Piotr Sarna	feec3825aa	view: degrade shutdown bookkeeping update failures log to warn Currently, if updating bookkeeping operations for view building fails, we log the error message and continue. However, during shutdown, some errors are more likely to happen due to existing issues like #4384. To differentiate actual errors from semi-expected errors during shutdown, the latter are now logged with a warning level instead of error. Fixes #4954	2019-09-16 10:13:06 +03:00
Piotr Sarna	f912122072	main: log unexpected errors thrown on shutdown (#4993 ) Shutdown routines are usually implemented via the deferred_action mechanism, which runs a function in its destructor. We thus expect the function to be noexcept, but unfortunately it's not always the case. Throwing in the destructor results in terminating the program anyway, but before we do that, the exception can be logged so it's easier to investigate and pinpoint the issue. Example output before the patch: INFO 2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder terminate called without an active exception Aborting on shard 0. Backtrace: 0x000000000184a9ad (...) Example output after the patch: INFO 2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder ERROR 2019-09-10 12:49:05,858 [shard 0] init - Unexpected error on shutdown: std::runtime_error (Hello there!) terminate called without an active exception Aborting on shard 0. Backtrace: 0x000000000184a9ad (...)	2019-09-16 09:42:55 +03:00
Rafael Ávila de Espíndola	1d9ba4c79b	types: Simplify and explain from_varint_to_integer This simplifies the implementation of from_varint_to_integer and avoids using the fact that a static_cast from cpp_int to uint64_t seems to just keep the low 64 bits. The boost release notes (https://www.boost.org/users/history/version_1_67_0.html) implies that the conversion function should return the maximum value a uint64_t can hold if the original value is too large. The idea of using a & with ~0 is a suggestion from the boost release notes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-15 14:44:54 -07:00
Rafael Ávila de Espíndola	6611e9faf7	Add more cast tests These cover converting a varint to a value smaller than 64 bits. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-15 14:44:54 -07:00
Benny Halevy	c22ad90c04	scyllatop: livedata, metric: expire absent metrics Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 19:48:09 +03:00
Benny Halevy	6e807a56e1	scyllatop: livedata: update all metrics based on new discovered list Update current results dictionary using the Metric.discover method. New results are added and missing results are marked as absent. (Both full metrics or specific keys) Previously, with prometheous, each metric.update called query_list resulting in O(n^2) when all metric were updated, like in the scylla_top dtest - causing test timeout when testing debug build. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 19:45:34 +03:00
Benny Halevy	16de4600a0	scyllatop: metric: return discover results as dict So that we can easily search by symbol for updating multiple results in a single pass. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	02707621d4	scyllatop: metric: update_info in discover So that all metric information can be retrieved in a single pass. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	3861460d3b	scyllatop: metric: refactor update method Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	99ab60fc27	scyllatop: metric: add_to_results In preparation to changing results to a dict use a method to add a new metric to the results. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	b489556807	scyllatop: metric: refactor discover and discover_with_help Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	8f7c721907	scyllatop: livedata: get rid of _setupUserSpecifiedMetrics Add self._metricPatterns member and merge _setupUserSpecifiedMetrics with _initializeMetrics. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	c17aee0dd3	scyllatop: add debug logging Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Tomasz Grabiec	79935df959	commitlog: replay: Respect back-pressure from memtable space to prevent OOM Commit log replay was bypassing memtable space back-pressure, and if replay was faster than memtable flush, it could lead to OOM. The fix is to call database::apply_in_memory() instead of table::apply(). The former blocks when memtable space is full. Fixes #4982. Tests: - unit (release) - manual, replay with memtable flush failin and without failing Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>	2019-09-15 11:51:56 +03:00
Tomasz Grabiec	3c49b2960b	gdb: Introduce 'scylla memtables' Example output: (gdb) scylla memtables table "ks_truncate"."standard1": (memtable) 0x60c0005a5500: total=131072, used=131072, free=0, flushed=0 table "keyspace1"."standard1": (memtable) 0x60c0005a6000: total=5144444928, used=4512728524, free=631716404, flushed=0 (memtable) 0x60c0005a8a80: total=426901504, used=374294312, free=52607192, flushed=0 (memtable) 0x60c000eb6a80: total=0, used=0, free=0, flushed=0 table "system_traces"."sessions_time_idx": (memtable*) 0x60c0005a4d80: total=131072, used=131072, free=0, flushed=0 Message-Id: <1568133476-22463-1-git-send-email-tgrabiec@scylladb.com>	2019-09-15 10:39:55 +03:00
Kamil Braun	9bf4fe669f	Auto-expand replication_factor for NetworkTopologyStrategy (#4667 ) If the user supplies the 'replication_factor' to the 'NetworkTopologyStrategy' class, it will expand into a replication factor for each existing DC for their convenience. Resolves #4210. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-09-15 10:38:09 +03:00
Tomasz Grabiec	8517eecc28	Revert "Simplify db::cql_type_parser::parse" This reverts commit `7f64a6ec4b`. Fixes #5011 The reverted commit exposes #3760 for all schemas, not only those which have UDTs. The problem is that table schema deserialization now requires keyspace to be present. If the replica hasn't received schema changes which introduce the keyspace yet, the write will fail.	2019-09-12 12:45:21 +02:00
Nadav Har'El	67a07e9cbc	README.md: mention Alternator Mention on the top-level README.md that Scylla by default is compatible with Cassandra, but also has experimental support for DynamoDB's API. Provide links to alternator/alternator.md and alternator/getting-started.md with more information about this feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190911080913.10141-1-nyh@scylladb.com>	2019-09-11 18:01:58 +03:00
Avi Kivity	c08921b55a	Merge "Alternator - Add support for DynamoDB Compatible API in Scylla" from Nadav & Piotr " In this patch set, written by Piotr Sarna and myself, we add Alternator - a new Scylla feature adding compatibility with the API of Amazon DynamoDB(TM). DynamoDB's API uses JSON-encoded requests and responses which are sent over an HTTP or HTTPS transport. It is described in detail on Amazon's site: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/ Our goal is that any application written to use Amazon DynamoDB could be run, unmodified, against Scylla with Alternator enabled. However, at this stage the Alternator implementation is incomplete, and some of DynamoDB's API features are not yet supported. The extent of Alternator's compatibility with DynamoDB is described in the document docs/alternator/alternator.md included in this patch set. The same document also describes Alternator's design (and also points to a longer design document). By default, Scylla continues to listen only to Cassandra API requests and not DynamoDB API requests. To enable DynamoDB-API compatibility, you must set the alternator-port configuration option (via command line or YAML) to the port on which you wish to listen for DynamoDB API requests. For more information, see docs/alternator/alternator.md. The document docs/alternator/getting-started.md also contains some examples of how to get started with Alternator. " * 'alternator' of https://github.com/nyh/scylla: (272 commits) Added comments about DAX, monitoring and more alternator: fix usage of client_state alternator-test: complete test_expected.py for rest of comparison operators alternator-test: reproduce bug in Expected with EQ of set value alternator: implement the Expected request parameter alternator: add returning PAY_PER_REQUEST billing mode alternator: update docs/alternator.md on GSI/LSI situation Alternator: Add getting started document for alternator move alternator.md to its own directory alternator-test: add xfail test for GSI with 2 regular columns alternator/executor.cc: Latencies should use steady_clock alternator-test: fix LSI tests alternator-test: fix test_describe_endpoints.py for AWS run alternator-test: test_describe_endpoints.py without configuring AWS alternator: run local tests without configuring AWS alternator-test: add LSI tests alternator-test: bump create table time limit to 200s alternator: add basic LSI support alternator: rename reserved column name "attrs" alternator: migrate make_map_element_restriction to string view ...	2019-09-11 18:01:05 +03:00
Dor Laor	7d639d058e	Added comments about DAX, monitoring and more	2019-09-11 18:01:05 +03:00
Nadav Har'El	c953aa3e20	alternator-test: complete test_expected.py for rest of comparison operators This patch adds tests for all the missing comparion operators in the Expected parameter (the old-style parameter for conditional operations). All these new tests are now xfailing on Alternator (and succeeding on DynamoDB), because these operators are not yet implemented in Alternator (we only implemented EQ and BEGINS_WITH, so far - the rest are easy but need to be implemented). The test_expected.py is now hopefully comprehensive, covering the entire feature set of the "Expected" parameter and all its various cases and subcases. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190910092208.23461-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	23bb3948ee	alternator-test: reproduce bug in Expected with EQ of set value Our implementation of the "EQ" operator in Expected (conditional operation) just compares the JSON represntation of the values. This is almost always correct, but unfortunately incorrect for sets - where we can have two equal sets despite having a different order. This patch just adds an (xfailing) test for this bug. The bug itself can be fixed in the future in one of several ways including changing the implementation of EQ, or changing the serialization of sets so they'll always be sorted in the same way. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190909125147.16484-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	13d657b20d	alternator: implement the Expected request parameter In this patch we implement the Expected parameter for the UpdateItem, PutItem and DeleteItem operations. This parameter allows a conditional update - i.e., do an update only if the existing value of the item matches some condition. This is the older form of conditional updates, but is still used by many applications, including Amazon's Tic-Tac-Toe demo. As usual, we do not yet provide isolation guarantees for read-modify-write operations - the item is simply read before the modification, and there is no protection against concurrent operation. This will of course need to be addressed in the future. The Expected parameter has a relatively large number of variations, and most of them are supported by this code, except that currenly only two comparison operators are supported (EQ and BEGINS_WITH) out of the 13 listed in the documentation. The rest will be implemented later. This patch also includes comprehensive tests for the Expected feature. These tests are almost exhaustive, except for one missing part (labled FIXME) - among the 13 comparison operations, the tests only check the EQ and BEGINS_WITH operators. We'll later need to add checks to the rest of them as well. As usual, all the tests pass on Amazon DynamoDB, and after this patch all of them succeed on Alternator too. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190905125558.29133-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	c5fc48d1ee	alternator: add returning PAY_PER_REQUEST billing mode In order for Spark jobs to work correctly, a hardcoded PAY_PER_REQUEST billing mode entry is returned when describing a table with a DescribeTable request. Also, one test case in test_describe_table.py is no longer marked XFAIL. Message-Id: <a4e6d02788d8be48b389045e6ff8c1628240197c.1567688894.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	b58eadd6c9	alternator: update docs/alternator.md on GSI/LSI situation Update docs/alternator.md on the current level of compatibility of our GSI and LSI implementation vs. DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190904120730.12615-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Eliran Sinvani	a6f600c54f	Alternator: Add getting started document for alternator This patch adds a getting started document for alternator, it explains how to start up a cluster that has an alternator API port open and how to test that it works using either an application or some simple and minimal python scripts. The goal of the document is to get a user to have an up and running docker based cluster with alternator support in the shortest time possible.	2019-09-11 18:01:05 +03:00
Eliran Sinvani	573ff2de35	move alternator.md to its own directory As part of trying to make alternator more accessible to users, we expect more documents to be created so it seems like a good idea to give all of the alternator docs their own directory.	2019-09-11 18:01:05 +03:00
Piotr Sarna	6579a3850a	alternator-test: add xfail test for GSI with 2 regular columns When updating the second regular base column that is also a view key, the code in Scylla will assume it only needs to update an entry instead of replacing an old one. This leads to inconsitencies exposed in the test case. Message-Id: <5dfeb9f61f986daa6e480e9da4c7aabb5a09a4ec.1567599461.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Amnon Heiman	722b4b6e98	alternator/executor.cc: Latencies should use steady_clock To get a correct latency estimations executor should use a higher clock resolution. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	b470137cea	alternator-test: fix LSI tests LSI tests are amended, so they no longer needlessly XPASS: * two xpassing tests are no longer marked XFAIL * there's an additional test for partial projection that succeeds on DynamoDB and does not work fine yet in alternator Message-Id: <0418186cb6c8a91de84837ffef9ac0947ea4e3d3.1567585915.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	dc1d577421	alternator-test: fix test_describe_endpoints.py for AWS run The previous patch fixed test_describe_endpoints.py for a local run without an AWS configuration. But when running with "--aws", we do need to use that AWS configuration, and this patch fixes this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	897dffb977	alternator-test: test_describe_endpoints.py without configuring AWS Even when running against a local Alternator, Boto3 wants to know the region name, and AWS credentials, even though they aren't actually needed. For a local run, we can supply garbage values for these settings, to allow a user who never configured AWS to run tests locally. Running against "--aws" will, of course, still require the user to configure AWS. The previous patch already fixed this for most tests, this patch fixes the same issue in test_describe_endpoints.py, which had a separate copy of the problematic code. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	b39101cd04	alternator: run local tests without configuring AWS Even when running against a local Alternator, Boto3 wants to know the region name, and AWS credentials, even though they aren't actually needed. For a local run, we can supply garbage values for these settings, to allow a user who never configured AWS to run tests locally. Running against "--aws" will, of course, still require the user to configure AWS. Also modified the README to be clearer, and more focused on the local runs. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708121420.7485-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	efff187deb	alternator-test: add LSI tests Cases for local secondary indexes are added - loosely based on test_gsi.py suite.	2019-09-11 18:01:05 +03:00
Piotr Sarna	927dc87b9c	alternator-test: bump create table time limit to 200s Unfortunately the previous 100s limit proved to be not enough for creating tables with both local and global indexes attached to them. Empirically 200s was chosen as a safe default, as the longest test oscillated around 100s with the deviation of 10s.	2019-09-11 18:01:05 +03:00
Piotr Sarna	2fcd1ff8a9	alternator: add basic LSI support With this patch, LocalSecondaryIndexes can be added to a table during its creation. The implementation is heavily shared with GlobalSecondaryIndexes and as such suffers from the same TODOs: projections, describing more details in DescribeTable, etc.	2019-09-11 18:01:05 +03:00
Nadav Har'El	7b8917b5cb	alternator: rename reserved column name "attrs" We currently reserve the column name "attrs" for a map of attributes, so the user is not allowed to use this name as a name of a key. We plan to lift this reservation in a future patch, but until we do, let's at least choose a more obscure name to forbid - in this patch ":attrs". It is even less likely that a user will want to use this specific name as a column name. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190903133508.2033-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	ef7903a90f	alternator: migrate make_map_element_restriction to string view In order to elide unnecessary copying and allow more copy elision in the future, make_map_element_restriction helper function uses string_view instead of a const string reference. Message-Id: <1a3e82e7046dc40df604ee7fbea786f3853fee4d.1567502264.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	fc946ddfba	alternator: clean error, not a crash, on reserved column name Currently, we reserve the name ATTRS_COLUMN_NAME ("attrs") - the user cannot use it as a key column name (key of the base table or GSI or LSI) because we use this name for the attribute map we add to the schema. Currently, if the user does attempt to create such a key column, the result is undefined (sometimes corrupt sstables, sometimes outright crashes). This patches fixes it to become a clean error, saying that this column name is currently reserved. The test test_create_table_special_column_name now cleanly fails, instead of crashing Scylla, so it is converted from "skip" to "xfail". Eventually we need to solve this issue completely (e.g., in rare cases rename columns to allow us to reserve a name like ATTRS_COLUMN_NAME, or alternatively, instead of using a fixed name ATTRS_COLUMN_NAME pick a different one different from the key column names). But until we do, better fail with a clear error instead of a crash. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190901102832.7452-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	d64980f2ae	alternator-test: add initial test_condition_expression file The file initially consists of a very simple case that succeeds with `--aws` and expectedly fails without it, because the expression is not implemented yet.	2019-09-11 18:01:05 +03:00
Piotr Sarna	80edc00f62	alternator-test: add tests for unsupported expressions The test cases are marked XFAIL, as their expressions are not yet supported in alternator. With `--aws`, they pass.	2019-09-11 18:01:05 +03:00
Pekka Enberg	380a7be54b	dist/docker: Add support for Alternator This adds a "alternator-address" and "alternator-port" configuration options to the Docker image, so people can enable Alternator with "docker run" with: docker run --name some-scylla -d <image> --alternator-port=8080 Message-Id: <20190902110920.19269-1-penberg@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	3fae8239fa	alternator: throw on unsupported expressions When an unsupported expression parameter is encountered - KeyConditionExpression, ConditionExpression or FilterExpression are such - alternator will return an error instead of ignoring the parameter.	2019-09-11 18:01:05 +03:00
Amnon Heiman	811df711fb	alternator/executor: update the latencies histogram This patch update the latencies histogram for get, put, delete and update. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-09-11 18:01:05 +03:00
Amnon Heiman	4a6d1f5559	alternator/stats metrics: use labels and estimated histogram This patch make two chagnes to the alternator stats: 1. It add estimated_histogram for the get, put, update and delete operation 2. It changes the metrics naming, so the operation will be a label, it will be easier to handle, perform operation and display in this way. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	de53ed7cdd	alternator_test: mark test_gsi_3 as passing The test_gsi_3, involving creating a GSI with two key columns which weren't previously a base key, now passes, so drop the "xfail" marker. We still have problems with such materialized views, but not in the simple scenario tested by test_gsi_3. Later we should create a new test for the scenario which still fails, if any. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	0e6338ffd9	alternator: allow creating GSI with 2 base regular columns Creating an underlying materialized view with 2 regular base columns is risky in Scylla, as second's column liveness will not be correctly taken into account when ensuring view row liveness. Still, in case specific conditions are met: * the regular base column value is always present in the base row * no TTLs are involved then the materialized view will behave as expected. Creating a GSI with 2 base regular columns issues a warning, as it should be performed with care. Message-Id: <5ce8642c1576529d43ea05e5c4bab64d122df829.1567159633.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	3325e76c6f	alternator: fix default BillingMode It is important that BillingMode should default to PROVISIONED, as it does on DynamoDB. This allows old clients, which don't specify BillingMode at all, to specify ProvisionedThroughput as allowed with PROVISIONED. Also added a test case for this case (where BillingMode is absent). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190829193027.7982-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	395a97e928	alternator: correct error on missing index or table When querying on a missing index, DynamoDB returns different errors in case the entire table is missing (ResourceNotFoundException) or the table exists and just the index is missing (ValidationException). We didn't make this distinction, and always returned ValidationException, but this confuses clients that expect ResourceNotFoundException - e.g., Amazon's Tic-Tac-Toe demo. This patch adds a test for the first case (the completely missing table) - we already had a test for the second case - and returns the correct error codes. As usual the test passes against DynamoDB as well as Alternator, ensure they behave the same. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190829174113.5558-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	62c4ed8ee3	alternator: improve request logging We needlessly split the trace-level log message for the request to two messages - one containing just the operation's name, and one with the parameters. Moreover we printed them in the opposite order (parameters first, then the operation). So this patch combines them into one log message. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190829165341.3600-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	f755c22577	alternator-test: reproduce bug with using "attrs" as key column name Alternator puts in the Scylla table a column called "attrs" for all the non-key attributes. If the user happens to choose the same name, "attrs", for one of the key columns, the result of writing two different columns with the same name is a mess and corrupt sstables. This test reproduces this bug (and works against DynamoDB of course). Because the test doesn't cleanly fail, but rather leaves Scylla in a bad state from which it can't fully recover, the test is marked as "skip" until we fix this bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190828135644.23248-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	6b27eaf4d0	alternator: remove redundant key checks in UpdateItem Updating key columns is not allowed in UpdateItem requests, but the series introducing GSI support for regular columns also introduced redundant duplicates checks of this kind. This condition is already checked in resolve_update_path helper function and existing test_update_expression_cannot_modify_key test makes sure that the condition is checked. Message-Id: <00f83ab631f93b263003fb09cd7b055bee1565cd.1567086111.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	04a117cda3	alternator-test: improve test_update_expression_cannot_modify_key The test test_update_expression_cannot_modify_key() verifies that an update expression cannot modify one of the key columns. The existing test only tried the SET and REMOVE actions - this patch makes the test more complete by also testing the ADD and DELETE actions. This patch also makes the expected exception more picky - we now expect that the exception message contains the word "key" (as it, indeed, does on both DynamoDB and Alternator). If we get any other exception, there may be a problem. The test passed before this patch, and passes now as well - it's just stricter now. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190829135650.30928-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	81a97b2ac0	alternator-test: add test case for GSI with both keys A case which adds a global secondary index on a table with both hash and sort keys is added.	2019-09-11 18:01:05 +03:00
Piotr Sarna	615603877c	alternator: use from_single_value instead of from_singular in ck The code previously used clustering_key::from_singular() to compute a clustering key value. It works fine, but has two issues: 1. involves one redundant deserialization stage compared to from_single_value 2. does not work with compound clustering keys, which can appear when using indexes	2019-09-11 18:01:05 +03:00
Piotr Sarna	4474ceceed	alternator-test: enable passing tests With more GSI features implemented, tests with XPASS status are promoted to being enabled. One test case (test_gsi_describe) is partially done as DescribeTable now contains index names, but we could try providing more attributes (e.g. IndexSizeBytes and ItemCount from the test case), so the test is left in the XFAIL state.	2019-09-11 18:01:05 +03:00
Piotr Sarna	f922d6d771	alternator: Add 'mismatch' to serialization error message In order to match the tests and origin more properly, the error message for mismatched types is updated so it contains the word 'mismatch'.	2019-09-11 18:01:05 +03:00
Piotr Sarna	9dceea14f9	alternator: add describing GSI in DescribeTable The DescribeTable request now contains the list of index names as well. None of the attributes of the list are marked as 'required' in the documentation, so currently the implementation provides index names only.	2019-09-11 18:01:05 +03:00
Piotr Sarna	938a06e4c0	alternator: allow adding GSI-related regular columns to schema In order to be able to create a Global Secondary Index over a regular column, this column is upgraded from being a map entry to being a full member of the schema. As such, it's possible to use this column definition in the underlying materialized view's key.	2019-09-11 18:01:05 +03:00
Piotr Sarna	2a123925ca	alternator: add handling regular columns with schema definitions In order to prepare alternator for adding regular columns to schema, i.e. in order to create a materialized view over them, the code is changed so that updating no longer assumes that only keys are included in the table schema.	2019-09-11 18:01:05 +03:00
Piotr Sarna	befa2fdc80	alternator: start fetching all regular columns Since in the future we may want to have more regular columns in alternator tables' schemas, the code is changed accordingly, so all regular columns will be fetched instead of just the attribute map.	2019-09-11 18:01:05 +03:00
Piotr Sarna	53044645aa	alternator: avoid creating empty collection mutations If no regular column attributes are passed to PutItem, the attr collector serializes an empty collection mutation nonetheless and sends it. It's redundant, so instead, if the attr colector is empty, the collection does not get serialized and sent to replicas.	2019-09-11 18:01:05 +03:00
Nadav Har'El	317954fe19	alternator-test: add license blurbs Add copyright and license blurbs to all alternator-test source files. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825161018.10358-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	c9eb9d9c76	alternator: update license blurbs Update all the license blurbs to the one we use in the open-source Scylla project, licensed under the AGPL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825160321.10016-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	d6e671b04f	alternator: add initial tracing to requests Each request provides basic tracing information about itself. Example output from tracing: cqlsh> select request, parameters from system_traces.sessions where session_id = 39813070-c4ea-11e9-8572-000000000000; request \| parameters ------------------+----------------------------------------------------- Alternator Query \| {'query': '{"TableName": "alternator_test_15664", "KeyConditions": {"p": {"AttributeValueList": [{"S": "T0FE0QCS0X"}], "ComparisonOperator": "EQ"}}}'} cqlsh> select session_id, activity from system_traces.events where session_id = 39813070-c4ea-11e9-8572-000000000000; session_id \| activity --------------------------------------+----------------------------- 39813070-c4ea-11e9-8572-000000000000 \| Querying 39813070-c4ea-11e9-8572-000000000000 \| Performing a database query	2019-09-11 18:01:05 +03:00
Piotr Sarna	cb791abb9d	alternator: enable query tracing Probabilistic tracing can be enabled via REST API. Alternator will from now on create tracing sessions for its operations as well. Examples: # trace around 0.1% of all requests curl -X POST http://localhost:10000/storage_service/trace_probability?probability=0.001 # trace everything curl -X POST http://localhost:10000/storage_service/trace_probability?probability=1	2019-09-11 18:01:05 +03:00
Piotr Sarna	6c8c31bfc9	alternator: add client state Keeping an instance of client_state is a convenient way of being able to use tracing for alternator. It's also currently used in paging, so adding a client state to executor removes the need of keeping a dummy value.	2019-09-11 18:01:05 +03:00
Piotr Sarna	1ca9dc5d47	alternator: use correct string views in serialization String views used in JSON serialization should use not only the pointer returned by rapidjson, but also the string length, as it may contain \0 characters. Additionally, one unnecessary copy is elided.	2019-09-11 18:01:05 +03:00
Nadav Har'El	32b898db7b	alternator: docs/alternator.md: link to a longer document Add a link to a longer document (currently, around 40 pages) about DynamoDB's features and how we implemented or may implement them in Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825121201.31747-2-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	a5c3d11ccb	alternator: document choice of RF After changing the choice of RF in a previous patch, let's update the relevant part of docs/alternator.md. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825121201.31747-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	d20ec9f492	alternator: expand docs/alternator.md Expand docs/alternator.md with new sections about how to run Alternator, and a very brief introduction to its design. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190818164628.12531-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	9b0ef1a311	alternator: refuse CreateTable if uses unsupported features If a user tries to create a table with a unsupported feature - a local secondary index, a used-defined encryption key or supporting streams (CDC), let's refuse the table creation, so the application doesn't continue thinking this feature is available to it. The "Tags" feature is also not supported, but it is more harmless (it is used mostly for accounting purposes) so we do not fail the table creation because of it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190818125528.9091-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	ab25472034	alternator: migrate to visitor pattern in serialization Types can now be processed with a visitor pattern, which is more neat than a chain of if statements. Message-Id: <256429b7593d8ad8dff737d8ddb356991fb2a423.1566386758.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	42d2910f2c	alternator: add from_string with raw pointer to rjson from_string is a family of function that create rjson values from strings - now it's extended with accepting raw pointer and size. Message-Id: <d443e2e4dcc115471202759ecc3641ec902ed9e4.1566386758.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	2f53423a2f	alternator: automatically choose RF: 1 or 3 In CQL, before a user can create a table, they must create a keyspace to contain this table and, among other things, specify this keyspace's RF. But in the DynamoDB API, there is no "create keyspace" operation - the user just creates a table, and there is no way, and no opportunity, to specify the requested RF. Presumably, Amazon always uses the same RF for all tables, most likely 3, although this is not officially documented anywhere. The existing code creates the keyspace during Scylla boot, with RF=1. This RF=1 always works, and is a good choice for a one-node test run, but was a really bad choice for a real cluster with multiple nodes, so this patch fixes this choice: With this patch, the keyspace creation is delayed - it doesn't happen when the first node of the cluster boots, but only when the user creates the first table. Presumably, at that time, the cluster is already up, so at that point we can make the obvious choice automatically: a one-node cluster will get RF=1, a >=3 node cluster will get RF=3. The choice of RF is logged - and the choice of RF=1 is considered a warning. Note that with this patch, keyspace creation is still automatic as it was before. The user may manually create the keyspace via CQL, to override this automatic choice. In the future we may also add additional keyspace configuration options via configuration flags or new REST requests, and the keyspace management code will also likely change as we start to support clusters with multiple regions and global tables. But for now, I think the automatic method is easiest for users who want to test-drive Alternator without reading lengthy instructions on how to set up the keyspace. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190820180610.5341-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	1a1935eb72	alternator-test: add a test for wrong BEGINS_WITH target type The test ensures that passing a non-compatible type to BEGINS WITH, e.g. a number, results in a validation error. Tested both locally and remotely. Message-Id: <894a10d3da710d97633dd12b6ac54edccc18be82.1566291989.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	b7b998568f	alternator: add to CreateTable verification of BillingMode setting We allow BillingMode to be set to either PAY_PER_REQUEST (the default) or PROVISIONED, although neither mode is fully implemented: In the former case the payment isn't accounted, and in the latter case the throughput limits are not enforced. But other settings for BillingMode are now refused, and we add a new test to verify that. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190818122919.8431-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	66a2af4f7d	alternator-test: require a new-enough boto library The alternator tests want to exercise many of the DynamoDB API features, so they need a recent enough version of the client libraries, boto3 and botocore. In particular, only in botocore 1.12.54, released a year ago, was support for BillingMode added - and we rely on this to create pay-per-request tables for our tests. Instead of letting the user run with an old version of this library and get dozens of mysterious errors, in this patch we add a test to conftest.py which cleanly aborts the test if the libraries aren't new enough, and recommends a "pip" command to upgrade these libraries. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190819121831.26101-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	64bf2b29a8	alternator-test: exhaustive tests for DescribeTable operation The DescribeTable operation was currently implemented to return the minimal information that libraries and applications usually need from it, namely verifying that some table exists. However, this operation is actually supposed to return a lot more information fields (e.g., the size of the table, its creation date, and more) which we currently don't return. This patch adds a new test file, test_describe_table.py, testing all these additional attributes that DescribeTable is supposed to return. Several of the tests are marked xfail (expected to fail) because we did not implement these attributes yet. The test is exhaustive except for attributes that have to do with four major features which will be tested together with these features: GSI, LSI, streams (CDC), and backup/restore. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190816132546.2764-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	fbd2f5077d	alternator: enable timeouts on requests Currently Alternator starts all Scylla requests (including both reads and writes) without any timeout set. Because of bugs and/or network problems, Requests can theoretically hang and waste Scylla request for hours, long after the client has given up on them and closed their connection. The DynamoDB protocol doesn't let a user specify which timeout to use, so we should just use something "reasonable", in this patch 10 seconds. Remember that all DynamoDB read and write requests are small (even scans just scan a small piece), so 10 seconds should be above and beyond anything we actually expect to see in practice. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190812105132.18651-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	b2bd3bbc1f	alternator: add "--alternator-address" configuration parameter So far we had the "--alternator-port" option allowing to configure the port on which the Alternator server listens on, but the server always listened to any address. It is important to also be able to configure the listen address - it is useful in tests running several instances of Scylla on the same machine, and useful in multi-homed machines with several interfaces. So this patch adds the "--alternator-address" option, defaulting to 0.0.0.0 (to listen on all interfaces). It works like the many other "--*-address" options that Scylla already has. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190808204641.28648-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	ea41dd2cf8	alternator: docs/alternator.md more about filtering support Give more details about what is, and what isn't, currently supported in filtering of Scan (and Query) results. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190811094425.30951-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	88eed415bd	alternator: fix indentation It turns out that recent rjson patches introduced some buggy tabs instead of spaces due to bad IDE configuration. The indentation is restored to spaces.	2019-09-11 18:01:05 +03:00
Piotr Sarna	3c11428d8d	alternator-test: add QueryFilter validation cases QueryFilter validation was lately supplemented with non-key column checks, which is hereby tested.	2019-09-11 18:01:05 +03:00
Piotr Sarna	0e0dc14302	alternator-test: add scan case for key equality filtering With key equality filtering enabled, a test case for scanning is provided.	2019-09-11 18:01:05 +03:00
Piotr Sarna	f1641caa41	alternator: add filtering for key equality Until now, filtering in alternator was possible only for non-key column equality relations. This commit adds support for equality relations for key columns.	2019-09-11 18:01:05 +03:00
Piotr Sarna	a2828f9daa	alternator: add validation to QueryFilter QueryFilter, according to docs, can only contain non-key attributes.	2019-09-11 18:01:05 +03:00
Piotr Sarna	d055658fff	alternator: add computing key bounds from filtering Alternator allows passing hash and sort key restrictions as filters - it is, however, better to incorporate these restrictions directly into partition and clustering ranges, if possible. It's also necessary, as optimizations inside restrictions_filter assume that it will not be fed unneeded rows - e.g. if filtering is not needed on partition key restrictions, they will not be checked.	2019-09-11 18:01:05 +03:00
Piotr Sarna	9c05051b59	alternator: extract getting key value subfunction Currently the only utility function for getting key bytes from JSON was to parse a document with the following format: "key_column_name" : { "key_column_type" : VALUE }. However, it's also useful to parse only the inner document, i.e.: { "key_column_type" : VALUE }.	2019-09-11 18:01:05 +03:00
Piotr Sarna	c84019116a	alternator: make make_map_element_restriction static The function has no outside users and thus does not need to be exposed.	2019-09-11 18:01:05 +03:00
Piotr Sarna	3ee99a89b1	alternator: register filtering metrics Three metrics related to filtering are added to alternator: - total rows read during filtering operations - rows read and matched by filtering - rows read and dropped by filtering	2019-09-11 18:01:05 +03:00
Piotr Sarna	b3e35dab26	alternator: add bumping filtering stats When filtering is used in querying or scanning, the number of total filtered rows is added to stats.	2019-09-11 18:01:05 +03:00
Piotr Sarna	a6d098d3eb	alternator: add cql_stats to alternator stats Some underlying operations (e.g. paging) make use of cql_stats structure from CQL3. As such, cql_stats structure is added to alternator stats in order to gather and use these statistics.	2019-09-11 18:01:05 +03:00
Piotr Sarna	3ae54892cd	alternator: fix a comment typo s/Miscellenous/Miscellaneous/g	2019-09-11 18:01:05 +03:00
Piotr Sarna	ccf778578a	alternator: register read-before-write stats Read-before-write stat counters were already introduced, but the metrics needs to be added to a metric group as well in order to be available for users.	2019-09-11 18:01:05 +03:00
Nadav Har'El	6f81d0cb15	alternator: initial support for GSI This patch adds partial support for GSI (Global Secondary Index) in Alternator, implemented using a materialized view in Scylla. This initial version only supports the specific cases of the index indexing a column which was already part of the base table's key - e.g., indexing what used to be a sort key (clustering key) in the base table. Indexing of non-key attributes (which today live in a map) is not yet supported in this version. Creation of a table with GSIs is supported, and so is deleting the table. UpdateTable which adds a GSI to an existing table is not yet supported. Query and Scan operations on the index are supported. DescribeTable does not yet list the GSIs as it should. Seven previously-failing tests now pass, so their "xfail" tag is removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190808090256.12374-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	33611acf44	alternator: add stats for read-before-write A simple metric counting how many read-before-writes were executed is added. Message-Id: <d8cc1e9d77e832bbdeff8202a9f792ceb4f1e274.1565274797.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	ae59340c15	alternator: complement rjson.hh comments Some comments in rjson.hh header file were not clear and are hereby amended. Message-Id: <7fa4e2cf39b95c176af31fe66f404a6a51a25bec.1565275276.git.sarna@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	5eb583ab09	alternator: remove missing key FIXME The case for missing key in update_item was already properly fixed along with migrating from libjsoncpp to rapidjson, but one FIXME remained in the code by mistake. Message-Id: <94b3cf53652aa932a661153c27aa2cb1207268c7.1565271432.git.sarna@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	436f806341	alternator: remove decimal_type FIXME Decimal precision problems were already solved by commit d5a1854d93c9448b1d22c2d02eb1c46a286c5404, but one FIXME remained in the code by mistake. Message-Id: <381619e26f8362a8681b83e6920052919acf1142.1565271198.git.sarna@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	b29b753196	alternator: add comments to rjson The rapidjson library needs to be used with caution in order to provide maximum performance and avoid undefined behavior. Comments added to rjson.hh describe provided methods and potential pitfalls to avoid. Message-Id: <ba94eda81c8dd2f772e1d336b36cae62d39ed7e1.1565270214.git.sarna@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	7b02c524d0	alternator: remove a pointer-based workaround for future<json> With libjsoncpp we were forced to work around the problem of non-noexcept constructors by using an intermediate unique pointer. Objects provided by rapidjson have correct noexcept specifiers, so the workaround can be dropped.	2019-09-11 18:01:04 +03:00
Piotr Sarna	cb29d6485e	alternator: migrate to rapidjson library Profiling alternator implied that JSON parsing takes up a fair amount of CPU, and as such should be optimized. libjsoncpp is a standard library for handling JSON objects, but it also proves slower than rapidjson, which is hereby used instead. The results indicated that libjsoncpp used roughly 30% of CPU for a single-shard alternator instance under stress, while rapidjson dropped that usage to 18% without optimizations. Future optimizations should include eliding object copying, string copying and perhaps experimenting with different JSON allocators.	2019-09-11 18:01:04 +03:00
Piotr Sarna	0fd1354ef9	alternator: add handling rapidjson errors in the server If a JSON parsing error is encountered, it is transformed to a validation exception and returned to the user in JSON form.	2019-09-11 18:01:04 +03:00
Piotr Sarna	7064b3a2bf	alternator: add rapidjson helper functions Migrating from libjsoncpp to rapidjson proved to be beneficial for parsing performance. As a first step, a set of helper functions is provided to ease the migration process.	2019-09-11 18:01:04 +03:00
Piotr Sarna	0b0bfc6e54	alternator: add missing namespaces to status_type error.hh file implicitly assumed that seastar:: namespace is available when it's included, which is not always the case. To remedy that, seastar::httpd namespace is used explicitly.	2019-09-11 18:01:04 +03:00
Nadav Har'El	56309db085	alternator: correct catch table-already-exists exception Our CreateTable handler assumed that the function migration_manager::announce_new_column_family() returns a failed future if the table already exists. But in some of our code branches, this is not the case - the function itself throws instead of returning a failed future. The solution is to use seastar::futurize_apply() to handle both possibilities (direct exception or future holding an exception). This fixes a failure of the test_table.py::test_create_table_already_exists test case. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 18:01:04 +03:00
Nadav Har'El	d74b203dee	alternator: add docs/alternator.md This adds a new document, docs/alternator.md, about Alternator. The scope of this document should be expanded in the future. We begin here by introducing Alternator and its current compatibility level with Amazon DynamoDB, but it should later grow to explain the design of Alternator and how it maps the DynamoDB data model onto Scylla's. Whether this document should remain a short high-level overview, or a long and detailed design document, remains an open question. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190805085340.17543-1-nyh@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	75ee13e5f2	dependencies: add rapidjson The rapidjson fast JSON parsing library is used instead of libjsoncpp in the Alternator subproject. [avi: update toolchain image to include the new dependency] Message-Id: <a48104dec97c190e3762f927973a08a74fb0c773.1564995712.git.sarna@scylladb.com>	2019-09-11 18:00:44 +03:00
Nadav Har'El	5eaf73a292	alternator: fix sharing of a seastar::shared_ptr between threads The function attrs_type() return a supposedly singleton, but because it is a seastar::shared_ptr we can't use the same one for multiple threads, and need to use a separate one per thread. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190804163933.13772-1-nyh@scylladb.com>	2019-09-11 16:06:05 +03:00
Nadav Har'El	1b1ede9288	alternator: fix cross-shard use of CQL type objects The CQL type singletons like utf8_type et al. are separate for separate shards and cannot be used across shards. So whatever hash tables we use to find them, also needs to be per-shard. If we fail to do this, we get errors running the debug build with multiple shards. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190804165904.14204-1-nyh@scylladb.com>	2019-09-11 16:05:39 +03:00
Nadav Har'El	7eae889513	alternator-test: some more GSI tests Expand the GSI test suite. The most important new test is test_gsi_key_not_in_index(), where the index's key includes just one of the base table's key columns, but not a second one. In this case, the Scylla implementation will nevertheless need to add the second key column to the view (as a clustering key), even though it isn't considered a key column by the DynamoDB API. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190718085606.7763-1-nyh@scylladb.com>	2019-09-11 16:05:38 +03:00
Nadav Har'El	10ad60f7de	alternator: ListTables should not list materialized views Our ListTables implementation uses get_column_families(), which lists both base tables and materialized views. We will use materialized views to implement DynamoDB's secondary indexes, and those should not be listed in the results of ListTables. The patch also includes a test for this. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190717133103.26321-2-nyh@scylladb.com>	2019-09-11 16:04:29 +03:00
Nadav Har'El	676ada4576	alternator-test: move list_tables to util.py The list_tables() utility function was used only in test_table.py but I want to use it elsewhere too (in GSI test) so let's move it to util.py. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190717133103.26321-1-nyh@scylladb.com>	2019-09-11 16:04:28 +03:00
Piotr Sarna	f3963865f5	alternator: make set_sum exception more user-friendly As in case of set_diff, an exception message in set_sum should include the user-provided request (ADD) rather than our internal helper function set_sum.	2019-09-11 16:03:27 +03:00
Piotr Sarna	9dd8644e4a	alternator-tests: enable DELETE case for sets UpdateExpression's case for DELETE operation for sets is enabled.	2019-09-11 16:03:26 +03:00
Piotr Sarna	2b215b159c	alternator: implement set DELETE UpdateExpression's DELETE operation for set is implemented on top of set_diff helper function.	2019-09-11 16:02:25 +03:00
Piotr Sarna	fe72a6740c	alternator: add set difference helper function A function for computing set differene of two sets represented as JSON is added.	2019-09-11 16:01:03 +03:00
Nadav Har'El	e13c56be0b	alternator: fail attempt to create table with GSI Although we do not support GSI yet, until now we silently ignored CreateTable's GSI parameter, and the user wouldn't know the table wasn't created as intended. In this patch, GSI is still unsupported, but now CreateTable will fail with an error message that GSI is not supported. We need to change some of the tests which test the error path, and expect an error - but should not consider a table creation error as the expected error. After this patch, test_gsi.py still fails all the tests on Alternator, but much more quickly :-) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190711161420.18547-1-nyh@scylladb.com>	2019-09-11 16:00:01 +03:00
Piotr Sarna	336c90daaa	alternator-test: add stub case for set add duplication The test case for adding two sets with common values is added. This case is a stub, because boto3 transforms the result into a Python set, which removes duplicates on its own. A proper TODO is left in order to migrate this case to a lower-level API and check the returned JSON directly for lack of duplicates.	2019-09-11 16:00:00 +03:00
Piotr Sarna	67c95cb303	alternator-test: enable tests for ADD operation Tests for UpdateExpression::ADD are enabled.	2019-09-11 15:59:59 +03:00
Piotr Sarna	f29c2f6895	alternator: add ADD operation UpdateExpression is now able to perform ADD operation on both numbers and sets.	2019-09-11 15:59:00 +03:00
Piotr Sarna	a5f2926056	alternator: add helper function for adding sets A helper function that allows creating a set sum out of two sets represented in JSON is added.	2019-09-11 15:57:41 +03:00
Piotr Sarna	18686ff288	alternator: add unwrap_set It will be needed later to implement adding sets.	2019-09-11 15:56:15 +03:00
Piotr Sarna	09993cf857	alternator: add get_item_type_string helper function It will be useful later for ensuring that parameters for various functions have matching types.	2019-09-11 15:52:31 +03:00
Nadav Har'El	d54c82209c	alternator: fix Query verification of appropriate key columns The Query operation's conditions can be used to search for a particular hash key or both hash and sort keys - but not any other combinations. We previously forgot to verify most errors, so in this patch we add missing verifications - and tests to confirm we fail the query when DynamoDB does. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190711132720.17248-1-nyh@scylladb.com>	2019-09-11 15:51:27 +03:00
Nadav Har'El	fbe63ddcc4	alternator-test: more GSI tests Add more tests for GSI - tests that DescribeTable describes the GSI, and test the case of more than one GSI for a base table. Unfortunately, creating an empty table with two GSIs routinely takes on DynamoDB more than a full minute (!), so because we now have a test with two GSIs, I had to increase the timeout in create_test_table(). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190711112911.14703-1-nyh@scylladb.com>	2019-09-11 15:51:26 +03:00
Piotr Sarna	a3be9dda7f	alternator-test: enable if_not_exists-related tests Test cases that relied on the implementation of if_not_exists are enabled.	2019-09-11 15:51:25 +03:00
Piotr Sarna	cec82490d2	alternator: implement if_not_exists The if_not_exists function is implemented on the basis of recently added read-before write mechanism.	2019-09-11 15:50:22 +03:00
Piotr Sarna	b14e3c0e72	alternator: rename holds_path to a more generic name The holds_path() utility function is actually used to check if a value needs read before write, so its name is changed to more fitting check_needs_read_before_write.	2019-09-11 15:49:19 +03:00
Nadav Har'El	5fc7b0507e	alternator: fix bug in collection mutations Alternator currently keeps an item's attributes inside a map, and we had a serious bug in the way we build mutations for this map: We didn't know there was a requirement to build this mutation sorted by the attribute's name. When we neglect to do this sorting, this confuses Scylla's merging algorithms, which assume collection cells are thus sorted, and the result can be duplicate cells in a collection, and the visible effect is a mutation that seems to be ignored - because both old and new values exist in the collection. So this patch includes a new helper class, "attribute_collector", which helps collect attribute updates (put and del) and extract them in correctly sorted order. This helper class also eliminates some duplication of arcane code to create collection cells or deletions of collection cells. This patch includes a simple test that previously failed, and one xfail test that failed just because of this bug (this was the test that exposed this bug). Both tests now succeed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190709160858.6316-1-nyh@scylladb.com>	2019-09-11 15:48:18 +03:00
Nadav Har'El	5cce53fed9	alternator-test: exhaustive tests for GSI This patch adds what is hopefully an exhaustive test suite for the global secondary indexing (GSI) feature, and all its various complications and corner cases of how GSIs can be created, deleted, named, written, read, and more (the tests are heavily documented to explain what they are testing). All these tests pass on DynamoDB, and fail on Alternator, so they are marked "xfail". As we develop the GSI feature in Alternator piece by piece, we should make these tests start to pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708160145.13865-1-nyh@scylladb.com>	2019-09-11 15:48:17 +03:00
Nadav Har'El	9eea90d30d	alternator-test: another test for BatchWriteItem This adds another test for BatchWriteItem: That if one of the operations is invalid - e.g., has a wrong key type - the entire batch is rejected, and not none of its operations are done - even the valid ones. The test succeeds, because we already handle this case correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190707134610.30613-1-nyh@scylladb.com>	2019-09-11 15:48:16 +03:00
Nadav Har'El	01f4cf1373	alternator-test: test UpdateItem's SET with #reference Test an operation like SET #one = #two, where the RHS has a reference to a name, rather than the name itself. Also verify that DynamoDB gives an error if ExpressionAttributeNames includes names not needed by neither left or right hand side of such assignments. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708133311.11843-1-nyh@scylladb.com>	2019-09-11 15:48:15 +03:00
Piotr Sarna	e482f27e2f	alternator-test: add test for reading key before write The test case checks if reading keys in order to use their values in read-before-write updates works fine.	2019-09-11 15:48:14 +03:00
Piotr Sarna	7b605d5bec	alternator-test: add test case for nested read-before-write A test for read-before-write in nested paths (inside a function call or inside a +/- operator) is added.	2019-09-11 15:48:13 +03:00
Piotr Sarna	da795d8733	alternator-test: enable basic read-before-write cases With unsafe read-before-write implemented, simple cases can be enabled by removing their xfail flag.	2019-09-11 15:48:12 +03:00
Piotr Sarna	2e473b901a	alternator: fix indentation	2019-09-11 15:48:09 +03:00
Piotr Sarna	bf13564a9d	alternator: add unsafe read-before-write to update_item In order to serve update requests that depend on read-before-write, a proper helper function which fetches the existing item with a given key from the database is added. This read-before-write mechanism is not considered safe, because it provides no linearizability guarantees and offers no synchronization protection. As such, it should be consider a placeholder that works fine on a single machine and/or no concurrent access to the same key.	2019-09-11 15:45:21 +03:00
Piotr Sarna	2fb711a438	alternator: add context parameters to calculate_value The calculate_value utility function is going to need more context in order to resolve paths present in the right-hand side of update_item operators: update_info and schema.	2019-09-11 15:40:17 +03:00
Piotr Sarna	cbe1836883	alternator: add allowing key columns when resolving path Historically, resolving a path checked for key columns, which are not allowed to be on the left-hand side of the assignment. However, path resolving will now also be used for right-hand side, where it should be allowed to use the key value.	2019-09-11 15:39:15 +03:00
Piotr Sarna	20a6077fb3	alternator: add optional previous item to calculate_value In order to implement read-before-write in the future, calculate_value now accepts an additional parameter: previous_item. If read-before-write was performed, previous_item will contain an item for the given key which already exists in the database at the time of the update.	2019-09-11 15:38:13 +03:00
Piotr Sarna	784aaaa8ff	alternator: move describe_item implementation up It will be needed later to add read-before-write to update_item.	2019-09-11 15:37:13 +03:00
Nadav Har'El	bd4dfa3724	alternator-test: move create_test_table() to util.py This patch moves the create_test_table() utility function, which creates a test table with a unique name, from the fixtures (conftest.py) to util.py. This will allow reusing this function in tests which need to create tables but not through the existing fixtures. In particular we will need to do this for GSI (global secondary index) tests in the next patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708104438.5830-1-nyh@scylladb.com>	2019-09-11 15:37:12 +03:00
Nadav Har'El	ce13a0538c	alternator-test: expand tests of duplicate items in BatchWriteItem The tests we had for BatchWriteItem's refusal to accept duplicate keys only used test_table_s, with just a hash key. This patch adds tests for test_table, i.e., a table with both hash and sort keys - to check that we check duplicates in that case correctly as well. Moreover, the expanded tests also verify that although identical keys are not allowed, keys with just one component (hash or sort key) the same but the other not the same - are fine. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190705191737.22235-1-nyh@scylladb.com>	2019-09-11 15:37:11 +03:00
Nadav Har'El	9bc2685a92	alternator-test: run local tests without configuring AWS Even when running against a local Alternator, Boto3 wants to know the region name, and AWS credentials, even though they aren't actually needed. For a local run, we can supply garbage values for these settings, to allow a user who never configured AWS to run tests locally. Running against "--aws" will, of course, still require the user to configure AWS. Also modified the README to be clearer, and more focused on the local runs. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708121420.7485-1-nyh@scylladb.com>	2019-09-11 15:37:10 +03:00
Nadav Har'El	cb42c75e0a	alternator-test: don't hardcode us-east-1 region For "--aws" tests, use the default region chosen by the user in the AWS configuration (~/.aws/config or environment variable), instead of hard-coding "us-east-1". Patch by Pekka Enberg. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708105852.6313-1-nyh@scylladb.com>	2019-09-11 15:37:09 +03:00
Piotr Sarna	8f9e720f10	alternator-test: enable precision test for add With big_decimal-based implementation, the precision test passes. Message-Id: <6d631a43901a272cb9ebd349cb779c9677ce471e.1562318971.git.sarna@scylladb.com>	2019-09-11 15:37:08 +03:00
Piotr Sarna	78e495fac3	alternator: allow arithmetics without losing precision Calculating value represented as 'v1 + v2' or 'v1 - v2' was previously implemented with a double type, which offers limited precision. From now on, these computations are based on big_decimal, which allows returning values without losing precision. This patch depends on 'add big_decimal arithmetic operators' series. Message-Id: <f741017fe3d3287fa70618068bdc753bfc903e74.1562318971.git.sarna@scylladb.com>	2019-09-11 15:36:08 +03:00
Piotr Sarna	466f25b1e8	alternator-test: enable batch duplication cases With duplication checks implemented, batch write and delete tests no longer need to be marked @xfail. Message-Id: <6c5864607e06e8249101bd711dac665743f78d9f.1562325663.git.sarna@scylladb.com>	2019-09-11 15:36:07 +03:00
Piotr Sarna	eb7ada8387	alternator: add checking for duplicate keys in batches Batch writes and batch deletes do not allow multiple entries for the same key. This patch implements checking for duplicated entries and throws an error if applicable. Message-Id: <450220ba74f26a0893430cb903e4749f978dfd31.1562325663.git.sarna@scylladb.com>	2019-09-11 15:35:01 +03:00
Nadav Har'El	b810fa59c4	alternator-test: move utility functions to a new "util.py" Move some common utility functions to a common file "util.py" instead of repeating them in many test files. The utility functions include random_string(), random_bytes(), full_scan(), full_query(), and multiset() (the more general version, which also supports freezing nested dicts). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190705081013.1796-1-nyh@scylladb.com>	2019-09-11 15:35:00 +03:00
Nadav Har'El	2fb77ed9ad	alternator: use std::visit for reading std::variant The idiomatic way to use an std::variant depending the type holds is to use std::visit. This modern API makes it unnecessary to write many boiler-plate functions to test and cast the type of the variant, and makes it impossible to forget one of the options. So in this patch we throw out the old ways, and welcome the new. Thanks to Piotr Sarna for the idea. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190704205625.20300-1-nyh@scylladb.com>	2019-09-11 15:33:57 +03:00
Nadav Har'El	4d07e2b7c5	alternator: support BatchGetItem This patch adds to Alternator an implementation of the BatchGetItem operation, which allows to start a number of GetItem requests in parallel in a single request. The implementation is almost complete - the only missing feature is the ability to ask only for non-top-level attributes in ProjectionExpression. Everything else should work, and this patch also includes tests which, as usual, pass on DynamoDB and now also on Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:33:50 +03:00
Nadav Har'El	d1a5512a35	alternator: fix second boot Amazingly, it appears we never tested booting Alternator a second time :-) Our initialization code creates a new keyspace, and was supposed to ignore the error if this keyspace already existed - but we thought the error will come as an exceptional future, which it didn't - it came as a thrown exception. So we need to change handle_exception() to a try/catch. With this patch, I can kill Alternator and it will correctly start again. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:22:48 +03:00
Nadav Har'El	374162f759	alternator: generate error on spurious key columns Operations which take a key as parameter, namely GetItem, UpdateItem, DeleteItem and BatchWriteItem's DeleteRequest, already fail if the given key is missing one of the nessary key attributes, or has the wrong types for them. But they should also fail if the given key has spurious attributes beyond those actually needed in a key. So this patch adds this check, and tests to confirm that we do these checks correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:21:50 +03:00
Nadav Har'El	da4da6afbf	alternator: fix PutItem to really replace item. The PutItem operation, and also the PutRequest of BatchWriteItem, are supposed to completely replace the item - not to merge the new value with the previous value. We implemented this wrongly - we just wrote the new item forgetting a tombstone to remove the old item. So this patch fixes these operations, and adds tests which confirm the fix (as usual, these tests pass on DynamoDB, failed on Alternator before this patch, and pass after the patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:20:55 +03:00
Nadav Har'El	a0fffcebde	alternator: add support for DeleteRequest in BatchWriteItem Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:20:01 +03:00
Nadav Har'El	83b91d4b49	alternator: add DeleteItem Add support for the DeleteItem operation, which deletes an item. The basic deletion operation is supported. Still not supported are: 1. Parameters to conditionally delete (ConditionalExpression or Expected) 2. Parameters to return pre-delete content 3. ReturnItemCollectionMetrics (statistics relevant for tables with LSI) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:19:46 +03:00
Nadav Har'El	b09603ed9b	alternator: cleaner error on DeleteRequest In BatchWriteItem, we currently only support the PutRequest operation. If a user tries to use DeleteRequest (which we don't support yet), he will get a bizarre error. Let's test the request type more carefully, and print a better error message. This will also be the place where eventually we'll actually implement the DeleteRequest. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:16:02 +03:00
Nadav Har'El	a7f7ce1a73	alternator-test: tests for BatchWriteItem This patch adds more comprehensive tests for the BatchWriteItem operation, in a new file batch_test.py. The one test we already had for it was also moved from test_item.py here. Some of the test still xfail for two reasons: 1. Support for the DeleteRequest operation of BatchWriteItem is missing. 2. Tests that forbid duplicate keys in the same request are missing. As usual, all tests succeed on DynamoDB, and hopefully (I tried...) cover all the BatchWriteItem features. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:16:01 +03:00
Nadav Har'El	a8dd3044e2	alternator: support (most of) ProjectionExpression DynamoDB has two similar parameters - AttributesToGet and ProjectionExpression - which are supported by the GetItem, Scan and Query operations. Until now we supported only the older AttributesToGet, and this patch adds support to the newer ProjectionExpression. Besides having a different syntax, the main difference between AttributesToGet and ProjectionExpression is that the latter also allows fetching only a specific nested attribute, e.g., a.b[3].c. We do not support this feature yet, although it would not be hard to add it: With our current data representation, it means fetching the top-level attribute 'a', whose value is a JSON, and then post-filtering it to take out only the '.b[3].c'. We'll do that later. This patch also adds more test cases to test_projection_expression.py. All tests except three which check the nested attributes now pass, and those three xfail (they succeed on DynamoDB, and fail as expected on Alternator), reminding us what still needs to be done. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:15:01 +03:00
Nadav Har'El	98c4e646a5	alternator-test: tests for yet-unimplemented ProjectionExpression Our GetItem, Query and Scan implementations support the AttributesToGet parameter to fetch only a subset of the attributes, but we don't yet support the more elaborate ProjectionExpression parameter, which is similar but has a different syntax and also allows to specify nested document paths. This patch adds existive testing of all the ProjectionExpression features. All these tests pass against DynamoDB, but fail against the current Alternator so they are marked "xfail". These tests will be helpful for developing the ProjectionExpression feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:15:00 +03:00
Nadav Har'El	7c9e64ed81	alternator-test: more tests for AttributesToGet parameter The AttributesToGet parameter - saying which attributes to fetch for each item - is already supported in the GetItem, Query and Scan operations. However, we only had a test for it for it for Scan. This patch adds similar tests also for the GetItem and Query operations. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:14:59 +03:00
Nadav Har'El	9c53f33003	alternator-test: another test for top-level attribute overwrite Yet another test for overwriting a top-level attribute which contains a nested document - here, overwriting it by just a string. This test passes. In the current implementation we don't yet support updates to specific attribute paths (e.g. a.b[3].c) but we do support well writing and over-writing top-level attributes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:14:58 +03:00
Nadav Har'El	f6fa971e96	alternator: initial implementation of "+" and "-" in UpdateExpression This patch implements the last (finally!) syntactic feature of the UpdateExpression - the ability to do SET a=val1+val2 (where, as before, each of the values can be a reference to a value, an attribute path, or a function call). The implementation is not perfect: It adds the values as double-precision numbers, which can lose precision. So the patch adds a new test which checks that the precision isn't lost - a test that currently fails (xfail) on Alternator, but passes on DynamoDB. The pre-existing test for adding small integer now passes on Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:14:01 +03:00
Nadav Har'El	a5af962d80	alternator: support the list_append() function in UpdateExpression In the previous patch we added function-call support in the UpdateExpression parser. In this patch we add support for one such function - list_append(). This function takes two values, confirms they are lists, and concatenates them. After this patch only one function remains unimplemented: if_not_exists(). We also split the test we already had for list_append() into two tests: One uses only value references (":val") and passes after this patch. The second test also uses references to other attributes and will only work after we start supporting read-modify-write. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:13:07 +03:00
Nadav Har'El	9d2eba1c75	alternator: parse more types of values in UpdateExpression Until this patch, in update expressions like "SET a = :val", we only allowed the right-hand-side of the assignment to be a reference to a value stored in the request - like ":val" in the above example. But DynamoDB also allows the value to be an attribute path (e.g., "a.b[3].c", and can also be a function of a bunch of other values. This patch adds supports for parsing all these value types. This patch only adds the correct parsing of these additional types of values, but they are still not supported: reading existing attributes (i.e., read-modify-write operations) is still not supported, and none of the two functions which UpdateExpression needs to support are supported yet. Nevertheless, the parsing is now correct, and the the "unknown_function" test starts to pass. Note that DynamoDB allows the right-hand side of an assignment to be not only a single value, but also value+value and value-value. This possibility is not yet supported by the parser and will be added later. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:12:06 +03:00
Piotr Sarna	cb50207c7b	alternator-test: add initial filtering test for scans Currently the only supported case is equality on non-key attributes. More complex filtering tests are also included in test_query.py.	2019-09-11 15:12:05 +03:00
Piotr Sarna	b5eb3aed10	alternator-test: add initial filtering test for query The test cases verify that equality-based filtering on non-key attributes works fine. It also contains test stubs for key filtering and non-equality attribute filtering.	2019-09-11 15:12:04 +03:00
Piotr Sarna	319e946d8f	alternator-test: diversify attribute values in filled test table Filled test table used to have identical non-key attributes for all rows. These values are now diversified in order to allow writing filtering test cases.	2019-09-11 15:12:03 +03:00
Piotr Sarna	e4516617eb	alternator: add filtering to Query Query requests now accept QueryFilter parameter.	2019-09-11 15:11:10 +03:00
Piotr Sarna	4ea02bec89	alternator: enable filtering for Scan Scans can now accept ScanFilter parameter to perform filtering on returned rows.	2019-09-11 15:10:12 +03:00
Piotr Sarna	8cb078f757	alternator: add initial filtering implementation Filtering is currently only implemented for the equality operator on non-key attributes. Next steps (TODO) involve: 1. Implementing filtering for key restrictions 2. Implementing non-key attribute filtering for operators other than EQ. It, in turn, may involve introducing 'map value restrictions' notion to Scylla, since now it only allows equality restrictions on map values (alternator attributes are currently kept in a CQL map). 3. Implementing FilterExpression in addition to deprecated QueryFilter	2019-09-11 15:08:50 +03:00
Nadav Har'El	aa94e7e680	alternator: clean up parsing of attribute-path components Before this patch, we read either an attribute name like "name" or a reference to one "#name", as one type of token - NAME. However, while attribute paths indeed can use either one, in some other contexts - such as a function name - only "name" is allowed, so we need to distinguish between two types of tokens: NAME and NAMEREF. While separating those, I noticed that we incorrectly allowed a "#" followed by zero alphanumeric characters to be considered a NAMEREF, which it shouldn't. In other words, NAMEREF should have ALNUM+, not ALNUM*. Same for VALREF, which can't be just a ":" with nothing after it. So this patch fixes these mistakes, and adds tests for them. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:08:36 +03:00
Nadav Har'El	13476c8202	alternator: complain about unused values or names in UpdateExpression DynamoDB complains, and fails an update, if the update contains in ExpressionAttributeNames or ExpressionAttributeValues names which aren't used by the expression. Let's do the same, although sadly this means more work to track which of the references we've seen and which we haven't. This patch makes two previously xfail (expected fail) tests become successful tests on Alternator (they always succeeded against DynamoDB). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:07:35 +03:00
Nadav Har'El	c4fc02082b	alternator-test: complete test for UpdateItem's UpdateExpression The existing tests in test_update_expression.py thoroughly tested the UpdateExpression features which we currently support. But tests for features which Alternator doesn't yet support were partial. In this patch, we add a large number of new tests to test_update_expression.py aiming to cover ALL the features of UpdateExpression, regardless of whether we already support it in Alternator or not. Every single feature and esoteric edge-case I could discover is covered in these tests - and as far as I know these tests now cover the entire UpdateExpression feature. All the tests succeed on DynamoDB, and confirm our understanding of what DynamoDB actually does on all these cases. After this patch, test_update_expression.py is a whopper, with 752 lines of code and 37 separate test functions. 23 out of these 37 tests are still "xfail" - they succeed on DynamoDB but fail on Alternator, because of several features we are still missing. Those missing features include direct updates of nested attributes, read-modify-write updates (e.g., "SET a=b" or "SET a=a+1"), functions (e.g., "SET a = list_append(a, :val)"), the ADD and DELETE operations on sets, and various other small missing pieces. The benefit of this whopper test is two-fold: First, it will allow us to test our implementation as we continue to fill it (i.e., "test- driven development"). Second, all these tested edge cases basically "reverse engineer" how DynamoDB's expression parser is supposed to work, and we will need this knowledge to implement the still-missing features of UpdateExpression. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:07:34 +03:00
Nadav Har'El	ede5943401	alternator-test: test for UpdateItem's UpdateExpression This patch adds an extensive array of tests for UpdateItem's UpdateExpression support, which was introduced in the previous patch. The tests include verification of various edge cases of the parser, support for ":value" and "#name" references, functioning SET and REMOVE operations, combinations of multiple such operations, and much more. As usual, all these tests were ran and succeed on DynamoDB, as well as on Alternator - to confirm Alternator behaves the same as DynamoDB. There are two tests marked "xfail" (expected to fail), because Alternator still doesn't support the attribute copy syntax (e.g., "SET a = b", doing a read-before-write). There are some additional areas which we don't support - such as the DELETE and ADD operations or SET with functions - but those areas aren't yet test in these tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:07:33 +03:00
Nadav Har'El	4baa0d3b67	alternator: enable support for UpdateItem's UpdateExpression For the UpdateItem operation, so far we supported updates via the AttributeUpdates parameter, specifying which attributes to set or remove and how. But this parameter is considered deprecated, and DynamoDB supports a more elaborate way to modify attributes, via an "UpdateExpression". In the previous patch we added a function to parse such an UpdateExpression, and in this patch we use the result of this parsing to actually perform the required updates. UpdateExpression is only partially supported after this patch. The basic "SET" and "REMOVE" operations are supported, but various other cases aren't fully supported and will be fixed in followup patches. The following patch will add extensive tests to confirm exactly what works correctly with the new UpdateExpression support. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:34 +03:00
Nadav Har'El	829bafd181	alternator: add expression parsers The DynamoDB protocol is based on JSON, and most DynamoDB requests describe the operation and its parameters via JSON objects such as maps and lists. However, in some types of requests an "expression" is passed as a single string, and we need to parse this string. These cases include: 1. Attribute paths, such as "a[3].b.c", are used in projection expressions as well as inside other expressions described below. 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f", used in conditional updates, filters, and other places. 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d" This patch introduces the framework to parse these expressions, and an implementation of parsing update expressions. These update expressions will be used in the UpdateItem operation in the next patch. All these expression syntaxes are very simple: Most of them could be parsed as regular expressions, or at most a simple hand-written lexical analyzer and recursive-descent parser. Nevertheless, we decided to specify these parsers in the same ANTLR3 language already used in the Scylla project for parsing CQL, hopefully making these parsers easier to reason about, and easier to change if needed - and reducing the amount of boiler- plate code. The parsing of update expressions is most complete except that in SET actions, only the "path = value" form is supported and not yet forms forms such as "path1 = path2" (which does read-before-write) or "path1 = path1 + value" or "path = function(...)". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:12 +03:00
Nadav Har'El	f0f50607a7	alternator-test: split nested-document tests to new file We need to write more tests for various case of handling nested documents and nested attributes. Let's collect them all in the same test file. This patch mostly moves existing code, but also adds one small test, test_nested_document_attribute_write, which just writes a nested document and reads it back (it's mostly covered by the existing test_put_and_get_attribute_types, but is specifically about a nested document). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:11 +03:00
Nadav Har'El	12abe8e797	alternator-test: make local test the default We usually run Alternator tests against the local Alternator - testing against AWS DynamoDB is rarer, and usually just done when writing the test. So let's make "pytest" without parameters default to testing locally. To test against AWS, use "pytest --aws" explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:10 +03:00
Piotr Sarna	b67f22bfc6	alternator: move related functions to serialization.cc Existing functions related to serialization and deserialization are moved to serialization.cc source file. Message-Id: <fb49a08b05fdfcf7473e6a7f0ac53f6eaedc0144.1559646761.git.sarna@scylladb.com>	2019-09-11 15:06:05 +03:00
Piotr Sarna	fdba9866fc	alternator: apply new serialization to reads and writes Attributes for reads (GetItem, Query, Scan, ...) and writes (PutItem, UpdateItem, ...) are now serialized and deserialized in binary form instead of raw JSON, provided that their type is S, B, BOOL or N. Optimized serialization for the rest of the types will be introduced as follow-ups. Message-Id: <6aa9979d5db22ac42be0a835f8ed2931dae208c1.1559646761.git.sarna@scylladb.com>	2019-09-11 15:02:21 +03:00
Piotr Sarna	b3fd4b5660	alternator: add simple attribute serialization routines Attributes used to be written into the database in raw JSON format, which is far from optimal. This patch introduces more robust serializationi routines for simple alternator types: S, B, BOOL, N. Serialization uses the first byte to encode attribute type and follows with serializing data in binary form. More complex types (sets, lists, etc.) are currently still serialized in raw JSON and will be optimized in follow-up patches. Message-Id: <10955606455bbe9165affb8ac8fba4d9e7c3705f.1559646761.git.sarna@scylladb.com>	2019-09-11 15:01:07 +03:00
Piotr Sarna	27f00d1693	alternator: move error class to a separate header Error class definitions were previously in server.hh, but they are separate entities - future .cc files can use the errors without the need of including server definitions. Message-Id: <b5689e0f4c9f9183161eafff718f45dd8a61b653.1559646761.git.sarna@scylladb.com>	2019-09-11 14:52:58 +03:00
Nadav Har'El	52810d1103	configure.py: move alternator source files to separate list For some unknown reason we put the list of alternator source files in configure.py inside the "api" list. Let's move it into a separate list. We could have just put it in the scylla_core list, but that would cause frequent and annoying patch conflicts when people add alternator source files and Scylla core source files concurrently. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:52:39 +03:00
Nadav Har'El	d4b3c493ad	alternator: stub support for UpdateItem with UpdateExpression So far for UpdateItem we only supported the old-style AttributeUpdates parameter, not the newer UpdateExpression. This patch begins the path to supporting UpdateExpression. First, trying to use both parameters should result in an error, and this patch does this (and tests this). Second, passing neither parameters is allowed, and should result in an empty item being created. Finally, since today we do not yet support UpdateExpression, this patch will cause UpdateItem to fail if UpdateExpression is used, instead of silently being ignored as we did so far. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:51:40 +03:00
Nadav Har'El	04856a81f5	alternator-tests: two simple test for nested documents This patch adds two simple tests for nested documents, which pass: test_nested_document_attribute_overwrite() tests what happens when we UpdateItem a top-level attribute to a dictionary. We already tested this works on an empty item in a previous test, but now we check what happens when the attribute already existed, and already was a dictionary, and now we update it to a new dictionary. In the test attribute a was {b:3, c:4} and now we update it to {c:5}. The test verifies that the new dictionary completely replaces the old one - the two are not merged. The new value of the attribute is just {c:5}, not {b:3, c:5}. The second test verifies that the AttributeUpdates parameter of UpdateItem cannot be used to update a just a nested attributes. Any dots in the attribute name are considered an actual dot - not part of a path of attribute names. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:51:39 +03:00
Nadav Har'El	b782d1ef8d	alternator-test: test_query.py: change item list comparison Comparing two lists of items without regard for order is not trivial. For this reason some tests in test_query.py only compare arrays of sort keys, and those tests are fine. But other tests used a trick of converting a list of items into a of set_of_frozen_elements() and compare this sets. This trick is almost correct, but it can miss cases where items repeat. So in this patch, we replace the set_of_frozen_elements() approach by a similar one using a multiset (set with repetitions) instead of a set. A multiset in Python is "collections.Counter". This is the same approach we started to also used in test_scan.py in a recent patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:51:38 +03:00
Nadav Har'El	15f47a351e	alternator: remove unused code Remove the incomplete and unused function to convert DynamoDB type names to ScyllaDB type objects: DynamoDB has a different set of types relevant for keys and for attributes. We already have a separate function, parse_key_type(), for parsing key types, and for attributes - we don't currently parse the type names at all (we just save them as JSON strings), so the function we removed here wasn't used, and was in fact #if'ed out. It was never completed, and it now started to decay (the type for numbers is wrong), so we're better off completely removing it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:50:44 +03:00
Nadav Har'El	b63bd037ea	alternator: implement correct "number" type for keys This patch implements a fully working number type for keys, and now Alternator fully and correctly supports every key type - strings, byte arrays, and numbers. The patch also adds a test which verifies that Scylla correctly sorts number sort keys, and also correctly retrieves them to the full precision guaranteed by DynamoDB (38 decimal digits). The implementation uses Scylla's "decimal" type, which supports arbitrary precision decimal floating point, and in particular supports the precision specified by DynamoDB. However, "decimal" is actually over-qualified for this use, so might not be optimal for the more specific requirements of DynamoDB. So a FIXME is left to optimize this case in the future. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:47 +03:00
Nadav Har'El	cb1b2b1fc2	alternator-test: test_scan.py: change item list comparison Comparing two lists of items without regard for order is not trivial. test_scan.py currently has two ways of doing this, both unsatisfactory: 1. We convert each list to a set via set_of_frozen_elements(), and compare the sets. But this comparison can miss cases where items repeat. 2. We use sorted() on the list. This doesn't work on Python 3 because it removed the ability to compare (with "<") dictionaries. So in this patch, we replace both by a new approach, similar to the first one except we use a multiset (set with repetitions) instead of a set. A multiset in Python is "collections.Counter". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:46 +03:00
Nadav Har'El	4a1b6bf728	alternator-test: drop "test_2_tables" fixture Creating and deleting tables is the slowest part of our tests, so we should lower the number of tables our tests create. We had a "test_2_tables" fixture as a way to create two tables, but since our tests already create other tables for testing different key types, it's faster to reuse those tables - instead of creating two more unused tables. On my system, a "pytest --local", running all 38 tests locally, drops from 25 seconds to 20 seconds. As a bonus, we also have one fewer fixture ;-) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:45 +03:00
Nadav Har'El	013fb1ae38	alternator-text: fix errors in len/length variable name Also change "xrage" to "range" to appease Python 3 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:44 +03:00
Nadav Har'El	30a123d8ad	DynamoDB limits the size of hash keys to 2048 bytes, sort keys to 1024 bytes, and the entire item to 400 KB which therefore also limits the size of one attribute. This test checks that we can reach up to these limits, with binary keys and attributes. The test does not check what happens once we exceed these limits. In such a case, DynamoDB throws an error (I checked that manually) but Alternator currently simply succeeds. If in the future we decide to add artificial limits to Alternator as well, we should add such tests as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:43 +03:00
Nadav Har'El	b91eca28bd	alternator-test: don't use "len" as a parameter name "len" is an unfortunate choice for a variable name, in case one day the implementation may want to call the built-in "len" function. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:42 +03:00
Nadav Har'El	e21e0e6a37	alternator-test: test sort-key ordering - for both string and binary keys We already have a test for string sort-key ordering of items returned by the Scan operation, and this test adds a similar test for the Query operation. We verify that items are retrieved in the desired sorted order (sorted by the aptly-named sort key) and not in creation order or any other wrong order. But beyond just checking that Query works as expected (it should, given it uses the same machinary as Scan), the nice thing about this test is that it doesn't create a new table - it uses a shared table and creates one random partition inside it. This makes this test faster and easier to write (no need for a new fixture), and most importantly - easily allows us to write similar tests for other key types. So this patch also tests the correct ordering of binary sort keys. It helped exposed bugs in previous versions of the binary key implementation. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:41 +03:00
Nadav Har'El	1d058cf753	alternator-test: test item operations with binary keys Simple tests for item operations (PutItem, GetItem) with binary key instead of string for the hash and sort keys. We need to be able to store such keys, and then retrieve them correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:40 +03:00
Nadav Har'El	4bfd5d7ed1	alternator: add support for bytes as key columns Until now we only supported string for key columns (hash or sort key). This patch adds support for the bytes type (a.k.a binary or blob) as well. The last missing type to be supported in keys is the number type. Note that in JSON, bytes values are represented with base64 encoding, so we need to decode them before storing the decoded value, and re-encode when the user retrieves the value. The decoding is important not just for saving storage space (the encoding is 4/3 the size of the decoded) but also for correct sorting of the binary keys. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:35 +03:00
Nadav Har'El	57b46a92d7	alternator: add base64 encoding and decoding functions The DynamoDB API uses base64 encoding to encode binary blobs as JSON strings. So we need functions to do these conversions. This code was "inspired" by https://github.com/ReneNyffenegger/cpp-base64 but doesn't actually copy code from it. I didn't write any specific unit tests for this code, but it will be exercised and tested in a following patch which tests Alternator's use of these functions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:46:13 +03:00
Piotr Sarna	0980fde9d5	alternator-test: add dedicated BEGINS_WITH case to Query BEGINS_WITH behaves in a special way when a key postfix consists of <255> bytes. The initial test does not use that and instead checks UTF-8 characters, but once bytes type is implemented for keys, it should also test specifically for corner cases, like strings that consist of <255> byte only. Message-Id: <fe10d7addc1c9d095f7a06f908701bb2990ce6fe.1558603189.git.sarna@scylladb.com>	2019-09-11 14:46:12 +03:00
Piotr Sarna	5bc7bb00e0	alternator-test: rename test_query_with_paginator Paginator is an implementation detail and does not belong in the name, and thus the test is renamed to test_query_basic_restrictions. Message-Id: <849bc9d210d0faee4bb8479306654f2a59e18517.1558524028.git.sarna@scylladb.com>	2019-09-11 14:46:11 +03:00
Piotr Sarna	9e2ecf5188	alternator: fix string increment for BEGINS_WITH BEGINS_WITH statement increments a string in order to compute the upper bound for a clustering range of a query. Unfortunately, previous implementation was not correct, as it appended a <0> byte if the last character was <255>, instead of incrementing a last-but-one character. If the string contains <255> bytes only, the upper bound of the returned upper bound is infinite. Message-Id: <3a569f08f61fca66cc4f5d9e09a7188f6daad578.1558524028.git.sarna@scylladb.com>	2019-09-11 14:45:17 +03:00
Nadav Har'El	7b9180cd99	alternator: common get_read_consistency() function We had several places in the code that need to parse the ConsistentRead flag in the request. Let's add a function that does this, and while at it, checks for more error cases and also returns LOCAL_QUORUM and LOCAL_ONE instead of QUORUM and ONE. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:44:24 +03:00
Nadav Har'El	56907bf6c6	alternator: for writes, use LOCAL_QUORUM instead of QUORUM As Shlomi suggested in the past, it is more likely that when we eventually support global tables, we will use LOCAL_QUORUM, not QUORUM. So let's switch to that now. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:44:20 +03:00
Nadav Har'El	8c347cc786	alternator-test: verify that table with only hash key also works So far, all of the tests in test_item.py (for PutItem, GetItem, UpdateItem), were arbitrarily done on a test table with both hash key and sort key (both with string type). While this covers most of the code paths, we still need to verify that the case where there is not a sort key, also works fine. E.g., maybe we have a bug where a missing clustering key is handled incorrectly or an error is incorrectly reported in that case? But in this patch we add tests for the hash-key-only case, and see that it already works correctly. No bug :-) We add a new fixture test_table_s for creating a test table with just a single string key. Later we'll probably add more of these test tables for additional key types. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:41:16 +03:00
Nadav Har'El	c53b2ebe4d	alternator-test: also test for missing part of key Another type of key type error can be to forget part of the key (the hash or sort key). Let's test that too (it already works correctly, no need to patch the code). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:41:15 +03:00
Nadav Har'El	f58abb76d6	alternator: gracefully handle wrong key types When a table has a hash key or sort key of a certain type (this can be string, bytes, or number), one cannot try to choose an item using values of different types. We previously did not handle this case gracefully, and PutItem handled it particularly bad - writing malformed data to the sstable and basically hanging Scylla. In this patch we fix the pk_from_json() and ck_from_json() functions to verify the expected type, and fail gracefully if the user sent the wrong type. This patch also adds tests for these failures, for the GetItem, PutItem, and UpdateItem operations. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:40:23 +03:00
Nadav Har'El	9ee912d5cf	alternator: correct handling of missing item in GetItem According to the documentation, trying to GetItem a non-existant item should result in an empty response - NOT a response with an empty "Item" map as we do before this patch. This patch fixes this case, and adds a test case for it. As usual, we verify that the test case also works on Amazon DynamoDB, to verify DynamoDB really behaves the way we thik it does. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:39:32 +03:00
Nadav Har'El	7f73f561d5	alternator: fix support for empty items If an empty item (i.e., no attributes except the key) is created, or an item becomes empty (by deleting its existing attributes), the empty item must be maintained - it cannot just disappear. To do this in Scylla, we must add a row marker - otherwise an empty attribute map is not enough to keep the row alive. This patch includes 4 test cases for all the various ways an empty item can be created empty or non-empty item be emptied, and verifies that the empty item can be correctly retrieved (as usual, to verify that our expectation of "correctness" is indeed correct, we run the same tests against DynamoDB). All these 4 tests failed before this patch, and now succeed. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:38:40 +03:00
Nadav Har'El	95ed2f7de8	alternator: remove two unused lines of code These lines of codes were superfluous and their result unused: the make_item_mutation() function finds the pk and ck on its own. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:37:49 +03:00
Nadav Har'El	eb81b31132	alternator: add statistics his patch adds a statistics framework to Alternator: Executor has (for each shard) a _stats object which contains counters for various events, and also is in charge of making these counters visible via Scylla's regular metrics API (http://localhost:9180/metrics). This patch includes a counter for each of DynamoDB's operation types, and we increase the ones we support when handled. We also added counters for total operations and unsupported operations (operation types we don't yet handle). In the future we can easily add many more counters: Define the counter in stats.hh, export it in stats.cc, and increment it in where relevant in executor.cc (or server.cc). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:36:26 +03:00
Piotr Sarna	d267e914ad	alternator-test: add initial Query test The test covers simple restrictions on primary keys. Message-Id: <2a7119d380a9f8572210571c565feb8168d43001.1558356119.git.sarna@scylladb.com>	2019-09-11 14:36:25 +03:00
Piotr Sarna	b309c9d54b	alternator: implement basic Query The implementation covers the following restrictions - equality for hash key; - equality, <, <=, >, >=, between, begins_with for sort key. Message-Id: <021989f6d0803674cbd727f9b8b3815433ceeea5.1558356119.git.sarna@scylladb.com>	2019-09-11 14:36:16 +03:00
Piotr Sarna	8571046d3e	alternator: move do_query to separate function A fair portion of code from scan() will be used later to implement query(), so it's extracted as a helper function. Message-Id: <d3bc163a1cb2032402768fcbc6a447192fba52a4.1558356119.git.sarna@scylladb.com>	2019-09-11 14:31:31 +03:00
Nadav Har'El	4a8b2c794d	alternator-test: another edge case for Scan with AttributesToGet Ask to retrieve only an attribute name which none of the items have. The result should be a silly list of empty items, and indeed it is. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:31:30 +03:00
Nadav Har'El	c766d1153d	alternator-test: shorten test_scan.py by reusing full_scan more Use full_scan() in another test instead of open-coding the scan. There are two more tests that could have used full_scan(), but since they seem to be specifically adding more assertions or using a different API ("paginators"), I decided to leave them as-is. But new tests should use full_scan(). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:31:29 +03:00
Nadav Har'El	2666b29c77	alternator-test: test AttributesToGet parameter in Scan request This is a short, but extensive, test to the AttributesToGet parameter to Scan, allowing to select for output only some of the attributes. The AttributesToGet feature has several non-obvious features. Firstly, it doesn't require that any key attributes be selected. So since each item may have different non-key attributes, some scanned items may be missing some of the selected columns, and some of the items may even be missing all the selected columns - in which case DynamoDB returns an empty item (and doesn't entirely skip this item). This test covers all these cases, and it adds yet another item to the 'filled_test_table' fixture, one which has different attributes, so we can see these issues. As usual, this test passes in both DynamoDB and Alternator, to assure we correspond to the right behavior, not just what we think is right. This test actually exposed a bug in the way our code returned empty items (items which had none of the selected columns), a bug which was fixed by the previous patch. Instead of having yet another copy of table-scanning code, this patch adds a utility function full_scan(), to scan an entire table (with optional extra parameters for the scan) and return the result as an array. We should simply existing tests in test_scan.py by using this new function. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:31:28 +03:00
Avi Kivity	446faba49c	Merge "dbuild: add --image option, help, and usage" from Benny * tag 'dbuild-image-help-usage-v1' of github.com:bhalevy/scylla: dbuild: add usage dbuild: add help option dbuild: list available images when no image arg is given dbuild: add --image option	2019-09-11 14:30:45 +03:00
Nadav Har'El	f871a4bc87	alternator: fix bug in returning an empty item in a Scan When a Scan selects only certain attributes, and none of the key attributes are selected, for some of the scanned items nothing will remain to be output, but still Dynamo outputs an empty item in this case. Our code had a bug where after each item we "moved" the object leaving behind a null object, not an empty map, so a completely empty item wasn't output as an empty map as expected, and resulted in boto3 failing to parse the response. This simple one-line patch fixes the bug, by resetting the item to an empty map after moving it out. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:30:37 +03:00
Piotr Sarna	8525b14271	alternator: add lookup table for requests Instead of using a really long if-else chain, requests are now looked up via a routing table. Message-Id: <746a34b754c3070aa9cbeaf98a6e7c6781aaee65.1557914794.git.sarna@scylladb.com>	2019-09-11 14:29:59 +03:00
Piotr Sarna	f3440f2e4a	alternator-test: migrate filled_test_table to use batches Filled test table fixture now takes advantage of batch writes in order to run faster. Message-Id: <e299cdffa9131d36465481ca3246199502d65e0c.1557914382.git.sarna@scylladb.com>	2019-09-11 14:29:58 +03:00
Piotr Sarna	4c3bdd3021	alternator-test: add batch writing test case Message-Id: <a950799dd6d31db429353d9220b63aa96676a7a7.1557914382.git.sarna@scylladb.com>	2019-09-11 14:29:57 +03:00
Piotr Sarna	c0ecd1a334	alternator: add basic BatchWriteItem The initial implementation only supports PutRequest requests, without serving DeleteRequest properly. Message-Id: <451bcbed61f7eb2307ff5722de33c2e883563643.1557914382.git.sarna@scylladb.com>	2019-09-11 14:29:50 +03:00
Nadav Har'El	9a0c13913d	alternator: improve where DescribeEndpoints gets its information Instead of blindly returning "localhost:8000" in response to DescribeEndpoints and for sure causing us problems in the future, the right thing to do is to return the same domain name which the user originally used to get to us, be it "localhost:8000" or "some.domain.name:1234". But how can we know what this domain name was? Easy - this is why HTTP 1.1 added a mandatory "Host:" header, and the DynamoDB driver I tested (boto3) adds it as expected, indeed with the expected value of "localhost:8000" on my local setup. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:25:22 +03:00
Nadav Har'El	a4a3b2fe43	alternator-test: test for sort order of items in a single partition Although different partitions are returned by a Scan in (seemingly) random order, items in a single partition need to be returned sorted by their sort key. This adds a test to verify this. This patch adds to the filled_test_table fixture, which until now had just one item in each partition, another partition (with the key "long") with 164 additional items. The test_scan_sort_order_string test then scans this table, and verifies that the items are really returned in sorted order. The sort order is, of course, string order. So we have the first item with sort key "1", then "10", then "100", then "101", "102", etc. When we implement numeric keys we'll need to add a version of this test which uses a numeric clustering key and verifies the sort order is numeric. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:25:21 +03:00
Nadav Har'El	32c388b48c	alternator: fix clustering key setup Because of a typo, we incorrectly set the table's sort key as a second partition key column instead of a clustering key column. This has bad but subtle consequences - such as that the items are not sorted according to the sort key. So in this patch we fix the typo. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:24:30 +03:00
Nadav Har'El	29e0f68ee0	alternator: add initial implementation of DescribeEndpoints DescribeEndpoints is not a very important API (and by default, clients don't use it) but I wanted to understand how DynamoDB responds to it, and what better way than to write a test :-) And then, if we already have a test, let's implement this request in Scylla as well. This is a silly implementation, which always returns "localhost:8000". In the future, this will need to be configurable - we're not supposed here to return this server's IP address, but rather a domain name which can be used to get to all servers. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:22:47 +03:00
Avi Kivity	211b0d3eb4	Merge "sstables, gdb: Centralize tracking of sstable instances" from Tomasz " Currently, GDB scripts locate sstables by scanning the heap for bag_sstable_set containers. That has disadvatanges: - not all containers are considered - it's extremely slow on large heaps - fragile, new containers can be added, and we won't even know This series fixes all above by adding a per-shard sstable tracker which tracks sstable objects in a linked-list. " * 'sstable-tracker' of github.com:tgrabiec/scylla: gdb: Use sstable tracker to get the list of sstables gdb: Make intrusive_list recognize member_hook links sstables: Track whether sstable was already open or not sstables: Track all instances of sstable objects sstables: Make sstable object not movable sstables: Move constructor out of line	2019-09-11 14:22:41 +03:00
Nadav Har'El	982b5e60e7	alternator: unify and improve TableName field handling Most of the request types need to a TableName parameter, specifying the name of the table they operate on. There's a lot of boilerplate code required to get this table name and verify that it is valid (the parameter exists, is a string, passes DynamoDB's naming rules, and the table actually exists), which resulted in a lot of code duplication - and in some cases missing checks. So this patch introduces two utility functions, get_table_name() and get_table(), to fetch a table name or the schema of an existing table, from the request, with all necessary validation. If validation fails, the appropriate api_error() is thrown so the user gets the right error message. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:21:53 +03:00
Nadav Har'El	b8fc783171	alternator-test: clean up conftest.py Remove unused random-string code from conftest.py, and also add a TODO comment how we should speed up filled_test_table fixture by using a batch write - when that becomes available in Alternator. (right now this fixture takes almost 4 seconds to prepare on a local Alternator, and a whopping 3 minutes (!) to prepare on DynamoDB). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:21:52 +03:00
Piotr Sarna	a4387079ac	alternator-test: add initial scan test Message-Id: <c28ff1d38930527b299fe34e9295ecd25607398c.1557757402.git.sarna@scylladb.com>	2019-09-11 14:21:51 +03:00
Piotr Sarna	b6d148c9e0	alternator-test: add filled test table fixture The fixture creates a test table and fills it with random data, which can be later used for testing reads. Message-Id: <649a8b8928e1899c5cbd82d65d745a464c1163c8.1557757402.git.sarna@scylladb.com>	2019-09-11 14:21:50 +03:00
Piotr Sarna	4def674731	alternator: implement basic scan The most basic version of Scan request is implemented. It still contains a list of TODOs, among which the support for Segments parameter for scan parallelism. Message-Id: <5d1bfc086dbbe64b3674b0053e58a0439e64909b.1557757402.git.sarna@scylladb.com>	2019-09-11 14:21:39 +03:00
Piotr Sarna	0ce3866fb5	alternator: lower debug messages verbosity in the HTTP server The HTTP server still uses WARN log level to log debug messages, which is way higher than necessary. These messages are degraded to TRACE level. Message-Id: <59559277f2548d4046001bebff45ab2d3b7063b5.1557744617.git.sarna@scylladb.com>	2019-09-11 14:12:40 +03:00
Nadav Har'El	d45220fb39	alternator-test: simplify test_put_and_get_attribute_types The test test_put_and_get_attribute_types needlessly named all the different attributes and their variables, causing a lot of repetition and chance for mistakes when adding additional attributes to the test. In this rewrite, we only have a list of items, and automatically build attributes with them as values (using sequential names for the attributes) and check we read back the same item (Python's dict equality operator checks the equality recursively, as expected). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:12:39 +03:00
Nadav Har'El	ea32841dab	alternator-test: test all attribute types Although we planned to initially support only string types, it turns out for the attributes (not the key), we actually support all types already, including all scalar types (string, number, bool, binary and null) and more complex types (list, nested document, and sets). This adds a tests which PutItem's these types and verifies that we can retrieve them. Note that this test deals with top-level attributes only. There is no attempt to modify only a nested attribute (and with the current code, it wouldn't work). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:12:38 +03:00
Nadav Har'El	c645538061	alternator-test: rewrite ListTables test In our tests, we cannot really assume that ListTables should returns only the tables we created for the test, or even that a page size of 100 will be enough to list our 3 pages. The issue is that on a shared DynamoDB, or in hypothetical cases where multiple tests are run in parallel, or previous tests had catestrophic errors and failed to clean up, we have no idea how many unrelated tables there are in the system. There may be hundreds of them. So every ListTables test will need to use paging. So in this re-implementation, we begin with a list_tables() utility function which calls ListTables multiple times to fetch all tables, and return the resulting list (we assume this list isn't so huge it becomes unreasonable to hold it in memory). We then use this utility function to fetch the table list with various page sizes, and check that the test tables we created are listed in the resulting list. There's no longer a separate test for "all" tables (really was a page of 100 tables) and smaller pages (1,2,3,4) - we now have just one test that does the page sizes 1,2,3,4, 50 and 100. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:12:37 +03:00
Piotr Sarna	6b83e17b74	alternator: add tests to ListTables command Test cases cover both listing appropriate table names and pagination. Message-Id: <e7d5f1e5cce10c86c47cdfb4d803149488935ec0.1557402320.git.sarna@scylladb.com>	2019-09-11 14:12:36 +03:00
Piotr Sarna	dfbf4ffe0f	alternator-test: add 2 tables fixture For some tests, more than 1 table is needed, so another fixture that provided two additional test tables is added. Message-Id: <75ae9de5cc1bca19594db1f0bc03260f83459380.1557402320.git.sarna@scylladb.com>	2019-09-11 14:12:35 +03:00
Piotr Sarna	b6dde25bcc	alternator: implement ListTables ListTables is used to extract all table names created so far. Message-Id: <04f4d804a40ff08a38125f36351e56d7426d2e3d.1557402320.git.sarna@scylladb.com>	2019-09-11 14:10:54 +03:00
Piotr Sarna	b73a9f3744	alternator: use trace level for debug messages In the early development stage, warn level was used for all debug messages, while it's more appropriate to use 'trace' or 'debug'. Message-Id: <419ca5a22bc356c6e47fce80b392403cefbee14d.1557402320.git.sarna@scylladb.com>	2019-09-11 14:10:02 +03:00
Nadav Har'El	4ed9aa4fb4	alternator-test: cleanup in conftest.py This patch cleans up some comments and reorganizes some functions in conftest.py, where the test_table fixture was defined. The goal is to later add additional types of test tables with different schemas (e.g., just a partition key, different key types, etc.) without too much code duplication. This patch doesn't change anything functional in the tests, and they still pass ("pytest --local" runs all tests against the local Alternator). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:10:01 +03:00
Nadav Har'El	5c564b7117	alternator: make ck_from_json() easier to use The ck_from_json() utility function is easier to use if it handles the no-clustering-key case as the callers need them too, instead of requiring them to handle the no-clustering-key case separately. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:09:06 +03:00
Nadav Har'El	3ae0066aae	alternator: add support for UpdateItem's DELETE operation So far we supported UpdateItem only with PUT operations - this patch adds support for DELETE operations, to delete specific attributes from an item. Only the case of a missing value is support. DynamoDB also provides the ability to pass the old value, and only perform the deletion if the value and/or its type is still up-to-date - but we don't support this yet and fail such request if it is attempted. This patch also includes a test for this case in alternator-test/ Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:08:57 +03:00
Nadav Har'El	81679d7401	alternator-test: add tests for UpdateItem Add initial tests for UpdateItem. Only the features currently supported by our code (only string attributes, only "PUT" action) are tested. As usual, this test (like all others) was tested to pass on both DynamoDB and Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:03:10 +03:00
Nadav Har'El	0c2a440f7f	alternator: add initial UpdateItem implementation Add an initial UpdateItem implementation. As PutItem and GetItem we are still limited to string attributes. This initial implementation of UpdateItem implements only the "PUT" action (not "DELETE" and certainly not "ADD") and not any of the more advanced options. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:03:00 +03:00
Piotr Sarna	686d1d9c3c	alternator: add attrs_column() helper function Message-Id: <d93ae70ccd27fe31d0bc6915a20d83d7a85342cf.1557223199.git.sarna@scylladb.com>	2019-09-11 13:08:52 +03:00
Piotr Sarna	6ad9b10317	alternator: make constant names more explicit KEYSPACE and ATTRS constants refer to their names, not objects, so they're named more explicitly. Message-Id: <14b1f00d625e041985efbc4cbde192bd447cbf03.1557223199.git.sarna@scylladb.com>	2019-09-11 13:07:14 +03:00
Piotr Sarna	2975ca668c	alternator: remove inaccessible return statement Message-Id: <afaef20e7e110fa23271fb8c3dc40cec0716efb6.1557223199.git.sarna@scylladb.com>	2019-09-11 13:06:21 +03:00
Piotr Sarna	6e8db5ac6a	alternator: inline keywords It was decided that all alternator-specific keywords can be inlined in code instead of defining them as constants. Message-Id: <6dffb9527cfab2a28b8b95ac0ad614c18027f679.1557223199.git.sarna@scylladb.com>	2019-09-11 13:04:38 +03:00
Nadav Har'El	50a69174b3	alternator: some cleanups in validate_table_name() Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:03:44 +03:00
Nadav Har'El	0e06d82a1f	alternator: clean up api_error() interface All operation-generated error messages should have the 400 HTTP error code. It's a real nag to have to type it every time. So make it the default. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:01:47 +03:00
Nadav Har'El	0634629a79	alternator-test: test for error on creating an already-existing table Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:01:46 +03:00
Nadav Har'El	6fe6cf0074	alternator: correct error when trying to CreateTable an existing table Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:00:54 +03:00
Nadav Har'El	871dd7b908	alternator: fix return object from PutItem Without special options, PutItem should return nothing (an empty JSON result). Previously we had trouble doing this, because instead of return an empty JSON result, we converted an empty string into JSON :-) So the existing code had an ugly workaround which worked, sort of, for the Python driver but not for the Java driver. The correct fix, in this patch, is to invent a new type json_string which is a string already in JSON and doesn't need further conversion, so we can use it to return the empty result. PutItem now works from YCSB's Java driver. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:00:47 +03:00
Nadav Har'El	ae1ee91f3c	alternator-test: more examples in README.md Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:56:07 +03:00
Nadav Har'El	886438784c	alternator-test: test table name limit of 222 bytes, instead of 255. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:56:06 +03:00
Nadav Har'El	28e7fa20ed	alternator: limit table names to 222 bytes Although we would like to allow table names up to 222 bytes, this is not currently possible because Scylla tacks additional 33 bytes to create a directory name, and directory names are limited to 255 bytes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:55:07 +03:00
Nadav Har'El	a702e5a727	alternator-test: verify appropriate error when invalid key type is used Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:55:06 +03:00
Nadav Har'El	8af58b0801	alternator: better key type parsing The supported key types are just S(tring), B(lob), or N(umber). Other types are valid for attributes, but not for keys, and should not be accepted. And wrong types used should result in the appropriate user-visible error. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:54:12 +03:00
Nadav Har'El	6cdcf5abac	alternator-test: additional cases of invalid schemas in CreateTable Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:54:11 +03:00
Nadav Har'El	9839183157	alternator: better invalid schema detection for CreateTable To be correct, CreateTable's input parsing need to work in reverse from what it did: First, the key columns are listed in KeySchema, and then each of these (and potetially more, e.g., from indexes) need to appear AttributeDefinitions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:53:22 +03:00
Nadav Har'El	8bfbc1bae5	alternator-test: tests for CreateTable with bad schema Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:53:21 +03:00
Benny Halevy	0f01a4c1b8	dbuild: add usage Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-11 12:53:02 +03:00
Benny Halevy	f43bffdf9c	dbuild: add help option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-11 12:52:50 +03:00
Nadav Har'El	dc34c92899	alternator: better error handling for schema errors in CreateTable Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:52:31 +03:00
Nadav Har'El	77de0af40f	alternator-test: test for PutItem to nonexistant table We expect to see the right error code, not some "internal error". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:52:30 +03:00
Nadav Har'El	ca3553c880	alternator: PutItem: appropriate error for a non-existant table Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:51:38 +03:00
Nadav Har'El	275a07cf10	alternator-test: add another column to test_basic_string_put_and_get() Just to make sure our success isn't limited to just a single non-key attribute, let's add another one. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:51:37 +03:00
Nadav Har'El	6ca72b5fed	alternator: GetItem should by default returns all the columns, not none The test pytest --local test_item.py::test_basic_string_put_and_get Now passes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:51:31 +03:00
Benny Halevy	c840c43fa7	dbuild: list available images when no image arg is given Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-11 12:51:26 +03:00
Nadav Har'El	9920143fb7	alternator: change empty return of PutItem Without any arguments, PutItem should return no data at all. But somehow, for reasons I don't understand, the boto3 driver gets confused from an empty JSON thinking it isn't JSON at all. If we return a structure with an empty "attributes" fields, boto3 is happy. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:49:20 +03:00
Nadav Har'El	8dec31d23b	alternator: add initial implementation of DeleteTable Add an initial implementation of Delete table, enough for making the pytest --local test_table.py::test_create_and_delete_table Pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:45:42 +03:00
Nadav Har'El	41d4b88e78	alternator: on unknown operation, return standard API error When given an unknown operation (we didn't implement yet many of them...) we should throw the appropriate api_error, not some random exception. This allows the client to understand the operation is not supported and stop retrying - instead of retrying thinking this was a weird internal error. For example the test pytest --local test_table.py::test_create_and_delete_table Now fails immediately, saying Unsupported operation DeleteTable. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:45:04 +03:00
Nadav Har'El	1b1921bc94	alternator: fix JSON in DescribeTable response The structure's name in DescribeTable's output is supposed to be called "Table", not "TableDescription". Putting in the wrong place caused the driver's table creation waiters to fail. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:44:14 +03:00
Nadav Har'El	6a455035ba	alternator: validate table name in CreateTable validate table name in CreateTable, and if it doesn't fit DynamoDB's requirement, return the appropriate error as drivers expect. With this patch, test_table.py::test_create_table_unsupported_names now passes (albeit with a one minute pause - this a bug with keep-alive support...). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:24 +03:00
Nadav Har'El	0da214c2fe	alternator-test: test_create_table_unsupported_names minor fix Check the expected error message to contain just ValidationException instead of an overly specific text message from DynamoDB, so we aren't so constraint in our own messages' wording. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:23 +03:00
Nadav Har'El	4f721a0637	alternator-test: test for creating table with very long name Dynamo allows tables names up to 255 characters, but when this is tested on Alternator, the results are disasterous: mkdir with such a long directory name fails, Scylla considers this an unrecoverable "I/O error", and exits the server. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:22 +03:00
Nadav Har'El	6967dd3d8f	test-table: test DescribeTable on non-existent table Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:21 +03:00
Nadav Har'El	d0cdc65b4c	Add "--local" option to run test against local Scylla installation For example "pytest --local test_item.py" Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:21 +03:00
Nadav Har'El	079c7c3737	test_item.py: basic string put and get test Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:20 +03:00
Nadav Har'El	4550f3024d	test_table fixture: be quicker to realize table was created. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:19 +03:00
Nadav Har'El	f1f76ed17b	test_table fixture: automatically delete Automatically delete the test table when the test ends. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:18 +03:00
Nadav Har'El	a946e255c6	test_item.py: start testing CRUD operations Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:17 +03:00
Nadav Har'El	4d7d871930	Start to use "test fixtures" Start to use "test fixtures" defined in conftest.py: The connection to the DynamoDB API, and also temporary tables, can be reused between multiple tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:16 +03:00
Nadav Har'El	6984ccf462	Add some table tests and README Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:15 +03:00
Nadav Har'El	f66ec337f7	alternator: very initial implementation of DescribeTable This initial implementation is enough to pass a test of getting a failure for a non-existant table - test_table.py::test_describe_table_non_existent_table and to recognize an existing table. But it's still missing a lot of fields for an existing table (among others, the schema). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:41:32 +03:00
Nadav Har'El	ad9eb0a003	alternator: errors should be output from server as Dynamo drivers expect Exceptions from the handlers need to be output in a certain way - as a JSON with specific fields - as DynamoDB drivers expect them to be. If a handler throws an alternator::api_error with these specific fields, they are output, but any other exception is converted into the same format as an "Internal Error". After this patch, executor code can throw an alternator::api_error and the client will receive this error in the right format. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:40:55 +03:00
Nadav Har'El	db49bc6141	alternator: add alternator::api_error exception type DynamoDB error messages are returned in JSON format and expect specific information: Some HTTP error code (often but not always 400), a string error "type" and a user-readable message. Code that wants to return user-visible exceptions should use this type, and in the next patch we will translate it to the appropriate JSON string. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:39:26 +03:00
Nadav Har'El	9d72bc3167	alternator: table creation time is in seconds The "Timestamp" type returned for CreationDateTime can be one of several things but if it is a number, it is supposed to be the time in seconds since the epoch - not in milliseconds. Returning milliseconds as we wrongly did causes boto3 (AWS's Python driver) to throw a parse exception on this response. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:38:41 +03:00
Nadav Har'El	c0518183c2	alternator: require alternator-port configuration Until now, we always opened the Alternator port along with Scylla's regular ports (CQL etc.). This should really be made optional. With this patch, by default Alternator does NOT start and does not open a port. Run Scylla with --alternator-port=8000 to open an Alternator API port on port 8000, as was the default until now. It's also possible to set this in scylla.yaml. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:38:31 +03:00
Piotr Sarna	2ec78164bc	alternator: add minimal HTTP interface The interface works on port 8000 by default and provides the most basic alternator operations - it's an incomplete set without validation, meant to allow testing as early as possible.	2019-09-11 12:34:18 +03:00
Benny Halevy	443e0275ab	dbuild: add --image option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-11 11:46:33 +03:00
Tomasz Grabiec	06154569d5	gdb: Use sstable tracker to get the list of sstables	2019-09-10 17:05:19 +02:00
Tomasz Grabiec	a141d30eca	gdb: Make intrusive_list recognize member_hook links GDB now gives "struct boost::intrusive::member_hook" from template_arguments()	2019-09-10 17:05:19 +02:00
Tomasz Grabiec	c014c79d4b	sstables: Track whether sstable was already open or not Some sstable objects correspond to sstables which are being written and are not sealed yet. Such sstables don't have all the fields filled-in. Tools which calculate statistics (like GDB scripts) need to distinguish such sstables.	2019-09-10 17:05:18 +02:00
Tomasz Grabiec	33bef82f6b	sstables: Track all instances of sstable objects Will make it easier to collect statistics about sstable in-memory metadata.	2019-09-10 17:05:16 +02:00
Tomasz Grabiec	fd74504e87	sstables: Make sstable object not movable Will be easier to add non-movable fields. We don't really need it to be movable, all instances should be managed by a shared pointer.	2019-09-10 17:04:54 +02:00
Tomasz Grabiec	589c7476e0	sstables: Move constructor out of line	2019-09-10 17:04:54 +02:00
Tomasz Grabiec	785fe281e7	gdb: scylla sstables: Print table name Message-Id: <1568121825-32008-1-git-send-email-tgrabiec@scylladb.com>	2019-09-10 16:36:21 +03:00
Glauber Costa	6651f96a70	sstables: do not keep sharding information from scylla metadata in memory (#4915 ) There is no reason to keep parts of the the Scylla Metadata component in memory after it is read, parsed, and its information fed into the SSTable. We have seen systems in which the Scylla metadata component is one of the heaviest memory users, more than the Summary and Filter. In particular, we use the token metadata, which is the largest part of the Scylla component, to calculate a single integer -> the shards that are responsible for this SSTable. Once we do that, we never use it again Tests: unit (release/debug), + manual scylla write load + reshard. Fixes #4951 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-09-09 22:28:51 +03:00
Tomasz Grabiec	a09479e63c	Merge "Validate position in partition monotonicity" from Benny Introduce mutation_fragment_stream_validator class and use it as a Filter to flat_mutation_reader::consume_in_thread from sstable::write_components to validate partition region and optionally clustering key monotonicity. Fixes #4803	2019-09-09 15:38:31 +02:00
Benny Halevy	42f6462837	config: enable_sstable_key_validation by default in debug build Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	34d306b982	config: add enable_sstable_key_validation option key monotonicity validation requires an overhead to store the last key and also to compare therefore provide an option to enable/disable it (disabled by default). Refs #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	507c99c011	mutation_fragment_stream_validator: add compare_keys flag Storing and comparing keys is expensive. Add a flag to enable/disable this feature (disabled by default). Without the flag, only the partition region monotonicity is validated, allowing repeated clustering rows, regardless of clustering keys. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	bc2ef1d409	mutation_fragment: declare partition_region operator<< in header file Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	496467d0a2	sstables: writer: Validate input mutation fragment stream Fixes #4803 Refs #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	a37acee68f	position_in_partition: define operator=(position_in_partition_view) The respective constructor is explicit. Define this assignment operator to be used by flat_mutation_reader mutation_fragment_stream_validator filter so that it can use mutation_fragment::position() verbatim and keep its internal state as position_in_partition. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	41b60b8bc5	compaction: s/filter_func/make_partition_filter/ It expresses the purpose of this function better as suggested by Tomasz Grabiec. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	24c7320575	dbuild: run interactive shell by default If not given any other args to run, just run an interactive shell. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190909113140.9130-1-bhalevy@scylladb.com>	2019-09-09 15:15:57 +03:00
Nadav Har'El	2543760ee6	docs/metrics.md: document additional "lables" Recently we started to use more the concept of metric labels - several metrics which share the same name, but differ in the value of some label such a "group" (for different scheduling groups). This patch documents this feature in docs/metrics.md, gives the example of scheduling groups, and explains a couple more relevant Promethueus syntax tricks. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190909113803.15383-1-nyh@scylladb.com>	2019-09-09 15:15:57 +03:00
Botond Dénes	59a96cd995	scylla-gdb.py: introduce scylla task-queues This command provides an overview of the reactors task queues. Example: id name shares tasks A 00 "main" 1000.00 4 01 "atexit" 1000.00 0 02 "streaming" 200.00 0 A 03 "compaction" 171.51 1 04 "mem_compaction" 1000.00 0 *A 05 "statement" 1000.00 2 06 "memtable" 8.02 0 07 "memtable_to_cache" 200.00 0 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190906060039.42301-1-bdenes@scylladb.com>	2019-09-09 15:15:57 +03:00
Avi Kivity	8e8975730d	Update seastar submoodule * seastar cb7026c16f...b3fb4aaab3 (10): > Revert "scheduling groups: Adding per scheduling group data support" > scheduling groups: Adding per scheduling group data support > rpc: check that two servers are not created with the same streaming id > future: really ignore exceptions in ignore_ready_future > iostream: Constify eof() function > apply.hh: add missing #include for size_t > scheduling_group_demo: add explicit yields since future::get() no longer does > Fix buffer size used when calling accept4() > future-util: reduce allocations and continuations in parallel_for_each > rpc: lz4_decompressor: Add a static constexpr variable decleration for Cpp14 compatibility	2019-09-09 15:15:34 +03:00
Gleb Natapov	9e9f64d90e	messaging_service: configure different streaming domain for each rpc server A streaming domain identifies a server across shards. Each server should have different one. Fixes: #4953 Message-Id: <20190908085327.GR21540@scylladb.com>	2019-09-08 14:05:40 +03:00
Piotr Sarna	01410c9770	transport: make sure returning connection errors happens inside the gate. Previously, the gate could get closed too early, which would result in shutting down the server before it had an opportunity to respond to the client. Refs #4818	2019-09-08 13:23:20 +03:00
Avi Kivity	5663218fac	Merge "types: Fix decimal to integer and varint to integer conversion" from Rafael " The release notes for boost 1.67.0 includes: Breaking Change: When converting a multiprecision integer to a narrower type, if the value is too large (or negative) to fit in the smaller type, then the result is either the maximum (or minimum) value of the target Since we just moved out of boost 1.66, we have to update our code. This fixes issue #4960 " * 'espindola/fix-4960' of https://github.com/espindola/scylla: types: fix varint to integer conversion types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer types: fix decimal to integer conversion types: extract helper for converting a decimal to a cppint types: rename and detemplate make_castas_fctn_from_decimal_to_integer	2019-09-08 10:45:42 +03:00
Avi Kivity	244218e483	Merge "simplify date type" from Rafael " With this patch series one has to be explicit to create a date_type_impl and now there is only the one documented difference between date_type_impl and timestamp_type_impl. " * 'espindola/simplify-date-type' of https://github.com/espindola/scylla: types: Reduce duplication around date_type_impl types: Don't use date_type_native_type when we want a timestamp types: Remove timestamp_native_type types: Don't specialize data_type_for for db_clock::time_point types: Make it harder to create date_type	2019-09-08 10:21:48 +03:00
Rafael Ávila de Espíndola	3bac4ebac7	types: Reduce duplication around date_type_impl According to the comments, the only different between date_type_impl and timestamp_type_impl is the comparison function. This patch makes that explicit by merging all code paths except: * The warning when converting between the two * The compare function The date_type_impl type can still be user visible via very old sstables or via the thrift protocol. It is not clear if we still need to support either, but with this patch it is easy to do so. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola	36d40b4858	types: Don't use date_type_native_type when we want a timestamp In these cases it is pretty clear that the original code wanted to create a timestamp_type data_value but was creating a date_type one because of the old defaults. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola	01cd21c04d	types: Remove timestamp_native_type Now that we know that anything expecting a date_type has been converted to date_type_native_type, switch to using db_clock::time_point when we want a timestamp_type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola	df6c2d1230	types: Don't specialize data_type_for for db_clock::time_point This also moves every user to date_type_native_type. A followup patch will convert to timestamp_type when appropriate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola	e09fa2dcff	types: Make it harder to create date_type date_type was replaced with timestamp_type, but it was very easy to create a date_type instead of a timestamp_type by accident. This patch changes the code so that a date_type is no longer implicitly used when constructing a data_value. All existing code that was depending on this is converted to explicitly using date_type_native_type. A followup patch will convert to timestamp_type when appropriate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Gleb Natapov	f78b2c5588	transport: remove remaining craft related to cql's server load balancing Commit `7e3805ed3d` removed the load balancing code from cql server, but it did not remove most of the craft that load balancing introduced. The most of the complexity (and probably the main reason the code never worked properly) is around service::client_state class which is copied before been passed to the request processor (because in the past the processing could have happened on another shard) and then merged back into the "master copy" because a request processing may have changed it. This commit remove all this copying. The client_request is passed as a reference all the way to the lowest layer that needs it and it copy construction is removed to make sure nobody copies it by mistake. tests: dev, default c-s load of 3 node cluster Message-Id: <20190906083050.GA21796@scylladb.com>	2019-09-07 18:17:53 +03:00
Avi Kivity	3b5aa13437	Merge "Optimize type find" from Rafael " This avoids a double dispatch on _kind and also removes a few shared_ptr copies. The extra work was a small regression from the recent types refactoring. " * 'espindola/optimize_type_find' of https://github.com/espindola/scylla: types: optimize type find implementation types: Avoid shared_ptr copies	2019-09-07 18:14:36 +03:00
Gleb Natapov	5b9dc00916	test: fix query_processor_test::test_query_counters to use SERIAL consistency correctly It is not possible to scan a table with SERIAL consistency only to read a single partition. Message-Id: <20190905143023.GQ21540@scylladb.com>	2019-09-07 18:07:01 +03:00
Gleb Natapov	e52ebfb957	cql3: remove unused next_timestamp() function next_timestamp() just calls get_timestamp() directly and nobody uses it anyway. Message-Id: <20190905101648.GO21540@scylladb.com>	2019-09-05 17:20:21 +03:00
Botond Dénes	783277fb02	stream_session: STREAM_MUTATION_FRAGMENTS: print errors in receive and distribute phase Currently when an error happens during the receive and distribute phase it is swallowed and we just return a -1 status to the remote. We only log errors that happen during responding with the status. This means that when streaming fails, we only know that something went wrong, but the node on which the failure happened doesn't log anything. Fix by also logging errors happening in the receive and distribute phase. Also mention the phase in which the error happened in both error log messages. Refs: #4901 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190903115735.49915-1-bdenes@scylladb.com>	2019-09-05 13:43:00 +02:00
Rafael Ávila de Espíndola	dd81e94684	types: fix varint to integer conversion The previous code was using the boost::multiprecision::cpp_int to integer conversion, but that doesn't have the same semantics an cql for signed numbers. This fixes the dtest cql_cast_test.py:CQLCastTest.cast_varint_test. Fixes #4960 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola	263e18b625	types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer It will be used when converting varint to integer too. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola	2d453b8e17	types: fix decimal to integer conversion The previous code was using the boost::multiprecision::cpp_rational to integer conversion, but that doesn't have the same semantics an cql. This patch avoids creating a cpp_rational in the first place and works just with integers. This fixes the dtest cql_cast_test.py:CQLCastTest.cast_decimal_test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola	fb760774dd	types: extract helper for converting a decimal to a cppint It will also be used in the decimal to integer conversion. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 15:08:07 -07:00
Rafael Ávila de Espíndola	40e6882906	types: rename and detemplate make_castas_fctn_from_decimal_to_integer It was only ever used for varint. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 14:54:47 -07:00
Avi Kivity	301246f6c0	storage_proxy: protect _view_update_handlers_list iterators from invalidation on_down() iterates over _view_update_handlers_list, but it yields during iteration, and while it yields, elements in that list can be removed, resulting in a use-after-free. Prevent this by registering iterators that can be potentially invalidated, and any time we remove an element from the list, check whether we're removing an element that is being pointed to by a live iterator. If that is the case, advance the iterator so that it points at a valid element (or at the end of the list). Fixes #4912. Tests: unit (dev)	2019-09-04 17:19:28 +03:00
Tomasz Grabiec	9f5826fd4b	Merge "Use canonical mutations for background schema sync" from Botond Currently the background schema sync (push/pull) uses frozen mutation to send the schema mutations over the wire to the remote node. For this to work correctly, both nodes have to have the exact same schema for the system schema tables, as attempting to unpack the frozen mutation with the wrong schema leads to undefined behaviour. To avoid this and to ensure syncing schema between nodes with different schema table schema versions is defined we migrate the background schema sync to use canonical mutations for the transfer of the schema mutations. Canonical mutations are immune to this problem, as they support deserializing with any version of the schema, older or newer one. The foreground schema sync mechanisms -- the on-demand schema pulls on reads and writes -- already use canonical mutations to transmit the schema mutations. It is important to note that due to this change, column-level incompatibilities between the schema mutations and the schema used to deserialize them will be hidden. This is undesired and should be fixed in a follow-up (#4956). Table level incompatibilities are detected and schema mutations containing such mutations will be rejected just like before. This patch adds canonical mutation support to the two background schema sync verbs: * `DEFINITIONS_UPDATE` (schema push) * `MIGRATION_REQUEST` (schema pull) Both verbs still support the old frozen mutation schema transfer, albeit that path is now much less efficient. After all nodes are upgraded, the pull verb can effectively avoid sending frozen mutations altogether, completely migrating to canonical mutations. Unfortunately this was not possible for the push verb, so that one now has an overhead as it needs to send both the frozen and canonical mutations. Fixes: #4273	2019-09-04 13:58:14 +02:00
Benny Halevy	bc29520eb8	flat_mutation_reader: consume_in_thread: add mutation_filter For validating mutation_fragment's monotonicity. Note: forwarding constructor allows implicit conversion by current callers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-04 13:42:37 +03:00
Rafael Ávila de Espíndola	000514e7cc	sstable: close file_writer if an exception in thrown The previous code was not exception safe and would eventually cause a file to be destroyed without being closed, causing an assert failure. Unfortunately it doesn't seem to be possible to test this without error injection, since using an invalid directory fails before this code is executed. Fixes #4948 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190904002314.79591-1-espindola@scylladb.com>	2019-09-04 13:28:55 +03:00
Botond Dénes	7adc764b6e	messaging_service: add canonical_support to schema pull and push verbs The verbs are: * DEFINITIONS_UPDATE (push) * MIGRATION_REQUEST (pull) Support was added in a backward-compatible way. The push verb, sends both the old frozen mutation parameter, and the new optional canonical mutation parameter. It is expected that new nodes will use the latter, while old nodes will fall-back to the former. The pull verb has a new optional `options` parameter, which for now contains a single flag: `remote_supports_canonical_mutation_retval`. This flag, if set, means that the remote node supports the new canonical mutation return value, thus the old frozen mutations return value can be left empty.	2019-09-04 10:32:44 +03:00
Botond Dénes	d9a8ff15d8	service::migration_manager: add canonical_mutation merge_schema_from() overload Add an overload which takes a vector of canonical mutations. Going forward, this is the overload to use.	2019-09-04 10:32:44 +03:00
Botond Dénes	e02b93cae1	schema_tables: convert_schema_to_mutations: return canonical_mutations In preparation to the schema push/pull migrating to use canonical mutations, convert the method producing the schema mutations to return a vector of canonical mutations. The only user, MIGRATION_REQUEST verb, converts the canonical mutations back to frozen mutations. This is very inefficient, but this path will only be used in mixed clusters. After all nodes are upgraded the verb will be sending the canonical mutations directly instead.	2019-09-04 08:47:20 +03:00
Rafael Ávila de Espíndola	b100f95adc	types: optimize type find implementation This turns find into a template so there is only one switch over the kind of each type in the search. To evaluate the change in code size sizes, I added [[noinline]] to find and obtained the following results. The release columns for release in the before case have an extra column because the functions are sufficiently complex to trigger gcc to split them in hot + cold. before: dev release (hot + cold split) find 0x35f = 863 0x3d5 + 0x112 = 1255 references_duration 0x62 + 0x22 + 0x8 = 140 0x55 + 0x1f + 0x2a + 0x8 = 166 references_user_type 0x6b + 0x26 + 0x111 = 418 0x65 + 0x1f + 0x32 + 0x11b = 465 after: dev release find 0xd6 + 0x1b4 = 650 0xd2 + 0x1f5 = 711 references_duration 0x13 = 19 0x13 = 19 references_user_type 0x1a = 26 0x21 = 33 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-03 08:23:21 -07:00
Rafael Ávila de Espíndola	e0065b414e	types: Avoid shared_ptr copies They are somewhat expensive (in code size at least) and not needed everywhere. Inside the getter the variables are 'const data_type&', so we can return that. Everything still works when a copy is needed, but in code that just wants to check a property we avoid the copy. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-03 07:43:35 -07:00
Benny Halevy	bdfb73f67d	scripts/create-relocatable-package: ldd: print executable name in exception Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190903080511.534-1-bhalevy@scylladb.com>	2019-09-03 15:34:38 +03:00
Avi Kivity	294a86122e	Merge "nonroot installer" from Takuya " This is nonroot installer patchset v9. " * 'nonroot_v9' of https://github.com/syuu1228/scylla: dist/common/scripts: support nonroot mode on setup scripts reloc/python3: add install.sh on python relocatable package install.sh: add --nonroot mode dist/common/systemd: untemplataize .service, use drop-in units instead dist/debian: delete debian/.install, debian/*.dirs	2019-09-03 15:33:20 +03:00
Piotr Sarna	7b297865e1	transport: wait for the connections to finish when stopping (#4818 ) During CQL request processing, a gate is used to ensure that the connection is not shut down until all ongoing requests are done. However, the gate might have been left too early if the database was not ready to respond immediately - which could result in trying to respond to an already closed connection later. This issue is solved by postponing leaving the gate until the continuation chain that handles the request is finished. Refs #4808	2019-09-03 14:49:11 +03:00
Avi Kivity	8fb59915bb	Merge "Minor cleanup patches for sstables" from Asias * 'cleanup_sstables' of https://github.com/asias/scylla: sstables: Move leveled_compaction_strategy implementation to source file sstables: Include dht/i_partitioner.hh for dht::partition_range	2019-09-03 14:47:44 +03:00
Takuya ASADA	31ddb2145a	dist/common/scripts: support nonroot mode on setup scripts Since nonroot mode requires to run everything on non-privileged user, most of setup scripts does not able to use nonroot mode. We only provide following functions on nonroot mode: - EC2 check - IO setup - Node exporter installer - Dev mode setup Rest of functions will be skipped on scylla_setup. To implement nonroot mode on setup scripts, scylla_util provides utility functions to abstract difference of directory structure between normal installation and nonroot mode.	2019-09-03 20:06:35 +09:00
Takuya ASADA	cfa8885ae1	reloc/python3: add install.sh on python relocatable package To support nonroot installation on scylla-python3, add install.sh on scylla-python3 relocatable package.	2019-09-03 20:06:30 +09:00
Takuya ASADA	2de14e0800	install.sh: add --nonroot mode This implements the way to install Scylla without requires root privilege, not distribution dependent, does not uses package manager.	2019-09-03 20:06:24 +09:00
Takuya ASADA	cde798dba5	dist/common/systemd: untemplataize *.service, use drop-in units instead Since systemd unit can override parameters using drop-in unit, we don't need mustache template for them. Also, drop --disttype and --target options on install.sh since it does not required anymore, introduce --sysconfdir instead for non-redhat distributions.	2019-09-03 20:06:15 +09:00
Takuya ASADA	49a360f234	dist/debian: delete debian/.install, debian/.dirs Since `ac9b115`, we switched to install.sh on Debian so we don't rely on .deb specific packaging scripts anymore. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2019-09-03 20:06:09 +09:00
Benny Halevy	7827e3f11d	tests: test_large_data: do not stop database Now that compaction returns only after the compacted sstables are deleted we no longer need to stop the base to force waiting for deletes (that were previously done asynchronously) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	19b67d82c9	table::on_compaction_completion: fix indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	8dd6e13468	table::on_compaction_completion: wait for background deletes Don't let background deletes accumulate uncontrollably. Fixes #4909 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	da6645dc2c	table: refresh_snapshot before deleting any sstables The row cache must not hold refrences to any sstable we're about to delete. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:29 +03:00
Nadav Har'El	6c4ad93296	api/compaction_manager: do not hold map on the stack Merged patch series by Amnon Heiman: This patch fixes a bug that a map is held on the stack and then is used by a future. Instead, the map is now moved to the relevant lambda function. Fixes #4824	2019-09-01 13:16:34 +03:00
Avi Kivity	e962beea20	toolchain: update to Fedora 30 and gcc 9.2 In Fedora 30 we have a new boost version, so we no longer need to use our patched boost, so we also remove the scylladb/toolchain copr.	2019-09-01 12:05:26 +03:00
Piotr Sarna	23c891923e	main: make sure view_builder doesn't propagate semaphore errors Stopping services which occurs in a destructor of deferred_action should not throw, or it will end the program with terminate(). View builder breaks a semaphore during its shutdown, which results in propagating a broken_semaphore exception, which in turn results in throwing an exception during stop().get(). In order to fix that issue, semaphore exceptions are explicitly ignored, since they're expected to appear during shutdown. Fixes #4875	2019-09-01 11:59:57 +03:00
Tomasz Grabiec	c8f8a9450f	Merge "Improve cpu instruction set support checks" from Avi To prevent termination with SIGILL, tighten the instruction set support checks. First, check for CLMUL too. Second, add a check in scylla_prepare to catch the problem early. Fixes #4921.	2019-08-30 16:54:04 +02:00
Avi Kivity	07010af44c	scylla_prepare: verify processor satisfies instruction set requirements Scylla requires the CLMUL and SSE 4.2 instruction sets and will fail without them. There is a check in main(), but that happens after the code is running and it may already be too late. Add a check in scylla_prepare which runs before the main executable.	2019-08-29 15:34:29 +03:00
Avi Kivity	9579946e72	main: extend CPU feature check to verify that PCLMUL is available Since `79136e895f`, we use the pclmul instruction set, so check it is there.	2019-08-29 15:13:32 +03:00
Gleb Natapov	e61a86bbb2	to_string: Add operator<< overload for std::tuple. Message-Id: <20190829100902.GN21540@scylladb.com>	2019-08-29 13:35:02 +03:00
Rafael Ávila de Espíndola	036f51927c	sstables: Remove unused include Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190827210424.37848-1-espindola@scylladb.com>	2019-08-28 11:32:44 +03:00
Benny Halevy	869b518dca	sstables: auto-delete unsealed sstables Fixes #4807 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190827082044.27223-1-bhalevy@scylladb.com>	2019-08-28 09:46:17 +03:00
Botond Dénes	969aa22d51	configure.py: promote unused result warning to error Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190827111428.6829-2-bdenes@scylladb.com>	2019-08-28 09:46:17 +03:00
Botond Dénes	480b42b84f	tests/gossip_test: silence discarded future warning Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190827111428.6829-1-bdenes@scylladb.com>	2019-08-28 09:46:17 +03:00
Avi Kivity	d85339e734	Update seastar submodule * seastar 20bfd61955...cb7026c16f (2): > net: dpdk: suppress discarded future warning > Merge "Optimize promises in then/then_wrapped" from Rafael	2019-08-28 09:46:17 +03:00
Avi Kivity	f1d73d0c13	Merge "systemd: put scylla processes in systemd slices. #4743 " from Glauber " It is well known that seastar applications, like Scylla, do not play well with external processes: CPU usage from external processes may confuse the I/O and CPU schedulers and create stalls. We have also recently seen that memory usage from other application's anonymous and page cache memory can bring the system to OOM. Linux has a very good infrastructure for resource control contributed by amazingly bright engineers in the form of cgroup controllers. This infrastructure is exposed by SystemD in the form of slices: a hierarchical structure to which controllers can be attached. In true systemd way, the hierarchy is implicit in the filenames of the slice files. a "-" symbol defines the hierarchy, so the files that this patch presents, scylla-server and scylla-helper, essentially create a "scylla" cgroup at the top level with "server" and "helper" children. Later we mark the Services needed to run scylla as belonging to one or the other through the Slice= directive. Scylla DBAs can benefit from this setup by using the systemd-run utility to fire ad-hoc commands. Let's say for example that someone wants to hypothetically run a backup and transfer files to an external object store like S3, making sure that the amount of page cache used won't create swap pressure leading to database timeouts. One can then run something like: sudo systemd-run --uid=id -u scylla --gid=id -g scylla -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool (or even better, the backup tool can itself be a systemd timer) " * 'slices' of https://github.com/glommer/scylla: systemd: put scylla processes in systemd slices. move postinst steps to an external script	2019-08-26 20:16:55 +03:00
Benny Halevy	20083be9f6	sstables: delete_atomically: fix misplaced parenthesis in pending_delete_log warning message Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190818064637.9207-1-bhalevy@scylladb.com>	2019-08-26 19:50:21 +03:00
Avi Kivity	b9e9d7d379	Merge "Resolve discarded future warnings" from Botond " The warning for discarded futures will only become useful, once we can silence all present warnings and flip the flag to make it become error. Then it will start being useful in finding new, accidental discarding of futures. This series silences all remaining warnings in the Scylla codebase. For those cases where it was obvious that the future is discarded on purpose, the author taking all necessary precaution (handling exception) the warning was simply silenced by casting the future to void and adding a relevant comment. Where the discarding seems to have been done in error, I have fixed the code to not discard it. To the rest of the sites I added a FIXME to fix the discarding. " * 'resolve-discarded-future-warnings/v4.2' of https://github.com/denesb/scylla: treewide: silence discarded future warnings for questionable discards treewide: silence discarded future warnings for legit discards tests: silence discarded future warnings tests/cql_query_test.cc: convert some tests to thread	2019-08-26 19:40:25 +03:00
Botond Dénes	136fc856c5	treewide: silence discarded future warnings for questionable discards This patches silences the remaining discarded future warnings, those where it cannot be determined with reasonable confidence that this was indeed the actual intent of the author, or that the discarding of the future could lead to problems. For all those places a FIXME is added, with the intent that these will be soon followed-up with an actual fix. I deliberately haven't fixed any of these, even if the fix seems trivial. It is too easy to overlook a bad fix mixed in with so many mechanical changes.	2019-08-26 19:28:43 +03:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Botond Dénes	cff4c4932d	tests: silence discarded future warnings	2019-08-26 18:54:44 +03:00
Botond Dénes	486fa8c10c	tests/cql_query_test.cc: convert some tests to thread Some tests are currently discarding futures unjustifiably, however adding code to wait on these futures is quite inconvenient due to the continuation style code of these tests. Convert them to run in a seastar thread to make the fix easier.	2019-08-26 18:54:44 +03:00
Tomasz Grabiec	ac5ff4994a	service: Announce the new schema version when features are enabled Introduced in `c96ee98`. We call update_schema_version() after features are enabled and we recalculate the schema version. This method is not updating gossip though. The node will still use it's database::version() to decide on syncing, so it will not sync and stay inconsistent in gossip until the next schema change. We should call updatE_schema_version_and_announce() instead so that the gossip state is also updated. There is no actual schema inconsistency, but the joining node will think there is and will wait indefinitely. Making a random schema change would unbock it. Fixes #4647. Message-Id: <1566825684-18000-1-git-send-email-tgrabiec@scylladb.com>	2019-08-26 17:54:59 +03:00
Avi Kivity	a7b82af4c3	Update seastar submodule * seastar afc5bbf511...20bfd61955 (18): > reactor: closing file used to check if direct_io is supported > future: set_coroutine(): s/state()/_state/ > tests/perf/perf_test.hh: suppress discarded future warning > tests: rpc: fix memory leak in timeout wraparound tests > Revert "future-util: reduce allocations and continuations in parallel_for_each" > reactor: fix rename_priority_class() build failure in C++14 mode > future: mark future_state_base::failed() as unlikely > future-util: reduce allocations and continuations in parallel_for_each > future-utils: generalize when_all_estimate_vector_capacity() > output_stream: Add comment on sequentiality > docs/tutorial.md: minor cleanups in first section > core: fix a race in execution stages (Fixes #4856, fixes #4766) > semaphore: use semaphore's clock type in with_semaphore()/get_units() > future: fix doxygen documentation for promise<> > sharded: fixed detecting stop method when building with clang > reactor: fixed locking error in rename_priority_class > Assert that append_challenged_posix_file_impl are closed. > rpc: correctly handle huge timeouts	2019-08-26 15:37:58 +03:00
Asias He	3ea1255020	storage_service: Use sleep_abortable instead of sleep (#4697 ) Make the sleep abortable so that it is able to break the loop during shutdown. Fixes #4885	2019-08-26 13:35:44 +03:00
Asias He	2f24fd9106	sstables: Move leveled_compaction_strategy implementation to source file It is better than putting everything in header.	2019-08-26 16:49:48 +08:00
Asias He	b69138c4e4	sstables: Include dht/i_partitioner.hh for dht::partition_range Get rid of one FIXME.	2019-08-26 16:35:18 +08:00
Nadav Har'El	b60d201a11	API: column_family.cc Add get_built_indexes implmentation Merged patch series from Amnon Heiman amnon@scylladb.com This Patch adds an implementation of the get built index API and remove a FIXME. The API returns a list of secondary indexes belongs to a column family and have already been fully built. Example: CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) ); CREATE index on scylla_demo.mytableID (time); $ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid' ["mytableid_time_idx"]	2019-08-25 18:37:44 +03:00
Amnon Heiman	2d3185fa7d	column_family.cc: remove unhandle future The sum_ratio struct is a helper struct that is used when calculating ratio over multiple shards. Originally it was created thinking that it may need to use future, in practice it was never used and the future was ignore. This patch remove the future from the implementation and reduce an unhandle future warning from the compilation. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-25 16:51:14 +03:00
Amnon Heiman	21dee3d8ef	API:column_family.cc Add get_build_index implmentation This Patch adds an implementation of the get build index API and remove a FIXME. The API returns the list of the built secondary indexes belongs to a column family. Example: CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) ); CREATE index on scylla_demo.mytableID (time); $ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid' ["mytableid_time_idx"] Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-25 16:46:49 +03:00
Juliana Oliveira	711ed76c82	auth: standard_role_manager: read null columns as false When a role is created through the `create role` statement, the 'is_superuser' and 'can_login' columns are set to false by default. Likewise, `list roles`, `alter roles` and `* roles` operations expect to find a boolean when reading the same columns. This is not the case, though, when a user directly inserts to `system_auth.roles` and doesn't set those columns. Even though manually creating roles is not a desired day-to-day operation, it is an insert just like any other and it should work. `* roles` operations, on the other hand, are not prepared for this deviations. If a user manually creates a role and doesn't set boolean values to those columns, `* roles` will return all sorts of errors. This happens because `* roles` is explicitly expecting a boolean and casting for it. This patch makes `* roles` more friendly by considering the boolean variable `false` - inside `* roles` context - if the actual value is `null`; it won't change the `null` value. Fixes #4280 Signed-off-by: Juliana Oliveira <juliana@scylladb.com> Message-Id: <20190816032617.61680-1-juliana@scylladb.com>	2019-08-25 11:52:43 +03:00
Pekka Enberg	118a141f5d	scylla_blocktune.py: Kill btrfs related FIXME The scylla_blocktune.py has a FIXME for btrfs from 2016, which is no longer relevant for Scylla deployments, as Red Hat dropped support for the file system in 2017. Message-Id: <20190823114013.31112-1-penberg@scylladb.com>	2019-08-24 20:40:08 +03:00
Botond Dénes	18581cfb76	multishard_mutation_query: create_readr(): use the caller's priority class The priority class the shard reader was created with was hardcoded to be `service::get_local_sstable_query_read_priority()`. At the time this code was written, priority classes could not be passed to other shards, so this method, receiving its priority class parameters from another shard, could not use it. This is now fixed, so we can just use whatever the caller wants us to use. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190823115111.68711-1-bdenes@scylladb.com>	2019-08-23 16:10:43 +02:00
Tomasz Grabiec	080989d296	Merge "cql3: cartesian product limits" from Avi Cartesian products (generated by IN restrictions) can grow very large, even for short queries. This can overwhelm server resources. Add limit checking for cartesian products, and configuration items for users that are not satisfied with the default of 100 records fetched. Fixes #4752. Tests: unit (dev), manual test with SIGHUP.	2019-08-21 19:35:59 +02:00
Avi Kivity	67b0d379e0	main: add glue between db::config and cql3::cql_config Copy values between the flat db::config and the hierarchical cql_config, adding observers to keep the values updated.	2019-08-21 19:35:59 +02:00
Avi Kivity	8c7ad1d4cd	cql: single_column_clustering_key_restrictions: limit cartesian products Cartesian products (via IN restrictions) make it easy to generate huge primary key sets with simple queries, overflowing server resources. Limit them in the coordinator and report an exception instead of trying to execute a query that would consume all of our memory. A unit test is added.	2019-08-21 19:35:59 +02:00
Avi Kivity	3a44fa9988	cql3, treewide: introduce empty cql3::cql_config class and propagate it We need a way to configure the cql interpreter and runtime. So far we relied on accessing the configuration class via various backdoors, but that causes its own problems around initialization order and testability. To avoid that, this patch adds an empty cql_config class and propagates it from main.cc (and from tests) to the cql interpreter via the query_options class, which is already passed everywhere. Later patches will fill it with contents.	2019-08-21 19:35:59 +02:00
Rafael Ávila de Espíndola	86c29256eb	types: Fix references_user_type This was broken since the type refactoring. It was checking the static type, which is always abstract_type. Unfortunately we only had dtests for this. This can probably be optimized to avoid the double switch over kind, but it is probably better to do the simple fix first. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190821155354.47704-1-espindola@scylladb.com>	2019-08-21 19:13:59 +03:00
Dejan Mircevski	ea9d358df9	cql3: Optimize LIKE regex construction Currently we create a regex from the LIKE pattern for every row considered during filtering, even though the pattern is always the same. This is wasteful, especially since we require costly optimization in the regex compiler. Fix it by reusing the regex whenever the pattern is unchanged since the last call. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-21 16:45:47 +03:00
Piotr Sarna	526f4c42aa	storage_proxy: fix iterator liveness issue in on_down (#4876 ) The loop over view update handlers used a guard in order to ensure that the object is not prematurely destroyed (thus invalidating the iterator), but the guard itself was not in the right scope. Fixed by replacinga 'for' loop with a 'while' loop, which moves the iterator incrementation inside the scope in which it's still guarded and valid. Fixes #4866	2019-08-21 15:44:43 +03:00
Avi Kivity	4ef7429c4a	build: build seastar in build directory Currently, seastar is built in seastar/build/{mode}. This means we have two build directories: build/{mode} and seastar/build/{mode}. This patch changes that to have only a single build directory (build/{mode}). It does that by calling Seastar's cmake directly instead of through Seastar's ./configure.py. However, to support dpdk, if that is enabled it calls cmake through Seastar's ./cooking.sh (similar to what Seastar's ./configure.py does). All ./configure.py flags are translated to cmake variables, in the same way that Seastar does. Contains fix from Rafael to pass the flags for the correct mode.	2019-08-21 13:10:17 +02:00
Rafael Ávila de Espíndola	278b6abb2b	Improve documentation on the system.large_* tables This clarifies that "rows" are clustering rows and that there is no information about individual collection elements. The patch also documents some properties common to all these tables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190820171204.48739-1-espindola@scylladb.com>	2019-08-21 10:36:25 +03:00
Vlad Zolotarov	d253846c91	hinted handoff: fix a race on a directory removal between space_watchdog and drain_for() The endpoint directories scanned by space_watchdog may get deleted by the manager::drain_for(). If a deleted directory is given to a lister::scan_dir() this will end up in an exception and as a result a space_watchdog will skip this round and hinted handoff is going to be disabled (for all agents including MVs) for the whole space_watchdog round. Let's make sure this doesn't happen by serializing the scanning and deletion using end_point_hints_manager::file_update_mutex. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 11:46:46 -04:00
Vlad Zolotarov	b34c36baa2	hinted handoff: make taking file_update_mutex safe end_point_hints_manager::file_update_mutex is taken by space_watchdog but while space_watchdog is waiting for it the corresponding end_point_hints_manager instance may get destroyed by manager::drain_for() or by manager::stop(). This will end up in a use-after-free event. Let's change the end_point_hints_manager's API in a way that would prevent such an unsafe locking: - Introduce the with_file_update_mutex(). - Make end_point_hints_manager::file_update_mutex() method private. Fixes #4685 Fixes #4836 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 11:26:19 -04:00
Vlad Zolotarov	dbad9fcc7d	db::hints::manager::drain_for(): fix alignment Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 10:58:36 -04:00
Vlad Zolotarov	7a12b46fc9	db::hints::manager: serialize calls to drain_for() If drain_for() is running together with itself: one instance for the local node and one for some other node, erasing of elements from the _ep_managers map may lead to a use-after-free event. Let's serialize drain_for() calls with a semaphore. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 10:58:36 -04:00
Vlad Zolotarov	09600f1779	db::hints: cosmetics: identation and missing method qualifier Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 10:58:36 -04:00
Avi Kivity	698b72b501	relocatable: switch from run-time relocation to install-time relocation Our current relocation works by invoking the dynamic linker with the executable as an argument. This confuses gdb since the kernel records the dynamic linker as the executable, not the real executable. Switch to install-time relocation with patchelf: when installing the executable and libraries, all paths are known, and we can update the path to the dynamic loader and to the dynamic libraries. Since patchelf itself is dynamically linked, we have to relocate it dynamically (with the old method of invoking it via the dynamic linker). This is okay since it's a one-time operation and since we don't expect to debug core dumps of patchelf crashes. We lose the ability to run scylla directly from the uninstalled tarball, but since the nonroot installer is already moving in the direction of requiring install.sh, that is not a great loss, and certainly the ability to debug is more important. dh_strip barfs on some binaries which were treated with patchelf, so exclude them from dh_strip. This doesn't lose any functionality, since these binaries didn't have debug information to begin with (they are already-stripped Fedora executables). Fixes #4673.	2019-08-20 00:25:43 +02:00
Botond Dénes	4cb873abfe	query::trim_clustering_row_ranges_to(): fix handling of non-full prefix keys Non-full prefix keys are currently not handled correctly as all keys are treated as if they were full prefixes, and therefore they represent a point in the key space. Non-full prefixes however represent a sub-range of the key space and therefore require null extending before they can be treated as a point. As a quick reminder, `key` is used to trim the clustering ranges such that they only cover positions >= then key. Thus, `trim_clustering_row_ranges_to()` does the equivalent of intersecting each range with (key, inf). When `key` is a prefix, this would exclude all positions that are prefixed by key as well, which is not desired. Fixes: #4839 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190819134950.33406-1-bdenes@scylladb.com>	2019-08-20 00:24:51 +02:00
Avi Kivity	21d6f0bb16	Merge "Add LIKE test cases for all non-string types #4859 " from Dejan " Follow-up to #4610, where a review comment asked for test coverage on all types. Existing tests cover all the types admissible in LIKE, while this PR adds coverage for all inadmissible types. Tests: unit (dev) " * 'like-nonstring' of https://github.com/dekimir/scylla: cql_query_test: Add LIKE tests for all types cql_query_test: Remove LIKE-nonstring-pattern case cql_query_test: Move a testcase elsewhere in file	2019-08-20 00:24:51 +02:00
Tomasz Grabiec	6813ae22b0	Merge "Handle termination signals during streaming" from Avi In `b197924`, we changed the shutdown process not to rely on the global reactor-defined exit, but instead added a local variable to hold the shutdown state. However, we did not propagate that state everywhere, and now streaming processes are not able to abort. Fix that by enhancing stop_signal with a sharded<abort_source> member that can be propagated to services. Propagate it to storage_service and thence to boot_strapper and range_streamer so that streaming processes can be aborted. Fixes #4674 Fixes #4501 Tests: unit (dev), manual bootstrap test	2019-08-20 00:24:51 +02:00
Avi Kivity	2c7435418a	Merge "database: assign proper io priority for streaming view updates" from Piotr " Streamed view updates parasitized on writing io priority, which is reserved for user writes - it's now properly bound to streaming write priority. Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...} Tests: unit(dev) " * 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla: db,view: wrap view update generation in stream scheduling group database: assign proper io priority for streaming view updates	2019-08-20 00:24:51 +02:00
Pekka Enberg	d0eecbf3bb	api/storage_proxy: Wire up hinted-handoff status to API We support hinted-handoff now, so let's return it's status via the API. Message-Id: <20190819080006.18070-1-penberg@scylladb.com>	2019-08-20 00:24:50 +02:00
Piotr Sarna	3cc5a04301	db,view: wrap view update generation in stream scheduling group Generating view updates is used by streaming, so the service itself should also run under the matching scheduling group.	2019-08-20 00:24:50 +02:00
Piotr Sarna	1ab07b80b4	database: assign proper io priority for streaming view updates Streamed view updates parasitized on writing io priority, which is reserved for user writes - it's now properly bound to streaming write priority.	2019-08-20 00:24:50 +02:00
Tomasz Grabiec	b9447d0319	Revert "relocatable: switch from run-time relocation to install-time relocation" This reverts commit `4ecce2d286`. Should be committed via the next branch.	2019-08-20 00:22:30 +02:00
Avi Kivity	4ecce2d286	relocatable: switch from run-time relocation to install-time relocation Our current relocation works by invoking the dynamic linker with the executable as an argument. This confuses gdb since the kernel records the dynamic linker as the executable, not the real executable. Switch to install-time relocation with patchelf: when installing the executable and libraries, all paths are known, and we can update the path to the dynamic loader and to the dynamic libraries. Since patchelf itself is dynamically linked, we have to relocate it dynamically (with the old method of invoking it via the dynamic linker). This is okay since it's a one-time operation and since we don't expect to debug core dumps of patchelf crashes. We lose the ability to run scylla directly from the uninstalled tarball, but since the nonroot installer is already moving in the direction of requiring install.sh, that is not a great loss, and certainly the ability to debug is more important. dh_strip barfs on some binaries which were treated with patchelf, so exclude them from dh_strip. This doesn't lose any functionality, since these binaries didn't have debug information to begin with (they are already-stripped Fedora executables). Fixes #4673.	2019-08-20 00:20:19 +02:00
Glauber Costa	da260ecd61	systemd: put scylla processes in systemd slices. It is well known that seastar applications, like Scylla, do not play well with external processes: CPU usage from external processes may confuse the I/O and CPU schedulers and create stalls. We have also recently seen that memory usage from other application's anonymous and page cache memory can bring the system to OOM. Linux has a very good infrastructure for resource control contributed by amazingly bright engineers in the form of cgroup controllers. This infrastructure is exposed by SystemD in the form of slices: a hierarchical structure to which controllers can be attached. In true systemd way, the hierarchy is implicit in the filenames of the slice files. a "-" symbol defines the hierarchy, so the files that this patch presents, scylla-server and scylla-helper, essentially create a "scylla" cgroup at the top level with "server" and "helper" children. Later we mark the Services needed to run scylla as belonging to one or the other through the Slice= directive. Scylla DBAs can benefit from this setup by using the systemd-run utility to fire ad-hoc commands. Let's say for example that someone wants to hypothetically run a backup and transfer files to an external object store like S3, making sure that the amount of page cache used won't create swap pressure leading to database timeouts. One can then run something like: ``` sudo systemd-run --uid=`id -u scylla` --gid=`id -g scylla` -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool ``` (or even better, the backup tool can itself be a systemd timer) Changes from last version: - No longer use the CPUQuota - Minor typo fixes - postinstall fixup for small machines Benchmark results: ================== Test: read from disk, with 100% disk util using a single i3.xlarge (4 vCPUs). We have to fill the cache as we read, so this should stress CPU, memory and disk I/O. cassandra-stress command: ``` cassandra-stress read no-warmup duration=5m -rate threads=20 -node 10.2.209.188 -pop dist=uniform$1..150000000$ ``` Baseline results: ``` Results: Op rate : 13,830 op/s [READ: 13,830 op/s] Partition rate : 13,830 pk/s [READ: 13,830 pk/s] Row rate : 13,830 row/s [READ: 13,830 row/s] Latency mean : 1.4 ms [READ: 1.4 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.8 ms [READ: 2.8 ms] Latency 99.9th percentile : 3.4 ms [READ: 3.4 ms] Latency max : 12.0 ms [READ: 12.0 ms] Total partitions : 4,149,130 [READ: 4,149,130] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Question 1: =========== Does putting scylla in a special slice affect its performance ? Results with Scylla running in a slice: ``` Results: Op rate : 13,811 op/s [READ: 13,811 op/s] Partition rate : 13,811 pk/s [READ: 13,811 pk/s] Row rate : 13,811 row/s [READ: 13,811 row/s] Latency mean : 1.4 ms [READ: 1.4 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.2 ms [READ: 2.2 ms] Latency 99th percentile : 2.6 ms [READ: 2.6 ms] Latency 99.9th percentile : 3.3 ms [READ: 3.3 ms] Latency max : 23.2 ms [READ: 23.2 ms] Total partitions : 4,151,409 [READ: 4,151,409] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion : No significant change Question 2: =========== What happens when there is a CPU hog running in the same server as scylla? CPU hog: ``` taskset -c 0 /bin/sh -c "while true; do true; done" & taskset -c 1 /bin/sh -c "while true; do true; done" & taskset -c 2 /bin/sh -c "while true; do true; done" & taskset -c 3 /bin/sh -c "while true; do true; done" & sleep 330 ``` Scenario 1: CPU hog runs freely: ``` Results: Op rate : 2,939 op/s [READ: 2,939 op/s] Partition rate : 2,939 pk/s [READ: 2,939 pk/s] Row rate : 2,939 row/s [READ: 2,939 row/s] Latency mean : 6.8 ms [READ: 6.8 ms] Latency median : 5.3 ms [READ: 5.3 ms] Latency 95th percentile : 11.0 ms [READ: 11.0 ms] Latency 99th percentile : 14.9 ms [READ: 14.9 ms] Latency 99.9th percentile : 17.1 ms [READ: 17.1 ms] Latency max : 26.3 ms [READ: 26.3 ms] Total partitions : 884,460 [READ: 884,460] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Scenario 2: CPU hog runs inside scylla-helper slice ``` Results: Op rate : 13,527 op/s [READ: 13,527 op/s] Partition rate : 13,527 pk/s [READ: 13,527 pk/s] Row rate : 13,527 row/s [READ: 13,527 row/s] Latency mean : 1.5 ms [READ: 1.5 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.9 ms [READ: 2.9 ms] Latency 99.9th percentile : 3.8 ms [READ: 3.8 ms] Latency max : 18.7 ms [READ: 18.7 ms] Total partitions : 4,069,934 [READ: 4,069,934] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion: With systemd slice we can keep the performance very close to baseline Question 3: =========== What happens when there is a CPU hog running in the same server as scylla? I/O hog: (Data in the cluster is 2x size of memory) ``` while true; do find /var/lib/scylla/data -type f -exec grep glauber {} + done ``` Scenario 1: I/O hog runs freely: ``` Results: Op rate : 7,680 op/s [READ: 7,680 op/s] Partition rate : 7,680 pk/s [READ: 7,680 pk/s] Row rate : 7,680 row/s [READ: 7,680 row/s] Latency mean : 2.6 ms [READ: 2.6 ms] Latency median : 1.3 ms [READ: 1.3 ms] Latency 95th percentile : 7.8 ms [READ: 7.8 ms] Latency 99th percentile : 10.9 ms [READ: 10.9 ms] Latency 99.9th percentile : 16.9 ms [READ: 16.9 ms] Latency max : 40.8 ms [READ: 40.8 ms] Total partitions : 2,306,723 [READ: 2,306,723] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Scenario 2: I/O hog runs in the scylla-helper systemd slice: ``` Results: Op rate : 13,277 op/s [READ: 13,277 op/s] Partition rate : 13,277 pk/s [READ: 13,277 pk/s] Row rate : 13,277 row/s [READ: 13,277 row/s] Latency mean : 1.5 ms [READ: 1.5 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.9 ms [READ: 2.9 ms] Latency 99.9th percentile : 3.5 ms [READ: 3.5 ms] Latency max : 183.4 ms [READ: 183.4 ms] Total partitions : 3,984,080 [READ: 3,984,080] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion: With systemd slice we can keep the performance very close to baseline Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-08-19 14:31:28 -04:00
Avi Kivity	c32f9a8f7b	dht: check for aborts during streaming Propagate the abort_source from main() into boot_strapper and range_stream and check for aborts at strategic points. This includes aborting running stream_plans and aborting sleeps between retries. Fixes #4674	2019-08-18 20:41:07 +03:00
Avi Kivity	5af6f5aa22	main: expose SIGINT/SIGTERM as abort_source In order to propagate stop signals, expose them as sharded<abort_source>. This allows propagating the signal to all shards, and integrating it with sleep_abortable(). Because sharded<abort_source>::stop() will block, we'll now require stop_signal to run in a thread (which is already the case).	2019-08-18 20:15:26 +03:00
Avi Kivity	20aed3398d	Merge "Simplify types" from Rafael " This is hopefully the last large refactoring on the way of UDF. In UDF we have to convert internal types to Lua and back. Currently almost all our types and hidden in types.cc and expose functionality via virtual functions. While it should be possible to add a convert_{to\|from}_lua virtual functions, that seems like a bad design. In compilers, the type definition is normally public and different passes know how to reason about each type. The alias analysis knows about int and floats, not the other way around. This patch series is inspired by both the LLVM RTTI (https://www.llvm.org/docs/HowToSetUpLLVMStyleRTTI.html) and std::variant. The series makes the types public, adds a visit function and converts the various virtual methods to just use visit. As a small example of why this is useful, it then moves a bit of cql3 and json specific logic out of types.cc and types.hh. In a similar way, the UDF code will be able to used visit to convert objects to Lua. In comparison with the previous versions, this series doesn't require the intermediate step of converting void* to data_value& in a few member functions. This version also has fewer double dispatches I a am fairly confident has all the tools for avoiding all double dispatches. " * 'simplify-types-v3' of https://github.com/espindola/scylla: (80 commits) types: Move abstract_type visit to a header types: Move uuid_type_impl to a header types: Move inet_addr_type_impl to a header types: Move varint_type_impl to a header types: Move timeuuid_type_impl to a header types: Move date_type_impl to a header types: Move bytes_type_impl to a header types: Move utf8_type_impl to a header types: Move ascii_type_impl to a header types: Move string_type_impl to a header types: Move time_type_impl to a header types: Move simple_date_type_impl to a header types: Move timestamp_type_impl to a header types: Move duration_type_impl to a header types: Move decimal_type_impl to a header types: Move floating point types to a header types: Move boolean_type_impl to a header types: Move integer types to a header types: Move integer_type_impl to a header types: Move simple_type_impl to a header ...	2019-08-18 19:04:05 +03:00
Takuya ASADA	f574112301	dist/debian: handle --dist correctly On `ac9b115`, it mistakenly ignores --dist option. It should set 'housekeeping' template variable to 'enable'. Fixes #4857 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190816120127.14099-1-syuu@scylladb.com>	2019-08-18 15:00:33 +03:00
Avi Kivity	14d40cc659	Update seastar submodule * seastar fe2b5b0c6...afc5bbf51 (4): > Merge "handle discarded futures or suppress warning" from Benny > Remove variadic futures from the Seastar implementation > Revert "Merge "handle discarded futures or suppress warning" from Benny" > io_queue: Forward declare smp class	2019-08-17 12:18:18 +03:00
Dejan Mircevski	48bb89fcb7	cql_query_test: Add LIKE tests for all types As requested in a prior code review [1], ensure that LIKE cannot be used on any non-string type. [1] https://github.com/scylladb/scylla/pull/4610#pullrequestreview-255590129 Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-16 17:55:35 -04:00
Dejan Mircevski	ef071bf7ce	cql_query_test: Remove LIKE-nonstring-pattern case This testcase was previously commented out, pending a fix that cannot be made. Currently it is impossible to validate the marker-value type at filtering time. The value is entered into the options object under its presumed type of string, regardless of what it was made from. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-16 17:07:44 -04:00
Dejan Mircevski	20e688e703	cql_query_test: Move a testcase elsewhere in file Somehow this test case sits in the middle of LIKE-operator tests: test_alter_type_on_compact_storage_with_no_regular_columns_does_not_crash Move it so LIKE test cases are contiguous. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-16 17:07:44 -04:00
Glauber Costa	ffc328c924	move postinst steps to an external script There are systemd-related steps done in both rpm and deb builds. Move that to a script so we avoid duplication. The tests are so far a bit specific to the distributions, so it needs to be adapted a bit. Also note that this also fixes a bug with rpm as a side-effect: rpm does not call daemon-reload after potentially changing the systemd files (it is only implied during postun operations, that happen during uninstall). daemon-reload was called explicitly for debian packages, and now it is called for both. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-08-15 10:43:17 -04:00
Rafael Ávila de Espíndola	7f0a434cfa	types: Move abstract_type visit to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	dccefd1ddb	types: Move uuid_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	038728a381	types: Move inet_addr_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	1966416cb3	types: Move varint_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	9229f99c86	types: Move timeuuid_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	993f132619	types: Move date_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	a299ed3b9b	types: Move bytes_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	09ac2a1bc6	types: Move utf8_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	da472a65ec	types: Move ascii_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	b98bac65b0	types: Move string_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	3e5b1e2630	types: Move time_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	909df932ac	types: Move simple_date_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	8f3bebb6e8	types: Move timestamp_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	3260153d35	types: Move duration_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	2f6a26b1c1	types: Move decimal_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	480ca52b59	types: Move floating point types to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	6a4ec7488e	types: Move boolean_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	404b26d3fa	types: Move integer types to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	bd3e725605	types: Move integer_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	03aca28dc5	types: Move simple_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	e8ba37fa5a	types: Move counter_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	cb03c79a48	types: Move empty_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	1cb7127bf3	types: Make abstract_type::serialize a static helper Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	b175657ee7	types: Devirtualize abstract_type::validate Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	bf96f1111c	types: Make abstract_type::serialized_size a static helper Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:41 -07:00
Rafael Ávila de Espíndola	6831e05471	types: Move functions that use abstract_type::serialized_size out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	047e34a31d	types: Remove serialize_value It is no longer needed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	1e0663c56c	types: Devirtualize abstract_type::from_string Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	68b26047cc	types: Devirtualize abstract_type::serialize Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	18da5f9001	types: Devirtualize abstract_type::from_json_object Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	b987b2dcbe	types: Devirtualize abstract_type::to_json_string Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	b4bc888eac	types: Refactor abstract_type::serialized_size The following logic was duplicated: * For all types, if value is null, the result is zero. * For non collection types, if the native object is empty, the result is zero. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	968365b7e3	types: Devirtualize abstract_type::serialized_size Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	793bc50d69	types: Delete abstract_type::validate_collection_member Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	37686964f0	types: Devirtualize abstract_type::hash Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	396f5c7656	types: Devirtualize abstract_type::native_typeid Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	492043a77d	types: Devirtualize abstract_type::native_value_delete Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	4d849d7742	types: Devirtualize abstract_type::native_value_clone Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	ba887b7e56	types: Delete abstract_type::native_value_destroy Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	5c0e78d70c	types: Delete abstract_type::native_value_move Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	2bc6471a1e	types: Delete abstract_type::native_value_copy Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	33394dfdc1	types: Delete abstract_type::native_value_size Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	c22ca2f9c9	types: Delete abstract_type::native_value_alignment Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	37c0f5b985	types: Devirtualize get_string Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	f633f70616	types: Devirtualize abstract_type::is_value_compatible_with_internal It now is a static helper. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	19c9a033d9	types: Devirtualize abstract_type::is_compatible_with Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	d245d08045	types: Devirtualize abstract_type::is_string Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	ae30d78ca9	types: Devirtualize abstract_type::equal Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	f087756684	types: Implement less with compare We defined less for some types and compare for others. There is no type for which compare is substantially more expensive, so define it for all types and implement less with compare. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	9bbf55e9c0	types: Devirtualize abstract_type::compare Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	a5daa8d258	types: Devirtualize abstract_type::less Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	a3e898a648	types: Devirtualize abstract_type::deserialize Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	8145faa66f	types: Inline is_byte_order_comparable into only user Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	325418db16	types: Devirtualize abstract_type::is_byte_order_comparable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	d2b063877b	types: Devirtualize abstract_type::is_byte_order_equal Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	21da060b24	types: Devirtualize abstract_type::update_user_type The type walking is similar to what the find function does, but refactoring it doesn't seem worth it if these are the only two uses. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	ae6e96a1e2	types: Refactor references_duration and references_user_type With this patch the logic for walking all nested types is moved to a helper function. It also fixes reversed_type_impl not being handled in references_duration. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	25a5631a46	types: Devirtualize abstract_type::references_user_type Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	544337f380	types: Devirtualize abstract_type::references_duration Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	a6b48bda03	types: Devirtualize abstract_type::is_native Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	f5b4fe5685	types: Devirtualize abstract_type::is_atomic Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	ec09fb94cb	types: Devirtualize abstract_type::is_multi_cell Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	1bea7747ce	types: Devirtualize abstract_type::is_tuple Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	1581805a8d	types: Devirtualize abstract_type::is_collection Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	1137695cb2	types: Devirtualize abstract_type::is_counter Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	d3ba0d132a	types: Devirtualize abstract_type::is_user_type Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	0ff539500f	types: Devirtualize abstract_type::cql3_type_name_impl Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	5314b489e3	types: Devirtualize abstract_type::get_cql3_kind_impl Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	2f0c64844f	types: Devirtualize abstract_type::is_reversed Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	33d2ec8e1c	types: Devirtualize abstract_type::underlying_type Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	064db9b92e	types: Devirtualize abstract_type::to_string_impl Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	69d6fd21d2	types: Add a listlike_collection_type_impl class With this we can share code that wants to access the element type of set and list. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	a4837301a6	types: Move _is_multi_cell to collection_type_impl It was duplicated in each concrete collection type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	de6d6c46a1	types: Remove collection_type_impl::kind All uses have been switched to abstract_type::kind. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	c80c19459e	types: Add a visitor over data_value Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	5701051857	types: Add a generic visit over abstract_type The api is inspired by on std::variant. This bridges the runtime type of a abstract_type object to a compile time overload resolution. For example, it is possible to have a single lambda to visit a string_type_impl, but it corresponds to two leaf types (ascii and utf8). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	e5c7deaeb5	types: Add a kind to abstract_type The type hierarchy is closed, so we can give each leaf an enum value. This will be used to implement a visitor pattern and reduce code duplication. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	5c098eb7d0	types: Add more tests for abstract_type::to_string_impl The corresponding code is correct, but I noticed no tests would fail if it was broken while refactoring it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	096de10eee	types: Remove abstract_type::equals All types are interned, so we can just compare the pointers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	6a8ffb35ff	types: Make a few concrete_type member functions public These only use public member functions from data_value, so there is no reason for not making them public too. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Gleb Natapov	1779c3b7f6	move admission control semaphore from cql server to storage_service There are two reasons for the move. First is that cql server lifetime is shorter than storage_proxy one and the later stores references to the semaphore in each service_permit it holds. Second - we want thrift (and in the future other user APIs) to share the same admission control memory pool. Fixes #4844 Message-Id: <20190814142614.GT17984@scylladb.com>	2019-08-14 18:49:56 +03:00
Gleb Natapov	a1e9e6faa2	storage_service: remove outdated comment We in fact do stop cql server in storage_service::drain_on_shutdown() which is called in main.cc during shutdown. Message-Id: <20190814085027.GP17984@scylladb.com>	2019-08-14 11:52:49 +03:00
Avi Kivity	9f512509c7	github: remove github pull request template (#4833 ) Since we do accept pull requests (in a long-running experiment), the pull request template suggesting not to use them is inaccurate, and many requesters forget to remove the boilerplace. Remove the outdate template.	2019-08-14 09:28:39 +03:00
Pekka Enberg	595434a554	Merge "docker: relax permission checks" from Avi "Commit `e3f7fe4` added file owner validation to prevent Scylla from crashing when it tries to touch a file it doesn't own. However, under docker, we cannot expect to pass this check since user IDs are from different namespaces: the process runs in a container namespace, but the data files usually come from a mounted volume, and so their uids are from the host namespace. So we need to relax the check. We do this by reverting `b1226fb`, which causes Scylla to run as euid 0 in docker, and by special-casing euid 0 in the ownership verification step. Fixes #4823." * 'docker-euid-0' of git://github.com/avikivity/scylla: main: relax file ownership checks if running under euid 0 Revert "dist/docker/redhat: change user of scylla services to 'scylla'"	2019-08-13 19:55:05 +03:00
Tomasz Grabiec	64ff1b6405	cql: alter type: Format field name as text instead of hex Fixes #4841 Message-Id: <1565702635-26214-1-git-send-email-tgrabiec@scylladb.com>	2019-08-13 16:25:48 +03:00
Tomasz Grabiec	34cff6ed6b	types: Fix abort on type alter which affects a compact storage table with no regular columns Fixes #4837 Message-Id: <1565702247-23800-1-git-send-email-tgrabiec@scylladb.com>	2019-08-13 16:25:02 +03:00
Avi Kivity	1ed3356e0e	main: relax file ownership checks if running under euid 0 During startup, we check that the data files are owned by our euid. But in a container environment, this is impossible to enforce because uid/username mappings are different between the host and the container, and the data files are likely to be mounted from the host. To allow for such environments, relax the checks if euid=0. This both matches what happens in a container (programs run as root) and the kernel access checks (euid 0 can do anything). We can reconsider this when container uid mapping is better developed. Fixes #4823. Fixes #4536.	2019-08-13 14:36:08 +03:00
Avi Kivity	ca28fdc37d	Revert "dist/docker/redhat: change user of scylla services to 'scylla'" This reverts commit `b1226fb15a`. When the data volume is mounted from the host (as is usual in container deployments), we can't expect that the files will be owned by the in-container scylla user. So that commit didn't really fix #4536. A follow-up patch will relax the check so it passes in a container environment.	2019-08-13 14:36:00 +03:00
Pekka Enberg	fed38f5179	reloc/build_reloc.sh: Add '--configure-flags' command line option This adds a '--configure-flags FLAGS' command line option, which overrides the flags passed to scylla.git 'configure.py' script. We need this for flexibility of custom builds in Jenkins pipelines, for example. Message-Id: <20190813095428.13590-1-penberg@scylladb.com>	2019-08-13 14:05:25 +03:00
Tomasz Grabiec	0cf4fab2ca	Merge "Multishard combining reader more robust reader recreation" from Botond Make the reader recreation logic more robust, by moving away from deciding which fragments have to be dropped based on a bunch of special cases, instead replacing this with a general logic which just drops all already seen fragments (based on their position). Special handling is added for the case when the last position is a range tombstone with a non full prefix starting position. Reproducer unit tests are added for both cases. Refs #4695 Fixes #4733	2019-08-13 11:53:07 +02:00
Gleb Natapov	00c4078af3	cache_hitrate_calculator: do not ignore a future returned from gossiper::add_local_application_state We should wait for a future returned from add_local_application_state() to resolve before issuing new calculation, otherwise two add_local_application_state() may run simultaneously for the same state. Fixes #4838. Message-Id: <20190812082158.GE17984@scylladb.com>	2019-08-13 11:48:38 +03:00
Botond Dénes	fe58324fb9	tests: test_multishard_combining_reader_as_mutation_source: don't copy mutations cross shard It's illegal. Freeze-unfreeze them instead when crossing shard boundaries.	2019-08-13 10:16:02 +03:00
Botond Dénes	d746fb59a7	mutation_reader_test: harden test_multishard_combining_reader_as_mutation_source Add `single_fragment_buffer` test variable. When set, the shard readers are created with a max buffer size of 1, effectively causing them to read a single fragment at a time. This, when combined with `evict_readers=true` will stress the recreate reader logic to the max.	2019-08-13 10:16:02 +03:00
Botond Dénes	899afc0661	flat_mutation_reader_assertions: produces_range_tombstone(): be more lenient Be more tolerant with different but equivalent representation of range deletions. When expecting a range tombstone, keep reading range tombstones while these can be merged with the cumulative range tombstone, resulting from the merging of the previous range tombstones. This results in tests tolerating range tombstones that are split into several, potentially overlapping range tombstones, representing the same underlying deletion.	2019-08-13 10:16:02 +03:00
Botond Dénes	53e1dca5ca	tests/mutation_source_test: generate_mutation_sets() add row that falls into deleted prefix This is tailored to the multishard_combining_reader, to make sure it doesn't loos rows following a range tombstone with a prefix starting position (whose prefix their keys fall into).	2019-08-13 09:47:55 +03:00
Botond Dénes	6bfe468a17	multishard_combining_reader: remote_reader::recreate_reader(): restore indentation	2019-08-13 09:47:55 +03:00
Botond Dénes	68353acc1c	multishard_combining_reader: remote_reader: use next instead of last pos Currently the remote reader uses the last seen fragment's position to calculate the position the reader should continue from when the reader is recreated after having been evicted. Recently it was discovered that this logic breaks down badly when this last position is a non-full clustering prefix (a range tombstone start bound). In this case, if only the last position is available, there is no good way of computing the starting position. Starting after this position will potentially miss any rows that fall into the prefix (the current behaviour). Starting from before it will cause all range tombstones with said prefix to be re-emitted, causing other problems. A better solution is to exploit the fact that sometimes we also know what the next fragment is. These "some" times are the exact times that are problematic with the current approach -- when the last fragment is a range tombstone. Exploiting this extra knowledge allows for a much better way for calculating the starting position: instead of maintaining the last position, we maintain the next position, which is always safe to start from. This is not always possible, but in many cases we can know for sure what the next position is, for example if the last position was a static row we can be sure the next position is the first clustering position (or partition end). In the few cases where we cannot calculate the next position we fall back to the previous logic and start from after the last positions. The good news is that in these remaining cases (the last fragment is a clustering row) it is safe to do so. This patch also does some refactoring of the remote-reader internals, all fill-buffer related logic is grouped together in a single `fill_buffer()` method.	2019-08-13 09:47:55 +03:00
Botond Dénes	3949189918	multishard_combining_reader: remote_reader::do_fill_buffer(): reorganize drop logic To make it more readable.	2019-08-13 09:47:55 +03:00
Botond Dénes	20c06adf80	position_in_partition: add for_partition_start()	2019-08-13 09:47:55 +03:00
Botond Dénes	87973498a1	query: refactor trim_clustering_row_ranges_to() Allow expressing `pos` in term of a `position_in_partition_view`, which allows finer control of the exact position, allowing specifying position before, at or after a certain key. The previous overload is kept for backward compatibility, invoking the new overload behind the curtains.	2019-08-13 09:47:55 +03:00
Botond Dénes	3a5e7db9b6	tests: add unit test for query::trim_clustering_row_ranges_to() We are about to do a major refactoring of this method. Add extensive unit tests to ensure we don't brake it in the process.	2019-08-13 09:47:55 +03:00
Botond Dénes	1b4e88b972	position_in_partition_view: add get_bound_weight()	2019-08-13 09:47:55 +03:00
Avi Kivity	0d0ee20f76	Merge "Implement `sstable_info` API command (info on sstables)" from Calle " Refs #4726 Implement the api portion of a "describe sstables" command. Adds rest types for collecting both fixed and dynamic attributes, some grouped. Allows extensions to add attributes as well. (Hint hint) " * 'sstabledesc' of https://github.com/elcallio/scylla: api/storage_service: Add "sstable_info" command sstables/compress: Make compressor pointer accessible from compression info sstables.hh: Add attribute description API to file extension sstables.hh: Add compression component accessor sstables.hh: Make "has_component" public	2019-08-12 21:16:08 +03:00
Dejan Mircevski	8be147d069	cql3: Handle empty LIKE pattern Match SQL's LIKE in allowing an empty pattern, which matches only an empty text field. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-12 19:48:31 +03:00
Rafael Ávila de Espíndola	99c7f8457d	logalloc: Add a migrators_base that is common to debug and release This simplifies the debug implementation and it now should work with scylla-gdb.py. It is not clear what, if anything, is lost by not using random ids. They were never being reused in the debug implementation anyway. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190618144755.31212-1-espindola@scylladb.com>	2019-08-12 19:44:55 +03:00
Calle Wilund	2b19bfbfbc	types: Remove obsolete "FIXME" inet_addr_type_impl has supported ipv6 for some time now. Message-Id: <20190812142731.6384-1-calle@scylladb.com>	2019-08-12 17:30:15 +03:00
Calle Wilund	1afc899e37	type_parser: Fix/improve exception messages Removes long-standing FIXME for message detail Also simplifies some code, removing duplication. Message-Id: <20190812134144.2417-1-calle@scylladb.com>	2019-08-12 17:03:43 +03:00
Calle Wilund	fdf2017487	cql3::term: Remove unneeded const_cast Removed no longer needed FIXME (to_string became const long ago) Message-Id: <20190812133943.2011-1-calle@scylladb.com>	2019-08-12 17:00:46 +03:00
Amnon Heiman	6a0490c419	api/compaction_manager: indentation	2019-08-12 14:04:40 +03:00
Amnon Heiman	8181601f0e	api/compaction_manager: do not hold map on the stack This patch fixes a bug that a map is held on the stack and then is used by a future. Instead, the map is now wrapped with do_with. Fixes #4824 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-12 14:04:00 +03:00
Asias He	131acc09cc	repair: Adjust parallelism according to memory size (#4696 ) After commit `8a0c4d5` (Merge "Repair switch to rpc stream" from Asias), we increased the row buffer size for repair from 512KiB to 32MiB per repair instance. We allow repairing 16 ranges (16 repair instance) in parallel per repair request. So, a node can consume 16 * 32MiB = 512MiB per user requested repair. In addition, the repair master node can hold data from all the repair followers, so the memory usage on repair master can be larger than 512MiB. We need to provide a way to limit the memory usage. In this patch, we limit the total memory used by repair to 10% of the shard memory. The ranges that can be repaired in parallel is: max_repair_ranges_in_parallel = max_repair_memory / max_repair_memory_per_range. For example, if each shard has 4096MiB of memory, then we will have max_repair_ranges_in_parallel = 4096MiB / 32MiB = 12. Fixes #4675	2019-08-12 11:09:27 +03:00
Avi Kivity	e6cde72d2b	Merge "Fix cql server admission control to take all leftover work into account" from Gleb " Current admission control takes a permit when cql requests starts and releases it when reply is sent, but some requests may leave background work behind after that point (some because there is genuine background work to do like complete a write or do a read repair, and some because a read/write may stuck in a queue longer than the request's timeout), so after Scylla replies with a timeout some resources are still occupied. The series fixes this by passing the permit down to storage_proxy where it is held until all background work is completed. Fixes #4768 " * 'gleb/admission-v3' of github.com:scylladb/seastar-dev: transport: add a metric to follow memory available for service permit. storage_proxy: store a permit in a read executor storage_proxy: store a permit in a write response handler Pass service permit to storage_proxy transport: introduce service_permit class and use it instead of semaphore_units transport: hold admission a permit until a reply is sent transport: remove cql server load balancer	2019-08-12 11:02:37 +03:00
Gleb Natapov	3e27c2198a	transport: add a metric to follow memory available for service permit. Add a metric to follow memory available for service permit. When this memory is close to zero cql server stops admitting new requests.	2019-08-12 10:20:43 +03:00
Gleb Natapov	7d7b1685aa	storage_proxy: store a permit in a read executor A read executor exists until read operation completes in its entirety so storing a permit there guaranties that it will be freed only after no background work left for the request on this server.	2019-08-12 10:20:43 +03:00
Gleb Natapov	d5ced800f0	storage_proxy: store a permit in a write response handler A write response handler exists until write operation completes in its entirety so storing a permit there guaranties that it will be freed only after no background work left for the request on this server.	2019-08-12 10:20:43 +03:00
Gleb Natapov	6a4207f202	Pass service permit to storage_proxy Current cql transport code acquire a permit before processing a query and release it when the query gets a reply, but some quires leave work behind. If the work is allowed to accumulate without any limit a server may eventually run out of memory. To prevent that the permit system should account for the background work as well. The patch is a first step in this direction. It passes a permit down to storage proxy where it will be later hold by background work.	2019-08-12 10:20:43 +03:00
Raphael S. Carvalho	b436c41128	compaction_manager: Prevent sstable runs from being partially compacted Manager trims sstables off to allow compaction jobs to proceed in parallel according to their weights. The problem is that trimming procedure is not sstable run aware, so it could incorrectly remove only a subset of a sstable run, leading to partial sstable run compaction. Compaction of a sstable run could lead to inneficiency because the run structure would be messed up, affecting all amplification factors, and the same generation could even end up being compacted twice. This is fixed by making the trim procedure respect the sstable runs. Fixes #4773. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190730042023.11351-1-raphaelsc@scylladb.com>	2019-08-11 17:20:20 +03:00
Gleb Natapov	ddff7f48cf	transport: introduce service_permit class and use it instead of semaphore_units service_permit is a new class that allows sharing a permit among different parts of request processing many of which can complete at different times.	2019-08-11 16:08:55 +03:00
Gleb Natapov	2daa72b7dc	transport: hold admission a permit until a reply is sent Current code release admission permit to soon. If new requests are admitted faster than client read replies back reply queue can grow to be very big. The patch moves service permit release until after a reply is sent.	2019-08-11 16:08:55 +03:00
Gleb Natapov	7e3805ed3d	transport: remove cql server load balancer It is buggy, unused and unnecessary complicates the code.	2019-08-11 16:08:52 +03:00
Nadav Har'El	f9d6eaf5ff	reconcilable_result: switch to chunked_vector Merged patch series from Avi Kivity: In rare but valid cases (reconciling many tombstones, paging disabled), a reconciled_result can grow large. This triggers large allocation warnings. Switch to chunked_vector to avoid the large allocation. In passing, fix chunked_vector's begin()/end() const correctness, and add the reverse iterator function family which is needed by the conversion. Fixes #4780. Tests: unit (dev) Commit Summary utils: chunked_vector: make begin()/end() const correct utils::chunked_vector: add rbegin() and related iterators reconcilable_result: use chunked_vector to hold partitions	2019-08-11 16:03:13 +03:00
Avi Kivity	ce2b0b2682	Merge "Add listen/rpc "prefer_ipv6" options to DNS lookup #4775 " from Calle " Add listen/rpc "prefer_ipv6" options to DNS lookup of bind addresses for API/rpc/prometheus etc . Fixes #4751 Adds using a preferred address family to dns name lookups related to listen address and rpc address, adhering to the respective "prefer" options. API, prometheus and broadcast address are all considered to be covered by the "listen_interface_prefer_ipv6" option. Note: scylla does not yet support actual interface binding, but these options should apply equally to address name parameters. Setting a "prefer_ipv6" option automtially enables ipv6 dns family query. " * 'calle/ipv6' of https://github.com/elcallio/scylla: init: Use the "prefer_ipv6" options available for rpc/listen address/interface inet_address: Add optional "preferred type" to lookup config: Add rpc_interface_prefer_ipv6 parameter config: Add listen_interface_perfer_ipv6 parameter config.cc: Fix enable_ipv6_dns_lookup actual param name	2019-08-11 15:21:45 +03:00
Pekka Enberg	73113c0ea4	utils/fb_utilities.hh: Kill obsolete FIXME and commented out Java code The FIXME was added in the very first commit ("utils: Convert utils/FBUtilities.java") that introduced the fb_utilities class as a stub. However, we have long implemented the parts that we actually use, so drop the FIXME as obsolete. In addition, drop the remaining uncommented Java code as unused and also obsolete. Message-Id: <20190808182758.1155-1-penberg@scylladb.com>	2019-08-11 10:26:36 +03:00
Botond Dénes	fd925f6049	position_in_partition_view: add constructor with bound_weight This is a low level constructor which allows directly providing a bound weight to go with the key.	2019-08-09 10:54:27 +03:00
Pekka Enberg	547c072f93	dbuild: Make Maven local repository accessible The Maven build tool ("mvn"), which is used by scylla-jmx and scylla-tools-java, stores dependencies in a local repository stored at $HOME/.m2. Make sure it's accessible to dbuild. Message-Id: <20190808140216.26141-1-penberg@scylladb.com>	2019-08-08 17:36:13 +03:00
Avi Kivity	8f19b16fe4	Update seastar submodule * seastar ed608e3c9e...fe2b5b0c6b (2): > Merge "handle discarded futures or suppress warning" from Benny > output_stream: Add close() blurb	2019-08-08 16:22:38 +03:00
Avi Kivity	4a5ec61438	Update seastar submodule * seastar a1cf07858b...ed608e3c9e (4): > core: Add ability to abort on EBADF and ENOTSOCK > Revert "Merge "handle discarded futures or suppress warning" from Benny" > Merge "handle discarded futures or suppress warning" from Benny > reactor: remove replace variadic future<pollable_fd, socket_address> with future<tuple>	2019-08-08 14:22:29 +03:00
Raphael S. Carvalho	76cde84540	sstables/compaction_manager: Fix logic for filtering out partial sstable runs ignore_partial_runs() brings confusion because i__p__r() equal to true doesn't mean filter out partial runs from compaction. It actually means not caring about compaction of a partial run. The logic was wrong because any compaction strategy that chooses not to ignore partial sstable run[1] would have any fragment composing it incorrectly becoming a candidate for compaction. This problem could make compaction include only a subset of fragments composing the partial run or even make the same fragment be compacted twice due to parallel compaction. [1]: partial sstable run is a sstable that is still being generated by compaction and as a result cannot be selected as candidate whatsoever. Fix is about making sure partial sstable run has none of its fragments selected for compaction. And also renaming i__p__r. Fixes #4729. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190807022814.12567-1-raphaelsc@scylladb.com>	2019-08-08 14:11:35 +03:00
Pekka Enberg	7d4bf10d87	docs/building-packages.md: Document how to build Scylla packages This documents the steps needed to build Scylla's Linux packages with the relocatable package infrastructure we use today. Message-Id: <20190807134017.4275-1-penberg@scylladb.com>	2019-08-08 14:11:35 +03:00
Pekka Enberg	79cece9f33	toolchain: Fix default command for dbuild Docker image Running "dbuild" without a build command fails as follows: $ ./tools/toolchain/dbuild Error: This command has to be run under the root user. Israel Fruchter discovered that the default command of our Docker image is this: "Cmd": [ "bash", "-c", "dnf -y install python3-cassandra-driver && dnf clean all" ] Let's make "/bin/bash" the default command instead, which will make "dbuild" with no build command to return to the host shell. Message-Id: <20190807133955.4202-1-penberg@scylladb.com>	2019-08-08 14:11:35 +03:00
Pekka Enberg	76cdec222f	build_reloc.sh: Remove "--with" passed to "configure.py" The build_reloc.sh script passes "--with=scylla" and "--with=iotune" to the configure.py script. This is redundant as the "scylla-package.tar.gz" target of ninja already limits itself to them. Removing the "--with" options allows building unit tests after a relocatable package has been built without having to rebuild anything. Message-Id: <20190807130505.30089-1-penberg@scylladb.com>	2019-08-07 16:28:00 +03:00
Avi Kivity	e548bdb2e8	thrift, transport: switch to new seastar accept() API (#4814 ) Seastar switched accept() to return a single struct instead of a variadic future, adjust the code to the new API to avoid deprecation warnings.	2019-08-07 15:23:26 +02:00
Pekka Enberg	f68fffd99a	reloc/build_reloc.sh: Make build mode configurable Add a '--mode <mode>' command line option to the 'build_reloc.sh' script so that we can create relocatable packages for debug builds. The '--mode' command line option defaults to 'release' so existing users are unaffected. Message-Id: <20190807120759.32634-1-penberg@scylladb.com>	2019-08-07 16:19:37 +03:00
Asias He	fee26b9f6e	repair: Fix use after free in do_estimate_partitions_on_local_shard (#4813 ) We need to keep the sstables object alive during the operation of do_for_each. Notes: No need to backport to 3.1. Fixes #4811	2019-08-07 15:19:21 +02:00
Asias He	49a73aa2fc	streaming: Move stream_mutation_fragments_cmd to a new file (#4812 ) Avoid including the lengthy stream_session.hh in messaging_service. More importantly, fix the build because currently messaging_service.cc and messaging_service.hh does not include stream_mutation_fragments_cmd. I am not sure why it builds on my machine. Spotted this when backporting the "streaming: Send error code from the sender to receiver" to 3.0 branch. Refs: #4789	2019-08-07 14:59:46 +02:00
Asias He	288371ce75	streaming: Do not call rpc stream flush in send_mutation_fragments The stream close() guarantees the data sent will be flushed. No need to call the stream flush() since the stream is not reused. Follow up fix for commit `bac987e32a` (streaming: Send error code from the sender to receiver). Refs #4789	2019-08-07 14:31:17 +02:00
Avi Kivity	689fc72bab	Update seastar submodule * seastar d199d27681...a1cf07858b (1): > Merge 'Do not return a variadic future form server_socket::accept()' from Avi Seastar configure.py now has --api-level=1, to keep us one the old variadic future server_socket::accept() API.	2019-08-06 18:37:27 +03:00
Avi Kivity	97f66c72af	Update seastar submodule * seastar d90834443c...d199d27681 (3): > sharded: support for non-cooperative service types > shared_future: silence warning about discarded future > Fix backtrace suppression message in cpu_stall_detector. Fixes #4560.	2019-08-06 18:00:48 +03:00
Asias He	bac987e32a	streaming: Send error code from the sender to receiver In case of error on the sender side, the sender does not propagate the error to the receiver. The sender will close the stream. As a result, the receiver will get nullopt from the source in get_next_mutation_fragment and pass mutation_fragment_opt with no value to the generating_reader. In turn, the generating_reader generates end of stream. However, the last element that the generating_reader has generated can be any type of mutation_fragment. This makes the sstable that consumes the generating_reader violates the mutation_fragment stream rule. To fix, we need to propagate the error. However RPC streaming does not support propagate the error in the framework. User has to send an error code explicitly. Fixes: #4789	2019-08-06 16:54:56 +02:00
Piotr Jastrzebski	24f6d90a45	sstables: add test of sstables_mutation_reader for missing partition_end Reproduces #4783 Issue was fixed by `9b8ac5ecbc` Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-08-06 15:11:19 +03:00
Calle Wilund	6c62e5741e	init: Use the "prefer_ipv6" options available for rpc/listen address/interface Fixes #4751 Adds using a preferred address family to dns name lookups related to listen address and rpc address, adhering to the respective "prefer" options. API, prometheus and broadcast address are all considered to be covered by the "listen_interface_prefer_ipv6" option. Note: scylla does not yet support actual interface binding, but these options should apply equally to address name parameters. Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.	2019-08-06 08:32:10 +00:00
Calle Wilund	6c0c1309b3	inet_address: Add optional "preferred type" to lookup Allows using prio in address family dns lookup. I.e. prefer ipv4/ipv6 if avail.	2019-08-06 08:32:10 +00:00
Calle Wilund	d3410f0e48	config: Add rpc_interface_prefer_ipv6 parameter As already existing in scylla.yaml	2019-08-06 08:32:10 +00:00
Calle Wilund	0028cecb8e	config: Add listen_interface_perfer_ipv6 parameter As already existing in scylla.yaml. https://github.com/apache/cassandra/blob/cassandra-3.11/conf/cassandra.yaml#L622	2019-08-06 08:32:10 +00:00
Calle Wilund	39d18178eb	config.cc: Fix enable_ipv6_dns_lookup actual param name When adding option (and iterating through config refactoring) the member name and the config param name got out of sync	2019-08-06 08:32:09 +00:00
Calle Wilund	298da3fc4b	api/storage_service: Add "sstable_info" command Assembles information and attributes of sstables in one or more column families. v2: * Use (not really legal) nested "type" in json * Rename "table" param to "cf" for consistency * Some comments on data sizes * Stream result to avoid huge string allocations on final json	2019-08-06 08:14:15 +00:00
Calle Wilund	95a8ff12e7	sstables/compress: Make compressor pointer accessible from compression info	2019-08-06 07:07:44 +00:00
Calle Wilund	d15c63627c	sstables.hh: Add attribute description API to file extension	2019-08-06 07:07:44 +00:00
Calle Wilund	4c67d702c2	sstables.hh: Add compression component accessor	2019-08-06 07:07:44 +00:00
Calle Wilund	770f912221	sstables.hh: Make "has_component" public	2019-08-06 07:07:44 +00:00
Avi Kivity	b77c4e68c2	Merge "Add Zstandard compression #4802 " from Kamil " This adds the option to compress sstables using the Zstandard algorithm (https://facebook.github.io/zstd/). To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor' to the 'compression' argument when creating a table. You can also specify a 'compression_level' (default is 3). See Zstd documentation for the available compression levels. Resolves #2613. This PR also fixes a bug in sstables/compress.cc, where chunk length in bytes was passed to the compressor as chunk length in kilobytes. Fortunately, none of the compressors implemented until now used this parameter. Example usage (assuming there exists a keyspace 'a'): create table a.a (a text primary key, b int) with compression = {'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': 1, 'chunk_length_in_kb': '64'}; Notes: 1. The code uses an external dependency: https://github.com/facebook/zstd. Since I'm using "experimental" features of the library (using my own allocated memory to store the compression/decompression contexts), according to the library's documentation we need to link it statically (https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L63). I added a git submodule. 2. The compressor performs some dynamic allocations. Depending on the specified chunk length and/or compression level the allocations might be big and seastar throws warnings. But with reasonable chunk length sizes it should be OK. 3. It doesn't yet provide an option to train it with dictionaries, but that should be easy to add in another commit. " * 'zstd' of https://github.com/kbr-/scylla: Configure: rename seastar_pool to submodule_pool, add more submodules to the pool Add unit tests for Zstd compression Enable tests that use compressed sstable files Add ZStandard compression Fix the value of the chunk length parameter passed to compressors	2019-08-05 16:29:27 +03:00
Botond Dénes	23cc6d6fb2	make_flat_mutation_reader_from_fragments: reader: silence discarded future warning The fragment reader calls `fast_forward_to()` from its constructor to discard fragments that fall outside the query range. Mmove the the fast-forward code in to an internal void returning method, and call that from both the constructor and `fast_forward_to()`, to avoid a warning on a discarded future<>. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190801133942.10744-1-bdenes@scylladb.com>	2019-08-05 16:21:50 +03:00
Kamil Braun	3a0308f76f	Configure: rename seastar_pool to submodule_pool, add more submodules to the pool Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:55:56 +02:00
Kamil Braun	c3c7c06e10	Add unit tests for Zstd compression Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:55:56 +02:00
Kamil Braun	8b58cdab0a	Enable tests that use compressed sstable files The files in tests/sstables/3.x/compressed/ were not used in the tests. This commit: - renames the directory to tests/sstables/3.x/lz4/, - adds analogous directories and files for other compressors, - adds tests using these files, - does some minor refactoring. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:55:56 +02:00
Kamil Braun	f14e6e73bb	Add ZStandard compression This adds the option to compress sstables using the Zstandard algorithm (https://facebook.github.io/zstd/). To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor' to the 'compression' argument when creating a table. You can also specify a 'compression_level'. See Zstd documentation for the available compression levels. Resolves #2613. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:55:53 +02:00
Kamil Braun	7a61bcb021	Fix the value of the chunk length parameter passed to compressors This commit also fixes a bug in sstables/compress.cc, where chunk length in bytes was passed to the compressor as chunk length in kilobytes. Fortunately, none of the compressors implemented until now used this parameter. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:31:33 +02:00
Avi Kivity	95c0804731	Merge "Catch unclosed partition sstable write #4794 " from Tomasz " Not emitting partition_end for a partition is incorrect. SStable writer assumes that it is emitted. If it's not, the sstable will not be written correctly. The partition index entry for the last partition will be left partially written, which will result in errors during reads. Also, statistics and sstable key ranges will not include the last partition. It's better to catch this problem at the time of writing, and not generate bad sstables. Another way of handling this would be to implicitly generate a partition_end, but I don't think that we should do this. We cannot trust the mutation stream when invariants are violated, we don't know if this was really the last partition which was supposed to be written. So it's safer to fail the write. Enabled for both mc and la/ka. Passing --abort-on-internal-error on the command line will switch to aborting instead of throwing an exception. The reason we don't abort by default is that it may bring the whole cluster down and cause unavailability, while it may not be necessary to do so. It's safer to fail just the affected operation, e.g. repair. However, failing the operation with an exception leaves little information for debugging the root cause. So the idea is that the user would enable aborts on only one of the nodes in the cluster to get a core dump and not bring the whole cluster down. " * 'catch-unclosed-partition-sstable-write' of https://github.com/tgrabiec/scylla: sstables: writer: Validate that partition is closed when the input mutation stream ends config, exceptions: Add helper for handling internal errors utils: config_file: Introduce named_value::observe()	2019-08-04 15:18:31 +03:00
Asias He	3b39a59135	storage_service: Replicate and advertise tokens early in the boot up process When a node is restarted, there is a race between gossip starts (other nodes will mark this node up again and send requests) and the tokens are replicated to other shards. Here is an example: - n1, n2 - n2 is down, n1 think n2 is down - n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends reads/writes to n2, but n2 hasn't replicated the token_metadata to all the shards. - n2 complains: token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error (sorted_tokens is empty in first_token_index!) The code path looks like below: 0 stoarge_service::init_server 1 prepare_to_join() 2 add gossip application state of NET_VERSION, SCHEMA and so on. 3 _gossiper.start_gossiping().get() 4 join_token_ring() 5 _token_metadata.update_normal_tokens(tokens, get_broadcast_address()); 6 replicate_to_all_cores().get() 7 storage_service::set_gossip_tokens() which adds the gossip application state of TOKENS and STATUS The race talked above is at line 3 and line 6. To fix, we can replicate the token_metadata early after it is filled with the tokens read from system table before gossip starts. So that when other nodes think this restarting node is up, the tokens are already replicated to all the shards. In addition, this patch also fixes the issue that other nodes might see a node miss the TOKENS and STATUS application state in gossip if that node failed in the middle of a restarting process, i.e., it is killed after line 3 and before line 7. As a result we could not replace the node. Tests: update_cluster_layout_tests.py Fixes: #4709 Fixes: #4723	2019-08-04 15:18:31 +03:00
Avi Kivity	aebb9bd755	Merge "tests/mutation_source_test: pass query time to populate" from Botond " Altough `733c68cb1` made sure to synchronize the query time used for compaction happening in the mutation_source_test suite and that happening in the `flat_mutation_assertions` class, there remained another hidden compaction that potentially could use a different timestamp and hence produce false positive test failures. This was hastily fixed by `cea3338e3`, by just increasing the TTL of cells, thus avoiding possible differences in compaction output. This mini-series is the proper fix to this problem. It passes a query time to the populate function, allowing the users of the mutation source test suite to forward it to any compaction they might be doing on the data. The quick fix is reverted in favor of the proper fix. Refs: #4747 " * 'mutation_source_tests_proper_ttl_fix/v1' of https://github.com/denesb/scylla: Revert "tests/mutation_source_tests: generate_mutation_sets() use larger ttl" tests/sstable_mutation_test: test_sstable_conforms_to_mutation_source: use query_time tests/mutation_source_test: add populate_fn overload with query_time	2019-08-04 15:18:31 +03:00
Tomasz Grabiec	43c7144133	sstables: writer: Validate that partition is closed when the input mutation stream ends Not emitting partition_end for a partition is incorrect. Sstable writer assumes that it is emitted. If it's not, the sstable will not be written correctly. The partition index entry for the last partition will be left partially written, which will may result in errors during reads. Also, statistics and sstable key ranges will not include the last partition. It's better to catch this problem at the time of writing, and not generate bad sstables. Another way of handling this would be to implicitly generate a partition_end, but I don't think that we should do this. We cannot trust the mutation stream when invariants are violated, we don't know if this was really the last partition which was supposed to be written. So it's safer to fail the write. Enabled for both mc and la/ka.	2019-08-02 11:13:54 +02:00
Tomasz Grabiec	bf70ee3986	config, exceptions: Add helper for handling internal errors The handler is intended to be called when internal invariants are violated and the operation cannot safely continue. The handler either throws (default) or aborts, depending on configuration option. Passing --abort-on-internal-error on the command line will switch to aborting. The reason we don't abort by default is that it may bring the whole cluster down and cause unavailability, while it may not be necessary to do so. It's safer to fail just the affected operation, e.g. repair. However, failing the operation with an exception leaves little information for debugging the root cause. So the idea is that the user would enable aborts on only one of the nodes in the cluster to get a core dump and not bring the whole cluster down.	2019-08-02 11:13:54 +02:00
Tomasz Grabiec	61a9cfbfa9	utils: config_file: Introduce named_value::observe()	2019-08-02 11:13:53 +02:00
Avi Kivity	093d2cd7e5	reconcilable_result: use chunked_vector to hold partitions Usually, a reconcilable_result holds very few partitions (1 is common), since the page size is limited by 1MB. But if we have paging disabled or if we are reconciling a range full of tombstones, we may see many more. This can cause large allocations. Change to chunked_vector to prevent those large allocations, as they can be quite expensive. Fixes #4780.	2019-08-01 18:49:13 +03:00
Avi Kivity	eaa9a5b0d7	utils::chunked_vector: add rbegin() and related iterators Needed as an std::vector replacement.	2019-08-01 18:39:47 +03:00
Avi Kivity	df6faae980	utils: chunked_vector: make begin()/end() const correct begin() of a const vector should return a const_iterator, to avoid giving the caller the ability to mutate it. This slipped through since iterator's constructor does a const_cast. Noticed by code inspection.	2019-08-01 18:38:53 +03:00
Botond Dénes	0b748bb8fe	Revert "tests/mutation_source_tests: generate_mutation_sets() use larger ttl" This reverts commit `cea3338e38`. The above was a quick fix to allow the tests to pass, there is a proper fix now.	2019-08-01 13:05:46 +03:00
Botond Dénes	ac91f1f6b8	tests/sstable_mutation_test: test_sstable_conforms_to_mutation_source: use query_time Use the query_time passed in to the populate function and forward it to the sstable constructor, so that the compaction happening during sstable write uses the same query time that any compaction done by the mutation source test suit does.	2019-08-01 13:04:21 +03:00
Botond Dénes	ce1ed2cb70	tests/mutation_source_test: add populate_fn overload with query_time So tests that do compaction can pass the query_time they used for it to clients that do some compaction themselves, making sure all compactions happen with the same query time, avoiding false positive test failures.	2019-08-01 13:03:03 +03:00
Vlad Zolotarov	15eaf2fd8e	dist: scylla_util.py: get_mode_cpuset(): don't let false alarm error messages Don't let perftune.py print false alarm error message when we calculate a compute CPU set for tuning modes. This may happen when we calculate a CPU set for non-MQ tuning modes on small systems on which these modes are forbidden because they would result in a zero CPU set, e.g. sq_split on a system with a single physical core. We are going to utilize a newly introduced --get-cpu-mask-quiet execution mode introduced to the seastar/script/perftune.py by the "perftune.py: introduce --get-cpu-mask-quiet" series which would return a zero CPU set if that's what it turns out to be instead of exiting with an error what --get-cpu-mask would do in such a case. The rest of scylla_util.py logic is going to handle a zero CPU set returned by get_mode_cpuset() correctly. Fixes #4211 Fixes #4443 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190731212901.9510-1-vladz@scylladb.com>	2019-08-01 11:14:39 +03:00
Botond Dénes	339be3853d	foreign_reader: silence warning about discarded future And add a comment explaining why this is fine. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190801062234.69081-1-bdenes@scylladb.com>	2019-08-01 10:11:24 +03:00
Avi Kivity	47b0f40d27	Merge "introduce metrics for non-local queries" from Konstantin " A fix for #4338 "storage_proxy add a counter for cql requests that arrived to a non replica" Such requests should be tracked since forwarding them to a correct replica can create a lot network noise and incur significant performance penalty. The current metrics are considered insufficient after introduction of heat-weighted load balancing. " Fixes #4388. * 'gh-4338' of https://github.com/kostja/scylla: metrics: introduce a metric for non-local reads metrics: account writes forwarded by a coordinator in an own metric.	2019-08-01 10:09:33 +03:00
Avi Kivity	77686ab889	Merge "Make SSTable cleanup run aware" from Raphael " Fixes #4663. Fixes #4718. " * 'make_cleanup_run_aware_v3' of https://github.com/raphaelsc/scylla: tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id table: Make SSTable cleanup run aware compaction: introduce constants for compaction descriptor compaction: Make it possible to config the identifier of the output sstable run table: do not rely on undefined behavior in cleanup_sstables	2019-07-31 19:10:22 +03:00
Botond Dénes	a41e8f0bcf	query::consume_page: move away from variadic future Require the `consumer` to return 0 or 1 value in its future. Update all downstream code. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190731140440.57295-1-bdenes@scylladb.com>	2019-07-31 18:49:47 +03:00
Avi Kivity	320fd2be60	Update seastar submodule * seastar 3f88e9068b...d90834443c (12): > Print warning when somaxconn lower than backlog parameter used for listen() > Merge "perftune.py: introduce --get-cpu-mask-quiet" from Vlad > seastar-json2code: Handle "$ref"-usage for nested object types properly > Make future [[nodiscard]] > Allow pass listen_options to http_server::listen > Handle EPOLLHUP and EPOLLERR from epoll explicitly > reactor: fix false positives in the stall detector due to large task queue > Merge "Small asan related improvements" from Rafael > thread: reduce allocations during context switch > thread: remove deprecated thread_scheduling_group and its unit test > reactor: make _polls to be non atomic > reactor: remove unused _tasks_processed variable	2019-07-31 18:30:10 +03:00
Takuya ASADA	60ec8b2a04	install.sh: install everything when --pkg is not specified On previous commit `ac9b115a8f`, install.sh requires to specify single package using --pkg, there is no way to select all. It should be select all packages when running install.sh without --pkg. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190731013245.5857-1-syuu@scylladb.com>	2019-07-31 16:43:57 +03:00
Asias He	5d3e4d7b73	messaging_service: Check if messaging_service is stopped before get_rpc_client get_rpc_client assumes the messaging_service is not stopped. We should check is_stopping() before we call get_rpc_client. We do such check in existing code, e.g., send_message and friends. Do the same check in the newly introduced make_sink_and_source_for_stream_mutation_fragments() and friends for row level repair. Fixes: #4767	2019-07-31 11:44:57 +03:00
Avi Kivity	74349bdf7e	Merge "Partially devirtualize CQL restrictions" from Piotr " This series is a batch of first small steps towards devirtualizing CQL restrictions: - one artificial parent class in the hierarchy is removed: abstract_restriction - the following functions are devirtualized: * is_EQ() * is_IN() * is_slice() * is_contains() * is_LIKE() * is_on_token() * is_multi_column() Future steps can involve the following: - introducing a std::variant of restriction targets: it's either a column or a vector of columns - introducing a std::variant of restriction values: it's one of: {term, term_slice, std::vector<term>, abstract_marker} The steps above will allow devirtualizing most of the remaining virtual functions in favor of std::visit. They will also reduce the number of subclasses, e.g. what's currently `token_restriction::IN_with_values` can be just an instance of `restriction`, knowing that it's on a token, having a target of std::vector<column> and a value of std::vector<term>. Tests: unit(dev), dtest: cql_tests, cql_additional_tests " * 'refactor_restrictions_2' of https://github.com/psarna/scylla: cql3: devirtualize is_on_token() cql3: devirtualize is_multi_column() cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE tests: add enum_set adding case cql3: allow adding enum_sets cql3: remove abstract_restriction class	2019-07-31 11:44:57 +03:00
Vlad Zolotarov	9df53b8bca	configure.py: ignore 'thrift -version' exit code (At least) on Ubuntu 19 'thrift -version' prints the expected string but its exit status is non-zero: $ thrift -version Thrift version 0.9.1 $ echo $? 1 We don't really care about the exit status but rather about the printed version string. If there is going to be some problem with the command, e.g. it's missing, the printed string is not going to be as expected anyway - let's verify that explicitly by checking the format of the returned string in that case. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190722211729.24225-1-vladz@scylladb.com>	2019-07-31 11:44:57 +03:00
Botond Dénes	cea3338e38	tests/mutation_source_tests: generate_mutation_sets() use larger ttl Currently all cells generated by this method uses a ttl of 1. This causes test flakyness as tests often compact the input and output mutations to weed out artificial differences between them. If this compaction is not done with the exact same query time, then some cells will be expired in one compaction but not in the other. `733c68cb1` attempted to solve this by passing the same query time to `flat_mutation_reader_assertions::produce_compacted()` as well as `mutation_partition::compact_for_query()` when compacting the input mutation. However a hidden compaction spot remained: the ka/la sstable writer also does some compaction, and for this it uses the time point passed to the `sstable` constructor, which defaults to `gc_clock::now()`. This leads to false positive failures in `sstable_mutation_test.cc`. At this point I don't know what the original intent was behind this low `ttl` value. To solve the immediate problem of the tests failing, I increased it. If it turns out that this `ttl` value has a good reason, we can do a more involved fix, of making sure all sstables written also get the same query time as that used for the compaction. Fixes: #4747 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190731081522.22915-1-bdenes@scylladb.com>	2019-07-31 11:44:57 +03:00
Piotr Sarna	2f65144a20	cql3: devirtualize is_on_token() Instead of being a virtual function, is_on_token leverages the existing enum inside the `restriction` class.	2019-07-29 17:18:50 +02:00
Piotr Sarna	68aa42c545	cql3: devirtualize is_multi_column() Instead of being a virtual function, is_multi_column leverages an enum.	2019-07-29 17:18:50 +02:00
Piotr Sarna	83fbfe5a4f	cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE Instead of virtual functions, operation for each restriction is determined by an enum value it stores.	2019-07-29 17:18:49 +02:00
Piotr Sarna	e9798354ae	tests: add enum_set adding case	2019-07-29 17:15:51 +02:00
Piotr Sarna	989c31f68b	cql3: allow adding enum_sets Enum set can now be added to another enum set in order to create a sum of both.	2019-07-29 17:15:51 +02:00
Piotr Sarna	5e06801f12	cql3: remove abstract_restriction class All restrictions inherit from `abstract_restriction` class, which has only one parent class: `restriction`. To simplify the inheritance tree, `restriction` and `abstract_restriction` are merged into one class named `restriction`.	2019-07-29 15:54:39 +02:00
Botond Dénes	733c68cb13	tests: flat_reader_assertions::produces_compacted(): add query_time param `produces_compacted()` is usually used in tandem of another compaction done on the expected output (`m` param). This is usually done so that even though the reader works with an uncompacted stream, when checking the checking of the result will not fail due to insignificant changes to the data, e.g. expired collection cells dropped while merging two collections. Currently, the two compactions, the one inside `produce_compacted()` and the one done by the caller uses two separate calls to `gc_clock::now()` to obtain the query time. This can lead to off-by-one errors in the two query times and subsequently artificial differences between the two compacted mutations, ultimately failing the test due to a false-positive. To prevent this allow callers to pass in a query time, the same they used to compact the input mutation (`m`). This solves another source of flakyness in unit tests using the mutation source test suite. Refs: #4695 Fixes: #4747 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190726144032.3411-1-bdenes@scylladb.com>	2019-07-28 10:59:50 +03:00
Botond Dénes	f215286525	tests/mutation_reader_tests: move away from variadic futures Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190724101005.19126-1-bdenes@scylladb.com>	2019-07-27 13:21:24 +03:00
Botond Dénes	0f30bc0004	mutation_reader: move away from variadic futures Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190724102246.20450-1-bdenes@scylladb.com>	2019-07-27 13:21:24 +03:00
Botond Dénes	6742c77229	scylla-gdb.py: fix scylla_ptr Broken since `b3adabda2`. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190726140532.124406-1-bdenes@scylladb.com>	2019-07-27 13:21:24 +03:00
Avi Kivity	b272db368f	sstable: index_reader: close index_reader::reader more robustly If we had an error while reading, then we would have failed to close the reader, which in turn can cause memory corruption. Make the closing more robust by using then_wrapped (that doesn't skip on exception) and log the error for analysis. Fixes #4761.	2019-07-26 14:26:04 +02:00
Avi Kivity	fcf3195e54	Update seastar submodule * seastar c1be3c912f...3f88e9068b (3): > reactor: improve handling of connect storms > json: Make date formatter use RFC8601/RFC3339 format > reactor: fix deadlock of stall detector vs dlopen Fixes #4759.	2019-07-25 18:29:54 +03:00
Takuya ASADA	ac9b115a8f	dist/debian: use install.sh on Debian Currently, install.sh just used for building .rpm, we have similar build script under dist/debian, sometimes it become inconsistent with install.sh. Since most of package build process are same, we should share install.sh on both .rpm and .deb package build process. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190725123207.2326-1-syuu@scylladb.com>	2019-07-25 18:22:42 +03:00
Botond Dénes	6dd8c4da83	test_multishard_combining_reader_non_strictly_monotonic_positions: use the same deletion_time for tombstones Across all calls to `make_fragments_with_non_monotonic_positions()`, to prevent off-by one errors between the separately generated test input and expected output. This problem was already supposed to be fixed by `5f22771ea8` but for some reason that only used the same deletion time inside a single call, which will still fall short in some cases. This should hopefully fix this problem for good. Refs: #4695 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190724073240.125975-1-bdenes@scylladb.com>	2019-07-25 12:37:34 +02:00
Kamil Braun	148d4649d6	Add option to create a XUnit output file for non-boost tests in test.py. (#4757 ) If the user specifies an output file name using "--xunit=<filename>", test.py will write the test results of non-boost tests to the file in the XUnit XML format. Every boost test creates its own results file already. Resolves #4680. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-25 12:47:47 +03:00
Vlad Zolotarov	53cf90b075	ec2_snitch: properly build the AWS meta server address Explicity pass the port number of the AWS metadata server API when creating a corresponding socket. This patch fixes the regression introduced by `4ef940169f`. Fixes #4719 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-07-25 10:50:01 +03:00
Tomasz Grabiec	3af8431a40	Merge "compaction: allow collecting purged data" from Botond compaction: allow collecting purged data Allow the compaction initiator to pass an additional consumer that will consume any data that is purged during the compaction process. This allows the separate retention of these dead cells and tombstone until some long-running process like compaction safely finishes. If the process fails or is interrupted the purged data can be used to prevent data resurrection. This patch was developed to serve as the basis for a solution to #4531 but it is not a complete solution in and on itself. This series is a continuation of the patch: "[PATCH v1 1/3] Introduce Garbage Collected Consumer to Mutation Compactor" by Raphael S. Carvalho <raphaelsc@scylladb.com>. Refs: #4531 * https://github.com/denesb/scylla.git compaction_collect_purged_data/v8: Introduce compaction_garbage_collector interface collection_type_impl::mutation: compact_and_expire() add collector parameter row: add garbage_collector row_marker: de-inline compact_and_expire() row_marker: add garbage_collector Introduce Garbage Collected Consumer to Mutation Compactor tests: mutation_writer_test.cc/generate_mutations() -> random_schema.hh/generate_random_mutations() tests/random_schema: generate_random_mutations(): remove `engine` parameter tests/random_schema: add assert to make_clustering_key() tests/random_schema: generate_random_mutations(): allow customizing generated data tests: random_schema: futurize generate_random_mutations() random_schema: generate_random_mutations(): restore indentation data_model: extend ttl and expiry support tests/random_schema: generate_random_mutations(): generate partition tombstone random_schema: add ttl and expiry support tests/random: add get_bool() overload with random engine param random_schema: generate_random_mutations(): ensure partitions are unique tests: add unit tests for the data stream split in compaction	2019-07-23 17:12:28 +02:00
Avi Kivity	44b5878011	Merge "Fix possible stalls in row level repair" from Asias " After switching to rpc stream interface, we increased the row buffer size. Code works on the buffer that do not yield can stall the reactor. This series fixes the issue by futurizing or running the code in thread and yield. Fixes: #4642 " * 'repair_switch_to_rpc_stream_fix_stall' of https://github.com/asias/scylla: repair: Enable rpc stream in row level repair repair: Wrap with foreign_ptr to avoid cross cpu free repair: Futurize get_repair_rows_size and row_buf_size repair: Avoid calling get_repair_rows_size in get_sync_boundary repair: Futurize row_buf_csum repair: Yield inside get_set_diff repair: Use get_repair_rows_size helper in get_sync_boundary repair: Avoid stall in do_estimate_partitions_on_local_shard remove get_row_diff repair: Futurize get_row_diff to avoid stall repair: Fix possible stall in request_row_hashes repair: Allow default construct for repair_row repair: Remove apply_rows repair: Run get_row_diff_with_rpc_stream in a thread repair: Run get_row_diff_and_update_peer_row_hash_sets inside a thread repair: Run get_row_diff inside a thread repair: Add apply_rows_on_master_in_thread repair: Add apply_rows_on_follower repair: Futurize working_row_hashes repair: Remove get_full_row_hashes helper	2019-07-22 15:54:06 +03:00
Avi Kivity	9e630eb734	Update seastar submodule * seastar 44a300cd50...c1be3c912f (9): > execution_stage: prevent unbounded growth > io queues: Add renaming functionality to io priority class > scheduling: Add rename functionality to scheduling groups > net: Add listen_backlog option for posix stack > future: deprecate variadic futures > include,tests: add workaround for missing guaranteed copy elision > core/dpdk_rte: handle 64+ cores > perftune: add a dry-run mode > build: support building dpdk on arm64 Fixes #4749.	2019-07-22 15:41:54 +03:00
Avi Kivity	e03c7003f1	toppartitions: fix race between listener removal and reads Data listener reads are implemented as flat_mutation_readers, which take a reference to the listener and then execute asynchronously. The listener can be removed between the time when the reference is taken and actual execution, resulting in a dangling pointer dereference. Fix by using a weak_ptr to avoid writing to a destroyed object. Note that writes don't need protection because they execute atomically. Fixes #4661. Tests: unit (dev)	2019-07-22 13:26:18 +02:00
Avi Kivity	d730969278	Merge "make sure failure to create snapshots won't crash the node" from Glauber " Issue #4558 describes a situation in which failure to execute clearsnapshots will hard crash the node. The problem is that clearsnapshots will internally use lister::rmdir, which in turn has two in-tree users: clearing snapshots and clearing temporary directories during sstable creation. The way it is currently coded, it wraps the io functions in io_check, which means that failures to remove the directory will crash the database. We recently saw how benign failures crashed a database during clearsnapshot: we had snapshot creation running in parallel, adding more files to the directory that wasn't empty by the time of deletion. I have also seen very often users add files to existing directories by accident, which is another possibility to trigger that. This patch removes the io_check from lister, and moves it to the caller in which we want to be more strict. We still want to be strict about the creation of temporary directories, since users shouldn't be touching that in any way. Also while working on that, I realized we have no tests for snapshots of any kind in tree, so let's write some " * 'snapshots' of https://github.com/glommer/scylla: tests: add tests for snapshots. lister: don't crash the node on failure to remove snapshot	2019-07-22 11:09:23 +03:00
Rafael Ávila de Espíndola	636e2470b1	Always close commitlog files We were using segment::_closed to decide whether _file was already closed. Unfortunately they are not exactly the same thing. As far as I understand it, segments can be closed and reused without actually closing the file. Found with a seastar patch that asserts on destroying an open append_challenged_posix_file_impl. Fixes #4745. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190721171332.7995-1-espindola@scylladb.com>	2019-07-22 10:08:57 +03:00
Vlad Zolotarov	5632c0776e	tests: fix the compilation with fmt v5.3.0 Compilation fails with fmt release 5.3.0 when we print a bytes_view using "{}" formatter. Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument" Fix this by explicitly using to_hex() converter. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190716221231.22605-3-vladz@scylladb.com>	2019-07-21 16:42:54 +03:00
Nadav Har'El	db8d4a0cc6	Add computed columns Merged patch series by Piotr Sarna: This series introduces the concept of "computed" column, which represents values not provided directly by the user, but computed on the fly - possibly using other column values. It will be used in the future to implement map value indexing, collection indexing, etc. Right now the only use is the token column for secondary indexes - which is a column computed from the base partition key value. After this series, another one that depends on it and adds map value indexing will be pushed. Tests: unit(dev) Piotr Sarna (14): schema: add computed info to column definition schema: add implementation of computing token column schema: allow marking columns as computed in schema builder service: add computed columns feature view: check for computed columns in view view: remove unused token_for function database: add fixing previous secondary index schemas tests: disable computed columns feature in schema change test tests: add schema change test regeneration comment db: add system_schema.computed_columns docs: init system_schema_keyspace.md with column computations tests: generate new test case for schema change + computed cols index: mark token column as 'computed' when creating mv tests: add checking computed columns in SI column_computation.hh \| 63 ++++++++ db/schema_features.hh \| 4 +- db/schema_tables.hh \| 4 + idl/frozen_schema.idl.hh \| 1 + schema.hh \| 40 +++++ schema_builder.hh \| 4 +- schema_mutations.hh \| 18 ++- service/storage_service.hh \| 8 + view_info.hh \| 2 - database.cc \| 6 +- db/schema_tables.cc \| 146 ++++++++++++++++-- db/view/view.cc \| 46 +++--- index/secondary_index_manager.cc \| 2 +- schema.cc \| 58 ++++++- schema_mutations.cc \| 14 +- service/storage_service.cc \| 5 + tests/schema_change_test.cc \| 63 ++++++-- tests/secondary_index_test.cc \| 28 ++++ docs/system_schema_keyspace.md \| 40 +++++ plus about 200 new test sstable files	2019-07-21 13:05:46 +03:00
Piotr Sarna	4d1eaf8478	tests: add checking computed columns in SI The test case checks if token column generated for global indexing is indeed only present in global indexes and is marked as a computed column.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a8f7d64a08	index: mark token column as 'computed' when creating mv Secondary indexes use a computed token column to preserve proper query ordering. This column is now marked as 'computed'.	2019-07-19 11:58:42 +02:00
Piotr Sarna	1c0ef5f9e9	tests: generate new test case for schema change + computed cols The original "test_schema_digest_does_not_change" test case ensures that schema digests will match for older nodes that do not support all the features yet (including computed columns). The additional case uses sstables generated after computed columns are allowed, in order to make sure that the digest computed including computed columns does not change spuriously as well.	2019-07-19 11:58:42 +02:00
Piotr Sarna	1e54752167	docs: init system_schema_keyspace.md with column computations The documentation file for system_schema keyspace is introduced, and its first entry describes the column_computation table.	2019-07-19 11:58:42 +02:00
Piotr Sarna	c1d5aef735	db: add system_schema.computed_columns Information on which columns of a table are 'computed' is now kept in system_schema.computed_columns system table.	2019-07-19 11:58:42 +02:00
Piotr Sarna	589200f5a2	tests: add schema change test regeneration comment Schema change test might need regenerating every time a system table is added. In order to save future developer's time on debugging this test, a short description of that requirement is added.	2019-07-19 11:58:42 +02:00
Piotr Sarna	03ade01db7	tests: disable computed columns feature in schema change test In order to make sure that old schema digest is not recomputed and can be verified - computed columns feature is initially disabled in schema_change_test. The reason for that is as follows: running CQL test env assumes that we are running the newest cluster with all features enabled. However, the mere existence of some features might influence digest calculation. So, in order for the existing test to work correctly, it should have exactly the same set of cluster supported features as it had during its creation. It used to be "all features", but now it's "all features except computed columns". One can think of that as running a cluster with some nodes not yet knowing what computed columns are, so they are not taken into account when computing digests. Additionally, a separate test case that takes computed column digest into account will be generated and added in this series.	2019-07-19 11:58:42 +02:00
Piotr Sarna	17c323c096	database: add fixing previous secondary index schemas If a schema was created before computed columns were implemented, its token column may not have been marked as computed. To remedy this, if no computed column is found, the schema will be recreated. The code will work correctly even without this patch in order to support upgrading from legacy versions, but it's still important: it transforms token columns from the legacy format to new computed format, which will eventually (after a few release cycles) allow dropping the support for legacy format altogether.	2019-07-19 11:58:42 +02:00
Piotr Sarna	3c5dd94306	view: remove unused token_for function The function was only used once in code removed in this series.	2019-07-19 11:58:42 +02:00
Piotr Sarna	6a6871aa0e	view: check for computed columns in view Currently, having a 'computed' column in view update generation indicates that token value needs to be generated and assigned to it.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a0e02df36a	service: add computed columns feature Computed columns feature should be checked before creating index schemas the new way - by adding computed column names to system_schema.computed_columns.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a1100e3737	schema: allow marking columns as computed in schema builder In order to be able to transform legacy materialized view definitions, builder is now able to mark an existing column as computed.	2019-07-19 11:58:41 +02:00
Piotr Sarna	65bf6d34fe	schema: add implementation of computing token column Computed column of 'token' type can now have its value computed.	2019-07-19 11:47:48 +02:00
Piotr Sarna	491b7a817f	schema: add computed info to column definition Some columns may represent not user-provided values, but ones computed from other columns. Currently an example is token column used in secondary indexes to provide proper ordering. In order to avoid hardcoding special cases in execution stage, optional additional information for computed columns is stored in column definition.	2019-07-19 11:47:46 +02:00
Tomasz Grabiec	7604980d63	database: Add missing partition slicing on streaming reader recreation streaming_reader_lifecycle_policy::create_reader() was ignoring the partition_slice passed to it and always creating the reader for the full slice. That's wrong because create_reader() is called when recreating a reader after it's evicted. If the reader stopped in the middle of partition we need to start from that point. Otherwise, fragments in the mutation stream will appear duplicated or out of ordre, violating assumptions of the consumers. This was observed to result in repair writing incorrect sstables with duplicated clustering rows, which results in malformed_sstable_exception on read from those sstables. Fixes #4659. In v2: - Added an overload without partition_slice to avoid changing existing users which never slice Tests: - unit (dev) - manual (3 node ccm + repair) Backport: 3.1 Reviewd-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>	2019-07-18 18:35:28 +03:00
Asias He	64a4c0ede2	streaming: Do not open rpc stream connection if ranges are not relevant to a shard Given a list of ranges to stream, stream_transfer_task will create an reader with the ranges and create a rpc stream connection on all the shards. When user provides ranges to repair with -st -et options, e.g., using scylla-manger, such ranges can belong to only one shard, repair will pass such ranges to streaming. As a result, only one shard will have data to send while the rpc stream connections are created on all the shards, which can cause the kernel run out of ports in some systems. To mitigate the problem, do not open the connection if the ranges do not belong to the shard at all. Refs: #4708	2019-07-18 18:31:21 +03:00
Avi Kivity	51cff8ad23	Merge "Fix storage service for tests" from Botond " Fix another source of flakyness in mutation_reader_test. This one is caused by storage_service_for_tests lacking a config::broadcast_to_all_shards() call, triggering an invalid memory access (or SEGFAULT) when run on more than one shards. Refs: #4695 " * 'fix_storage_service_for_tests' of https://github.com/denesb/scylla: tests: storage_service_for_tests: broadcast config to all shards tests: move storage_service_for_tests impl to test_services.cc	2019-07-18 18:27:47 +03:00
Nadav Har'El	997b92a666	migration_manager: allow dropping table and all its views The function announce_column_family_drop() drops (deletes) a base table and all the materialized-views used for its secondary indexes, but not other materialized views - if there are any, the operation refuses to continue. This is exactly what CQL's "DROP TABLE" needs, because it is not allowed to drop a table before manually dropping its views. But there is no inherent reason why it we can't support an operation to delete a table and all its views - not just those related to indexes. This patch adds such an option to announce_column_family_drop(). This option is not used by the existing CQL layer, but can be used by other code automating operations programatically without CQL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190716150559.11806-1-nyh@scylladb.com>	2019-07-18 13:26:25 +02:00
Takuya ASADA	bd7d1b2d38	dist/common/systemd: change stop timeout sec to 900s Currently scylla-server.service uses DefaultTimeoutStopSec = 90, if Scylla does not able to clean-shutdown in 90sec we may have data corruption on the node. Since we already set TimeoutStartSec = 900, we can use TimeoutSec to set both TimeoutStartSec and TimeoutStopSec to 900. See #4700 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190717095416.10652-1-syuu@scylladb.com>	2019-07-17 15:37:47 +03:00
Nadav Har'El	759752947b	drop_index_statement: fix column_family() All statement objects which derive from cf_statement, including drop_index_statement, have a column_family() returning the name of the column family involved in this statement. For most statement this is known at the time of construction, because it is part of the statement, but for "DROP INDEX", the user doesn't specify the table's name - just the index name. So we need to override column_family() to find the table name. The existing implementation assert()ed that we can always find such a table, but this is not true - for example, in a DROP INDEX with "IF EXISTS", it is perfectly fine for no such table to exist. In this case we don't want a crash, and not even an except - it's fine that we just return an empty table name. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190716180104.15985-1-nyh@scylladb.com>	2019-07-17 09:44:47 +03:00
Glauber Costa	be26cbd952	tests: add tests for snapshots. While inspecting the snapshot code, I realized that we don't have any tests for it. So I decided to add some. Unfortunately I couldn't come up with a test of clearsnapshot reliably failing to remove the directory: relying on create snapshot + clearsnapshot is racy (not always happen), and other tricks that can be used to reproduce this -- like creating a root-owned file inside the snapshots directory -- is environment-dependent, and a bit ugly for unit tests. Dtests would probably be a better place for that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-07-16 13:35:53 -04:00
Glauber Costa	2008d982c3	lister: don't crash the node on failure to remove snapshot lister::rmdir has two in-tree users: clearing snapshots and clearing temporary directories during sstable creation. The way it is currently coded, it wraps the io functions in io_check, which means that failures to remove the directory will crash the database. We recently saw how benign failures crashed a database during clearsnapshot: we had snapshot creation running in parallel, adding more files to the directory that wasn't empty by the time of deletion. I have also seen very often users add files to existing directories by accident, which is another possibility to trigger that. This patch removes the io_check from lister, and moves it to the caller in which we want to be more strict. We still want to be strict about the creation of temporary directories, since users shouldn't be touching that in any way. Fixes #4558 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-07-16 13:35:36 -04:00
Kamil Braun	4417e78125	Fix timestamp_type_impl::timestamp_from_string. Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00. Fixes #4641. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-16 19:16:56 +03:00
Asias He	722ab3bb65	repair: Log repair id in check_failed_ranges Add the word `id` before the repair id in the log. It makes the log easier to figure out what the number stands for.	2019-07-16 19:10:19 +03:00
Avi Kivity	43690ecbdf	Merge "Fix disable_sstable_write synchronization with on_compaction_completion" from Benny " disable_sstable_write needs to acquire _sstable_deletion_sem to properly synchronize with background deletions done by on_compaction_completion to ensure no sstables will be created or deleted during reshuffle_sstables after storage_service::load_new_sstables disables sstable writes. Fixes #4622 Test: unit(dev), nodetool_additional_test.py migration_test.py " * 'scylla-4622-fix-disable-sstable-write' of https://github.com/bhalevy/scylla: table: document _sstables_lock/_sstable_deletion_sem locking order table: disable_sstable_write: acquire _sstable_deletion_sem table: uninline enable_sstable_write table: reshuffle_sstables: add log message	2019-07-16 19:06:58 +03:00
Amnon Heiman	399d79fc6f	init: do not allow replace-address for seeds If a node is a seed node, it can not be started with replace-address-first-boot or the replace-address flag. The issue is that as a seed node it will generate new tokens instead of replacing the existing one the user expect it to replaec when supplying the flags. This patch will throw a bad_configuration_error exception in this case. Fixes #3889 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-16 18:53:19 +03:00
Calle Wilund	dbc3499fd1	server: Fix cql notification inet address serialization Fixes #4717 Bug in ipv6 support series caused inet_address serialization to include an additional "size" parameter in the address chunk. Message-Id: <20190716134254.20708-1-calle@scylladb.com>	2019-07-16 16:51:59 +03:00
Botond Dénes	b40cf1c43d	tests: storage_service_for_tests: broadcast config to all shards Due to recent changes to the config subsystem, configuration has to be broadcast to all shards if one wishes to use it on them. The `storage_service_for_tests` has a `sharded<gms::gossiper>` member, which reads config values on initialization on each shard, causing a crash as the configuration was initialized only on shard 0. Add a call to `config::broadcast_to_all_shards()` to ensure all shards have access to valid config values.	2019-07-16 10:37:17 +03:00
Botond Dénes	fc9f46d7c1	tests: move storage_service_for_tests impl to test_services.cc Let's make it easier to find.	2019-07-16 10:36:49 +03:00
Raphael S. Carvalho	7180731d43	tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:50 -03:00
Raphael S. Carvalho	332c2ff710	table: Make SSTable cleanup run aware The cleanup procedure will move any sstable out of its sstable run because sstables are cleaned up individually and they end up receiving a new run identifier, meaning a table may potentially end up with a new sstable run for each of the sstables cleaned. SStable cleanup needs to be run aware, so that the run structure is not messed up after the operation is done. Given that only one fragment or other, composing a sstable run, may need cleanup, it's better to keep them in their original sstable run. Fixes #4663. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:47 -03:00
Raphael S. Carvalho	8c97e0e43e	compaction: introduce constants for compaction descriptor Make it easier for users, and also avoid duplicating knowledge about descriptor defaults across the codebase. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:44 -03:00
Raphael S. Carvalho	a1db29e705	compaction: Make it possible to config the identifier of the output sstable run Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:38 -03:00
Raphael S. Carvalho	0e732ed1cf	table: do not rely on undefined behavior in cleanup_sstables It shouldn't rely on argument evaluation order, which is ub. Fixes #4718. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:22 -03:00
Paweł Dziepak	060e3f8ac2	mutation_partition: verify row::append_cell() precondition row::append_cell() has a precondition that the new cell column id needs to be larger than that of any other already existing cell. If this precondition is violated the row will end up in an invalid state. This patch adds assertion to make sure we fail early in such cases.	2019-07-15 23:25:06 +02:00
Botond Dénes	5f22771ea8	tests/mutation_reader_test stabilize test_multishard_combining_reader_non_strictly_monotonic_positions Currently the test_multishard_combining_reader_non_strictly_monotonic_positions is flaky. The test is somewhat unconventional, in that it doesn't use the same instance of data as the input to the test and as it's expected output, instead it invokes the method which generates this data (`make_fragments_with_non_monotonic_positions()`) twice, first to generate the input, and a secondly to generate the expected output. This means that the test is prone to any deviation in the data generated by said method. One such deviation, discovered recently, is that the method doesn't explicitly specify the deletion time of the generated range tombstones. This results in this deletion time sometimes differing between the test input and the expected output. Solve by explicitly passing the same deletion time to all created range tombstones. Refs: #4695	2019-07-15 23:24:16 +02:00
Tomasz Grabiec	14700c2ac4	Merge "Fix the system.size_estimates table" from Kamil Fixes a segfault when querying for an empty keyspace. Also, fixes an infinite loop on smp > 1. Queries to system.size_estimates table which are not single-partition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. Fixes #4689.	2019-07-15 22:09:30 +02:00
Asias He	8774adb9d0	repair: Avoid deadlock in remove_repair_meta Start n1, n2 Create ks with rf = 2 Run repair on n2 Stop n2 in the middle of repair n1 will notice n2 is DOWN, gossip handler will remove repair instance with n2 which calls remove_repair_meta(). Inside remove_repair_meta(), we have: ``` 1 return parallel_for_each(*repair_metas, [repair_metas] (auto& rm) { 2 return rm->stop(); 3 }).then([repair_metas, from] { 4 rlogger.debug("Removed all repair_meta for single node {}", from); 5 }); ``` Since 3.1, we start 16 repair instances in parallel which will create 16 readers.The reader semaphore is 10. At line 2, it calls ``` 6 future<> stop() { 7 auto gate_future = _gate.close(); 8 auto writer_future = _repair_writer.wait_for_writer_done(); 9 return when_all_succeed(std::move(gate_future), std::move(writer_future)); 10 } ``` The gate protects the reader to read data from disk: ``` 11 with_gate(_gate, [] { 12 read_rows_from_disk 13 return _repair_reader.read_mutation_fragment() --> calls reader() to read data 14 }) ``` So line 7 won't return until all the 16 readers return from the call of reader(). The problem is, the reader won't release the reader semaphore until the reader is destroyed! So, even if 10 out of the 16 readers have finished reading, they won't release the semaphore. As a result, the stop() hangs forever. To fix in short term, we can delete the reader, aka, drop the the repair_meta object once it is stopped. Refs: #4693	2019-07-15 21:51:57 +02:00
Benny Halevy	0e4567c881	table: document _sstables_lock/_sstable_deletion_sem locking order Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-15 19:20:35 +03:00
Botond Dénes	135c84c29a	tests: add unit tests for the data stream split in compaction	2019-07-15 17:38:00 +03:00
Botond Dénes	719ad51bea	random_schema: generate_random_mutations(): ensure partitions are unique Duplicate partitions can appear as a result of the same partition key generated more than once. For now we simply remove any duplicates. This means that in some circumstances there will be less partitions generated than asked.	2019-07-15 17:38:00 +03:00
Botond Dénes	eaedbed069	tests/random: add get_bool() overload with random engine param	2019-07-15 17:38:00 +03:00
Botond Dénes	057f9aa655	random_schema: add ttl and expiry support When generating data, the user can now also generate ttls and expiry for all generated atoms. This happens in a controlled way, via a generator functor, very similar to how the timestamps are generated. This functor is also used by `random_schema` to generate `deletion_time` for all tombstones, so the user now has full control of when all of the atoms can be GC'd.	2019-07-15 17:38:00 +03:00
Botond Dénes	76a853e345	tests/random_schema: generate_random_mutations(): generate partition tombstone	2019-07-15 17:38:00 +03:00
Botond Dénes	4d9f3e5705	data_model: extend ttl and expiry support	2019-07-15 17:38:00 +03:00
Botond Dénes	96d3c1efb1	random_schema: generate_random_mutations(): restore indentation	2019-07-15 17:38:00 +03:00
Botond Dénes	b26fe76fc1	tests: random_schema: futurize generate_random_mutations() To avoid reactor stalls when generate many and/or large partitions.	2019-07-15 17:38:00 +03:00
Botond Dénes	cf135c6257	tests/random_schema: generate_random_mutations(): allow customizing generated data Allow callers to specify the number of partitions generated, as well as the number of clustering rows and range tombstones generated per partition.	2019-07-15 17:38:00 +03:00
Botond Dénes	d2930ffa53	tests/random_schema: add assert to make_clustering_key() Verify that the schema does indeed have clustering columns. Better an assert than a cryptic "division by 0" exception deeper in the call stack.	2019-07-15 17:38:00 +03:00
Botond Dénes	d90ac6bd7b	tests/random_schema: generate_random_mutations(): remove `engine` parameter Use an internally create instance of random engine. Passing a readily seeded engine from the outside is pointless now that we have a mechanism to seed entire test suites with a command line algorithm: the internal engine is seeded from tests::random, so the seed of the test suite determines the internal seed as well. Update the sole user of this method (mutation_writer_test.cc) to not generate local seeds anymore.	2019-07-15 17:38:00 +03:00
Botond Dénes	fd2f53f292	tests: mutation_writer_test.cc/generate_mutations() -> random_schema.hh/generate_random_mutations() We plan on allowing other tests to use this method. The first step is to make it available in a header.	2019-07-15 17:38:00 +03:00
Botond Dénes	7a4a609e88	Introduce Garbage Collected Consumer to Mutation Compactor Introduce consumer in mutation compactor that will only consume data that is purged away from regular consumer. The goal is to allow compaction implementation to do whatever it wants with the garbage collected data, like saving it for preventing data resurrection from ever happening, like described in issue #4531. noop_compacted_fragments_consumer is made available for users that don't need this capability. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 17:38:00 +03:00
Botond Dénes	4c2781edaa	row_marker: add garbage_collector The new collector parameter is a pointer to a `compaction_garbage_collector` implementation. This collector is passed the row_marker when it expired and would be discarded. The collector param is optional and defaults to nullptr.	2019-07-15 17:38:00 +03:00
Botond Dénes	7db2006162	row_marker: de-inline compact_and_expire()	2019-07-15 17:38:00 +03:00
Botond Dénes	4c7a7ffe8f	row: add garbage_collector The new collector parameter is a pointer to a `compaction_garbage_collector` implementation. This collector is passed all atoms that are expired and can would be discarded. The body of `compact_and_expire()` was changed so that it checks cells' tombstone coverage before it checks their expiry, so that cells that are both covered by a tombstone and also expired are not passed to the collector. The collector is forwarded to `collection_type_impl::mutation::compact_and_expire()` as well. The collector param is optional and defaults to nullptr	2019-07-15 17:38:00 +03:00
Botond Dénes	307b48794d	collection_type_impl::mutation: compact_and_expire() add collector parameter The new collector parameter is a pointer to a `compaction_garbage_collector` implementation. This collector is passed all atoms that are expired and would be discarded. The body of `compact_and_expire()` was changed so that it checks cells' tombstone coverage before it checks their expiry, so that cells that are both covered by a tombstone and also expired are not passed to the collector. The collector param is optional and defaults to nullptr. To accommodate the collector, which needs to know the column id, a new `column_id` parameter was added as well.	2019-07-15 17:37:55 +03:00
Calle Wilund	1ed9a44396	utils::config_file: Propagare broadcast_to_all_shards to dependent files Fixes #4713 Modifying config files to use sharded storage misses the fact that extensions are allowed to add non-member config fields to the main configuration, typically from "extra" config_file objects. Unless those "extra" files are broadcast when main file broadcast, the values will not be readable from other shards. This patch propagates the broadcast to all other config files whose entries are in the top level object. This ensures we always keep data up to date on config reload. Message-Id: <20190715135851.19948-1-calle@scylladb.com>	2019-07-15 17:02:09 +03:00
Nadav Har'El	9cc9facbea	configure.py: atomically overwrite build.ninja configure.py currently takes some time to write build.ninja. If the user interrupts (e.g., control-C) configure.py, it can leave behind a partial or even empty build.ninja file. This is most frustrating when the user didn't explicitly run "configure.py", but rather just ran "ninja" and ninja decided to run configure.py, and after interrupting it the user cannot run "ninja" again because build.ninja is gone. Another result of losing build.ninja is that the user now needs to remember which parameters to run "configure.py", because the old ones stored in build.ninja were lost. The solution in this patch is simple: We write the new build.ninja contents into a temporary file, not directly into build.ninja. Then, only when the entire file has been succesfully written, do we rename the temporary file to its intended name - build.ninja. Fixes #4706 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190715122129.16033-1-nyh@scylladb.com>	2019-07-15 15:34:48 +03:00
Botond Dénes	5002ebb73f	Introduce compaction_garbage_collector interface This interface can be used to implement a garbage collector that collects atoms that are purged due to expiry during compaction. The intended usage is collecting purged atoms for safekeeping until the compaction process finishes safely, to be dropped only at the end when the compaction is known to have finished successfully.	2019-07-15 15:30:43 +03:00
Eliran Sinvani	997a146c7f	auth: Prevent race between role_manager and pasword_authenticator When scylla is started for the first time with PasswordAuthenticator enabled, it can be that a record of the default superuser will be created in the table with the can_login and is_superuser set to null. It happens because the module in charge of creating the row is the role manger and the module in charge of setting the default password salted hash value is the password authenticator. Those two modules are started together, it the case when the password authenticator finish the initialization first, in the period until the role manager completes it initialization, the row contains those null columns and any loging attempt in this period will cause a memory access violation since those columns are not expected to ever be null. This patch removes the race by starting the password authenticator and autorizer only after the role manger finished its initialization. Tests: 1. Unit tests (release) 2. Auth and cqlsh auth related dtests. Fixes #4226 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190714124839.8392-1-eliransin@scylladb.com>	2019-07-14 16:19:57 +03:00
Rafael Ávila de Espíndola	67c624d967	Add documentation for large_rows and large_cells Fixes #4552 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190614151907.20292-1-espindola@scylladb.com>	2019-07-12 19:21:26 +03:00
Amnon Heiman	1c6dec139f	API: compaction_manager add get pending tasks by table The pending tasks by table name API return an array of pending tasks by keyspace/table names. After this patch the following command would work: curl -X GET 'http://localhost:10000/compaction_manager/metrics/pending_tasks_by_table' Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-12 19:21:26 +03:00
Takuya ASADA	842f75d066	reloc: provide libthread_db.so.1 to debug thread on gdb In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc since it's not available on ldd result with scylla binary. To debug thread, we need to add the library in a relocatable package manually. Fixes #4673 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190711111058.7454-1-syuu@scylladb.com>	2019-07-12 19:21:26 +03:00
Piotr Sarna	ac7531d8d9	db,hints: decouple in-flight hints limits from resource manager The resource manager is used to manage common resources between various hints managers. In-flight hints used to be one of the shared resources, but it proves to cause starvation, when one manager eats the whole limit - which may be especially painful if the background materialized views hints manager starves the regular hints manager, which can in turn start failing user writes because of admission control. This patch makes the limit per-manager again, which effectively reverts the limit to its original behavior. Fixes #4483 Message-Id: <8498768e8bccbfa238e6a021f51ec0fa0bf3f7f9.1559649491.git.sarna@scylladb.com>	2019-07-12 19:21:26 +03:00
Rafael Ávila de Espíndola	4e7ffb80c0	cql: Fix use of UDT in reversed columns We were missing calls to underlying_type in a few locations and so the insert would think the given literal was invalid and the select would refuse to fetch a UDT field. Fixes #4672 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190708200516.59841-1-espindola@scylladb.com>	2019-07-12 19:21:26 +03:00
Kamil Braun	60a4867a5b	Fix infinite looping when performing a range query on system.size_estimates. Queries to system.size_estimates table which are not single parition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. This commit fixes the issue and closes #4689.	2019-07-12 18:09:15 +02:00
Kamil Braun	ba5a02169e	Fix segmentation fault when querying system.size_estimates for an empty keyspace.	2019-07-12 18:02:10 +02:00
Kamil Braun	a1665b74a9	Refactor size_estimates_virtual_reader Move the implementation of size_estimates_mutation_reader to a separate compilation unit to speed up compilation times and increase readability. Refactor tests to use seastar::thread.	2019-07-12 17:53:00 +02:00
Benny Halevy	6dad9baa1c	table: disable_sstable_write: acquire _sstable_deletion_sem `disable_sstable_write` needs to acquire `_sstable_deletion_sem` to properly synchronize with background deletions done by `on_compaction_completion` to ensure no sstables will be created or deleted during `reshuffle_sstables` after `storage_service::load_new_sstables` disables sstable writes. Fixes #4622 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	bbbd749f70	table: uninline enable_sstable_write Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	c6bad3f3c2	table: reshuffle_sstables: add log message To mark the point in time writes are disabled and scanning of the data directory is beginning. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Asias He	aa8d7af4f0	repair: Enable rpc stream in row level repair Add the row_level_diff_detect_algorithm::send_full_set_rpc_stream as supported algo. If both repair master and followers support it, the master will use the rpc stream interface, otherwise use the old rpc verb interface.	2019-07-11 08:59:48 +08:00
Asias He	38b72b398b	repair: Wrap with foreign_ptr to avoid cross cpu free The moved set_diff and rows will be freed on the target cpu instead of the source cpu, which will cause a lot of cross-cpu frees. To fix, wrap them in foreign_ptr.	2019-07-11 08:59:48 +08:00
Asias He	06c84be257	repair: Futurize get_repair_rows_size and row_buf_size To prevent stall when number of rows inside row buf is large.	2019-07-11 08:36:39 +08:00
Asias He	809c992b30	repair: Avoid calling get_repair_rows_size in get_sync_boundary Instead of calling get_repair_rows_size() which might stall with large number of rows, return the size of the rows from read_rows_from_disk.	2019-07-11 08:36:39 +08:00
Asias He	4d41f8e57e	repair: Futurize row_buf_csum To prevent stall when number of rows inside row buf is large.	2019-07-11 08:36:39 +08:00
Asias He	0ef167c9c8	repair: Yield inside get_set_diff get_set_diff always runs inside a thread, so we can thread::maybe_yield() to avoid stall.	2019-07-11 08:36:39 +08:00
Asias He	f871d9edd4	repair: Use get_repair_rows_size helper in get_sync_boundary We have a helper get_repair_rows_size to get the row size in the list.	2019-07-11 08:36:39 +08:00
Asias He	ccbc9fb0ca	repair: Avoid stall in do_estimate_partitions_on_local_shard Do not use boost::accumulate which does not yield. Use do_for_each for each sstable to avoid stall.	2019-07-11 08:36:39 +08:00
Asias He	b7b5cb33e8	remove get_row_diff	2019-07-11 08:36:39 +08:00
Rafael Ávila de Espíndola	281f3a69f8	mc writer: Fix exception safety when closing _index_writer This fixes a possible cause of #4614. From the backtrace in that issue, it looks like a file is being closed twice. The first point in the backtrace where that seems likely is in the MC writer. My first idea was to add a writer::close and make it the responsibility of the code using the writer to call it. That way we would move work out of the destructor. That is a bit hard since the writer is destroyed from flat_mutation_reader::impl::~consumer_adapter and that would need to get a close function too. This patch instead just fixes an exception safety issue. If _index_writer->close() throws, _index_writer is still valid and ~writer will try to close it again. If the exception was thrown after _completed.set_value(), that would explain the assert about _completed.set_value() being called twice. With this patch the path outside of the destructor now moves the writer to a local variable before trying to close it. Fixes #4614 Message-Id: <20190710171747.27337-1-espindola@scylladb.com>	2019-07-10 19:27:19 +02:00
Paweł Dziepak	eb7d17e5c5	lsa: make sure align_up_for_asan() doesn't cause reads past end of segment In debug mode the LSA needs objects to be 8-byte aligned in order to maximise coverage from the AddressSanitizer. Usually `close_active()` creates a dummy objects that covers the end of the segment being closed. However, it the last real objects ends in the last eight bytes of the segment then that dummy won't be created because of the alignment requirements. This broke exit conditions on loops trying to read all objects in the segment and caused them to attempt to dereference address at the end of the segment. This patch fixes that. Fixes #4653.	2019-07-10 19:19:24 +02:00
Avi Kivity	e32bdb6b90	Merge "Warn user about using SimpleStrategy with Multi DC deployment" from Kamil " If the user creates a keyspace with the 'SimpleStrategy' replication class in a multi-datacenter environment, they will receive a warning in the CQL shell and in the server logs. Resolves #4481 and #4651. " * 'multidc' of https://github.com/kbr-/scylla: Warn user about using SimpleStrategy with Multi DC deployment Add warning support to the CQL binary protocol implementation	2019-07-10 16:47:07 +03:00
Avi Kivity	138b28ae43	Merge "Fix command line parsing and add logging." from Kamil " Fixes #4203 and #4141. " * 'cmdline' of https://github.com/kbr-/scylla: Add logging of parsed command line options Fix command line argument parsing in main.	2019-07-10 16:40:57 +03:00
Avi Kivity	405fd517b0	Merge "IPv6 support" from Calle " Fixes #2027 Modifies inet address type in scylla to use seastar::net::inet_address, and removes explicit use of ipv4_addr in various network code in favour of socket_address. Thus capable of resolving and binding to ipv6. Adds config option to enable/disable ipv6 (default enabled), so upgrading cluster can continue to work while running mixed version nodes (since gossip message address serialization becomes different). " * 'calle/ipv6' of https://github.com/elcallio/scylla: test-serialization: Add small roundtrip test for inet address (v4 + v6) inet_address/init: Make ipv6 default enabled db::config: Add enable ipv6 switch (default off) gms::inet_address: Make serialization ipv6 aware Remove usage of inet_address::raw_addr() Replace use of "ipv4_addr" with socket_address inet_address: Add optional family to lookup gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address types: Add ipv6_address support	2019-07-10 15:07:56 +03:00
Benny Halevy	b4dc118639	tests: logalloc_test: scale down test_region_groups Post commit `b3adabda2d` (Reduce logalloc differences between debug and release) logalloc_test's memory footprint has grown, in particular in test_region_groups, and it triggers the oom killer on our test automation machines. This patch scales down this test case so it requires less memory. Fixes #4669 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-10 12:06:10 +02:00
Pekka Enberg	bb53c109b4	test.py: Add option for repeating test execution This adds a '--repeat N' command line option to test.py, which can be used to execute the tests N times. This is useful for finding flakey tests, for example. Message-Id: <20190710092115.15960-1-penberg@scylladb.com>	2019-07-10 12:42:39 +03:00
Botond Dénes	ce647fac9f	timestamp_based_splitting_writer: fix the handling of partition tombstone Currently the handling of partition tombstones is broken in multiple ways: * The partition-tombstone is lost when the bucket is calculated for its timestamp (due to a misplaced `std::exchange()`). * When the `partition_start` fragment (containing the partition tombstone) is actually written to the bucket we emit another `partition_start` fragment before it because the bucket has not seen that partition before and we fail to notice that we are actually writing the partition header. This bug was allowed to fly under the radar because the unit test was accidentally not creating partition tombstones in the generated data (due to a mistake). It was discovered while working on unit tests for another test and fixing the data generation function to actually generate partition tombstones. This patch fixes both problems in the handling of partition tombstones but it doesn't yet fixes the test. That is deferred until the patch series which uncovered this bug is merged to avoid merge conflicts. The other series mentioned here is: [PATCH v6 00/15] compaction: allow collecting purged data Fixes: #4683 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190710092427.122623-1-bdenes@scylladb.com>	2019-07-10 12:36:57 +03:00
Pekka Enberg	e6cc90aa98	test: add 'eventually' block to index paging test (#4681 ) Without 'eventually', the test is flaky because the index can still be not up to date while checking its conditions. Fixes #4670 Tests: unit(dev)	2019-07-10 11:46:03 +03:00
Kamil Braun	d6736a304a	Add metric for failed memtable flushes Resolves #3316. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-10 11:30:10 +03:00
Amnon Heiman	2fbc5ea852	config_file.hh: get_value return a pointer to the value The get_value method returns a pointer to the value that is used by the value_to_json method. The assumption is that the void pointer points to the actual value. Fixes #4678 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-10 10:40:35 +03:00
Piotr Sarna	ebbe038d19	test: add 'eventually' block to index paging test Without 'eventually', the test is flaky because the index can still be not up to date while checking its conditions. Fixes #4670	2019-07-09 17:07:16 +02:00
Asias He	39ca044dab	repair: Allow repair when a replica is down Since commit `bb56653` (repair: Sync schema from follower nodes before repair), the behaviour of handling down node during repair has been changed. That is, if a repair follower is down, it will fail to sync schema with it and the repair of the range will be skipped. This means a range can not be repaired unless all the nodes for the replicas are up. To fix, we filter out the nodes that is down and mark the repair is partial and repair with the nodes that are still up. Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test Fixes: #4616 Backports: 3.1 Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>	2019-07-09 10:07:36 +03:00
Konstantin Osipov	56f3bda4c7	metrics: introduce a metric for non-local reads A read which arrived to a non-replica and had to be forwarded to a replica by the coordinator is accounted in an own metric, reads_coordinator_outside_replica_set. Most often such read is produced by a driver which is unaware of token distribution on the ring. If a read was forwarded to another replica due to heat weighted load balancing or query preference set by the user, it's not accounted in the metric. In case of a multi-partition read (a query using IN statement, e.g. x in (1, 2, 3)), if any of the keys is read from a non-local node the read is accounted as a non-local. The rationale behind it is that if the user tries to be careful and send IN queries only to the same vnode, they are rewarded with the counter staying at zero, while if they send multi-partition IN queries without any precautions, they will see the metric go up which gives them a starting point for investigating performance problems. Closes #4338	2019-07-08 19:23:38 +03:00
Calle Wilund	5dfc356380	test-serialization: Add small roundtrip test for inet address (v4 + v6) Verify we get back what we put in.	2019-07-08 15:28:21 +00:00
Konstantin Osipov	da1d1b74da	metrics: account writes forwarded by a coordinator in an own metric. Add a metric to account writes which arrived to a non-replica and had to be forwarded by a coordinator to a replica. The name of the added metric is 'writes_coordinator_outside_replica_set'. Do not account forwarded read repair writes, since they are already accounted by a reads_coordinator_outside_replica_set metric, added in a subsequent patch. In scope of #4338.	2019-07-08 18:17:48 +03:00
Calle Wilund	3cfb79e0ff	inet_address/init: Make ipv6 default enabled Makes lookup find any (incl ipv6 numeric) address. Init will look at enable_ipv6 and use explcit ipv4 family lookup if not enabled.	2019-07-08 14:13:10 +00:00
Calle Wilund	1f5e1d22bf	db::config: Add enable ipv6 switch (default off) Off by default to prevent problems during cluster migration when needing to gossip with non-ipv6 aware nodes.	2019-07-08 14:13:09 +00:00
Calle Wilund	c540e36fe2	gms::inet_address: Make serialization ipv6 aware Because inet_address was initially hardcoded to ipv4, its wire format is not very forward compatible. Since we potentially need to communicate with older version nodes, we manually define the new serial format for inet_address to be: ipv4: 4 bytes address ipv6: 4 bytes marker 0xffffffff (invalid address) 16 bytes data -> address	2019-07-08 14:13:09 +00:00
Calle Wilund	e9816efe06	Remove usage of inet_address::raw_addr()	2019-07-08 14:13:09 +00:00
Calle Wilund	4ef940169f	Replace use of "ipv4_addr" with socket_address Allows the various sockets to use ipv6 address binding if so configured.	2019-07-08 14:13:09 +00:00
Calle Wilund	5ba545f493	inet_address: Add optional family to lookup	2019-07-08 14:13:09 +00:00
Calle Wilund	5fd811ec8a	gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address Thusly handle all types net::inet_address can handle. I.e. ipv6.	2019-07-08 14:13:09 +00:00
Calle Wilund	482fd72ca2	types: Add ipv6_address support As ipv4, just redirect to inet_address.	2019-07-08 14:09:25 +00:00
Asias He	b7abaa04da	repair: Futurize get_row_diff to avoid stall The copy of _working_row_buf and boost::copy_range can stall if the number of rows are big. Futurize get_row_diff to avoid stall.	2019-07-08 15:22:16 +08:00
Asias He	a4b24e44a3	repair: Fix possible stall in request_row_hashes The std::find_if and std::copy can stall if the number of rows are big. Introduce a helper move_row_buf_to_working_row_buf to move the rows that yields to avoid stall.	2019-07-08 15:22:16 +08:00
Asias He	b48dc42e73	repair: Allow default construct for repair_row All members of repair_row are now optional. Enable the default constructor so that _row_buf.resize() can work.	2019-07-08 15:22:16 +08:00
Asias He	18fb0714a0	repair: Remove apply_rows It is not used any more. The user now calls apply_rows_on_master_in_thread and apply_rows_on_follower instead.	2019-07-08 15:22:16 +08:00
Asias He	882530ce26	repair: Run get_row_diff_with_rpc_stream in a thread So that we can make get_row_diff_source_op run inside a thread, in turn it can now call apply_rows_on_master_in_thread which eliminates stall.	2019-07-08 15:22:16 +08:00
Asias He	948b833d74	repair: Run get_row_diff_and_update_peer_row_hash_sets inside a thread So it can use apply_rows_on_master_in_thread which eliminates stall.	2019-07-08 15:22:16 +08:00
Asias He	7f29d13984	repair: Run get_row_diff inside a thread So it can use apply_rows_on_master_in_thread which elimiates stall.	2019-07-08 15:22:16 +08:00
Asias He	6b2e3946fb	repair: Add apply_rows_on_master_in_thread Like apply_rows, except it runs inside a thread and runs on master node only.	2019-07-08 15:22:16 +08:00
Asias He	7c6a29027f	repair: Add apply_rows_on_follower Add a version for apply_rows on follower node only.	2019-07-08 15:22:16 +08:00
Asias He	cc14c6e0c4	repair: Futurize working_row_hashes To avoid stall when the number of rows is big.	2019-07-08 15:22:16 +08:00
Asias He	f3d2ba6ec7	repair: Remove get_full_row_hashes helper It is a single wrapper for working_row_hashes and is used only once. Remove it.	2019-07-08 15:22:16 +08:00
Benny Halevy	a0499bbd31	lister::guarantee_type: do not follow symlink Simliar to commit `9785754e0d` lister::guarantee_type needs to check the entry's type, not the symlink it may point to. Fixes #4606 The nodetool_refresh_with_wrong_upload_modes_test dtest creates a broken symlink and following it fails, as it should, with the default follow_symlink::yes Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190626110734.4558-1-bhalevy@scylladb.com>	2019-07-07 15:29:28 +03:00
Avi Kivity	63edd46562	Merge "Expand big decimal with arithmetic operators" from Piotr " This miniseries expands big_decimal interface with convenience operators (-=, +, -), provides test cases for it and makes one of the constructors explicit. Tests: unit(dev) " * 'expand_big_decimal_interface' of https://github.com/psarna/scylla: utils: make string-based big decimal constructor explicit tests: add more operators to big decimal tests utils: add operators to big_decimal	2019-07-06 12:26:08 +03:00
Avi Kivity	24caf0824d	Merge "Complete the LIKE operator" from Dejan " Implement LIKE parsing, intermediate representation, and query processing. Add tests for this implementation (leaving the LIKE functionality tests in tests/like_matcher_test.cc). Refs #4477. " * 'finish-like' of https://github.com/dekimir/scylla: cql3: Add LIKE operator to CQL grammar cql3: Ensure LIKE filtering for partition columns cql3: Add LIKE restriction cql3: Add LIKE relation	2019-07-06 12:26:08 +03:00
kbr-	8995945052	Implement tuple_type_impl::to_string_impl. (#4645 ) Resolves #4633. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-06 12:26:08 +03:00
Avi Kivity	187859ad78	review-checklist: mention that the guidelines are not absolute rules and can be overridden	2019-07-06 12:26:08 +03:00
Kamil Braun	c0915c40eb	Warn user about using SimpleStrategy with Multi DC deployment If the user creates a keyspace with the 'SimpleStrategy' replication class in a multi-datacenter environment, they will receive a warning in the CQL shell and in the server logs. Resolves #4481. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-05 09:25:03 +02:00
Kamil Braun	35dbe9371c	Add warning support to the CQL binary protocol implementation The CQL binary protocol v4 adds support for server-side warnings: https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec This adds a convenient API to add warnings to messages returned to the user. Resolves #4651. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-05 09:24:56 +02:00
Kamil Braun	2f0f53ac72	Add logging of parsed command line options The recognized command line options are now being printed when Scylla is run, together with the whole command used. Fixes #4203.	2019-07-05 09:00:28 +02:00
Piotr Sarna	eed2543bcc	utils: make string-based big decimal constructor explicit As a rule of thumb, single-parameter constructors should be explicit in order to avoid unexpected implicit conversions.	2019-07-04 11:33:00 +02:00
Piotr Sarna	7e722f8dd5	tests: add more operators to big decimal tests	2019-07-04 11:32:57 +02:00
Piotr Sarna	a5e41408ec	utils: add operators to big_decimal For convenience, operators -=, + and - are implemented on top of +=.	2019-07-04 11:32:53 +02:00
Dejan Mircevski	6727e8f073	cql3: Add LIKE operator to CQL grammar Extend the grammar with LIKE and add CQL query tests for it. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 11:01:13 +02:00
Dejan Mircevski	1c583de8bb	cql3: Ensure LIKE filtering for partition columns Partition columns are implicitly filtered whenever possible, avoiding expensive post-processing. But there are exceptions, eg, when partition key is only partially restricted, or for CONTAINS expressions. Here we add LIKE to this list of exceptions. Also fix compute_bounds() to punt on LIKE restrictions, which cannot be translated into meaningful bounds. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 10:59:13 +02:00
Dejan Mircevski	63cec653e5	cql3: Add LIKE restriction This restriction leverages like_matcher to perform filtering. Make single_column_relation::new_LIKE_restriction() return this new restriction. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 10:58:56 +02:00
Dejan Mircevski	21d7722594	cql3: Add LIKE relation Add a new type of relation with operator LIKE. Handle it in relation::to_restriction by introducing a new virtual method for it. The temporary implementation of this method returns null; that will be replaced in a subsequent patch. Add abstract_type::is_string() to recognize string columns and disallow LIKE operator on non-string columns. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 10:54:30 +02:00
Kamil Braun	f155a2d334	Fix command line argument parsing in main. Command line arguments are parsed twice in Scylla: once in main and once in Seastar's app_template::run. The first parse is there to check if the "--version" flag is present --- in this case the version is printed and the program exists. The second parsing is correct; however, most of the arguments were improperly treated as positional arguments during the first parsing (e.g., "--network host" would treat "host" as a positional argument). This happened because the arguments weren't known to the command line parser. This commit fixes the issue by moving the parsing code until after the arguments are registered. Resolves #4141. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-03 14:11:34 +02:00
Avi Kivity	8a0c4d508a	Merge "Repair switch to rpc stream" from Asias " The put_row_diff, get_row_dif and get_full_row_hashes verbs are switched to use rpc stream instead of rpc verb. They are the verbs that could send big rpc messages. The rpc stream sink and source are created per repair follower for each of the above 3 verbs. The sink and source are shared for multiple requests during the entire repair operation for a given range, so there is no overhead to setup rpc stream. The row buffer is now increased to 32MiB from 256KiB, giving better bandwidth in high latency links. The downside of bigger row buffer is reduced possibility that all the rows inside a row buffer are identical. This causes more full hashes to be exchanged. To address this issue, the plan is to add better set reconciliation algorithm in addition to the current send full hashes. I compared rebuild using regular stream plan with repair using rpc stream. With 2 nodes, 1 smp, 8M rows, delete all data on one of the node before repair or rebuild. repair using seastar rpc verb Time to complete: 82.17s rebuild using regular streaming which uses seastar rpc stream Time to complete: 63.87s repair using seastar rpc stream Time to complete: 68.48s For 1) and 3), the improvement is 16.6% (repair using rpc verb v.s. repair using rpc stream) For 2) and 3), the difference is 7.2% (repair v.s. stream) The result is promising for the future repair-based bootstrap/replace node operations. NOTE: We do not actually enable rpc stream in row level repair for now. We will enable it after we fix the the stall issues caused by handling bigger row buffers. Fixes #4581 " * 'repair_switch_to_rpc_stream_v9' of https://github.com/asias/scylla: (45 commits) docs: Add RPC stream doc for row level repair repair: Mark some of the helper functions static repair: Increase max row buf size repair: Hook rpc stream version of verbs in row level repair repair: Add use_rpc_stream to repair_meta repair: Add is_rpc_stream_supported repair: Add needs_all_rows flag to put_row_diff repair: Optimize get_row_diff repair: Register repair_get_full_row_hashes_with_rpc_strea repair: Register repair_put_row_diff_with_rpc_stream repair: Register repair_get_row_diff_with_rpc_stream repair: Add repair_get_full_row_hashes_with_rpc_stream_handler repair: Add repair_put_row_diff_with_rpc_stream_handler repair: Add repair_get_row_diff_with_rpc_stream_handler repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op repair: Add repair_put_row_diff_with_rpc_stream_process_op repair: Add repair_get_row_diff_with_rpc_stream_process_op repair: Add put_row_diff_with_rpc_stream repair: Add put_row_diff_sink_op repair: Add put_row_diff_source_op ...	2019-07-03 10:08:55 +03:00
Asias He	f686f0b9d6	docs: Add RPC stream doc for row level repair This documents RPC stream usage in row level repair.	2019-07-03 08:09:57 +08:00
Asias He	78ae5af203	repair: Mark some of the helper functions static They are used only inside repair/row_level.cc. Make them static.	2019-07-03 08:09:57 +08:00
Asias He	e8c13444ba	repair: Increase max row buf size If the cluster supports row level repair with rpc stream interface, we can use bigger row buf size to have better repair bandwidth in high latency links.	2019-07-03 08:01:37 +08:00
Asias He	7d08a8d223	repair: Hook rpc stream version of verbs in row level repair If rpc stream is supported, use the rpc stream version of the get_row_diff, put_row_diff, get_full_row_hashes.	2019-07-03 08:01:37 +08:00
Asias He	fccaa0324f	repair: Add use_rpc_stream to repair_meta Determine if rpc stream should be used.	2019-07-03 08:01:37 +08:00
Asias He	7bf0c646be	repair: Add is_rpc_stream_supported Given a row_level_diff_detect_algorithm, return if this algo supports rpc stream interface.	2019-07-03 08:01:04 +08:00
Asias He	1c92643f02	repair: Add needs_all_rows flag to put_row_diff So we can avoid copy _working_row_buf in get_row_diff on master node if there is only one follower node and all repair rows are needed by follower node.	2019-07-03 07:56:22 +08:00
Asias He	6595417567	repair: Optimize get_row_diff Move _working_row_buf instead of copy if it is follower node or it is master node with only one follow. In these cases, the _working_row_buf will not be used after this function, so we can move it.	2019-07-03 07:56:22 +08:00
Asias He	c4eb0ee361	repair: Register repair_get_full_row_hashes_with_rpc_strea Register the get_full_row_hashes rpc stream verb.	2019-07-03 07:56:22 +08:00
Asias He	b56cced5b8	repair: Register repair_put_row_diff_with_rpc_stream Register the put_row_diff rpc stream verb.	2019-07-03 07:56:22 +08:00
Asias He	67130031b1	repair: Register repair_get_row_diff_with_rpc_stream Register the get_row_diff rpc stream verb.	2019-07-03 07:56:22 +08:00
Asias He	f255f902bd	repair: Add repair_get_full_row_hashes_with_rpc_stream_handler It is the handler for the get_full_row_hashes rpc stream verb on the receiving side.	2019-07-03 07:56:17 +08:00
Asias He	e3267ad98c	repair: Add repair_put_row_diff_with_rpc_stream_handler It is the handler for the put_row_diff rpc stream verb on the receiving side.	2019-07-03 07:55:24 +08:00
Asias He	06ac014261	repair: Add repair_get_row_diff_with_rpc_stream_handler It is the handler for the get_row_diff rpc stream verb on the receiving side.	2019-07-03 07:54:43 +08:00
Asias He	5f25969da3	repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op It is the helper for the get_full_row_hashes rpc stream verb handler.	2019-07-03 07:54:03 +08:00
Asias He	39d5a9446e	repair: Add repair_put_row_diff_with_rpc_stream_process_op It is the helper for the put_row_diff rpc stream verb handler.	2019-07-03 07:53:21 +08:00
Asias He	049e793fe5	repair: Add repair_get_row_diff_with_rpc_stream_process_op It is the helper for the get_row_diff rpc stream verb handler.	2019-07-03 07:52:12 +08:00
Avi Kivity	fca1ae69ff	database: convert _cfg from a pointer to a reference _cfg cannot be null, so it can be converted to a reference to indicate this. Follow-up to `fe59997efe`.	2019-07-02 17:57:50 +02:00
Calle Wilund	f317d7a975	commitlog: Simplify commitlog extension iteration Fixes #4640 Iterating extensions in commitlog.cc should mimic that in sstables.cc, i.e. a simple future-chain. Should also use same order for read and write open, as we should preserve transformation stack order. Message-Id: <20190702150028.18042-1-calle@scylladb.com>	2019-07-02 18:37:44 +03:00
Takuya ASADA	332a6931c4	dist/redhat: fix install path of scripts On recent changes install.sh mistakenly copies dist/common/scripts to /opt/scylladb/scripts/scripts, it should be /opt/scylladb/scripts. Same on /opt/scylladb/scyllatop as well. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190702120030.13729-1-syuu@scylladb.com>	2019-07-02 17:29:33 +03:00
Asias He	b1188f299e	repair: Add put_row_diff_with_rpc_stream It is rpc stream version of put_row_diff. It uses rpc stream instead of rpc verb to put the repair rows to follower nodes.	2019-07-02 21:22:41 +08:00
Asias He	31b30486a7	repair: Add put_row_diff_sink_op It is a helper that works on the sink() of the put_row_diff rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	dbe035649b	repair: Add put_row_diff_source_op It is a helper that works on the source() of the put_row_diff rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	72d3563da1	repair: Add get_row_diff_with_rpc_stream It is rpc stream version of get_row_diff. It uses rpc stream instead of rpc verb to get the repair rows from follower nodes.	2019-07-02 21:22:41 +08:00
Asias He	4cb44baa08	repair: Add get_row_diff_sink_op It is a helper that works on the sink() of the get_row_diff rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	a1e19514f9	repair: Add get_row_diff_source_op It is a helper that works on the source() of the get_row_diff rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	473bd7599c	repair: Add get_full_row_hashes_with_rpc_stream It is rpc stream version of get_full_row_hashes. It uses rpc stream instead of rpc verb to get the repair hashes data from follower nodes.	2019-07-02 21:22:41 +08:00
Asias He	1e2a598fe7	repair: Add get_full_row_hashes_sink_op It is a helper that works on the sink() of the get_full_row_hashes rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	149c54b000	repair: Add get_full_row_hashes_source_op It is a helper that works on the source() of the get_full_row_hashes rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	b3e7299032	repair: Add sink and source object into repair_meta They will soon be used to sync repair hashes and repair rows bewteen master and follower nodes.	2019-07-02 21:22:41 +08:00
Asias He	acd40fd529	repair: Add sink_source_for_put_row_diff Use sink_source_for_repair to define sink_source_for_put_row_diff with sink = repair_row_on_wire_with_cmd, source = repair_stream_cmd for REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	4405f7a6ff	repair: Add sink_source_for_get_row_diff Use sink_source_for_repair to define sink_source_for_get_row_diff with sink = repair_hash_with_cmd, source = repair_row_on_wire_with_cmd for REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	0bffd07e7e	repair: Add sink_source_for_get_full_row_hashes Use the sink_source_for_repair to define sink_source_for_get_full_row_hashes with sink = repair_stream_cmd, source = repair_hash_with_cmd for REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	8400dafa12	repair: Add sink_source_for_repair helper class It is used to store the sink and source objects for the rpc stream verbs used by row level repair.	2019-07-02 21:22:41 +08:00
Asias He	37b3de4ea0	messaging_service: Add REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM support It is used by row level repair.	2019-07-02 21:18:55 +08:00
Asias He	a7c7ba9765	messaging_service: Add REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM support It is used by row level repair.	2019-07-02 21:18:55 +08:00
Asias He	dc92bda93b	messaging_service: Add REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM support	2019-07-02 21:18:55 +08:00
Asias He	f312c95b74	messaging_service: Add do_make_sink_source helper It is used by the row level repair rpc stream verbs to make sink and source object.	2019-07-02 21:18:55 +08:00
Asias He	bc295a00a6	messaging_service: Add rpc stream verb for row level repair - REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM Get repair rows from follower nodes - REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM Put repair rows to follower nodes - REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM: Get full hashes from follower nodes	2019-07-02 21:18:55 +08:00
Asias He	c93113f3a5	idl: Add repair_row_on_wire_with_cmd	2019-07-02 21:18:54 +08:00
Asias He	a90fb24efc	idl: Add repair_hash_with_cmd	2019-07-02 21:18:37 +08:00
Asias He	599d40fbe9	idl: Add repair_stream_cmd	2019-07-02 21:18:15 +08:00
Asias He	672c24f6b0	idl: Add send_full_set_rpc_stream for row_level_diff_detect_algorithm	2019-07-02 21:17:36 +08:00
Avi Kivity	c987397e52	transport: reject initial frames with wild body sizes (#4620 ) If someone opens a connection to port 9042 and sends some random bytes, there is a 1 in 64 probability we'll recognize it as a valid frame (since we only check the version byte, allowing versions 1-4) and we'll try to read frame.length bytes for the body. If this value is very large, we'll run out of memory very quickly. Fix this by checking for reasonable body size (100kB). The initial message must be a STARTUP, whose body is a [string map] of options, of which just three are recognized. 100kB is plenty for future expansion. Note that this does not replace true security on listening ports and only serves to protect against mistakes, not attacks. An attacker can easily exhaust server memory by opening many connections and trickle-feeding them small amounts of data so they appear alive. We can't use the config item native_transport_max_frame_size_in_mb, because that can be legitimately large (and the default is atrocious, 256MB). Fixes #4366.	2019-07-01 19:02:34 +02:00
Tomasz Grabiec	eb496b5eae	Merge "Allow changing configuration at runtime" from Avi This patchset allows changing the configuration at runtime, The user triggers this by editing the configuration file normally, then signalling the database with SIGHUP (as is traditional). The implementation is somewhat complicated due the need to store non-atomic mutable state per-shard and to synchronize the values in all shards. This is somewhat similar to Seastar's sharded<>, but that cannot be used since the configuration is read before Seastar is initialized (due to the need to read command-line options). Tests: unit (dev, debug), manual test with extra prints (dev) Ref #2689 Fixes #2517.	2019-07-01 15:04:59 +02:00
Avi Kivity	28a514820d	Update seastar submodule * seastar a5b9f77d52...44a300cd50 (1): > build: fix dpdk library link order Should fix the build with dpdk enabled.	2019-07-01 11:56:59 +03:00
Takuya ASADA	02c6db29c8	dist/debian: manage .pyc as a part of package Since `828b63f4fb` only add .pyc on .rpm package, we also need it to .deb package. See #4612 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190629023739.8472-1-syuu@scylladb.com>	2019-06-30 15:54:42 +03:00
Avi Kivity	af2a3859f6	Update seastar submodule * seastar b629d5ef7a...a5b9f77d52 (6): > perftune.py: add comment explaining why we don't log errors when binding NVMe IRQs for all but i3.nonmetal machines > sharded: do a two phase shutdown for sharded services > chunked_fifo: add iterator > perftune.py: fix the i3 metal detection pattern > core/memory: remove translation api > reactor: file_type: offer option to not follow symbolic links	2019-06-30 11:32:21 +03:00
Avi Kivity	2abe015150	database: allow live update of the compaction_enforce_min_threshold config item Change the type from bool to updateable_value<bool> throughout the dependency chain and mark it as live updateable. In theory we should also observe the value and trigger compaction if it changes, but I don't think it is worthwhile.	2019-06-28 16:43:25 +03:00
Avi Kivity	c98d1ea942	tests: cql_test_env: prepare config for updateable values Once we start using updateable_value<>, we must make it refer to the updateable_value_source<> on the same shard, and to do that we need to call broadcast_to_all_shards() first (this creates the per-shard copy).	2019-06-28 16:43:25 +03:00
Avi Kivity	8cffec37aa	main: re-read configuration file on SIGHUP Trap SIGHUP and signal a loop to re-read the configuration file.	2019-06-28 16:43:25 +03:00
Avi Kivity	2ee07bb09b	main: preserve config::client_encryption_options configuration source With dynamically updateable configuration, tracking the source of a value is more important, since we'll accept or reject updates depending on the source. Fix the source of client_encryption_options, which we RMW, by preserving the original source.	2019-06-28 16:43:25 +03:00
Avi Kivity	6061a833a3	config: make values updateable Replace the per-shard value we store with an updateable_value_source, which allows updating it dynamically and allows users to track changes. The broadcast_to_all_shards() function is augmented to apply modifications when called on a live system.	2019-06-28 16:43:25 +03:00
Avi Kivity	f7de01d082	config: store copies of config items per shard Since some of our values are not atomic (strings) and the administrative information needed to track references to values is also not atomic, we will need to store them per-shard. To do that we add a vector of per-shard data to config_file, where each element is itself a vector of configuration items. Since we need to operate generically on items (copying them from shard to shard) we store them in a type-erased form. Only mutable state is stored per-shard.	2019-06-28 16:43:25 +03:00
Avi Kivity	fb23cd1ff6	Introduce updatable_value The updateable_value and updateable_value_source classes allow broadcasting configuration changes across the application. The updateable_value_source class represents a value that can be updated, and updateable_value tracks its source and reflects changes. A typical use replaces "uint64_t config_item" with "updateable_value<uint64_t> config_item", and from now on changes to the source will be reflected in config_item. For more complicated uses, which must run some callback when configuration changes, you can also call config_item.observe(callback) to be actively notified of changes.	2019-06-28 16:43:25 +03:00
Avi Kivity	8d7c1c7231	db: seed_provider_type: add operator==() Dynamically updateable configuration requires checking whether configuration items changed or not, so we can skip firing notifiers for the common case where nothing changed. This patch adds a comparison operator for seed_provider_type, which was missing it.	2019-06-28 16:43:25 +03:00
Avi Kivity	da2a98cde6	config: don't allow assignment to config values Currently, we allow adjusting configuration via cfg.whatever() = 5; by returning a mutable reference from cfg.whatever(). Soon, however, this operation will have side effects (updating all references to the config item, and triggering notifiers). While this can be done with a proxy, it is too tricky. Switch to an ordinary setter interface: cfg.whatever.set(5); Because boost::program_options no longer gets a reference to the value to be written to, we have to move the update to a notifier, and the value_ex() function has to be adjusted to infer whether it was called with a vector type after it is called, not before.	2019-06-28 16:43:25 +03:00
Avi Kivity	b146fd1356	config: make noncopyable config_file and db::config are soon not going to be copyable. The reason is that in order to support live updating, we'll need per-shard copies of each value, and per-shard tracking of references to values. While these can be copied, it will be an asycnronous operation and thus cannot be done from a copy constructor. So to prepare for these changes, replace all copies of db::config by references and delete config_file's copy constructor. Some existing references had to be made const in order to adapt the const-ness of db::config now being propagated (rather than being terminated by a non-const copy).	2019-06-28 16:43:25 +03:00
Avi Kivity	fe59997efe	database: don't copy config object Copying the config object breaks the link between the original and the copied object, so updates to config items will not be visible. To allow updates, don't copy any more, and instead keep a pointer. The pointer won't work will once config is updateable, since the same object is shared across multiple shard, but that can be addressed later.	2019-06-28 15:20:39 +03:00
Avi Kivity	339699b627	database: remove default constructor Currently, database::_cfg is a copy of the global configuration. But this means that we have multiple master copies of the configuration, which makes updating the configuration harder. In order to eliminate the copy we have to eliminate the database default constructor, which creates a config object, so that all remaining constructors can receive config by reference and retain that reference.	2019-06-28 15:20:39 +03:00
Avi Kivity	70d8127400	gossip_test: pass configuration to database object We want to eliminate the default database constructor (to be explained in the next patch), so eliminate its only use in gossip_test, using the regular constructor instead.	2019-06-28 15:20:39 +03:00
Glauber Costa	d916601ea4	toppartitions: fix typo toppartitons -> toppartitions Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190627160937.7842-1-glauber@scylladb.com>	2019-06-27 19:13:58 +03:00
Tomasz Grabiec	e071445373	Merge "More precise poisoning in logalloc" from Rafael With this unused descriptors and objects should always be poisoned. * https://github.com/espindola/scylla/ align-descriptors-so-that-they-are-poisoned-v4: Convert macros to inline functions More precise poisoning in logalloc	2019-06-27 16:30:40 +02:00
Takuya ASADA	eabb872789	dist/redhat: install /usr/sbin symlinks correctly On current scylla.spec, shell glob pattern "scylla_setup" does not correctly expanded, it mistakenly created a symlink named "/usr/sbin/scylla_setup". We need to expand them, need to create symlinks for each setup scripts. Fixes #4605 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190627053530.10406-2-syuu@scylladb.com>	2019-06-27 14:22:40 +03:00
Takuya ASADA	828b63f4fb	dist/redhat: manage .pyc as a part of package Since we don't install .pyc files on our package, python3 will generate .pyc file when we launch setup script first time. Then we will have unmanaged files under script directory, it will remain when Scylla package upgraded / removed. We need to compile .py when we generate relocatable package, add compiled .pyc files on .rpm/.deb packages. Fixes #4612 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190627053530.10406-1-syuu@scylladb.com>	2019-06-27 14:22:39 +03:00
Rafael Ávila de Espíndola	d8dbacc7f6	More precise poisoning in logalloc This change aligns descriptors and values to 8 bytes so that poisoning a descriptor or value doesn't interfere with other descriptors and values. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-06-26 13:13:48 -07:00
Rafael Ávila de Espíndola	6a2accb483	Convert macros to inline functions Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-06-26 13:13:48 -07:00
Avi Kivity	dd76943125	Merge "Segregate data when streaming by timestamp for time window compaction strategy" from Botond " When writing streamed data into sstables, while using time window compaction strategy, we have to emit a new sstable for each time window. Otherwise we can end up with sstables, mixing data from wildly different windows, ruining the compaction strategy's ability to drop entire sstables when all data within is expired. This gets worse as these mixed sstables get compacted together with sstables that used to contain a single time window. This series provides a solution to this by segregating the data by its atom's the time-windows. This is done on the new RPC streaming and the new row-level, repair, memtable-flush and compaction, ensuring that the segregation requirement is respected at all times. Fixes: #2687 " * 'segregate-data-into-sstables-by-time-window-streaming/v2.1' of ssh://github.com/denesb/scylla: streaming,repair: restore indentation repair: pass the data stream through the compaction strategy's interposer consumer streaming: pass the data stream through the compaction strategy's interposer consumer TWCS: implement add_interposer_consumer() compaction_strategy: add add_interposer_consumer() Add mutation_source_metadata tests: add unit test for timestamp_based_splitting_writer Add timestamp_based_splitting_writer Introduce mutation_writer namespace	2019-06-26 19:18:52 +03:00
Tomasz Grabiec	3e30a33e31	Merge "Introduce tests::random_schema" from Botond Most of our tests use overly simplistic schemas (`simple_schema`) or very specialized ones that focus on exercising a specific area of the tested code. This is fine in most places as not all code is schema dependent, however practice has showed that there can be nasty bugs hiding in dark corners that only appear with a schema that has a specific combination of types. This series introduces `tests::random_schema` a utility class for generating random schemas and random data for them. An important goal is to make using random schemas in tests as simple and convenient as possible, therefore fostering the appearance of tests using random schemas. Random schema was developed to help testing code I'm currently working on, which segregates data by time-windows. As I wasn't confident in my ability to think of every possible combination of types that can break my code I came up with random-schema to help me finding these corner cases. So far I consider it a success, it already found bugs in my code that I'm not sure I would have found if I had relied on specific schemas. It also found bugs in unrelated areas of the code which proves my point in the first paragraph. * https://github.com/denesb/scylla.git random_schema/v5: tests/data_model: approximate to the modeled data structures data_value: add ascii constructor tests/random-utils.hh: add stepped_int_distribution tests/random-utils.hh: get_int() add overloads that accept external rand engine tests/random-utils.hh: add get_real() tests: introduce random_schema	2019-06-26 18:10:20 +02:00
Botond Dénes	12b8405720	streaming,repair: restore indentation Deferred from the previous two patches.	2019-06-26 18:45:36 +03:00
Botond Dénes	e3f4692868	repair: pass the data stream through the compaction strategy's interposer consumer	2019-06-26 18:45:36 +03:00
Botond Dénes	9c2407573c	streaming: pass the data stream through the compaction strategy's interposer consumer	2019-06-26 18:45:36 +03:00
Botond Dénes	ee563928df	TWCS: implement add_interposer_consumer() Exploit the interposer customization point to inject a consumer that will segregate the mutation stream based on the contained atoms' timestamps, allowing the requirements of TWCS to be mantained every time sstables are written to disk. For the implementation, `timestamp_based_splitting_writer` is used, with a classifier that maps timestamps to windows.	2019-06-26 18:45:36 +03:00
Tomasz Grabiec	2d3e3640df	Merge "Collection: use utils::chunked_vector to store the cells" from Botond This is a band-aid patch that is supposed to fix the immediate problem of large collections causing large allocations. The proper fix is to use IMR but that will take time. In the meanwhile alleviate the pressure on the memory allocator by using a chunked storage collection (utils::chunked_vector) instead of std::vector. In the linked issue seastar::chunked_fifo was also proposed as the container to use, however chunked fifo is not traversable in reverse which disqualifies it from this role. Refs: #3602	2019-06-26 15:32:25 +02:00
Botond Dénes	a280dcfe4c	compaction_strategy: add add_interposer_consumer() This will be the customization point for compaction strategies, used to inject a specific interposer consumer that can manipulate the fragment stream so that it satisfies the requirements of the compaction strategy. For now the only candidate for injecting such an interposer is time-window compaction strategy, which needs to write sstables that only contains atoms belonging to the same time-window. By default no interposer is injected. Also add an accompanying customization point `adjust_partition_estimate()` which returns the estimated per-sstable partition-estimate that the interposer will produce.	2019-06-26 15:45:59 +03:00
Botond Dénes	3ce902a4be	Add mutation_source_metadata This struct contains metadata regarding to a mutation_source. Currently it contains the min and max timestamp. This will be used later by compaction strategies to determine whether a given mutation stream has to be split or not.	2019-06-26 15:45:59 +03:00
Botond Dénes	25d7cbedc0	tests: add unit test for timestamp_based_splitting_writer	2019-06-26 15:45:59 +03:00
Botond Dénes	df29600eec	Add timestamp_based_splitting_writer This writer implements the core logic of time-window based data segregation. It splits the fragment stream provided by a reader, such that each atom (cell) in the stream will be written into a consumer based on the time-window its timestamp belongs to. The end result is that each consumer will only see fragments, whoose atoms all have timestamps belonging to the same time-window. When a mutation fragment has atoms belonging to different time-windows, it is split into as many fragments as needed so each has only atoms that belong to the same time-window.	2019-06-26 15:45:59 +03:00
Botond Dénes	2693f1838a	Introduce mutation_writer namespace Currently there is a single mutation_writer: `multishard_writer`, however in the next path we are going to add another one. This is the right moment to move these into a common namespace (and folder), we have way too much stuff scattered already in the top-level namespace (and folder). Also rename `tests/multishard_writer_test.cc` to `tests/mutation_writer_test.cc`, this test-suite will be the home of all the different mutation writer's unit test cases.	2019-06-26 15:45:59 +03:00
Avi Kivity	adcc95dddc	Merge "sstable: mc: reader: Optimize multi-partition scans for data sets with small partitions" from Tomasz " Currently, parser and the consumer save its state and return the control to the caller, which then figures out that it needs to enter a new partition, and that it doesn't need to skip. We do it twice, after row end, and after row start. All this work could be avoided if the consumer installed by the reader adjusted its state and pushed the fragments on the spot. This patch achieves just that. This results in less CPU overhead. The ka/la reader is left still stopping after row end. Brings a 20% improvement in frag/s for a full scan in perf_fast_forward (Haswell, NVMe): perf_fast_forward -c1 -m1G --run-tests=small-partition-skips: Before: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 0.952372 4 1000000 1050009 755 1050765 1046585 976.0 971 124256 1 0 0 0 0 0 0 0 99.7% After: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 0.790178 4 1000000 1265538 1150 1266687 1263684 975.0 971 124256 2 0 0 0 0 0 0 0 99.6% Tests: unit (dev) " * 'sstable-optimize-partition-scans' of https://github.com/tgrabiec/scylla: sstable: mc: reader: Do not stop parsing across partitions sstables: reader: Move some parser state from sstable_mutation_reader to mp_row_consumer_reader sstables: reader: Simplify _single_partition_read checking sstables: reader: Update stats from on_next_partition() sstables: mutation_fragment_filter: Drop unnecessary calls to _walker.out_of_range() sstables: ka/la: reader make push_ready_fragments() safe to call many times sstables: mc: reader: Move out-of-range check out of push_ready_fragments() sstables: reader: Return void from push_ready_fragments() sstables: reader: Rename on_end_of_stream() to on_out_of_clustering_range() sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end	2019-06-26 13:19:12 +03:00
Avi Kivity	06a9596491	tests: cql_test_env: disable commitlog O_DSYNC O_DSYNC causes commitlog to pre-allocate each commitlog segment by writing zeroes into it. In normal operation, this is amortized over the many times the segment will be reused. In tests, this is wasteful, but under the default workstation configuration with /tmp using tmpfs, no actual writes occur. However on a non-default configuration with /tmp mounted on a real disk, this causes huge disk I/O and eventually a crash (observed in schema_change_test). The crash is likely only caused indirectly, as the extra I/O (exacerbated by many tests running in parallel) xcauses timeouts. I reproduced this problem by running 15 copies of schema_change_test in parallel with /tmp mounted on a real filesystem. Without this change, I usually observe one or two of the copies crashing, with the change they complete (and much more quickly, too).	2019-06-26 12:15:53 +02:00
Asias He	f0f0beba2e	repair: Move the global tracker object into repair_service The tracker object was a static object in repair.cc. At the time we initialize it, we do not know the smp::count, so we have to initialize the _repairs object when it is used on the fly. void init_repair_info() { if (_repairs.size() != smp::count) { _repairs.resize(smp::count); } } This introduces a race if init_repair_info is called on different thread(shard). To fix, put the tracker object inside the newly introduced repair_service object which is created in main.cc. Fixes #4593 Message-Id: <b1adef1c0528354d2f92f8aaddc3c4bee5dc8a0a.1561537841.git.asias@scylladb.com>	2019-06-26 12:53:10 +03:00
Botond Dénes	572a738777	collection: use chunked_vector to store cells This is quick fix to the immediate problem of large collections causing large allocations, triggering stalls or OOM. The proper fix is to use IMR for storing the cells, but that is a complex change that will require time, so let's not stall/OOM in the meanwhile.	2019-06-26 11:40:44 +03:00
Botond Dénes	c68ffc330e	types: don't copy collection_type_impl::mutation_view Just because its a view its not cheap to copy.	2019-06-26 11:39:41 +03:00
Asias He	fb3f0125ee	repair: Add default construct for partition_key_and_mutation_fragments This is useful when we want to add an empty partition_key_and_mutation_fragments.	2019-06-26 09:12:55 +08:00
Asias He	3fc53a6b72	repair: Add send_full_set_rpc_stream in row_level_diff_detect_algorithm It is used to negotiate if the master can use the rpc stream interface to transfer data.	2019-06-26 09:12:55 +08:00
Asias He	6054a56333	repair: Add repair_row_on_wire_with_cmd It is used to contain both a repair cmd and repair_row_on_wire object.	2019-06-26 09:12:55 +08:00
Asias He	9f36d775dc	repair: Add repair_hash_with_cmd It is a wrapper contains both a repair cmd and repair_hash object.	2019-06-26 09:12:55 +08:00
Asias He	6b59279e26	repair: Add repair_stream_cmd It is used by row level repair to add small protocol on top of the rpc stream interface.	2019-06-26 09:12:55 +08:00
Rafael Ávila de Espíndola	94d2194c77	dht: token: Simplify operator< While this is a strict weak ordering, it is not obvious and duplicates a bit of logic. This ptach simplifies it by using tri_compare. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190621204820.37874-1-espindola@scylladb.com>	2019-06-25 19:06:30 +03:00
Tomasz Grabiec	269e65a8db	Merge "Sync schema before repair" from Asias This series makes sure new schema is propagated to repair master and follower nodes before repair. Fixes #4575 * dev.git asias/repair_pull_schema_v2: migration_manager: Add sync_schema repair: Sync schema from follower nodes before repair	2019-06-25 19:05:29 +03:00
Amos Kong	f0cd589a75	dist: suppress the yaml load warning YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Fix it by use new safe interface - yaml.safe_load() Signed-off-by: Amos Kong <amos@scylladb.com> Cc: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <9b68601845117274573474ede0341cc81f80efa6.1561156205.git.amos@scylladb.com>	2019-06-25 19:05:29 +03:00
Avi Kivity	fc629bb14f	Merge "cql3: lift infinite bound check" from Benny & Piotr " If the database supports infinite bound range deletions, CQL layer will no longer throw an error indicating that both ranges need to be specified. Fixes #432 Update test_range_deletion_scenarios unit test accordingly. " * 'cql3-lift-infinite-bound-check' of https://github.com/bhalevy/scylla: cql3: lift infinite bound check if it's supported service: enable infinite bound range deletions with mc database: add flag for infinite bound range deletions	2019-06-25 19:05:29 +03:00
Nadav Har'El	a88c9ca5a5	Merge branch 'add_proper_aggregation_for_paged_indexing_2' of git://github.com/psarna/scylla into next Piotr Sarna says: Fixes #4540 This series adds proper handling of aggregation for paged indexed queries. Before this series returned results were presented to the user in per-page partial manner, while they should have been returned as a single aggregated value. Tests: unit(dev) Piotr Sarna (8): cql3: split execute_base_query implementation cql3: enable explicit copying of query_options cql3: add a query options constructor with explicit page size cql3: add proper aggregation to paged indexing cql3: make DEFAULT_COUNT_PAGE_SIZE constant public tests: add query_options to cquery_nofail tests: add indexing + paging + aggregation test case tests: add indexing+paging test case for clustering keys	2019-06-25 19:05:29 +03:00
Avi Kivity	7195f75fb2	Update seastar submodule * seastar ded50bd8a4...b629d5ef7a (9): > sharded: no_sharded_instance_exception: fix grammar > core,net: output_stream: remove redundant std::move() > perftune: make sure that ethtool -K has a chance of succeeding > net/dpdk: upgrade to dpdk-19.05 > perftune.py: Fix a few more places where we use deprecated pyudev.Device ones > reactor: provide an uptime function > rpc: add sink::flush() to streaming api > Use a table to document the various build modes > foreign_ptr: Fix compilation error due to unused variable	2019-06-25 19:05:29 +03:00
Avi Kivity	9d21341733	review-checklist.md: add common checks - code style - naming - micro-performance - concurrency - unit-testing - templates and type erasure - singletons	2019-06-25 19:05:29 +03:00
Piotr Sarna	efa7951ea5	main: stop view builder conditionally The view builder is started only if it's enabled in config, via the view_building=true variable. Unfortunately, stopping the builder was unconditional, which may result in failed assertions during shutdown. To remedy this, view building is stopped only if it was previously started. Fixes #4589	2019-06-25 19:05:29 +03:00
Asias He	bb5665331c	repair: Sync schema from follower nodes before repair Since commit "repair: Use the same schema version for repair master and followers", repair master and followers uses the same schema version that master decides to use during the whole repair operation. If master has older version of schema, repair could ignore the data which makes use of the new schema, e.g., writes to new columns. To fix, always sync the schema agreement before repair. The master node pulls schema from followers and applies locally. The master then uses the "merged" schema. The followers use get_schema_for_write() to pull the "merged" schema. Fixes #4575 Backports: 3.1	2019-06-25 17:13:47 +08:00
Asias He	14c1a71860	migration_manager: Add sync_schema Makes sure this node knows about all schema changes known by "nodes" that were made prior to this call. Refs: #4575 Backports: 3.1	2019-06-25 17:13:47 +08:00
Botond Dénes	d00cb4916c	tests: introduce random_schema random_schema is a utility class that provides methods for generating random schemas as well as generating data (mutations) for them. The aim is to make using random schemas in tests as simple and convenient as is using `simple_schema`. For this reason the interface of `random_schema` follows closely that of `simple_schema` to the extent that it makes sense. An important difference is that `random_schema` relies on `data_model` to actually build mutations. So all its mutation-related operations work with `data_model::mutation_descrition` instead of actual `mutation` objects. Once the user arrived at the desired mutation description they can generate an actual mutation via `data_model::mutation_description::build()`. In addition to the `random_schema` class, the `random_schema.hh` header exposes the generic utility classes for generating types and values that it internally uses. random_schema is fully deterministic. Using the same seed and the same set of operations is guaranteed to result in generating the same schema and data.	2019-06-25 12:01:33 +03:00
Botond Dénes	070d72ee23	tests/random-utils.hh: add get_real()	2019-06-25 12:01:33 +03:00
Botond Dénes	2d9f6c3b63	tests/random-utils.hh: get_int() add overloads that accept external rand engine	2019-06-25 12:01:33 +03:00
Botond Dénes	2a7710129e	tests/random-utils.hh: add stepped_int_distribution	2019-06-25 12:01:33 +03:00
Botond Dénes	a3f9932a2f	data_value: add ascii constructor To allow a `data_value` with `ascii_type` to be constructed.	2019-06-25 12:01:33 +03:00
Botond Dénes	1bd8b77770	tests/data_model: approximate to the modeled data structures Make the the data modelling structures model their "real" counterparts more closely, allowing the user greater control on the produced data. The changes: * Add timestamp to atomic_value (which is now a struct, not just an alias to bytes). * Add tombstone to collection. * Add row_tombstone to row. * Add bound kinds and tombstone to range_tombstone. Great care was taken to preserve backward compatibility, to avoid unnecessary changes in existing code.	2019-06-25 12:01:33 +03:00
Piotr Sarna	add40d4e59	cql3: lift infinite bound check if it's supported If the database supports infinite bound range deletions, CQL layer will no longer throw an error indicating that both ranges need to be specified. [bhalevy] Update test_range_deletion_scenarios unit test accordingly. Fixes #432 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-24 15:58:34 +03:00
Piotr Sarna	c19fdc4c90	service: enable infinite bound range deletions with mc As soon as it's agreed that the cluster supports sstables in mc format, infinite bound range deletions in statements can be safely enabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-24 15:58:28 +03:00
Piotr Sarna	e77ef849af	database: add flag for infinite bound range deletions Database can only support infinite bound range deletions if sstable mc format is supported. As a first step to implement these checks, an appropriate flag is added to database.	2019-06-24 15:57:47 +03:00
Piotr Sarna	b668ee2b2d	tests: add indexing+paging test case for clustering keys Indexing a non-prefix part of the clustering key has a separate code path (see issue #3405), so it deserves a separate test case.	2019-06-24 14:51:17 +02:00
Piotr Sarna	3d9a37f28f	tests: add indexing + paging + aggregation test case Indexed queries used to erroneously return partial per-page results for aggregation queries. This test case used to reproduce the problem and now ensures that there would be no regressions. Refs #4540	2019-06-24 14:06:42 +02:00
Piotr Sarna	60cafcc39c	tests: add query_options to cquery_nofail The cquery_nofail utility is extended, so it can accept custom query options, just like execute_cql does.	2019-06-24 14:06:41 +02:00
Piotr Sarna	fe18638de3	cql3: make DEFAULT_COUNT_PAGE_SIZE constant public The constant will be later used in test scenarios.	2019-06-24 13:21:37 +02:00
Piotr Sarna	bb08af7e68	cql3: add proper aggregation to paged indexing Aggregated and paged filtering needs to aggregate the results from all pages in order to avoid returning partial per-page results. It's a little bit more complicated than regular aggregation, because each paging state needs to be translated between the base table and the underlying view. The routine keeps fetching pages from the underlying view, which are then used to fetch base rows, which go straight to the result set builder. Fixes #4540	2019-06-24 13:21:32 +02:00
Piotr Sarna	97d476b90f	cql3: add a query options constructor with explicit page size For internal use, there already exists a query_options constructor that copies data from another query_options with overwritten paging state. This commit adds an option to overwrite page size as well.	2019-06-24 13:21:32 +02:00
Piotr Sarna	fa89e220ef	cql3: enable explicit copying of query_options	2019-06-24 12:57:04 +02:00
Piotr Sarna	7a8b243ce4	cql3: split execute_base_query implementation In order to handle aggregation queries correctly, the function that returns base query results is split into two, so it's possible to access raw query results, before they're converted into end-user CQL message.	2019-06-24 12:57:03 +02:00
Benny Halevy	b1e78313fe	log_histogram: log_heap_options::bucket_of: avoid calling pow2_rank(0) pow2_rank is undefined for 0. bucket_of currently works around that by using a bitmask of 0. To allow asserting that count_{leading,trailing}_zeros are not called with 0, we want to avoid it at all call sites. Fixes #4153 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190623162137.2401-1-bhalevy@scylladb.com>	2019-06-23 19:32:51 +03:00
Avi Kivity	779b378785	Merge "Fix partitioned_sstable_set by making it self sufficient" from Raphael & Benny " partitioned_sstable_set is not self sufficient because it relies on compatible_ring_position_view, which in turn relies on lifetime of sstable object. This leads to use-after-free. Fix this problem by introducing compatible_ring_position and using it in p__s__s. Fixes #4572. Test: unit (dev), compaction dtests (dev) " * 'projects/fix_partitioned_sstable_set/v4' of ssh://github.com/bhalevy/scylla: tests: Test partitioned sstable set's self-sufficiency sstables: Fix partitioned_sstable_set by making it self sufficient Introduce compatible_ring_position and compatible_ring_position_or_view	2019-06-23 17:17:18 +03:00
Raphael S. Carvalho	14fa7f6c02	tests: Test partitioned sstable set's self-sufficiency Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-23 16:29:13 +03:00
Raphael S. Carvalho	293557a34e	sstables: Fix partitioned_sstable_set by making it self sufficient Partitioned sstable set is not self sufficient, because it uses compatible_ring_position_view as key for interval map, which is constructed from a decorated key in sstable object. If sstable object is destroyed, like when compaction releases it early, partitioned set potentially no longer works because c__r__p__v would store information that is already freed, meaning its use implies use-after-free. Therefore, the problem happens when partitioned set tries to access the interval of its interval map and uses freed information from c__r__p__v. Fix is about using the newly introduced compatible_ring_position_or_view which can hold a ring_position, meaning that partitioned set is no longer dependent on lifetime of sstable object. Retire compatible_ring_position_view.hh as it is now unused. Fixes #4572. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-23 16:29:13 +03:00
Raphael S. Carvalho	9a83561700	Introduce compatible_ring_position and compatible_ring_position_or_view The motivation for supporting ring position is that containers using it can be self sufficient. The existing compatible_ring_position_view could lead to use after free when the ring position data, it was built from, is gone. The motivation for compatible_ring_position_or_view is to allow lookup on containers that don't support different key types using c__r__p, and also to avoid unnecessary copies. If the user is provided only with a ring_position_view, c__r__p__or_v could be built from it and used for lookups. Converting ring_position_view to ring_position is very bug prone because there could be information lost in the process. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-23 16:29:12 +03:00
Rafael Ávila de Espíndola	65ac0a831c	Add to_string_impl that takes a data_value Currently to_string takes raw bytes. This means that to print a data_value it has to first be serialized to be passed to to_string, which will then deserializes it. This patch adds a virtual to_string_impl that takes a data_value and implements a now non virtual to_sting on top of it. I don't expect this to have a performance impact. It mostly documents how to access a data_value without converting it to bytes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190620183449.64779-3-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola	3bd5dd7570	Add a few more tests of data_value::to_string I found that no tests covered this code while refactoring it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190620183449.64779-2-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Nadav Har'El	6e87bca65d	storage_proxy: fix race and crash in case of MV and other node shutdown Recently, in merge commit `2718c90448`, we added the ability to cancel pending view-update requests when we detect that the target node went down. This is important for view updates because these have a very long timeout (5 minutes), and we wanted to make this timeout even longer. However, the implementation caused a race: Between creating the update's request handler (create_write_response_handler()) and actually starting the request with this handler (mutate_begin()), there is a preemption point and we may end up deleting the request handler before starting the request. So mutate_begin() must gracefully handle the case of a missing request handler, and not crash with a segmentation fault as it did before this patch. Eventually the lifetime management of request handlers could be refactored to avoid this delicate fix (which requires more comments to explain than code), or even better, it would be more correct to cancel individual writes when a node goes down, not drop the entire handler (see issue #4523). However, for now, let's not do such invasive changes and just fix bug that we set out to fix. Fixes #4386. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190620123949.22123-1-nyh@scylladb.com>	2019-06-23 16:03:06 +03:00
Asias He	b99c75429a	repair: Avoid searching all the rows in to_repair_rows_on_wire The repair_rows in row_list are sorted. It is only possible for the current repair_row to share the same partition key with the last repair_row inserted into repair_row_on_wire. So, no need to search from the beginning of the repair_rows_on_wire to avoid quadratic complexity. To fix, look at the last item in repair_rows_on_wire. Fixes #4580 Message-Id: <08a8bfe90d1a6cf16b67c210151245879418c042.1561001271.git.asias@scylladb.com>	2019-06-23 16:03:06 +03:00
Benny Halevy	883cb4318f	Merge pull request #4583 from bhalevy/init-and-shutdown-logging Init and shutdown logging	2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola	3660caff77	Reduce memory used by all tests Tests without custom flags were already being run with -m2G. Tests with custom flags have to manually specify it, but some were missing it. This could cause tests to fail with std::bad_alloc when two concurrent tests tried to allocate all the memory. This patch adds -m2G to all tests that were missing it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190620002921.101481-1-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Avi Kivity	9229afe64f	Merge "Fix infinite paging for indexed queries" from Piotr " Fixes #4569 This series fixes the infinite paging for indexed queries issue. Before this fix, paging indexes tended to end up in an infinite loop of returning pages with 0 results, but has_more_pages flag set to true, which confused the drivers. Tests: unit(dev) Branches: 3.0, 3.1 " * 'fix_infinite_paging_for_indexed_queries' of https://github.com/psarna/scylla: tests: add test case for finishing index paging cql3: fix infinite paging for indexed queries	2019-06-23 16:03:06 +03:00
Takuya ASADA	2135d2ae7f	dist/debian: install capabilities.conf on postinst script We still has "{{^jessie}}" tag on scylla-server systemd unit file to skip using AmbientCapabilities on Debian 8, but it does not able to work anymore since we moved to single binary .deb package for all debian variants, we must share same systemd unit file across all Debian variants. To do so we need to have separated file on /etc/systemd to define AmbientCapabilities, create the file while running postinst script only if distribution is not Debian 8, just like we do in .rpm. See #3344 See #3486 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190619064224.23035-1-syuu@scylladb.com>	2019-06-23 16:03:06 +03:00
Tomasz Grabiec	46341bd63f	gdb: Print coordinator stats related to memory usage from 'scylla memory' Example: Coordinator: fg writes: 150 bg writes: 39980, 21429280 B fg reads: 0 bg reads: 0 hints: 0 B view hints: 0 B Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1559906745-17150-1-git-send-email-tgrabiec@scylladb.com>	2019-06-23 16:03:06 +03:00
Tomasz Grabiec	f7e79b07d1	lsa: Respect the reclamation step hint from seastar allocator This will allow us to reduce the amount of segment compaction when reclaiming on behlaf of a large allocation because we'll evict much more up front. Tests: - unit (dev) Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1559906584-16770-1-git-send-email-tgrabiec@scylladb.com>	2019-06-23 16:03:06 +03:00
Tomasz Grabiec	c5184b3dd0	gdb: Print region_impl pointer from scylla lsa Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1559906684-17019-1-git-send-email-tgrabiec@scylladb.com>	2019-06-23 16:03:06 +03:00
Alexys Jacob	98bc9edf6f	thrift/: support version 0.11+ after THRIFT-2221 Thrift 0.11 changed to generate c++ code with std::shared_ptr instead of boost::shared_ptr. - https://issues.apache.org/jira/browse/THRIFT-2221 This was forcing scylla to stick with older versions of thrift. Fixes issue #3097. thrift: add type aliases to build with old and new versions. update to using namespace =	2019-06-23 16:03:06 +03:00
Takuya ASADA	e4320d6537	dist/debian: run 'systemctl daemon-reload' automatically on package install/uninstall Since we cannot use dh --with=systemd because we don't want to automatically enabling systemd units, manage them by our setup scripts, we have to do 'systemctl daemon-reload' manually. (On dh --with=systemd, systemd helper automatically provides such scirpts) Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190618000210.28972-1-syuu@scylladb.com>	2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola	8c067c26d9	Add support for the sanitize build mode in scylla Running tests in debug mode takes 25:22.08 in my machine. Using sanitize instead takes that down to 10:46.39. The mode is opt in, in that it must be explicitly selected with "configure.py --mode=sanitize" or "ninja sanitize". It must also be explicitly passed to test.py. Unfortunately building with asan, optimizations and debug info is very slow and there is nothing like -gline-tables-only in gcc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190617170007.44117-1-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Benny Halevy	1fd91eb616	main: add logging for deferred stopping Increase visibility of init messages to help diagnose init and shutdown issues. Ref #4384 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-20 13:04:36 +03:00
Benny Halevy	cbbe5a519a	main: improve init logging Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-20 13:04:36 +03:00
Benny Halevy	e96b1afdbd	supervisor::notify log at info level rather than trace Increase visibility of init messages to help diagnose init and shutdown issues. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-20 13:04:36 +03:00
Tomasz Grabiec	fa2ed3ecce	sstable: mc: reader: Do not stop parsing across partitions Currently, parser and the consumer save its state and return the control to the caller, which then figures out that it needs to enter a new partition, and that it doesn't need to skip. We do it twice, after row end, and after row start. All this work could be avoided if the consumer installed by the reader adjusted its state and pushed the fragments on the spot. This patch achieves just that. This results in less CPU overhead. The ka/la reader is left still stopping after row end. Brings a 20% improvement in frag/s for a full scan in perf_fast_forward (Haswell, NVMe): perf_fast_forward -c1 -m1G --run-tests=small-partition-skips: Before: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 0.952372 4 1000000 1050009 755 1050765 1046585 976.0 971 124256 1 0 0 0 0 0 0 0 99.7% After: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 0.790178 4 1000000 1265538 1150 1266687 1263684 975.0 971 124256 2 0 0 0 0 0 0 0 99.6%	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	386079472a	sstables: reader: Move some parser state from sstable_mutation_reader to mp_row_consumer_reader This state will be needed by the consumer to handle crossing partition boundaries on its own. While at it, document it.	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	92cb07debd	sstables: reader: Simplify _single_partition_read checking The old code was making advance_to_next_partition() behave incorrectly when _single_partition_read, which was compensated by a check in read_partition(). Cleaner to exit early.	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	7f4c041ba0	sstables: reader: Update stats from on_next_partition() After partition_start is emitted directly from the parser's consumer, read_partition() will not always be called for each produced partition.	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	0964a8fb38	sstables: mutation_fragment_filter: Drop unnecessary calls to _walker.out_of_range() out_of_range() cannot change to true when the position falls into the ranges, we only need to check it when it falls outside them.	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	556ccf4373	sstables: ka/la: reader make push_ready_fragments() safe to call many times Not a bug fix, just makes the implementation more robust against changes. Before this patch this might have resulted in partition_end being pushed many times.	2019-06-19 14:29:01 +02:00
Tomasz Grabiec	ef6edff673	sstables: mc: reader: Move out-of-range check out of push_ready_fragments() Currently, calling push_ready_fragments() with _mf_filter disengaged or with _mf_filter->out_of_range() causes it to call _reader->on_out_of_clustering_range(), which emits the partition_end fragment. It's incorrect to emit this fragment twice, or zero times, so correctness depends on the fact that push_ready_fragments() is called exactly once when transitioning between partitions. This is proved to be tricky to ensure, especially after partition_end starts to be emitted in a different path as well. Ensuring that push_ready_fragments() is NOT called after partition_end is emitted from consume_partition_end() becomes tricky. After having to fix this problem many times after unrelated changes to the flow, I decide that it's better to refactor. This change moves the call of on_out_of_clustering_range() out of push_ready_fragments(), making the latter safe to call any number of times. The _mf_filter->out_of_range() check is moved to sites which update the filter. It's also good because it gets rid of conditionals.	2019-06-19 14:29:01 +02:00
Tomasz Grabiec	552fe21812	sstables: reader: Return void from push_ready_fragments() The result is ignored, which is fine, so make it official to avoid confusion.	2019-06-19 14:29:01 +02:00
Tomasz Grabiec	1488b57933	sstables: reader: Rename on_end_of_stream() to on_out_of_clustering_range() The old name is confusing, because we're not always ending the stream when we call it.	2019-06-19 14:29:01 +02:00
Tomasz Grabiec	9b8ac5ecbc	sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end Currently, if there is a fragment in _ready and _out_of_range was set after row end was consumer, push_ready_fragments() would return without emitting partition_end. This is problematic once we make consume_row_start() emit partiton_start directly, because we will want to assume that all fragments for the previous partition are emitted by then. If they're not, then we'd emit partition_start before partition_end for the previous partition. The fix is to make sure that push_ready_fragments() emits everything.	2019-06-19 14:14:38 +02:00
Piotr Sarna	b8cadc928c	tests: add test case for finishing index paging The test case makes sure that paging indexes does not result in an infinite loop. Refs #4569	2019-06-19 14:10:13 +02:00
Piotr Sarna	88f3ade16f	cql3: fix infinite paging for indexed queries Indexed queries need to translate between view table paging state and base table paging state, in order to be able to page the results correctly. One of the stages of this translation is overwriting the paging state obtained from the base query, in order to return view paging state to the user, so it can be used for fetching next pages. Unfortunately, in the original implementation the paging state was overwritten only if more pages were available, while if 'remaining' pages were equal to 0, nothing was done. This is not enough, because the paging state of the base query needs to be overwritten unconditionally - otherwise a guard paging state value of 'remaining == 0' is returned back to the client along with 'has_more_pages = true', which will result in an infinite loop. This patch correctly overwrites the base paging state unconditionally. Fixes #4569	2019-06-19 14:10:13 +02:00
Tomasz Grabiec	cd1ff1fe02	Merge "Use same schema version for repair nodes" from Asias This patch set fixes repair nodes using different schema version and optimizes the hashing thanks to the fact now all nodes uses same schema version. Fixes: #4549 * seastar-dev.git asias/repair_use_same_schema.v3: repair: Use the same schema version for repair master and followers repair: Hash column kind and id instead of column name and type name	2019-06-18 12:42:53 +02:00
Asias He	4285801af9	repair: Hash column kind and id instead of column name and type name It is guaranteed repair nodes use the same schema. It is faster to hash column kind and id. Changing the hashing of mutation fragment causes incompatibility with mixed clusters. Let's backport to the 3.1 release, which includes row level repair for the first time and is not released yet. Refs: #4549 Backports: 3.1	2019-06-18 18:27:21 +08:00
Asias He	3db136f81e	repair: Use the same schema version for repair master and followers Before this patch, repair master and followers use their own schema version at the point repair starts independently. The schemas can be different due to schema change. Repair uses the schema to serialize mutation_fragment and deserialize the mutation_fragment received from peer nodes. Using different schema version to serialize and deserialize cause undefined behaviour. To fix, we use the schema the repair master decides for all the repair nodes involved. On top of this patch, we could do another step to make sure all nodes has the latest schema. But let's do it in a separate patch. Fixes: #4549 Backports: 3.1	2019-06-18 18:27:21 +08:00
Rafael Ávila de Espíndola	8672eddff2	Document the best practices for when to use asserts/exceptions/logs The intention is just to document what is currently done. If someone wants to propose changes, that can be done after the current practices have been documented. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190524135109.29436-1-espindola@scylladb.com>	2019-06-18 12:13:01 +03:00
Rafael Ávila de Espíndola	26c0814a88	Add test large collection warning This was already working, but we were not testing for it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190617181706.66490-1-espindola@scylladb.com>	2019-06-18 10:27:55 +02:00
Nadav Har'El	6aab1a61be	Fix deciding whether a query uses indexing The code that decides whether a query should used indexing was buggy - a partition key index might have influenced the decision even if the whole partition key was passed in the query (which effectively means that indexing it is not necessary). Fixes #4539 Closes https://github.com/scylladb/scylla/pull/4544 Merged from branch 'fix_deciding_whether_a_query_uses_indexing' of git://github.com/psarna/scylla tests: add case for partition key index and filtering cql3: fix deciding if a query uses indexing	2019-06-18 01:01:14 +03:00
Takuya ASADA	7320c966bc	dist/common/scripts/scylla_setup: don't proceed with empty NIC name Currently NIC selection prompt on scylla_setup just proceed setup when user just pressed Enter key on the prompt. The prompt should ask NIC name again until user input correct NIC name. Fixes #4517 Message-Id: <20190617124925.11559-1-syuu@scylladb.com>	2019-06-17 15:52:29 +03:00
Avi Kivity	938b74f47a	Merge "Fix gcc9 build" from Paweł " These patches fix remaining issues with gcc9 build, that involve a gcc9 bug, a gcc9 bug, and a stricter warning. Tests: unit(debug, dev, release). " * 'fix-gcc9-build' of https://github.com/pdziepak/scylla: dht/ring_position: silence complaints about uninitialised _token_bound xx_hasher: disable -Warray-bounds api/column_family: work around gcc9 bug in seastar::future<std::any>	2019-06-17 15:23:24 +03:00
Tomasz Grabiec	f798f724c8	frozen_mutation: Guard against unfreezing using wrong schema Currently, calling unfreeze() using the wrong version of the schema results in undefined behavior. That can cause hard-to-debug problems. Better to throw in such cases. Refs #4549. Tests: - unit (dev) Message-Id: <1560459022-23786-1-git-send-email-tgrabiec@scylladb.com>	2019-06-17 15:23:24 +03:00
Asias He	f32371727b	repair: Avoid copying position in to_repair_rows_list No need to make a copy because it is not used to construct repair_row any more since commit `9079790f85` (repair: Avoid writing row with same partition key and clustering key more than once). Use mf->position() instead. Refs: #4510 Backports: 3.1 Message-Id: <7b21edcc3368036b6357b5136314c0edc22ad4d2.1560753672.git.asias@scylladb.com>	2019-06-17 15:23:24 +03:00
Paweł Dziepak	483f66332b	dht/ring_position: silence complaints about uninitialised _token_bound	2019-06-17 13:11:20 +01:00
Paweł Dziepak	82b8450922	xx_hasher: disable -Warray-bounds In release mode gcc9 has a false positive warning about out of bound access in xxhash implementation: ./xxHash/xxhash.c:799:27: error: array subscript -3 is outside array bounds of ‘long unsigned int [1]’ [-Werror=array-bounds] This is solved by disabling -Warray-bounds in the xxhash code.	2019-06-17 13:09:54 +01:00
Paweł Dziepak	8a13d96203	api/column_family: work around gcc9 bug in seastar::future<std::any> There is a gcc9 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415 that makes it impossible to pass std::any through a seastar::future<T>. Fortunately, there is only one user of seastar::future<std::any> in Scylla and it is not performance-critical. This patch avoids the gcc9 bug by using seastar::future<std::unique_ptr<std::any>>.	2019-06-17 13:06:28 +01:00
Glauber Costa	91b71a0b1a	do not allow multiple snapshot operations at the same time We saw a node crashing today with nodetool clearsnapshot being called. After investigation, the reason is that nodetool clearsnapshot ws called at the same time a new snapshot was created with the same tag. nodetool clearsnapshot can't delete all files in the directory, because new files had by then been created in that directory, and crashes on I/O error. There are, many problems with allowing those operations to proceed in parallel. Even if we fix the code not to crash and return an error on directory non-empty, the moment they do any amount of work in parallel the result of the operation becomes undefined. Some files in the snapshot may have been deleted by clear, for example, and a user may then not be able to properly restore from the backup if this snapshot was used to generate a backup. Moreover, although we could lock at the granularity of a keyspace or column family, I think we should use a big hammer here and lock the entire snapshot creation/deletion to avoid surprises (for example, if a user requests creation of a snapshot for all keyspaces, and another process requests clear of a single keyspace) Fixes #4554 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190614174438.9002-1-glauber@scylladb.com>	2019-06-16 10:30:13 +03:00
Rafael Ávila de Espíndola	44eb939aa6	Use the sanitizer flags from seastar In practice, we always want to use the same sanitizer flags with seastar and scylla. Seastar was already marking its sanitizer flags public, so what was missing was exporting the link flags via pkgconfig and dropping the duplicates from scylla. I am doing this after wasting some time editing the wrong file. This depends on the seastar patch to export the sanitizer flags in pkgconfig. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-06-16 09:21:10 +03:00
Takuya ASADA	f582a759ee	dist: merge /usr/lib/scylla to /opt/scylladb We used to use /opt/scylladb just for Scylla build toolchain and dependency libraries, not for Scylla main package. But since we merged relocatable package, Scylla main binary and dependency libraries are all located under /opt/scylladb, only setup scripts remained on /usr/lib/scylla. It strange to keep using both /usr/lib/<app name> and /opt/<app name>, we should merge them into single place. Message-Id: <20190614011038.17827-1-syuu@scylladb.com>	2019-06-14 21:03:36 +03:00
Piotr Jastrzebski	a41c9763a9	sstables: distinguish empty and missing cellpath Before this patch mc sstables writer was ignoring empty cellpaths. This is a wrong behaviour because it is possible to have empty key in a map. In such case, our writer creats a wrong sstable that we can't read back. This is becaus a complex cell expects cellpath for each simple cell it has. When writer ignores empty cellpath it writes nothing and instead it should write a length of zero to the file so that we know there's an empty cellpath. Fixes #4533 Tests: unit(release) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <46242906c691a56a915ca5994b36baf87ee633b7.1560532790.git.piotr@scylladb.com>	2019-06-14 20:36:41 +03:00
Asias He	9079790f85	repair: Avoid writing row with same partition key and clustering key more than once Consider master: row(pk=1, ck=1, col=10) follower1: row(pk=1, ck=1, col=20) follower2: row(pk=1, ck=1, col=30) When repair runs, master fetches row(pk=1, ck=1, col=20) and row(pk=1, ck=1, col=30) from follower1 and follower2. Then repair master sends row(pk=1, ck=1, col=10) and row(pk=1, ck=1, col=30) to follower1, follower1 will write the row with the same pk=1, ck=1 twice, which violates uniqueness constraints. To fix, we apply the row with same pk and ck into the previous row. We only needs this on repair follower because the rows can come from multiple nodes. While on repair master, we have a sstable writer per follower, so the rows feed into sstable writer can come from only a single node. Tests: repair_additional_test.py:RepairAdditionalTest.repair_same_row_diff_value_3nodes_test Fixes: #4510 Message-Id: <cb4fbba1e10fb0018116ffe5649c0870cda34575.1560405722.git.asias@scylladb.com>	2019-06-13 17:19:19 +02:00
Asias He	912ce53fc5	repair: Allow repair_row to initialize partially On repair follower node, only decorated_key_with_hash and the mutation_fragment inside repair_row are used in apply_rows() to apply the rows to disk. Allow repair_row to initialize partially and throw if the uninitialized member is accessed to be safe. Message-Id: <b4e5cc050c11b1bafcf997076a3e32f20d059045.1560405722.git.asias@scylladb.com>	2019-06-13 17:18:53 +02:00
Benny Halevy	2fd2713fda	conf: update conf/scylla.yaml default large data warning thresholds They are currently inconsistent with db/config.cc and missing compaction_large_cell_warning_threshold_mb Fixes #4551 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190613133657.15370-1-bhalevy@scylladb.com>	2019-06-13 16:45:27 +03:00
Benny Halevy	4ad06c7eeb	tests/perf: provide random-seed option Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190613114307.31038-2-bhalevy@scylladb.com>	2019-06-13 14:45:49 +03:00
Benny Halevy	43e4631e6a	tests: random-utils: use seastar::testing::local_random_engine To provide test reproducibility use the seastar local_random_engine. To reproduce a run, use the --random-seed command line option with the seed printed accordingly. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190613114307.31038-1-bhalevy@scylladb.com>	2019-06-13 14:45:48 +03:00
Benny Halevy	fe2d629e20	mutation_reader_test: test_multishard_combining_reader_reading_empty_table: fix non-atomic sharing of shards_touched It needs to be a std::vector<std::atomic<bool>> otherwise threads step on wach other in shared memory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190613112359.21884-1-bhalevy@scylladb.com>	2019-06-13 14:44:43 +03:00
Piotr Sarna	2c2122e057	tests: add a test case for filtering clustering key The test cases makes sure that clustering key restriction columns are fetched for filtering if they form a clustering key prefix, but not a primary key prefix (partition key columns are missing). Ref #4541 Message-Id: <3612dc1c6c22c59ac9184220a2e7f24e8d18407c.1560410018.git.sarna@scylladb.com>	2019-06-13 10:38:56 +03:00
Piotr Sarna	c4b935780b	cql3: fix qualifying clustering key restrictions for filtering Clustering key restrictions can sometimes avoid filtering if they form a prefix, but that can happen only if the whole partition key is restricted as well. Ref #4541 Message-Id: <9656396ee831e29c2b8d3ad4ef90c4a16ab71f4b.1560410018.git.sarna@scylladb.com>	2019-06-13 10:38:47 +03:00
Piotr Sarna	adeea0a022	cql3: fix fetching clustering key columns for filtering When a column is not present in the select clause, but used for filtering, it usually needs to be fetched from replicas. Sometimes it can be avoided, e.g. if primary key columns form a valid prefix - then, they will be optimized out before filtering itself. However, clustering key prefix can only be qualified for this optimization if the whole partition key is restricted - otherwise the clustering columns still need to be present for filtering. This commit also fixes tests in cql_query_test suite, because they now expect more values - columns fetched for filtering will be present as well (only internally, the clients receive only data they asked for). Fixes #4541 Message-Id: <f08ebae5562d570ece2bb7ee6c84e647345dfe48.1560410018.git.sarna@scylladb.com>	2019-06-13 10:38:37 +03:00
Glauber Costa	8a3fe3ac9b	debian: correctly relocate python scripts Relocation of python scripts mentions scylla-server in paths explicitly. It should use {{product}} instead. The current build is failing when {{product}} is different than scylla-server Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190613012518.28784-1-glauber@scylladb.com>	2019-06-13 09:39:36 +03:00
Takuya ASADA	b1226fb15a	dist/docker/redhat: change user of scylla services to 'scylla' On branch-3.1 / master, we are getting following error: ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/data: File not owned by current euid: 0. Owner is: 999 ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999) ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/hints: File not owned by current euid: 0. Owner is: 999 ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999) ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/commitlog: File not owned by current euid: 0. Owner is: 999 ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999) ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/view_hints: File not owned by current euid: 0. Owner is: 999 ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999) It seems like owner verification of data directory fails because scylla-server process is running in root but data directory owned by scylla, so we should run services as scylla user. Fixes #4536 Message-Id: <20190611113142.23599-1-syuu@scylladb.com>	2019-06-12 20:29:06 +03:00
Takuya ASADA	60d8a99f05	dist/common/scripts/scylla_setup: verify system umask is acceptable for scylla-server To avoid 'Bad permmisons' error when user changed default umask, we need to verify system umask is acceptable for scylla-server. Fixes #4157 Message-Id: <20190612130343.6043-1-syuu@scylladb.com>	2019-06-12 20:29:06 +03:00
Avi Kivity	cac812661c	Update seastar submodule * seastar 253d6cb...ded50bd (14): > Only export sanitizer flags if used > perftune.py: use pyudev.Devices methods instead of deprecated pyudev.Device ones > Add a Sanitize build mode > Merge "perftune.py : new tuning modes" from Vlad > reactor: clarify how submit_to() destroys the function object > Export the sanitizer flags via pkgconfig > smp: Delete unprocessed work items > iotune: fixed finding mountpoint infinite loop > net: Fix dereferencing moved object > Always enable the exception scalability hack > Merge "Simple cleanups in future.hh" from Rafael > tests: introduce testing::local_random_engine > core/deleter: Fix abort when append() is called twice with a shared deleter > rpc stream: do not crash if a stream is used after eos	2019-06-12 20:28:48 +03:00
Asias He	b463d7039c	repair: Introduce get_combined_row_hash_response Currently, REPAIR_GET_COMBINED_ROW_HASH RPC verb returns only the repair_hash object. In the future, we will use set reconciliation algorithm to decode the full row hashes in working row buf. It is useful to return the number of rows inside working row buf in addition to the combined row hashes to make sure the decode is successful. It is also better to use a wrapper class for the verb response so we can extend the return values later more easily with IDL. Fixes #4526 Message-Id: <93be47920b523f07179ee17e418760015a142990.1559771344.git.asias@scylladb.com>	2019-06-12 13:51:29 +03:00
Takuya ASADA	30414d9c23	dist/ami: install scylla debug symbols by default On AMI creation, install scylla-debuginfo by default. closes #4542 Message-Id: <20190612102355.21386-1-syuu@scylladb.com>	2019-06-12 13:49:46 +03:00
Eliran Sinvani	2b44d8ed42	cql: Allow user manipulation queries to use cql keywords for a name This commit allows the CREATE/DROP/ALTER USER cql queris to use cql keywords for the user name (for example "empty"). Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190612104301.8322-1-eliransin@scylladb.com>	2019-06-12 13:48:10 +03:00
Dejan Mircevski	a52a56bfc0	utils: Add like_matcher A utility for matching text with LIKE patterns, and a battery of tests. Tests: unit(dev,debug) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-06-12 13:14:53 +03:00
Piotr Sarna	7b2de7ac5b	tests: add case for partition key index and filtering The test ensures that partition key index does not influence filtering decisions for regular columns. Ref #4539	2019-06-12 11:53:02 +02:00
Rafael Ávila de Espíndola	bf87b7e1df	logalloc: Use asan to poison free areas With this patch, when using asan, we poison segment memory that has been allocated from the system but should not be accessible to user code. Should help with debugging user after free bugs. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190607140313.5988-1-espindola@scylladb.com>	2019-06-12 11:46:45 +02:00
Piotr Sarna	adc51e57c1	cql3: fix deciding if a query uses indexing The code that decides whether a query should used indexing was buggy - a partition key index might have influenced the decision even if the whole partition key was passed in the query (which effectively means that indexing it is not necessary). Fixes #4539	2019-06-12 11:44:16 +02:00
Raphael S. Carvalho	62aa0ea3fa	sstables: fix log of failure on large data entry deletion by fixing use-after-move Fixes #4532. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190527200828.25339-1-raphaelsc@scylladb.com>	2019-06-12 10:55:46 +03:00
Juliana Oliveira	43f92ae6d5	cql: functions: add min/max/count for boolean type Explicitly add min/max/count functions and tests for boolean type. Tests: unit (release) Signed-off-by: Juliana Oliveira <juliana@scylladb.com> Message-Id: <20190612015215.GA2618@shenzou.localdomain>	2019-06-12 10:11:08 +03:00
Benny Halevy	3ad005ba17	build-ami: fix branch detection failure when not in git tree Introduced in `513d01d53e` The script is trying to determine the branch to shallow clone when an rpm is missing and has to be built. This functionality in the current implementation assumes it is being run inside a git repository, but that must not be the case if the script is triggered after local rpms were placed on the local directory. This happens when putting all necessary rpm files in: dist/ami/files And then running: dist/ami/build_ami.sh --localrpm The dist/ami/ and dist/ami/files are the only ones required for this action so querying the git repository in that situation makes no sense. Fixes #4535 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190611112455.13862-1-bhalevy@scylladb.com>	2019-06-11 19:08:02 +03:00
Piotr Sarna	1a5e5433bf	cql3: make add_restriction helper functions public In order to allow building statement restrictions manually instead of providing WHERE clause from CQL layer, helper functions that add single restrictions are made public. Message-Id: <31fa23a5e5ef927128f23b9fcb3362a2582d86bb.1560237237.git.sarna@scylladb.com>	2019-06-11 16:01:35 +03:00
Tomasz Grabiec	8c4baab81e	Merge "view: ignore duplicated key entries in progress virtual reader" from Piotr S. Build progress virtual reader uses Scylla-specific scylla_views_builds_in_progress table in order to represent legacy views_builds_in_progress rows. The Scylla-specific table contains additional cpu_id clustering key part, which is trimmed before returning it to the user. That may cause duplicated clustering row fragments to be emitted by the reader, which may cause undefined behaviour in consumers. The solution is to keep track of previous clustering keys for each partition and drop fragments that would cause duplication. That way if any shard is still building a view, its progress will be returned, and if many shards are still building, the returned value will indicate the progress of a single arbitrary shard. Fixes #4524 Tests: unit(dev) + custom monotonicity checks from tgrabiec@scylladb.com	2019-06-11 13:55:25 +02:00
Piotr Sarna	85a3a4b458	view: ignore duplicated key entries in progress virtual reader Build progress virtual reader uses Scylla-specific scylla_views_builds_in_progress table in order to represent legacy views_builds_in_progress rows. The Scylla-specific table contains additional cpu_id clustering key part, which is trimmed before returning it to the user. That may cause duplicated clustering row fragments to be emitted by the reader, which may cause undefined behaviour in consumers. The solution is to keep track of previous clustering keys for each partition and drop fragments that would cause duplication. That way if any shard is still building a view, its progress will be returned, and if many shards are still building, the returned value will indicate the progress of a single arbitrary shard. Fixes #4524 Tests: unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>	2019-06-11 13:01:31 +02:00
Nadav Har'El	5ef928a63d	coding-style.md: mention "using namespace seastar" All Scylla code is written with "using namespace seastar", i.e., no "seastar::" prefix for Seastar symbols. Document this in the coding style. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190610203948.18075-1-nyh@scylladb.com>	2019-06-11 10:39:03 +03:00
Calle Wilund	26702612f3	api.hh: Fix bool parsing in req_param Fixes #4525 req_param uses boost::lexical cast to convert text->var. However, lexical_cast does not handle textual booleans, thus param=true causes not only wrong values, but exceptions. Message-Id: <20190610140511.15478-1-calle@scylladb.com>	2019-06-10 17:11:47 +03:00
Gleb Natapov	9213d56a06	storage_proxy: align background and foreground repair metric names One is plural another is not, make them all plural. Message-Id: <20190605135940.GI25001@scylladb.com>	2019-06-10 11:34:36 +03:00
Benny Halevy	2017de9387	build-ami: delete extra parenthesis in branch_arg calculation Fixing a typo Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190610062113.5604-1-bhalevy@scylladb.com>	2019-06-10 11:29:44 +03:00
Avi Kivity	591d2968cc	storage_proxy: limit resources consumed in cross-shard operations Currently, each shard protects itself by not reading from rpc and the native transport if in-flight requests consume too much memory for that shard. However, if all shards then forward their requests to some other shard, then that shard can easily run out of memory since its load can be multiplied by the number of shards that send it requests. To protect against this, use the new Seastar smp_service_group infrastructure. We create three groups: read, write, and write ack (the latter is needed to avoid ABBA deadlocks is shard A exhausts all its resources sending writes to shard B, and shard B simulateously does the same; neither will be able to send acknowledgements, so if the writes are throttled, they will never be unthrottled until a timeout occurs). Range scans are not addressed by this patch since they are handled by multishard_mutation_query, which has its own complex cross-shard communication scheme, but it be a similar solution. Ref #1105 (missing range scan protection) Tests: unit (dev) Message-Id: <20190512142243.17795-1-avi@scylladb.com>	2019-06-07 10:53:23 +02:00
Vlad Zolotarov	20a610f6bc	fix_system_distributed_tables.py: declare the 'port' argument as 'int' If a port value passed as a string this makes the cluster.connect() to fail with Python3.4. Let's fix this by explicitly declaring a 'port' argument as 'int'. Fixes #4527 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190606133321.28225-1-vladz@scylladb.com>	2019-06-06 20:19:57 +03:00
Benny Halevy	c188f838bc	build-ami: use ssh git URLs Rather than https, for cert-based passwordless access. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190606133648.15877-2-bhalevy@scylladb.com>	2019-06-06 20:02:13 +03:00
Benny Halevy	513d01d53e	build-ami: use current git branch for shallow-clone of other repos We want to use the same branch on the other repos build-ami needs as the one we're building for. Automatically find the current branch using the `git branch` command. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190606133648.15877-1-bhalevy@scylladb.com>	2019-06-06 20:02:13 +03:00
Juliana Oliveira	fd83f61556	Add a warning for partitions with too many rows This patch adds a warning option to the user for situations where rows count may get bigger than initially designed. Through the warning, users can be aware of possible data modeling problems. The threshold is initially set to '100,000'. Tests: unit (dev) Message-Id: <20190528075612.GA24671@shenzou.localdomain>	2019-06-06 19:48:57 +03:00
Piotr Sarna	74f6ab7599	db: drop unnecessary double computation when feeding hash When feeding hash for schema digest, compact_for_schema_digest is mistakenly called twice, which may result in needless recomputation. Message-Id: <8f52201cf428a55e7057d8438025275023eb9288.1559826555.git.sarna@scylladb.com>	2019-06-06 16:16:47 +03:00
Rafael Ávila de Espíndola	b3adabda2d	Reduce logalloc differences between debug and release A lot of code in scylla is only reachable if SEASTAR_DEFAULT_ALLOCATOR is not defined. In particular, refill_emergency_reserve in the default allocator case is empty, but in the seastar allocator case it compacts segments. I am trying to debug a crash that seems to involve memory corruption around the lsa allocator, and being able to use a debug build for that would be awesome. This patch reduces the differences between the two cases by having a common segment_pool that defers only a few operations to different segment_store implementations. Tests: unit (debug, dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190606020937.118205-1-espindola@scylladb.com>	2019-06-06 12:55:56 +03:00
Nadav Har'El	95bab04cf9	docs/metrics.md: "instance" label no longer comes from Scylla Prometheus needs to remember which "instance" (node) each measurement came from. But it doesn't actually need Scylla to tell it the instance name - it knows which node it got each measurement from. After Seastar commit `79281ef287` which fixed Seastar issue https://github.com/scylladb/seastar/issues/477, the "instance" label on measurements no longer comes from Scylla but rather is added by Prometheus. This patch corrects the documentation to explain the current situation, instead of incorrectly saying that Scylla adds the "instance" label itself. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190602074629.14336-1-nyh@scylladb.com>	2019-06-06 12:42:30 +03:00
Piotr Sarna	f50f418066	types: isolate deserializing iterator to separate file In order to be used outside types.cc, listlike deserializing iterator is moved to a separate header. Message-Id: <d9416e6a8d170aa4936826b54ca7be4acb4ec8e6.1559745816.git.sarna@scylladb.com>	2019-06-05 17:46:51 +03:00
Pekka Enberg	eb00095bca	relocate_python_scripts.py: Fix node-exporter install on Debian variants The relocatable Python is built from Fedora packages. Unfortunately TLS certificates are in a different location on Debian variants, which causes "node_exporter_install" to fail as follows: Traceback (most recent call last): File "/usr/lib/scylla/libexec/node_exporter_install", line 58, in <module> data = curl('https://github.com/prometheus/node_exporter/releases/download/v{version}/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), byte=True) File "/usr/lib/scylla/scylla_util.py", line 40, in curl with urllib.request.urlopen(req) as res: File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 525, in open response = self._open(req, data) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1360, in https_open context=self._context, check_hostname=self._check_hostname) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1319, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)> Unable to retrieve version information node exporter setup failed. Fix the problem by overriding the SSL_CERT_FILE environment variable to point to the correct location of the TLS bundle. Message-Id: <20190604175434.24534-1-penberg@scylladb.com>	2019-06-04 21:12:21 +03:00
Piotr Sarna	b3396dbb57	types: migrate to_json_string to use bytes view The to_json_string utility implementation was based on const references instead of views, which can be a source of unnecessary memory copying. This patch migrates all to_json_string to use bytes_view and leaves the const reference version as a thin wrapper. Message-Id: <2bf9f1951b862f8e8a2211cb4e83852e7ac70c67.1559654014.git.sarna@scylladb.com>	2019-06-04 19:17:46 +03:00
Avi Kivity	06d77aa548	Merge "Introduce queue reader" from Botond " Technically queue_reader already exists, however so far it was a private utility in `multishard_writer.cc`. This mini-series makes it public and generally useful. The interface is made safer and simpler and the implementation is improved so it doesn't have two separate buffers. Also, unit tests are added. Tests: mutation_reader_test:debug/test_queue_reader, multishard_writer_test:debug " * 'queue_reader/v2' of https://github.com/denesb/scylla: queue_reader: use the reader's buffer as the queue Make queue_reader public	2019-06-04 13:46:15 +03:00
Botond Dénes	2ccd8ee47c	queue_reader: use the reader's buffer as the queue The queue reader currently uses two buffers, a `_queue` that the producer pushes fragments into and its internal `_buffer` where these fragments eventually end up being served to the consumer from. This double buffering is not necessary. Change the reader to allow the producer to push fragments directly into the internal `_buffer`. This complicates the code a little bit, as the producer logic of `seastar::queue` has to be folded into the queue reader. On the other hand this introduces proper memory consumption management, as well as reduces the amount of consumed memory and eliminates the possibility of outside code mangling with the queue. Another big advantage of the change is that there is now an explicit way to communicate the EOS condition, no need to push a disengaged `mutation_fragment_opt`. The producer of the queue reader now pushes the fragments into the reader via an opaque `queue_reader_handle` object, which has the producer methods of `seastar::queue`. Existing users of queue readers are refactored to use the new interface. Since the code is more complex now, unit tests are added as well.	2019-06-04 13:39:26 +03:00
Glauber Costa	cbaea172cd	python3: add the cassandra driver to the relocatable package We have a script in tree that fixes the schema for distributed system tables, like tracing, should they change their schema. We use it all the time but unfortunately it is not distributed with the scylla package, which makes it using it harder (we want to do this in the server, but consistent updates will take a while). One of the problems with the script today that makes distributing it harder is that it uses the python3 cassandra driver, that we don't want to have as a server dependency. But now with the relocatable packages in place there is no reaso not to just add it. [avi: adjust tools/toolchain/image to point to a new image with python3-cassandra-driver] Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190603162447.24215-1-glauber@scylladb.com>	2019-06-03 19:34:55 +03:00
Konstantin Osipov	29c27bfc28	storage_proxy: remove unnecessary lambdas in metrics binding Remove unnecessasry lambdas when binding metrics of the storage proxy. Message-Id: <20190603133753.1724-1-kostja@scylladb.com>	2019-06-03 16:55:16 +03:00
Botond Dénes	a597e46792	Make queue_reader public Extract it from `mutlishard_writer.cc` and move it to `mutation_reader.{hh,cc}` so other code can start using it too.	2019-06-03 12:08:37 +03:00
Takuya ASADA	25112408a7	dist/debian: support relocatable python3 on Debian variants Unlike CentOS, Debian variants has python3 package on official repository, so we don't have to use relocatable python3 on these distributions. However, official python3 version is different on each distribution, we may have issue because of that. Also, our scripts and packaging implementation are becoming presuppose existence of relocatable python3, it is causing issue on Debian variants. Switching to relocatable python3 on Debian variants avoid these issues, it will easier to manage Scylla python3 environments accross multiple distributions. Fixes #4495 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190531112707.20082-1-syuu@scylladb.com>	2019-06-02 14:59:43 +03:00
Raphael S. Carvalho	f360d5a936	sstables: export output operator for sstable run It wasn't being exported in any header. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190527182246.19007-1-raphaelsc@scylladb.com>	2019-06-02 10:25:51 +03:00
Avi Kivity	7a0c6cd583	Revert "dist/debian: support relocatable python3 on Debian variants" This reverts commit `4d119cbd6d`. It breaks build_deb.sh: 18:39:56 + seastar/scripts/perftune.py seastar/scripts/seastar-addr2line seastar/scripts/perftune.py 18:39:56 Traceback (most recent call last): 18:39:56 File "./relocate_python_scripts.py", line 116, in <module> 18:39:56 fixup_scripts(archive, args.scripts) 18:39:56 File "./relocate_python_scripts.py", line 104, in fixup_scripts 18:39:56 fixup_script(output, script) 18:39:56 File "./relocate_python_scripts.py", line 79, in fixup_script 18:39:56 orig_stat = os.stat(script) 18:39:56 FileNotFoundError: [Errno 2] No such file or directory: '/data/jenkins/workspace/scylla-master/unified-deb/scylla/build/debian/scylla-package/+' 18:39:56 make[1]: *** [debian/rules:19: override_dh_auto_install] Error 1	2019-05-29 13:58:41 +03:00
Konstantin Osipov	fcd52d6187	Update README.md with more recent build instructions on Ubuntu Building on Ubuntu 18 or 19 following the current build instructions doesn't work. Add information about a few pitfalls. Switch README.md to recommending dbuild and move the details to HACKING.md. Message-Id: <20190520152738.GA15198@atlas>	2019-05-29 12:26:12 +03:00
Takuya ASADA	4d119cbd6d	dist/debian: support relocatable python3 on Debian variants Unlike CentOS, Debian variants has python3 package on official repository, so we don't have to use relocatable python3 on these distributions. However, official python3 version is different on each distribution, we may have issue because of that. Also, our scripts and packaging implementation are becoming presuppose existence of relocatable python3, it is causing issue on Debian variants. Switching to relocatable python3 on Debian variants avoid these issues, it will easier to manage Scylla python3 environments accross multiple distributions. Fixes #4495 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190526105138.677-1-syuu@scylladb.com>	2019-05-26 13:56:30 +03:00
Glauber Costa	71c4375a66	scylla_io_setup: adjust values for i3en instances Apparently we are having some issues running iotune in the i3en instances, as the values not always make sense. We believe it is something that XFS is doing, and running fio directly on the device (no filesystem) provides more meaningful results (more according to AWS published expected values). For now, let's use fio instead. In this patch I have ran fio for our 4 dimensions in each of the three types of disks (large, xlarge, 3xlarge). Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190524111454.27956-1-glauber@scylladb.com>	2019-05-24 19:37:58 +03:00
Avi Kivity	53dfaf9121	Update seastar submodule * seastar 5cb1234b0...253d6cb69 (3): > reactor: disable nowait aio again > Merge "Restructure `timer` implementations to avoid circular dependencies" from Jesse > Fix build command in building-docker.md	2019-05-24 14:33:05 +03:00
Raphael S. Carvalho	cabeb12b4e	sstables: add output operator for sstable run the output will look like as follow: Run = { Identifier: 647044fd-d3d4-43c4-b014-b546943ead0d Fragments = { 1471=-9223317893235177836:-7063220874380325121 1478=5924386327138804918:8070482595977135657 1472=-7063202587832032132:-4903425074566642766 1473=-4903298949436784325:-2739716797579745183 1474=-2739703419744073436:-589328117804966275 1477=3734534455848060136:5924372906965333873 1476=1579822226461317527:3734518878340722529 1475=-589322393539097068:1579813857236466583 1479=8070499046054048682:9223317594733741806 } } Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190524043331.5093-1-raphaelsc@scylladb.com>	2019-05-24 08:36:08 +03:00
Paweł Dziepak	899ebe483a	Merge "Fix empty counters handling in MC" from Piotr " Before this patchset empty counters were incorrectly persisted for MC format. No value was written to disk for them. The correct way is to still write a header that informs the counter is empty. We also need to make sure that reading wrongly persisted empty counters works because customers may have sstables with wrongly persisted empty counters. Fixes #4363 " * 'haaawk/4363/v3' of github.com:scylladb/seastar-dev: sstables: add test for empty counters docs: add CorrectEmptyCounters to sstable-scylla-format sstables: Add a feature for empty counters in Scylla.db. sstables: Write header for empty counters sstables: Remove unused variables in make_counter_cell sstables: Handle empty counter value in read path	2019-05-23 13:05:53 +01:00
Piotr Jastrzebski	fdbf4f6f53	sstables: add test for empty counters Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-23 10:10:24 +02:00
Piotr Jastrzebski	e91e1a1dde	docs: add CorrectEmptyCounters to sstable-scylla-format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-23 10:10:24 +02:00
Piotr Jastrzebski	a962696e44	sstables: Add a feature for empty counters in Scylla.db. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-23 10:10:24 +02:00
Piotr Jastrzebski	b35030ae7e	sstables: Write header for empty counters When storing an empty counter we should still write its header that indicates the emptiness. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-23 10:10:08 +02:00
Amnon Heiman	f3b6c5fe2f	API: storage_proxy add CAS and View endpoints Some nodetool command in 3.0 uses the CAS and View metrics. CAS is not implemented and we don't have all the metrics for View but we still don't want those nodetool commands to fail. After this patch the following would work and will return empty: curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/cas_read/moving_average_histogram' curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/view_write/moving_average_histogram' curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/cas_write/moving_average_histogram' This patch is needed for #4416 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190521141235.20856-1-amnon@scylladb.com>	2019-05-22 14:25:17 +03:00
Avi Kivity	698f52d257	Merge "tests: Replace ad-hoc cql utilities with general ones" from Dejan " One local utility function in cql_query_test.cc duplicates an existing exception_predicate member. Another can be generalized for wider use in the future. This patch accomplishes both, retiring a to-do item. Tests: unit (dev) " * 'use-utils-predicate-in-cql_test' of https://github.com/dekimir/scylla: tests/cql: Replace equery() with cquery_nofail() tests: Add cquery_nofail() utility tests: Drop redundant function	2019-05-22 10:09:12 +03:00
Dejan Mircevski	09acb32d35	tests/cql: Replace equery() with cquery_nofail() Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-21 23:38:09 -04:00
Dejan Mircevski	a9849ecba7	tests: Add cquery_nofail() utility Most tests await the result of cql_test_env::execute_cql(). Most would also benefit from reporting errors with top-level location included. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-21 23:28:14 -04:00
Dejan Mircevski	1d8bfc4173	tests: Drop redundant function make_predicate_for_exception_message_fragment() is redundant now that exception_utils has landed. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-21 23:28:14 -04:00
Avi Kivity	d481521a2e	Update seastar submodule * seastar 3f7a5e1...5cb1234 (5): > build: Help Seastar to find Boost on Fedora 30 > Merge 'Reinstate nowait aio support' from Avi > Fix documentation link in README.md > sharded: add variants to invoke_on() that accept an smp_service_group > improve error message on AIO setup failure	2019-05-21 20:15:09 +03:00
Benny Halevy	fae4ca756c	cql3: select_statement: provide default initializer for parameters::_bypass_cache Fixes #4503 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190521143300.22753-1-bhalevy@scylladb.com>	2019-05-21 20:06:40 +03:00
Piotr Jastrzebski	a6484b28a1	sstables: Remove unused variables in make_counter_cell Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-21 12:07:31 +02:00
Piotr Jastrzebski	f711cce024	sstables: Handle empty counter value in read path Due to a bug in an sstable writer, empty counters were stored without a header. Correct way of storing empty counter is to still write a header that indicates the emptiness. Next patch in this series fixes the write path but we have to make sure that we handle incorrectly serialized counters in the read path becuase there may exist sstables with counters stored without header. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-21 12:07:12 +02:00
Takuya ASADA	a55330a10b	dist/ami: output scylla version information to AMI tags and description Users may want to know which version of packages are used for the AMI, it's good to have it on AMI tags and description. To do this, we need to download .rpm from specified .repo, extract version information from .rpm. Fixes #4499 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190520123924.14060-2-syuu@scylladb.com>	2019-05-20 15:46:06 +03:00
Takuya ASADA	abe44c28c5	dist/ami: build scylla-python3 when specified --localrpm Since we switched to relocatable python3, we need to build it for AMI too. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190520123924.14060-1-syuu@scylladb.com>	2019-05-20 15:46:05 +03:00
Konstantin Osipov	25087536bc	main: developer-mode configuraiton option uses dash, not underscore Message-Id: <20190520115524.101871-1-kostja@scylladb.com>	2019-05-20 15:14:11 +03:00
Calle Wilund	1e37e1d40c	commitlog: Add optional use of O_DSYNC mode Refs #3929 Optionally enables O_DSYNC mode for segment files, and when enabled ignores actual flushing and just barriers any ongoing writes. Iff using O_DSYNC mode, we will not only truncate the file to max size, but also do an actual initial write of zero:s to it, since XFS (intended target) has observably less good behaviour on non-physical file blocks. Once written (and maybe recycled) we should have rather satisfying throughput on writes. Note that the O_DSYNC behaviour is hidden behind a default disabled option. While user should probably seldom worry about this, we should add some sort of logic i main/init that unless specified by user, evaluates the commitlog disk and sets this to true if it is using XFS and looks ok. This is because using O_DSYNC on things like EXT4 etc has quite horrible performance. All above statements about performance and O_DSYNC behaviour are based on a sampling of benchmark results (modified fsqual) on a statistically non-ssignificant selection of disks. However, at least there the observed behaviour is a rather large difference between ::fallocate:ed disk area vs. actually written using O_DSYNC on XFS, and O_DSYNC on EXT4. Note also that measurements on O_DSYNC vs. no O_DSYNC does not take into account the wall-clock time of doing manual disk flush. This is intentionally ignored, since in the commitlog case, at least using periodic mode, flushes are relatively rare. Message-Id: <20190520120331.10229-1-calle@scylladb.com>	2019-05-20 15:10:48 +03:00
Avi Kivity	d92973ba86	Merge "scylla-gdb.py: scylla_fiber: add fallback mode" from Botond " Add a fallback-mode that can be used when the `scylla ptr` cannot be used, either because the application is not built with the seastar allocator, or due to bugs. The fallback mode relies on a more primitive method for determining how much memory to scan looking for task pointers inside the task object. This mode, being more primitive, is less prone to errors, but is more wasteful and less precise. " * 'scylla-fiber-fallback-mode/v2' of https://github.com/denesb/scylla: scylla-gdb.py: scylla_fiber: add fallback mode scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used() scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers scylla-gdb.py: scylla_fiber: fix misaligned text in docstring	2019-05-19 18:34:55 +03:00
Takuya ASADA	4b08a3f906	reloc/python3: add license files on relocatable python3 package It's better to have license files on our python3 distribution. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190516094329.13273-1-syuu@scylladb.com>	2019-05-19 18:30:19 +03:00
Jesse Haber-Kucharsky	68353a8265	build: Don't build `iotune` unconditionally We compile Seastar unconditionally so that changes to Seastar files are reflected in Scylla when it's built. We don't need to unconditionally build `iotune` in the same way. `iotune` is still listed as a build artifact, so it will be built if `ninja` is invoked without a particular target. However, building a specific target (like `ninja build/dev/scylla`) will not build `iotune`. Fixes #4165 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <9fb96a281580a8743e04d5dd11398be53960cb58.1558100815.git.jhaberku@scylladb.com>	2019-05-19 18:24:05 +03:00
Avi Kivity	5a276d44af	Merge "row_cache: Make invalidate() preemptible" from Tomasz " This patchset fixes reactor stalls caused by cache invalidation not being preemptible. This becomes a problem when there is a lot of partitions in cache inside the invalidated range. This affects high-level operations like nodetool refresh, table truncation, repair and streaming. Fixes #2683 The improvement on stalls was measured using tests/perf_row_cache_update: Before: Small partitions, no overwrites: invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]} Small partition with a few rows: invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]} Large partition, lots of small rows: invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]} After: Small partitions, no overwrites: invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]} Small partition with a few rows: invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]} Large partition, lots of small rows: invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]} The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota). Tests: - unit (dev) " * tag 'cache-preemptible-invalidation-v2' of github.com:tgrabiec/scylla: row_cache: Make invalidate() preemptible row_cache: Switch _prev_snapshot_pos to be a ring_position_ext dht: Introduce ring_position_ext dht: ring_position_view: Take key by const pointer tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion tests: perf_row_cache_update: Report stalls around invalidation	2019-05-19 10:47:46 +03:00
Takuya ASADA	f625284113	dist/debian: apply product name variable on override_dh_auto_install To make product name templatization works correctly, we cannot use "debian/scylla-server" as package contents directory path, need to use template like "debian/{{product}}-server" instead. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190517121946.18248-1-syuu@scylladb.com>	2019-05-19 10:46:08 +03:00
Gleb Natapov	31bf4cfb5e	cache_hitrate_calculator: make cache hitrate calculation preemptable The calculation is done in a non preemptable loop over all tables, so if numbers of tables is very large it may take a while since we also build a string for gossiper state. Make the loop preemtable and also make the string calculation more efficient by preallocating memory for it. Message-Id: <20190516132748.6469-3-gleb@scylladb.com>	2019-05-16 15:32:36 +02:00
Gleb Natapov	4517c56a57	cache_hitrate_calculator: do not copy stats map for each cpu invoke_on_all() copies provided function for each shard it is executed on, so by moving stats map into the capture we copy it for each shard too. Avoid it by putting it into the top level object which is already captured by reference. Message-Id: <20190516132748.6469-2-gleb@scylladb.com>	2019-05-16 15:32:24 +02:00
Dejan Mircevski	8dcb35913a	table: Avoid needless allocation of cell lockers All `table` instances currently unconditionally allocate a cell locker for counter cells, though not all need one. Since the lockers occupy quite a bit of memory (as reported in #4441), it's wasteful to allocate them when unneeded. Fixes #4441. Tests: unit (dev, debug) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190515190910.87931-1-dejan@scylladb.com>	2019-05-16 11:10:38 +03:00
Avi Kivity	5b2c8847c7	Merge "Pre timestamp based data segregation cleanup" from Botond " This series contains loosely related generic cleanup patches that the timestamp based data segregation series depends on. Most of the patches have to do with making headers self-sustainable, that is compilable on their own. This was needed to be able to ensure that the new headers introduced or touched by that series are self-sustainable too. This series also introduces `schema_fwd.hh` which contains a forward declaration of `schema` and `schema_ptr` classes. No effort was made to find and replace all existing ad-hoc schema forward declarations in the source tree. " * 'pre-timestamp-based-data-segregation-cleanup/v1' of https://github.com/denesb/scylla: encoding_stats.hh: add missing include sstables/time_window_compaction_strategy.hh: make self-sufficient sstables/size_tiered_compaction_strategy.hh: make self-sufficient sstables/compaction_strategy_impl.hh: make header self-sufficient compaction_strategy.hh: use schema_fwd.hh db/extensions.hh: use schema_fwd.hh Add schema_fwd.hh	2019-05-15 17:37:06 +03:00
Asias He	51c4f8cc47	repair: Fix use after free in remove_repair_meta for repair_metas We should capture repair_metas so that it will not be freed until the parallel_for_each is finished. Fixes: #4333 Tests: repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test Message-Id: <237b20a359122a639330f9f78c67568410aef014.1557922403.git.asias@scylladb.com>	2019-05-15 17:22:51 +03:00
Calle Wilund	e7003f1051	sstable: Make all sstable components subject to file extensions Makes opening all sstable components go through same file open routine, optionally applying extensions to each (except TOC which is special). Also ensures we read Scylla metadata before other non-TOC components, as we might need this for extensions (hint hint). Message-Id: <20190513201821.14417-1-calle@scylladb.com>	2019-05-15 17:14:58 +03:00
Botond Dénes	a0010f52c5	scylla-gdb.py: scylla_fiber: add fallback mode The current implementation of the `scylla fiber` command relies on the `scylla ptr` command to provide metadata on pointers, more specifically the boundaries of the region the object they point to occupies. However, in debug mode, seastar is using the standard allocator and thus the `scylla ptr` command doesn't work. To work around this, provide a fallback mode for debug builds. This mode assumes pointers point to the start of objetcts and scans a configurable region of memory. While less exact than the variant relying on `scylla ptr` it still works reasonably well. The size of the to-be-scanned memory region can be set using the `--scanned-region-size` command line argument. This defaults to 512. Additionally, add a flag (`--force-fallback-mode`) to force using the fallback mode. This is useful if `scylla ptr` is not working for any reason.	2019-05-15 15:46:42 +03:00
Botond Dénes	c78d667153	scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used() Determines whether the application is using the seastar allocator or not. This is done by attempting to resolve the `seastar::memory::cpu_mem` symbol. To avoid the expensive symbol lookup the result is cached. This means that loading a new inferior will possibly return the wrong value. The cache can be flushed by re-sourcing the `scylla-gdb.py` script.	2019-05-15 15:44:38 +03:00
Botond Dénes	c3a06da8fb	scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers	2019-05-15 15:43:34 +03:00
Botond Dénes	4964671e83	scylla-gdb.py: scylla_fiber: fix misaligned text in docstring	2019-05-15 15:43:29 +03:00
Avi Kivity	8e19121e98	Merge "Implement simple selection alongside aggregation" from Dejan " Although CQL allows SELECT statements with both simple and aggregate selectors, Scylla disallows them. This patch removes that restriction and ensures that mixed simple/aggregate selection works as specified both with and without GROUP BY. Tests: unit (dev) " * 'aggregate-and-simple-select-together' of https://github.com/dekimir/scylla: cql: Fix mixed selection with GROUP BY cql: Allow mixing of aggregate and simple selectors	2019-05-14 20:03:58 +03:00
Dejan Mircevski	f9b00a4318	cql: Fix mixed selection with GROUP BY GROUP BY is currently supported by simple_selection, the class used when all selectors are simple. But when selectors are mixed, we use selection_with_processing, which does not yet support GROUP BY. This patch fixes that. It also adapts one testcase in filtering_test to the new behavior of simple_selector. The test currently expects the last value seen, but simple_selector now outputs the first value seen. (More details: the WHERE clause implicitly selects the columns it references, and unit tests are forced to provide expected values for these columns. The user-visible result is unchanged in the test; users never see the WHERE column values due to filtering in cql::transport, outside unit tests.) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-14 12:50:39 -04:00
Dejan Mircevski	06e3b36164	cql: Allow mixing of aggregate and simple selectors Scylla currently rejects SELECT statements with both simple and aggregate selectors, but Cassandra allows them. This patch brings parity to Scylla. Fixes #4447. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-14 10:34:02 -04:00
Botond Dénes	fe3b798b51	scylla-gdb.py: scylla fiber: add seastar::smp_message_queue::async_work_item to the whitelist Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4c49fcf5391e027eae68707c9e6ab2f9188c2ea4.1557838171.git.bdenes@scylladb.com>	2019-05-14 17:09:32 +03:00
Avi Kivity	82b91c1511	Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz " Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460. Refs #4485. " * tag 'fix-gc_clock-digest-v2.1' of github.com:tgrabiec/scylla: tests: Add test which verifies that schema digest stays the same tests: Add sstables for the schema digest test schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition db/schema_tables: Move feed_hash_for_schema_digest() to .cc file hashing: Introduce type-erased interface for the hasher hashing: Introduce C++ concept for the hasher hashers: Rename hasher to cryptopp_hasher gc_clock: Fix hashing to be backwards-compatible	2019-05-14 16:59:50 +03:00
Tomasz Grabiec	285ada5035	Merge "config: remove _make_config_values macro" from Avi The _make_config_values macro reduces duplication (both the item name and the types need to be available as C++ identifiers and as runtime strings), but is hard to work with. The macro is huge and editors don't handle it well, errors aren't identified at the correct location, and since the macro doesn't have types, it's hard to refactor. This series replaces the macro with ordinary C++ code. Some repetition is introduced, but IMO the result is easier to maintain than the macro. As a bonus the bulk of the code is moved away from the header file. Tests: unit (dev), manual testing of the config REST API * https://github.com/avikivity/scylla config-no-macro/v2 config: make the named_value type name available without requiring _make_config_values config: remove value_status from named_value template parameter list config: add named_value::value_as_json() api: config: stop using _make_config_values config: auto-add named_values into config_file config: add allowed_values parameter to named_value constructor config: convert _make_config_values to individual named_value member declarations and initializers	2019-05-14 16:00:23 +03:00
Avi Kivity	987739898f	docs: document SSTable Scylla.db component Document the format and meaning of the various bits of the Scylla.db component. Message-Id: <20190513081605.7394-1-avi@scylladb.com>	2019-05-14 16:00:23 +03:00
Avi Kivity	786ce70dfc	doc: mention the Slack workspace as a place to get help Message-Id: <20190514090420.5598-1-avi@scylladb.com>	2019-05-14 16:00:23 +03:00
Botond Dénes	c2ec78358b	encoding_stats.hh: add missing include	2019-05-14 13:27:30 +03:00
Botond Dénes	eeacf45b4a	sstables/time_window_compaction_strategy.hh: make self-sufficient	2019-05-14 13:27:30 +03:00
Botond Dénes	9953cecc83	sstables/size_tiered_compaction_strategy.hh: make self-sufficient	2019-05-14 13:27:30 +03:00
Botond Dénes	d02c2253a5	sstables/compaction_strategy_impl.hh: make header self-sufficient Add missing includes and forward declarations. De-inline some methods.	2019-05-14 13:27:30 +03:00
Botond Dénes	20d9d18ab3	compaction_strategy.hh: use schema_fwd.hh	2019-05-14 13:27:30 +03:00
Botond Dénes	690ef09b8f	db/extensions.hh: use schema_fwd.hh	2019-05-14 13:27:30 +03:00
Botond Dénes	48bf1d5629	Add schema_fwd.hh	2019-05-14 13:27:30 +03:00
Tomasz Grabiec	6159d5522d	tests: Add test which verifies that schema digest stays the same (cherry picked from commit `8019634dba`)	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	815295547d	tests: Add sstables for the schema digest test Generated by running test_schema_digest_does_not_change with regenerate set to true. (cherry picked from commit `1f2995c8c5`)	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	9de071d214	schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition Schema digest is calculated by querying for mutations of all schema tables, then compacting them so that all tombstones in them are dropped. However, even if the mutation becomes empty after compaction, we still feed its partition key. If the same mutations were compacted prior to the query, because the tombstones expire, we won't get any mutation at all and won't feed the partition key. So schema digest will change once an empty partition of some schema table is compacted away. That's not a problem during normal cluster operation because the tombstones will expire at all nodes at the same time, and schema digest, although changes, will change to the same value on all nodes at about the same time. This fix changes digest calculation to not feed any digest for partitions which are empty after compaction. The digest returned by schema_mutations::digest() is left unchanged by this patch. It affects the table schema version calculation. It's not changed because the version is calculated on boot, where we don't yet know all the cluster features. It's possible to fix this but it's more complicated, so this patch defers that. Refs #4485. Asd	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	3a4a903674	db/schema_tables: Move feed_hash_for_schema_digest() to .cc file	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	b0eecdcb8f	hashing: Introduce type-erased interface for the hasher The motivation is to allow hiding the definition of functions accepting a hasher. For one, this reduces (re)complication times, because we can put the definition in .cc	2019-05-14 10:43:06 +02:00
Avi Kivity	1cf72b39a5	Merge "Unbreak the Unbreakable Linux" from Glauber " scylla_setup is currently broken for OEL. This happens because the OS detection code checks for RHEL and Fedora. CentOS returns itself as RHEL, but OEL does not. " * 'unbreakable' of github.com:glommer/scylla: scylla_setup: be nicer about unrecognized OS scylla_util: recognize OEL as part of the RHEL family	2019-05-13 21:38:21 +03:00
Glauber Costa	3b64727244	scylla_setup: be nicer about unrecognized OS Right now if the user tries to execute this in an unrecognized OS, the following will be thrown: Traceback (most recent call last): File "/usr/lib/scylla/libexec/scylla_setup", line 214, in <module> do_verify_package('scylla-enterprise-jmx') File "/usr/lib/scylla/libexec/scylla_setup", line 73, in do_verify_package if res != 0: UnboundLocalError: local variable 'res' referenced before assignment It would be a lot nicer to exit gracefully and print a messge saying what is going on. This was caught when running on OEL, which the previous patch fixed. Still, there are other unknown OS out there the users may try to run on. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-05-13 14:31:49 -04:00
Glauber Costa	6c15ae5b36	scylla_util: recognize OEL as part of the RHEL family Oracle Linux is a RHEL-like distribution and we support it just fine, but our new incarnation of scylla_setup is failing to recognize it. os-release for OEL is a bit different. It doesn't have an ID_LIKE string, and only shows an ID string, which is set to 'ol'. So let's recognize this. Fixes: #4493 Branches: 3.1 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-05-13 14:31:38 -04:00
Tomasz Grabiec	77fb34821b	row_cache: Make invalidate() preemptible This change inserts preemption points between removal of partitions. The main complication is in maintaining consitency in the face of concurrent population or eviction. We use the same mechanism which is used by memtable updates. _prev_snapshot_pos is the ring position which partitions the ring into the part which is already updated in cache and the one which is yet to be updated. That position should be set accordingly on preemption. In case of invalidation, updating means removing all entries in the range and marking the range as discontinuous. When resuming invalidation of a range we continue from _prev_snapshot_pos as the lower bound. This affects high-level operations like nodetool refresh, table truncation, repair and streaming. Fixes #2683 The improvement on stalls was measured using tests/perf_row_cache_update: Before Small partitions, no overwrites: invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]} Small partition with a few rows: invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]} Large partition, lots of small rows: invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]} After: Small partitions, no overwrites: invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]} Small partition with a few rows: invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]} Large partition, lots of small rows: invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]} The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).	2019-05-13 19:32:00 +02:00
Tomasz Grabiec	595e1a540e	row_cache: Switch _prev_snapshot_pos to be a ring_position_ext dht::ring_position cannot represent all ring_position_view instances, in particular those obtained from dht::ring_position_view::for_range_start(). To allow using the latter, switch to views.	2019-05-13 19:30:50 +02:00
Tomasz Grabiec	1530224377	dht: Introduce ring_position_ext It's an owning version of ring_position_view. Note that ring_position has a narrower domain than the ring_position_view for historical reasons, so we cannot use that.	2019-05-13 19:30:50 +02:00
Tomasz Grabiec	b08180c7fa	dht: ring_position_view: Take key by const pointer	2019-05-13 19:30:39 +02:00
Tomasz Grabiec	ed697306be	tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion	2019-05-13 19:18:20 +02:00
Tomasz Grabiec	b516e5fdbf	tests: perf_row_cache_update: Report stalls around invalidation	2019-05-13 10:47:03 +02:00
Avi Kivity	a8b3cb8a28	Update seastar submodule * seastar f73690e...3f7a5e1 (7): > Revert "Make sure all allocations/deallocations are properly byte aligned" > http: fix request content for POST requests > doc: discourage generic lambdas and unconstrained templates > smp: add smp_service_group for smp::submit_to() resource control > Revert "smp: add smp_service_group for smp::submit_to() resource control" > smp: add smp_service_group for smp::submit_to() resource control > Make sure all allocations/deallocations are properly byte aligned	2019-05-12 13:32:41 +03:00
Tomasz Grabiec	fd349a3c65	hashing: Introduce C++ concept for the hasher	2019-05-10 12:54:30 +02:00
Tomasz Grabiec	5c2f5b522d	hashers: Rename hasher to cryptopp_hasher So that we can introduce a truly generic interface named "hasher".	2019-05-10 12:54:08 +02:00
Tomasz Grabiec	b7ece4b884	gc_clock: Fix hashing to be backwards-compatible Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460. (cherry picked from commit `549d0eb2f3`)	2019-05-10 12:48:46 +02:00
Avi Kivity	fdace36fa5	Merge "Fixes for GCC9 build" from Paweł " This series contains fixes for GCC9 build, mostly corrections needed after changes in libstdc++. With this series and a workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415 (not included) Scylla builds and passes unit tests with GCC9 (tested on Fedora 30, development mode only). Tests: unit(dev with gcc8 and gcc9). " * tag 'gcc9-fixes/v1' of https://github.com/pdziepak/scylla: tests/imr: add missing noexcept counters: bytes_view::pointer is not const pointer imr/fundamental: use bytes_view::const_pointer for const pointer	2019-05-09 21:51:24 +03:00
Paweł Dziepak	96eec203bd	tests/imr: add missing noexcept The concepts require that serialisers passed to the IMR are noexcept. GCC9 started verifying that.	2019-05-09 17:38:24 +01:00
Paweł Dziepak	ae9e083b02	counters: bytes_view::pointer is not const pointer In libstdc++ for gcc9 std::basic_string_view::pointer isn't const any more. As a result the compiler is complaining about reinterpret_cast casting away const. The solution is to use std::conditional<> to choose between const pointer for counter view and non-const pointer for mutable counter view.	2019-05-09 17:31:35 +01:00
Paweł Dziepak	c19576319f	imr/fundamental: use bytes_view::const_pointer for const pointer In libstdc++ shipped with gcc9 std::basic_string_view::pointer is no longer constant, which is causing the compiler to complain about dropping const in reinterpret_cast. The solution is to use std::basic_string_view::const_pointer.	2019-05-09 17:30:15 +01:00
Paweł Dziepak	49b4aeca4d	Merge "hinted handoff: prevent sending attempts" from Vlad " Fix the broken logic that is meant to prevent sending hints when node is in a DOWN NORMAL state. " * 'hinted_handoff_stop_sending_to_down_node-v2' of https://github.com/vladzcloudius/scylla: hints_manager: rename the state::ep_state_is_not_normal enum value hinted handoff: fix the logic that detects that the destination node is in DN state hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper() types.cc: fix the compilation with fmt v5.3.0	2019-05-09 15:18:57 +01:00
Avi Kivity	db536776d9	tools: toolchain: fix dbuild in interactive mode regression Before `ede1d248af`, running "tools/toolchain/dbuild -it -- bash" was a nice way to play in the toolchain environment, for example to start a debugger. But that commit caused containers to run in detached mode, which is incompatible with interactive mode. To restore the old behavior, detect that the user wants interactive mode, and run the container in non-detached mode instead. Add the --rm flag so the container is removed after execution (as it was before `ede1d248af`). Message-Id: <20190506175942.27361-1-avi@scylladb.com>	2019-05-09 15:01:21 +02:00
Dejan Mircevski	d5f587b83d	Narrow down build dependences of duration_test In 0ea6df, duration_test was made to link against all tests/*.o files. This isn't necessary, as it only needs tests/exception_utils.o. This patch narrows down duration_test's dependences to only exception_utils. Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190508211630.108228-1-dejan@scylladb.com>	2019-05-09 15:01:21 +02:00
Dejan Mircevski	e4ec89473e	tests: Cover indexing errors in frozen collections Add new test cases: - disallow creating a non-FULL index on frozen collections - disallow repeated creation of a FULL index on frozen collections - disallow FULL indexes on non-frozen collections - disallow referencing frozen-map entries in the WHERE clause Also add error-message expectations to existing test cases. Fixes #3654. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190509025806.124499-1-dejan@scylladb.com>	2019-05-09 15:25:11 +03:00
Dejan Mircevski	4eeec4a452	tests: drop util.hh The file tests/util.hh was somehow committed despite `git mv`g it to tests/exception_utils.hh. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190508210203.106295-1-dejan@scylladb.com>	2019-05-09 14:45:33 +03:00
Takuya ASADA	19a973cd05	dist/ami: fix wrong path of SCYLLA-PRODUCT-FILE Since other build_*.sh are for running inside extracted relocatable package, they have SCYLLA-PRODUCT-FILE on top of the directory, but build_ami.sh is not running in such condition, we need to run SCYLLA-VERSION-GEN first, then refer to build/SCYLLA-PRODUCT-FILE. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190509110621.27468-1-syuu@scylladb.com>	2019-05-09 14:45:31 +03:00
Vlad Zolotarov	f07c341efc	hints_manager: rename the state::ep_state_is_not_normal enum value Rename this state value to better reflect the reality: state::ep_state_is_not_normal -> state::ep_state_left_the_ring The manager gets to this state when the destination Node has left the ring. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-05-08 15:46:47 -04:00
Vlad Zolotarov	93ba700458	hinted handoff: fix the logic that detects that the destination node is in DN state When node is in a DN state its gossiper state may be NORMAL, SHUTDOWN or "" depending on the use case. In addition to that if node has been removed from the ring its state is also going to be removed from the gossiper_state map. Let's consider the above when deciding if node is in the DN state. Fixes #4461 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-05-08 14:53:01 -04:00
Glauber Costa	a23531ebd5	Support AWS i3en instances AWS just released their new instances, the i3en instances. The instance is verified already to work well with scylla, the only adjustments that we need is advertise that we support it, and pre-fill the disk information according to the performance numbers obtained by running the instance. Fixes #4486 Branches: 3.1 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190508170831.6003-1-glauber@scylladb.com>	2019-05-08 20:09:44 +03:00
Avi Kivity	a86fdeb02b	Merge "Implement GROUP BY" from Dejan " Cassandra has supported GROUP BY in SELECT statements since 2016 (v3.10), while ScyllaDB currently treats it as a syntax error. To achieve parity with Cassandra in this important bit of functionality, this patch adds full support for GROUP BY, from parsing to validation to implementation to testing. " * 'groupby-implPP' of https://github.com/dekimir/scylla: Implement grouping in selection processing Propagate GROUP BY indices to result_set_builder Process GROUP BY columns into select_statement Parse GROUP BY clause, store column identifiers	2019-05-08 18:35:12 +03:00
Dejan Mircevski	d51e4a589d	Implement grouping in selection processing Make result_set_builder obey its _group_by_cell_indices by recognizing group boundaries and resetting the selectors. Also make simple_selectors work correctly when grouping. Fixes #2206. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 11:05:36 -04:00
Dejan Mircevski	c3929aee3a	Propagate GROUP BY indices to result_set_builder Ensure that the indices recorded in select_statement are passed to result_set_builder when one is created for processing the cell values. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 10:10:10 -04:00
Dejan Mircevski	274a77f45e	Process GROUP BY columns into select_statement Validate raw GROUP BY identifiers and translate them into a select_statement member. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 10:10:10 -04:00
Dejan Mircevski	e1fb414805	Parse GROUP BY clause, store column identifiers Extend the grammar file with GROUP BY, collect the column identifiers, and store them in raw::select_statement. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 10:09:22 -04:00
Avi Kivity	ab3f044daa	Revert "Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz" This reverts commit `dcb263b36b`, reversing changes made to `a6759dc6aa`. schema_change_test fails consistently on master with it.	2019-05-08 16:19:38 +03:00
JP-Reddy	56420dc650	scylla_io_setup: TypeError in iotune_args array from scylla_io_setup script Whenever the iotune_args array uses "--smp", it needs cpudata.smp() which returns an integer instead of a string. So when iotune_args is passed to subprocess.check_call(), it actually throws "TypeError: expected str, bytes or os.PathLike object, not int" but "%s did not pass validation tests, it may not be on XFS..." is shown as the exception. Even though the user inputs correct arguments, it might still throw an error and confuse the user that he/she has not passed the right arguments. One simple fix is to use str(cpudata.smp()) instead of cpudata.smp(). Signed-off-by: JP-Reddy <guthijp.reddy@gmail.com> Message-Id: <20190406070118.48477-1-guthijp.reddy@gmail.com>	2019-05-07 20:13:54 +03:00
Paweł Dziepak	8a16cbc50d	Merge "treewide: adjust for gcc 9" from Avi " gcc 9 complains a lot about pessimizing moves, narrowing conversions, and has tighter deduction rules, plus other nice warnings. Fix problems found by it, and make some non-problems compile without warnings. " * tag 'gcc9/v1' of https://github.com/avikivity/scylla: types: fix pessimizing moves thrift: fix pessimizing moves tests: fix pessimizing moves tests: cql_query_test: silence narrowing conversion warning test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T> table: fix potentially wrong schema when reading from zero sstables storage_proxy: fix pessimizing moves memtable: fix pessimizing moves IDL: silence narrowing conversion in bool serializer compaction: fix pessimizing moves cache: fix pessimizing moves locator: fix pessimizing moves database: fix pessimizing moves cql: fix pessimizing moves cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9	2019-05-07 12:19:29 +01:00
Avi Kivity	43867fe618	types: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 10:01:36 +03:00
Avi Kivity	1b760297f5	thrift: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 10:01:15 +03:00
Avi Kivity	0ff6e48e77	tests: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 10:00:58 +03:00
Avi Kivity	b60d58d6bd	tests: cql_query_test: silence narrowing conversion warning Make it explicit to gcc 9 that the conversion to bool is intended.	2019-05-07 09:59:44 +03:00
Avi Kivity	5636b621a7	test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T> gcc 9 is unable to decide whether to call role_name's copy or move constructor. Help it by casting.	2019-05-07 09:58:21 +03:00
Avi Kivity	add20eb9a6	table: fix potentially wrong schema when reading from zero sstables We use the schema during creation of the mutation_source rather than during the query itself. Likely they're the same, and since no rows are returned from a zero-sstable query, harmless. But gcc 9 complains. Fix by using the query's schema.	2019-05-07 09:56:30 +03:00
Avi Kivity	985a30a01c	storage_proxy: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:56:09 +03:00
Avi Kivity	fd3c493961	memtable: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:55:53 +03:00
Avi Kivity	17c268cd55	IDL: silence narrowing conversion in bool serializer bool serializers are now aliases to int8_t serializers, but gcc 9 complains about narrowing conversions, due to the path int8_t -> int -> bool. A bad narrowing conversion here cannot happen in practice, but massage the code a little to silence it.	2019-05-07 09:28:24 +03:00
Avi Kivity	d7cbd3dc61	compaction: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:28:12 +03:00
Avi Kivity	9c7eb95f78	cache: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:27:50 +03:00
Avi Kivity	c42d59d805	locator: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:27:27 +03:00
Avi Kivity	96a0073929	database: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:26:58 +03:00
Avi Kivity	03e9cdbfb0	cql: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:26:20 +03:00
Avi Kivity	c26ec176dd	cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9 We use uninitialized<T> (wrapping an optional<T>) to adjust to the parser's way of laying out the code, but this fails with gcc 9 (presumably for the correct reasons) when converting from uninitialized<T> back to optional<T>. Add a conversion operator to make it build.	2019-05-07 09:21:22 +03:00
Dejan Mircevski	0ea6df2cd1	tests: Add predicates for checking exception messages Many tests verify exception messages. Currently, they do so via verbose lambdas or inner functions that hide test-failure locations. This patch adds utilities for quick creation of message-checking tests and replaces existing ad-hoc methods with these new utilities. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190506210006.124645-1-dejan@scylladb.com>	2019-05-07 07:11:07 +03:00
Avi Kivity	dcb263b36b	Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz " Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460. Branches: 3.1 " * tag 'fix-gc_clock-digest-v1' of github.com:tgrabiec/scylla: tests: Add test which verifies that schema digest stays the same tests: Add sstables for the schema digest test gc_clock: Fix hashing to be backwards-compatible	2019-05-07 07:04:40 +03:00
Tomasz Grabiec	8019634dba	tests: Add test which verifies that schema digest stays the same	2019-05-06 18:43:43 +02:00
Tomasz Grabiec	1f2995c8c5	tests: Add sstables for the schema digest test Generated by running test_schema_digest_does_not_change with regenerate set to true.	2019-05-06 18:43:43 +02:00
Tomasz Grabiec	549d0eb2f3	gc_clock: Fix hashing to be backwards-compatible Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460.	2019-05-06 18:43:43 +02:00
Avi Kivity	a6759dc6aa	Update seastar submodule * seastar 4cdccae...f73690e (16): > sstring: silence technically correct but unhelpful warning in sstring move ctor > cmake: add a seastar_supports_flag function > future: Fix build with libc++'s non-trivially-constructible std::tuple<> > Revert "Make sure all allocations are properly bytes aligned" > Merge "future: simplify future_state management" from Rafael > Make sure all allocations are properly bytes aligned > util/log: use correct clock type > core/reactor: don't assume system_clock::duration is in nanoseconds > Merge "Optimize the future_state move constructor" from Rafael > rpc: don't use boost/variant.hpp directly > core/memory: Omit [[gnu::leaf]] attribute on clang > Fix build with std::filesystem > Merge "Fix clang build and tests" from Rafael > cmake: Move ) out of quotes > Merge "Fix some bugs found by (or perhaps in) gcc 9" by Avi > Deduplicate Seastar dependencies management in CMake scripts	2019-05-06 19:17:37 +03:00
Gleb Natapov	1d851a3892	messaging: catch an error that sending of CLIENT_ID may return Avoid a warning about unhandled exception. Message-Id: <20190506122718.GL21208@scylladb.com>	2019-05-06 18:13:51 +03:00
Glauber Costa	79a5351651	scylla-housekeeping: timeout eventually scylla-housekeeping always wants to run in the installation to check if we are running the latest version. This happens regardless of whether or not we said yes or no to the housekeeping scylla_setup question - as that question only deals with whether or not we want to do this through a timer. It is fine to try to run scylla-housekeeping, as long as we time it out. The current code doesn't. The naive solution is to add a timeout parameter to urllib.request.open. However, that timeout is not respected and in my tests I saw real timeouts up to four times higher the timeout we set. For a reasonable 5s timeout, this mean a 20s real timeout which can lead to a very bad user experience. This seems to be a known problem with this module according to a quick Google search. This patch then takes a slightly more complex solution and uses multiprocess to enforce a well-defined user-visible timeout. Fixes #3980 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190506122335.5707-1-glauber@scylladb.com>	2019-05-06 17:37:59 +03:00
Gleb Natapov	b8188e1e2f	storage_proxy: avoid copying of a topology and endpoint array in batchlog code batchlog make copies of topology and endpoint array in batchlog endpoint choosing code. There is a remark that at least endpoint copy is deliberate because Cassandra code has it. We do not have to follow. Our endpoint calculation code is atomic, so we can use a reference. Message-Id: <20190506115815.GK21208@scylladb.com>	2019-05-06 17:36:50 +03:00
Raphael S. Carvalho	ef5681486f	compaction: do not unconditionally delete a new sstable in interrupted compaction After incremental compaction, new sstables may have already replaced old sstables at any point. Meaning that a new sstable is in-use by table and a old sstable is already deleted when compaction itself is UNFINISHED. Therefore, we should NEVER delete a new sstable unconditionally for an interrupted compaction, or data loss could happen. To fix it, we'll only delete new sstables that didn't replace anything in the table, meaning they are unused. Found the problem while auditting the code. Fixes #4479. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190506134723.16639-1-raphaelsc@scylladb.com>	2019-05-06 16:55:36 +03:00
Avi Kivity	1c65ba6e66	Use correct scylla_tables schema for removing version column Mutations carry their schema, so use that instead of bring in a global schema, which may change as features are added. Message-Id: <20190505132542.6472-1-avi@scylladb.com>	2019-05-06 13:51:08 +02:00
Paweł Dziepak	51e98e0e11	tests/perf_fast_forward: report average number of aio operations perf_fast_forward is used to detect performance regressions. The two main metrics used for this are fargments per second and the number of the IO operations. The former is a median of a several runs, but the latter is just the actual number of asynchronous IO operations performed in the run that happened to be picked as a median frag/s-wise. There's no always a direct correlation between frag/s and aio and the latter can vary which makes the latter hard to compare. In order to make this easier a new metric was introduced: "average aio" which reports the average number of asynchronous IO operations performed in a run. This should produce much more stable results and therefore make the comparison more meaningful. Message-Id: <20190430134401.19238-1-pdziepak@scylladb.com>	2019-05-06 11:47:31 +02:00
Piotr Sarna	cf8d2a5141	Revert "view: cache is_index for view pointer" This reverts commit `dbe8491655`. Caching the value was not done in a correct manner, which resulted in longevity tests failures. Fixes #4478 Branches: 3.1 Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>	2019-05-06 11:45:46 +03:00
Benny Halevy	d9136f96f3	commitlog: descriptor: skip leading path from filename std::regex_match of the leading path may run out of stack with long paths in debug build. Using rfind instead to lookup the last '/' in in pathname and skip it if found. Fixes #4464 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>	2019-05-05 17:51:56 +03:00
Benny Halevy	3a2fa82d6e	time_window_backlog_tracker: fix use after free Fixes #4465 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190430094209.13958-1-bhalevy@scylladb.com>	2019-05-05 12:47:51 +03:00
Glauber Costa	47d04e49e8	scylla_setup: respect user's decision not to call housekeeping The setup script asks the user whether or not housekeeping should be called, and in the first time the script is executed this decision is respected. However if the script is invoked again, that decision is not respected. This is because the check has the form: if (housekeeping_cfg_file_exists) { version_check = ask_user(); } if (version_check) { do_version_check() } else { dont_do_it() } When it should have the form: if (housekeeping_cfg_file_exists) { version_check = ask_user(); if (version_check) { do_version_check() } else { dont_do_it() } } (Thanks python) This is problematic in systems that are not connected to the internet, since housekeeping will fail to run and crash the setup script. Fixes #4462 Branches: master, branch-3.1 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190502034211.18435-1-glauber@scylladb.com>	2019-05-02 18:46:41 +03:00
Glauber Costa	99c00547ad	make scylla_util OS detection robust against empty lines Newer versions of RHEL ship the os-release file with newlines in the end, which our script was not prepared to handle. As such, scylla_setup would fail. This patch makes our OS detection robust against that. Fixes #4473 Branches: master, branch-3.1 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190502152224.31307-1-glauber@scylladb.com>	2019-05-02 18:33:35 +03:00
Paweł Dziepak	cf451f0e62	Merge "gdb: Fixes and improvements to memory analysis" from Tomasz " One of the fixes is for incorrect recognition of memory pages as belonging or not belonging to small allocation pools in some cases. Also, compensates for https://github.com/scylladb/seastar/issues/608 in "scylla memory", which improves accurracy of the small allocation pool report. Fixes "scylla task_histogram" to not look into pages which do not belong to live small allocation pool spans. Fixes #4367 Fixes #4368 " * tag 'gdb-fix-span-qualification-v2' of github.com:tgrabiec/scylla: gdb: Print size of large allocations in 'scylla ptr' gdb: Fix 'scylla ptr' for free pages gdb: Set is_live and offset for large allocations properly in 'scylla ptr' gdb: Fix 'scylla ptr' misqualifying pointers gdb: Make 'scylla memory' show unused memory in small pools gdb: Fix small pool memory usage reporting in 'scylla memory' gdb: Switch 'scylla memory' to use the span_checker to find large spans gdb: Switch task_histogram to use the span_checker gdb: Introduce span_checker	2019-05-02 14:25:30 +01:00
Gleb Natapov	95c6d19f6c	batchlog_manager: fix array out of bound access endpoint_filter() function assumes that each bucket of std::unordered_multimap contains elements with the same key only, so its size can be used to know how many elements with a particular key are there. But this is not the case, elements with multiple keys may share a bucket. Fix it by counting keys in other way. Fixes #3229 Message-Id: <20190501133127.GE21208@scylladb.com>	2019-05-01 17:30:11 +03:00
Nadav Har'El	2710f382de	secondary index: expand test of secondary-index and UPDATE requests The existing unit test test_secondary_index_contains_virtual_columns reproduced a bug (issue #4144) with indexing of primary-key columns, but we only actually tested clustering columns. In issue #4471 there was a question whether we may still have a bug when indexing of partition-key columns. This patch adds a test that verifies that we don't, and this case works well too. Refs #4144 Refs #4471 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190501113500.25900-1-nyh@scylladb.com>	2019-05-01 12:53:23 +01:00
Nadav Har'El	a45b6e41a0	materialized views and secondary index: sometimes allow dropping base columns Until this patch, dropping columns from a table was completely forbidden if this table has any materialized views or secondary indexes. However, this is excessively harsh, and not compatible with Cassandra which does allow dropping columns from a base table which has a secondary index on other columns. This incompatibility was raised in the following Stackoverflow question: https://stackoverflow.com/questions/55757273/error-while-dropping-column-from-a-table-with-secondary-index-scylladb/55776490 In this patch, we allow dropping a base table column if none of its materialized views needs this column. Columns selected by a view (as regular or key columns) are needed by it, of course, but when virtual columns are used (namely, there is a view with same key columns as the base), all columns are needed by the view, so unfortunately none of the columns may be dropped. After this patch, when a base-table column cannot be dropped because one of the materialized views needs it, the error message will look like: exceptions::invalid_request_exception: Cannot drop column a from base table ks.cf: a materialized view cf_a_idx_index needs this column. This patch also includes extensive testing for the cases where dropping columns are now allowed, and not allowed. The secondary-index tests are especially interesting, because they demonstrate that now usually (when a non-key column is being indexed) dropping columns will be allowed, which is what originally bothered the Stackoverflow user. Fixes #4448. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190429214805.2972-1-nyh@scylladb.com>	2019-04-30 12:13:10 +01:00
Nadav Har'El	92d5f61ba5	cql: support single-value IN restriction wherever EQ restriction is supported There are several places were IN restrictions are not currently supported, especially in queries involving a secondary index. However, when the IN restriction has just a single value, it is nothing more than an equality restriction and can be converted into one and be supported. So this patch does exactly this. Note that Cassandra does this conversion since August 2016, and therefore supports the special case of single-value IN even where general IN is not supported. So it's important for Cassandra compatibility that we do this conversion too. This patch also includes a test with two queries involving a secondary index that were previously disallowed because of the "IN" on the primary key or the indexed column - and are now allowed when the IN restriction has just a single value. A third query tested is not related to secondary indexes, but confirms we don't break multi-column single-value IN queries. Fixes #4455. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190428160317.23328-1-nyh@scylladb.com>	2019-04-30 12:13:06 +01:00
Tomasz Grabiec	1adcb3637e	Merge "multishard reader: fix handling of non strictly monotonous positions" from Botond The shard readers of the multishard reader assumed that the positions in the data stream are strictly monotonous. This assumption is invalid. Range tombstones can have positions that they can share with other range tombstones and/or a clustering row. The effect of this false assumption was that when the shard reader was evicted such that the last seen fragment was a range tombstone, when recreated it would skip any unseen fragments that have the same position as that of the last seen range tombstone. Fixes: #4418 Branches: master, 3.0, 2019.1 Tests: unit(dev) * https://github.com/denesb/scylla.git multishard_reader_handle_non_strictly_monotonous_positions/v4: multishard_combining_reader: shard_reader::remote_reader extract fill-buffer logic into do_fill_buffer() mutlishard_combining_reader: reorder shard_reader::remote_reader::do_fill_buffer() code position_in_partition_view: add region() accessor multishard_combining_reader: fix handling of non-strictly monotonous positions flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice tests: add unit test for multishard reader correctly handling non-strictly monotonous positions	2019-04-30 12:35:28 +02:00
Tomasz Grabiec	077c639e42	Merge "Simplify the result_set_row API" from Rafael Currently null and missing values are treated differently. Missing values throw no_such_column. Null values return nullptr, std::nullopt or throw null_column_value. The api is a bit confusing since a function returning a std::optional either returns std::nullopt or throws depending on why there is no value. With this patch series only get_nonnull throws and there is only one exception type. * https://github.com/espindola/scylla.git espindola/merge-null-and-missing-v2: query-result-set: merge handling of null and missing values Remove result_set_row::has Return a reference from get_nonnull	2019-04-30 11:06:29 +02:00
Rafael Ávila de Espíndola	63c47117b5	Return a reference from get_nonnull No reason to copy if we don't have to. Now that get_nonnull doesn't copy, replace a raw used of get_data_value with it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 21:14:11 -07:00
Rafael Ávila de Espíndola	0474458872	Remove result_set_row::has Now that the various get methods return nullptr or std::nullopt on missing values, we don't need to do double lookups. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 19:56:26 -07:00
Rafael Ávila de Espíndola	2770b29036	query-result-set: merge handling of null and missing values Nothing seems to differentiate a missing and a null value. This patch then merges the two exception types and now the only method that throws is get_nonnull. The other methods return nullptr or std::nullopt as appropriate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 19:56:20 -07:00
Avi Kivity	3726a4fbd9	Merge "Fix schema disagreement during rolling upgrade" from Tomasz " After `7c87405`, schema sync includes system_schema.view_virtual_columns in the schema digest. Old nodes don't know about this table and will not include it in the digest calculation. As a result, there will be schema disagreement until the whole cluster is upgraded. Also, the order in which tables were hashed changed in `7c87405`, which causes digests to differ in some schemas. Fixes #4457. " * tag 'fix-disagreement-during-upgrade-v2' of github.com:tgrabiec/scylla: db/schema_tables: Include view_virtual_columns in the digest only when all nodes do storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature db/schema_tables: Hash schema tables in the same order as on 3.0 db/schema_tables: Remove table name caching from all_tables() treewide: Propagate schema_features to db::schema::all_tables() enum_set: Introduce full() service/storage_service: Introduce cluster_schema_features() schema: Introduce schema_features schema_tables: Propagate storage_service& to merge_schema() gms/feature: Introduce a more convenient when_enabled() gms/feature: Mark all when_enabled() overloads as const	2019-04-29 14:23:53 +03:00
Avi Kivity	ede1d248af	tools: toolchain: improve dbuild signal handing Currently, we use --sig-proxy to forward signals to the container. However, this requires the container's co-operation, which usually doesn't exist. For example, docker run --sig-proxy fedora:29 bash -c "sleep 5" Does not respond to ctrl-C. This is a problem for continuous integration. If a build is aborted, Jenkins will first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give up and use SIGKILL. If the graceful termination doesn't work, we end up with an orphan container running on the node, which can then consume enough memory and CPU to harm the following jobs. To fix this, trap signals and handle them by killing the container. Also trap shell exit, and even kill the container unconditionally, since if Jenkins happens to kill the "docker wait" process the regular paths will not be taken. We lose a lot by running the container asynchronously with the dbuild shell script, so we need to add it back: - log display: via the "docker logs" command - auto-removal of the container: add a "docker rm -f" command on signal or normal exit Message-Id: <20190424130112.794-1-avi@scylladb.com>	2019-04-29 10:05:21 +02:00
Botond Dénes	aa18bb33b9	tests: add unit test for multishard reader correctly handling non-strictly monotonous positions	2019-04-29 10:24:14 +03:00
Botond Dénes	51e81cf027	flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice To be able to support this new overload, the reader is made partition-range aware. It will now correctly only return fragments that fall into the partition-range it was created with. For completeness' sake and to be able to test it, also implement `fast_forward_to(const dht::partition_range)`. Slicing is done by filtering out non-overlapping fragments from the initial list of fragments. Also add a unit test that runs it through the mutation_source test suite.	2019-04-29 10:24:14 +03:00
Tomasz Grabiec	c96ee9882b	db/schema_tables: Include view_virtual_columns in the digest only when all nodes do After `7c87405`, schema sync includes system_schema.view_virtual_columns in the schema digest. Old nodes don't know about this table and will not include it in the digest calculation. As a result, there will be schema disagreement until the whole cluster is upgraded. Fix this by taking the new table into account only when the whole cluster is upgraded. The table should not be used for anything before this happens. This is not currently enforced, but should be. Fixes #4457.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	a108df09f9	storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature Needed for determining if all nodes in the cluster are aware of the new schema table. Only when all nodes are aware of it we can take it into account when calculating schema digest, otherwise there would be permanent schema disagreement in during rolling upgrade.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	73b859005c	db/schema_tables: Hash schema tables in the same order as on 3.0 The commit `7c87405` also indirectly changed the order of schema tables during hash calculation (index table should be taken after all other tables). This shows up when there is an index created and any of {user defined type, function, or aggregate}. Refs #4457.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	394a684a99	db/schema_tables: Remove table name caching from all_tables() The set of table names will depend on the features and thus will be dynamic.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	3cb7b2d72e	treewide: Propagate schema_features to db::schema::all_tables()	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	f33f0d759d	enum_set: Introduce full()	2019-04-28 15:50:12 +02:00
Tomasz Grabiec	1d9b88dceb	service/storage_service: Introduce cluster_schema_features()	2019-04-28 15:50:12 +02:00
Tomasz Grabiec	0633fcde10	schema: Introduce schema_features	2019-04-28 15:50:12 +02:00
Tomasz Grabiec	6e2c190b5f	schema_tables: Propagate storage_service& to merge_schema() We will need to calculate cluster schema features at the time we calculate the schema digest.	2019-04-28 12:33:10 +02:00
Tomasz Grabiec	6db002163f	gms/feature: Introduce a more convenient when_enabled() It can be invoked with a lambda without the ceremony of creating a class deriving from gms::feature::listener. The reutrned registration object controls listener's scope.	2019-04-28 12:33:10 +02:00
Tomasz Grabiec	22c07b9183	gms/feature: Mark all when_enabled() overloads as const	2019-04-28 12:33:10 +02:00
Rafael Ávila de Espíndola	ee9f3388f6	cql_query_test: Fix a use after return There was nothing keeping the verify lambda alive after the return. It worked most of the time since the only state kept by the lambda was a pointer to cql_test_env. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190426203823.15562-1-espindola@scylladb.com>	2019-04-27 08:06:35 +03:00
Avi Kivity	07d06aee43	Update seastar submodule * seastar e84d2647c...4cdccae53 (4): > Merge "future: Move some code out of line" from Rafael > tests: socket_test: Add missing virtual and override > build: Don't pass -Wno-maybe-uninitialized to clang > Merge "expose file_permssions for creating files and dirs in API" from Benny	2019-04-26 22:58:48 +03:00
Tomasz Grabiec	c6274fdef3	keys: Avoid implicit conversion to partition_key in the hasher of partition_key_view Message-Id: <1556230107-13557-1-git-send-email-tgrabiec@scylladb.com>	2019-04-26 20:02:35 +03:00
Botond Dénes	bc08f8fd07	flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice To be able to run the mutation-source test suite with this reader. In the next patch, this reader will be used in testing another reader, so it is important to make sure it works correctly first.	2019-04-26 12:43:45 +03:00
Botond Dénes	eba310163d	multishard_combining_reader: fix handling of non-strictly monotonous positions The shard readers under a multishard reader are paused after every operation executed on them. When paused they can be evicted at any time. When this happens, they will be re-created lazily on the next operation, with a start position such that they continue reading from where the evicted reader left off. This start position is determined from the last fragment seen by the previous reader. When this position is clustering position, the reader will be recreated such that it reads the clustering range (from the half-read partition): (last-ckey, +inf). This can cause problems if the last fragment seen by the evicted reader was a range-tombstone. Range tombstones can share the same clustering position with other range tombstones and potentially one clustering row. This means that when the reader is recreated, it will start from the next clustering position, ignoring any unread fragments that share the same position as the last seen range tombstone. To fix, ensure that on each fill-buffer call, the buffer contains all fragments for the last position. To this end, when the last fragment in the buffer is a range tombstone (with pos x), we continue reading until we see a fragment with a position y that is greater. This way it is ensured that we have seen all fragments for pos x and it is safe to resume the read, starting from after position x.	2019-04-26 11:38:12 +03:00
Botond Dénes	b30af48c83	position_in_partition_view: add region() accessor	2019-04-26 11:38:12 +03:00
Vlad Zolotarov	274b9d8069	hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check gossiper::is_alive() has a lot of not needed checks (e.g. is_me(ep)) that are irrelevant for HH use case and we may safely skip them. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:16:07 -04:00
Vlad Zolotarov	74b4076ceb	hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper() sender has its own reference to the local gossiper - use it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:04:02 -04:00
Vlad Zolotarov	fe82437dea	types.cc: fix the compilation with fmt v5.3.0 Compilation fails with fmt release 5.3.0 when we print a bytes_view using "{}" formatter. Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument" Fix this by explicitly using to_hex() converter. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:04:02 -04:00
Piotr Sarna	037b517c85	service: initialize system distributed keyspace after schema agreement In order to avoid schema disagreements during upgrades (which may lead to deadlocks), system distributed keyspace initialization is moved right before starting the bootstrapping process, after the schema agreement checks already succeeded. Fixes #3976 Message-Id: <932e642659df1d00a2953df988f939a81275774a.1556204185.git.sarna@scylladb.com>	2019-04-25 18:44:08 +02:00
Raphael S. Carvalho	ccb29c6c20	sstables: make partitioned sstable set available to custom compaction strategies To make it available, we'll need to make it optional the usage of level metadata, used to deal with interval map's fragmentation issue when level 0 falls behind, and also introduce a interface for compaction strategies to implement make_sstable_set() that instantiate partitioned sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190424232948.668-1-raphaelsc@scylladb.com>	2019-04-25 12:59:04 +03:00
Botond Dénes	a3f79bfe5e	mutlishard_combining_reader: reorder shard_reader::remote_reader::do_fill_buffer() code Reduce the number of indentations - use early return for the short path.	2019-04-24 10:55:16 +03:00
Botond Dénes	bbd3f0acc3	multishard_combining_reader: shard_reader::remote_reader extract fill-buffer logic into do_fill_buffer()	2019-04-24 10:55:16 +03:00
Avi Kivity	b19792405f	main: RAII-ify shutdown Instead of app-template::run_deprecated() and at_exit() hooks, use app_template::run() and RAII (via defer()) to stop services. This makes it easier to add services that do support shutdown correctly. Ref #2737 Message-Id: <20190420175733.29454-1-avi@scylladb.com>	2019-04-23 16:13:39 +02:00
Avi Kivity	9a6c86e2a7	config: convert _make_config_values to individual named_value member declarations and initializers While causing some duplication (names are explicitly instead of implicitly stringified, and names are repeated in the member declaration and initializer), it is overall more maintainable than the huge macro. It is easier to overload named_value constructors when you can get error reporting on the line where the error occurs, for example.	2019-04-23 16:29:03 +03:00
Avi Kivity	4b3c2f6514	config: add allowed_values parameter to named_value constructor The _make_config_values() macro supples an optional list of allowed values for a config item, so support that, even though no one uses it yet.	2019-04-23 16:29:03 +03:00
Avi Kivity	d959fbfc16	config: auto-add named_values into config_file By passing a config_file into named_value, we remove another call to the _make_config_values() macro.	2019-04-23 16:29:03 +03:00
Avi Kivity	b663cd1765	api: config: stop using _make_config_values Now that named_value::value_as_json() exists, make use of it to report the current value of a configuration variable via the REST API, instead of _make_config_values().	2019-04-23 16:29:03 +03:00
Avi Kivity	6033b6a079	config: add named_value::value_as_json() Currently, the REST API does its own conversion of named_value into json. This requires it to use the _make_config_values macro to perform iteration of all config items, since it needs to preserve the concrete type of the item while iterating, so it can select the correct json conversion. Since we want to remove that macro, we need to provide a different way to convert a config item to json. So this patch adds a value_as_json(). To hide json_return_value from the rest of the system, we extend config_type with a conversion function to handle the details. This usually calls the json_return_type constructor directly, but when it doesn't have default translation, it interposes a conversion into a type that json recognizes. I didn't bother maintaining the existing type names, since they're C++ names which don't make sense for the UI.	2019-04-23 16:28:19 +03:00
Avi Kivity	db3f61776f	config: remove value_status from named_value template parameter list The value_status is only needed at run-time, and removing it from the template parameter list reduces type proliferation (which leads to code bloat) and simplifies the code.	2019-04-23 16:15:28 +03:00
Avi Kivity	daf5744daa	config: make the named_value type name available without requiring _make_config_values I want to remove the _make_config_values macro, but it is needed now in api/config.cc to make the type names available. So as a first step, copy the type names to config_src. Further changes can extract it from there. Because we want to add more type infomation in following patches, place the type name in a new config_type object, instead of allocating a string_view in config_src.	2019-04-23 16:13:54 +03:00
Tomasz Grabiec	21fbf59fa8	lsa: Fix compact_and_evict() being called with a too low step compact_and_evict gets memory_to_release in bytes while reclamation step is in segments. Broken in `f092decd90`. It doesn't make much difference with the current default step of 1 segment since we cannot reclaim less than that, so shouldn't cause problems in practice. Message-Id: <1556013920-29676-1-git-send-email-tgrabiec@scylladb.com>	2019-04-23 13:14:43 +03:00
Gleb Natapov	c6b3b9ff13	cache_hitrate_calculator: wait for ongoing calculation to complete during stop Currently stop returns ready future immediately. This is not a problem since calculation loop holds a shared pointer to the local service, so it will not be destroyed until calculation completes and global database object db, that also used by the calculation, is never destroyed. But the later is just a workaround for a shutdown sequence that cannot handle it and will be changed one day. Make cache hitrate calculation service ready for it. Message-Id: <20190422113538.GR21208@scylladb.com>	2019-04-22 14:44:42 +03:00
Takuya ASADA	64c2aa8f9b	reloc/python3: add missing SCYLLA-PRODUCT-FILE to python3 relocatable package Since `214c74a`, we need SCYLLA-PRODUCT-FILE on relocatable package so add it on python3 package as well. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190422085620.22486-1-syuu@scylladb.com>	2019-04-22 13:56:38 +03:00
Gleb Natapov	306f5b99b5	cache_hitrate_calculator: fix use after free in non_system_filter lambda non_system_filter lambda is defined static which means it is initialized only once, so the 'this' that is will capture will belong to a shard where the function runs first. During service destruction the function may run on different shard and access already other's shard service that may be already freed. Fixed #4425 Message-Id: <20190421152139.GN21208@scylladb.com>	2019-04-21 18:22:31 +03:00
Amnon Heiman	9ad63efcfe	Adding node_exporter to docker This patch add the node_exporter to the docker image. It install it create and run a service with it. After this patch node_exporter will run and will be part of scylla Docker image. Fixes #4300 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190421130643.6837-1-amnon@scylladb.com>	2019-04-21 18:12:58 +03:00
Benny Halevy	0c9aaef673	sstables: make lamdas that std:move mutable As noticed by Rafael Ávila de Espíndola <espindola@scylladb.com> regarding commit `5a99023d4a`: Without the lambda being mutable, the second std::move actually doesn't move anything. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421150422.19304-1-bhalevy@scylladb.com>	2019-04-21 18:11:42 +03:00
Benny Halevy	5a99023d4a	treewide: use lambda for io_check of *touch_directory To prepare for a seastar change that adds an optional file_permissions parameter to touch_directory and recursive_touch_directory. This change messes up the call to io_check since the compiler can't derive the Func&& argument. Therefore, use a lambda function instead to wrap the call to {recursive_,}touch_directory. Ref #4395 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>	2019-04-21 12:04:39 +03:00
Tomasz Grabiec	f092decd90	lsa: Fix potential bad_alloc even though evictable memory exists When we start the LSA reclamation it can be that segment_pool::_free_segments is 0 under some conditions and segment_pool::_current_emergency_reserve_goal is set to 1. The reclamation step is 1 segment, and compact_and_evict_locked() frees 1 segment back into the segment_pool. However, segment_pool::reclaim_segments() doesn't free anything to the standard allocator because the condition _free_segments > _current_emergency_reserve_goal is false. As a result, tracker::impl::reclaim() returns 0 as the amount of released memory, tracker::reclaim() returns memory::reclaiming_result::reclaimed_nothing and the seastar allocator thinks it's a real OOM and throws std::bad_alloc. The fix is to change compact_and_evict() to make sure that reserves are met, by releasing more if they're not met at entry. This change also allows us to drop the variant of allocate_segment() which accepts the reclamation step as a means to refill reserves faster. This is now not needed, because compact_and_evict() will look at the reserve deficit to increase the amount of memory to reclaim. Fixes #4445 Message-Id: <1555671713-16530-1-git-send-email-tgrabiec@scylladb.com>	2019-04-20 09:17:49 +03:00
Avi Kivity	704600f829	Update seastar submodule * seastar eb03ba5cd...e84d2647c (14): > Fix hardcoded python paths in shebang line > Disable -Wmaybe-uninitialized everywhere > app_template: allow opting out of automatic SIGINT/SIGTERM handling > build: Restore DPDK machine inference from cflags > http: capture request content for POST requests > Merge "Simplify future_state and promise" from Rafael > temporary_buffer: fix memleak on fast path > perftune.py: allow explicitly giving a CPU mask to be used for binding IRQs > perftune.py: fix the sanity check for args.tune > perftune.py: identify fast-path hardware queues IRQs of Mellanox NICs > memory: malloc_allocator should be always available > Merge "Using custom allocator in the posix network stack" from Elazar > memory: Tell reclaimers how much should be reclaimed > net/ipv4_addr: add std::hash & operator== overloads	2019-04-20 09:16:53 +03:00
Avi Kivity	d485facea2	Revert "tools: toolchain: improve dbuild signal handing" This reverts commit `6c672e674b`. It loses build logs, and the patch that restores logs causes build failures, so the whole thing needs to be revisited.	2019-04-19 15:16:42 +03:00
Takuya ASADA	0a874f1897	dist/docker/redhat: prioritize /opt/scylladb/python3/bin on $PATH To prevent running entrypoint script in another python3 package like python36 in EPEL, move /opt/scylladb/python3/bin to top of $PATH. It won't happen on this container image, but may occurs when user tries to extend the image. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190417165806.12212-1-syuu@scylladb.com>	2019-04-19 11:47:40 +03:00
Takuya ASADA	c3dae6673f	dist/common/scripts: use out() to run perftune.py perftune.py executes hwloc-calc, the command is now provided as relocatable binary, placed under /opt/scylladb/bin. So we need to add the directory to PATH when calling subprocess.check_output(), but our utility function already do that, switch to it. Fixes #4443 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190418124345.24973-1-syuu@scylladb.com>	2019-04-19 11:47:40 +03:00
Benny Halevy	9785754e0d	distributed_loader: do not follow symlinks when verifying mode and owner We allow only regular files and directotries so to detect symlinks we must not follow them. Fixes #4375 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190418051627.9298-1-bhalevy@scylladb.com>	2019-04-19 11:47:40 +03:00
Takuya ASADA	214c74a71d	dist: merge product name parameter on single place When we add product name customization, we mistakenly defined the parameter on each package build script. Number of script is increasing since we recently added relocatable python3 package, we should merge it in single place. Also we should save the parameter on relocatable package, just like version-release parameters. So move the definition to SCYLLA-VERSION-GEN, save it to build/SCYLLA-PRODUCT-FILE then archive it to relocatable package. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190417163335.10191-1-syuu@scylladb.com>	2019-04-19 11:47:40 +03:00
Paweł Dziepak	d47ea66ec6	messaging_service: add lz4_fragmented RPC compressor Seastar now supports two RPC compression algorithm: the original LZ4 one and LZ4_FRAGMENTED. The latter uses lz4 stream interface which allows it to process large messages without fully linearising them. Since, RPC requests used by Scylla often contain user-provided data that potentially could be very large, LZ4_FRAGMENTED is a better choice for the default compression algorithm. Message-Id: <20190417144318.27701-1-pdziepak@scylladb.com>	2019-04-18 19:07:14 +03:00
Takuya ASADA	592fec32a0	dist/common/scripts: use /etc/os-release to detect distributions Since we moved relocatable .rpm now Scylla able to run on Amazon Linux 2. However, is_redhat_variant() on scylla_util.py does not works on Amazon Linux 2, since it does not have /etc/redhat-release. So we need to switch to /etc/os-release, use ID_LIKE to detect Redhat variants/Debian variants. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190417115634.9635-1-syuu@scylladb.com>	2019-04-18 19:07:14 +03:00
Takuya ASADA	3cf7cf015a	dist/docker/redhat: use relocatable python3 on docker-entrypoint.py Switch to relocatable python3 instead of EPEL's python3 on docker-entrypoint.py. Also drop uneeded dependencies, since we switched to relocatable scylla image. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190417111024.6604-1-syuu@scylladb.com>	2019-04-18 19:07:14 +03:00
Paweł Dziepak	85409c1a16	Merge "Validate elements of collections" from Piotr " Previously we weren't validating elements of collections so it was possible to add non-UTF-8 string to a column with type list<text>. Tests: unit(release) Fixes #4009 " * 'haaawk/4009/v5' of github.com:scylladb/seastar-dev: types: Test correct map validation types: Test correct in clause validation types: Test correct tuple validation types: Test correct set validation types: Test correct list validation types: Add test_tuple_elements_validation types: Add test_in_clause_validation types: Add test_map_elements_validation types: Add test_set_elements_validation types: Add test_list_elements_validation types: Validate input when tuples types: Validate input when parsing a set types: Validate input when parsing a map types: Validate input when parsing a list types: Implement validation for tuple types: Implement validation for set types: Implement validation for map types: Implement validation for list types: Add cql_serialization_format parameter to validate	2019-04-18 19:07:14 +03:00
Botond Dénes	6e85d1e8c1	date_type_impl: add notice explaining why its not used And why is it still in the code. The note has been copied from Origin. Refs: #4419 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c7790a898c331a7f58014d82a10cbc9ee7ad3265.1555483620.git.bdenes@scylladb.com>	2019-04-18 19:07:14 +03:00
Piotr Jastrzebski	134b59a425	table_helper: take insert function arguments by value Previous version wasn't working correctly with r-values. Fixes #4438 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <5017b04901c47bd826b2e411e603ce01e42a83a5.1555424512.git.piotr@scylladb.com>	2019-04-16 17:34:35 +03:00
Tomasz Grabiec	5dc3f5ea33	Merge "Properly enable MC format on the cluster" from Piotr 1. All nodes in the cluster have to support MC_SSTABLE_FEATURE 2. When a node observes that whole cluster supports MC_SSTABLE_FEATURE then it should start using MC format. 3. Once all shards start to use MC then a node should broadcast that unbounded range tombstones are now supported by the cluster. 4. Once whole cluster supports unbounded range tombstones we can start accepting them on CQL level. tests: unit(release) Fixes #4205 Fixes #4113 * seastar-dev.git dev/haaawk/enable_mc/v11: system_keyspace: Add scylla_local system_keyspace: add accessors for SCYLLA_LOCAL storage_service: add _sstables_format field feature: add when_enabled callbacks system_keyspace: add storage_service param to setup Add sstable format helper methods Register feature listeners in storage_service Add service::read_sstables_format Use read_sstables_format in main.cc Use _sstables_format to determine current format Add _unbounded_range_tombstones_feature Update supported features on format change	2019-04-16 14:07:05 +02:00
Avi Kivity	6c672e674b	tools: toolchain: improve dbuild signal handing Currently, we use --sig-proxy to forward signals to the container. However, this requires the container's co-operation, which usually doesn't exist. For example, docker run --sig-proxy fedora:29 bash -c "sleep 5" Does not respond to ctrl-C. This is a problem for continuous integration. If a build is aborted, Jenkins will first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give up and use SIGKILL. If the graceful termination doesn't work, we end up with an orphan container running on the node, which can then consume enough memory and CPU to harm the following jobs. To fix this, trap signals and handle them by killing the container. Also trap shell exit, and even kill the container unconditionally, since if Jenkins happens to kill the "docker wait" process the regular paths will not be taken. Message-Id: <20190415084040.12352-1-avi@scylladb.com>	2019-04-16 14:07:05 +02:00
Tomasz Grabiec	ac0d435c3e	Merge "hinted handoff: don't reuse_segments and discard corrupted segments" from Vlad This series addresses two issues in the hinted handoff that should complete fixing the infamous #4231. In particular the second patch removes the requirement to manually delete hints files after upgrading to 3.0.4. Tested with manual unit testing. * https://github.com/vladzcloudius/scylla.git hinted_handoff_drop_broken_segments-v3: hinted handoff: disable "reuse_segments" commitlog: introduce a segment_error hinted handoff: discard corrupted segments	2019-04-16 14:07:05 +02:00
Avi Kivity	643bddbecc	Update seastar submodule * seastar 6f73675...eb03ba5 (11): > tests: tests C++14 dialect in continuous integration > rpc/compressor/lz4: fix std:variant related compiler errors > tests: futures_test: allow project to compile with C++14 > Merge "io_queue: make io_priority_class namespace global" from Benny > future::then_wrapped: use std::terminate instead of abort > reactor: make metric about task quota violations less sensitive > Merge "Add LZ4_FRAGMENTED compressor for RPC" from Paweł > Fix build issues with Clang 7 > Merge "file_stat follow_symlink option and related fixes" from Benny > doc/tutorial.md: reword mention of seastar::thread premption on get() > tests: semaphore_test: relax timeouts Fixes #4272.	2019-04-16 14:34:32 +03:00
Raphael S. Carvalho	52e1125b52	sstables: do not destroy sstable runs after resharding Resharding wasn't preserving the sstable run structure, which depends on all fragments sharing the same run identifier. So let's make resharding run aware, meaning that a run will be created for each shard involved. tests: release mode. Fixes #4428. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190415193556.16435-1-raphaelsc@scylladb.com>	2019-04-16 10:34:49 +03:00
Tomasz Grabiec	ff66b27754	gdb: heapprof: Coalesce parents in the flamegraph mode This change drops the hit count from the name of the node, because it prevents coalescing of nodes which are shared parents for paths with different counts. This lack of coalescing makes the flamegraph a lot less useful. Message-Id: <1555348576-26382-1-git-send-email-tgrabiec@scylladb.com>	2019-04-15 21:05:08 +03:00
Tomasz Grabiec	3fd82021b1	schema_tables: Serialize schema merges fairly All schema changes made to the node locally are serialized on a semaphore which lives on shard 0. For historical reasons, they don't queue but rather try to take the lock without blocking and retry on failure with a random delay from the range [0, 100 us]. Contenders which do not originate on shard 0 will have an extra disadvantage as each lock attempt will be longer by the across-shard round trip latency. If there is constant contention on shard 0, contenders originating from other shards may keep loosing to take the lock. Schema merge executed on behalf of a DDL statement may originate on any shard. Same for the schema merge which is coming from a push notification. Schema merge executed as part of the background schema pull will originate on shard 0 only, where the application state change listeners run. So if there are constant schema pulls, DDL statements may take a long time to get through. The fix is to serialize merge requests fairly, by using the blocking semaphore::wait(), which is fair. We don't have to back-off any more, since submit_to() no longer has a global concurrency limit. Fixes #4436. Message-Id: <1555349915-27703-1-git-send-email-tgrabiec@scylladb.com>	2019-04-15 20:40:38 +03:00
Botond Dénes	c6314e422f	tests/mutation_source_test: use a single random seed Currently, each instanciation of `random_mutation_generator::impl` will generate a new random seed for itself. Altough these are printed, mapping back all the printed seeds to the exact source location where it has to be substituted in is non-trivial. This makes reproducing random test failures very hard. To solve this problem, use `tests::random::get_int()` to produce the random seed of the `random_mutation_generator::impl` instances. This way the seed of all the mutation generator will be derived from a single "master" seed that is easily replaced after a test failure, hopefully also leading to easily reproducible random test failures. I checked that after substituting in a previously generated master random seed, all derived seeds were exactly the same. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <0471415938fc27485975ef9213d37d94bff20fd5.1555329062.git.bdenes@scylladb.com>	2019-04-15 17:37:31 +03:00
Avi Kivity	3afbe219cd	Merge "UDF/UDA related cleanups and refactoring" from Rafael " These are patches I wrote while working on UDF/UDA, but IMHO they are independent improvements and are ready for review. Tests: unit (debug) dtest (release) I checked that all tests in nosetests -v user_types_test.py sstabledump_test.py cqlsh_tests/cqlsh_tests.py now pass. " * 'espindola/udf-uda-refactoring-v3' of https://github.com/espindola/scylla: Refactor user type merging cql_type_parser::raw_builder: Allow building types incrementally cql3: delete dead code Include missing header return a const reference from return_type delete unused var Add a test on nested user types.	2019-04-15 16:52:13 +03:00
Glauber Costa	c01ed239a3	fix typo in create table statement error message specifed -> specified Fixes #4434 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190415125206.2993-1-glauber@scylladb.com>	2019-04-15 16:51:13 +03:00
Benny Halevy	b543ab4c76	sstables: remove_temp_dir: do not return then_wrapped future f.get_exception makes the future invalid so it must not be returned. Instead, make_exception_future<> with the exception ptr. Fixes #4435. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190415111909.30499-1-bhalevy@scylladb.com>	2019-04-15 16:42:49 +03:00
Glauber Costa	b9327f81cf	conf: stop telling people to run auto_bootstrap: false auto_bootstrap: false provide negligible gains for new clusters and it is extremely dangerous everywhere else. We have seen a couple of times in which users, confused by this, added this flag by mistake and added nodes with it. While they were pleased by the extremely fast times to add nodes, they were later displeased to find their data missing. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190414012028.20767-1-glauber@scylladb.com>	2019-04-14 10:42:25 +03:00
Piotr Jastrzebski	2c599122e1	Update supported features on format change Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:38:31 +02:00
Piotr Jastrzebski	9c7e3dd470	Add _unbounded_range_tombstones_feature This requires introduction of storage_service::get_known_features and using it with check_knows_remote_features. Otherwise a node joining the existing cluster won't be able to join because it does not support unbounded range tombstones yet. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	96ad8f7df9	Use _sstables_format to determine current format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	da1eba5bdb	Use read_sstables_format in main.cc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	7339e9de30	Add service::read_sstables_format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	9934740c39	Register feature listeners in storage_service Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:36:58 +02:00
Piotr Jastrzebski	7a62235259	Add sstable format helper methods Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	caa6798f2c	system_keyspace: add storage_service param to setup Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	460fb260cb	feature: add when_enabled callbacks Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	081542cf00	storage_service: add _sstables_format field Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	0211541d84	system_keyspace: add accessors for SCYLLA_LOCAL Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	4c205b733a	system_keyspace: Add scylla_local Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Benny Halevy	adf539fb2c	tests: sstable_test_env::do_with_async: wait_for_background_jobs To solve memory leak seen in sstable_datafile_test -t test_old_format_non_compound_range_tombstone_is_read Refs #4376 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190411154621.9716-1-bhalevy@scylladb.com>	2019-04-11 18:50:42 +03:00
Takuya ASADA	4636284856	dist/ami: drop EPEL, convert scylla_install_ami script to python2 We have to run this script in python2, since we dropped EPEL from dependencies, and the script is installer for rpms so we cannot use relocatable python3 for it. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190411151858.2292-1-syuu@scylladb.com>	2019-04-11 18:21:48 +03:00
Glauber Costa	f3a24b6c22	dist: remove curl dependency to simplify dependency list further Although curl is widely available, there is no reason to depend on it. There are mainly two users, as indicated by grep: 1) scylla-housekeeping 2) scripts within the AMI 3) docker image The AMI has its own RPM and it already depends on curl. While we could get rid of the curl dependency there too, we can do that later. Docker is its own thing and it only needs it at build time anyway. For the main scylla repo, this patch changes scylla-housekeeping so as not to depend on the curl binary and use urllib directly instead. We can then remove curl from our dependency list. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190411125642.9754-1-glauber@scylladb.com>	2019-04-11 16:12:36 +03:00
Benny Halevy	8181acd83b	test.py: fail if given test name not found Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190411092041.24712-1-bhalevy@scylladb.com>	2019-04-11 12:31:23 +03:00
Tzach Livyatan	f444c949bd	Fix the Dockerhub documentation for listen-address Fix listen-address documention: it is used for internal communication, not for external clients Signed-off-by: Tzach Livyatan <tzach@scylladb.com> Message-Id: <20190410181409.16078-1-tzach@scylladb.com>	2019-04-11 11:53:40 +03:00
Botond Dénes	f201f8abab	types: fix date_type_impl::less() (timestamp cql type) date_type_impl::less() invokes `compare_unsigned()` to compare the underlying raw byte values. `compared_unsigned()` is a tri comparator, however `date_type_impl::less()` implicitely converted the returned value to bool. In effect, `date_type_impl::less()` would always return `true` when the two compared values were not equal. Found while working on a unit test which empoly a randomly generated schema to test a component. Fixes #4419. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <8a17c81bad586b3772bf3d1d1dae0e3dc3524e2d.1554907100.git.bdenes@scylladb.com>	2019-04-10 21:01:25 +03:00
Botond Dénes	90721468f0	tests/mutation_diff: remove false-positive diff of the partition header Currently the partition header will always be reported as different when comparing two mutations. This is because they are prepended with the "expected: " and "... but got: " texts. This generates unnecessary noise. Inject a new line between the prefix and the partition-header proper. This way the partition header will only show up in the diff when there is an actual difference. The "expected: " and "... but got: " phrases are still shown as different on the top of the diff but this is fine as one can immediately see that they are not part of the data and additionaly they help the reader in determining which part of the diff is the expected one and which is the actual one. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <29e0f413d248048d7db032224a3fd4180bf1b319.1554909144.git.bdenes@scylladb.com>	2019-04-10 18:05:36 +02:00
Raphael S. Carvalho	8a117c338a	compaction: fix use-after-free when calculating backlog after schema change The problem happens after a schema change because we fail to properly remove ongoing compaction, which stopped being tracked, from list that is used to calculate backlog, so it may happen that a compaction read monitor (ceases to exist after compaction ends) is used after freed. Fixes #4410. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190409024936.23775-1-raphaelsc@scylladb.com>	2019-04-10 15:54:39 +03:00
Vlad Zolotarov	db2ba0df61	hinted handoff: discard corrupted segments If we discover that a current segment is corrupted there is nothing we can do about it. This patch does the following: 1) Drops the corrupted segment and moves to the next one. 2) Logs such events as ERRORs. 3) Introduces a new metrics that accounts such event. Fixes #4364 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 15:54:20 -04:00
Vlad Zolotarov	1cba4a54bb	commitlog: introduce a segment_error Introduce a common base class for all errors that indicate that the current segment has "issues". This allows a laconic "catch" clause for all such errors. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 15:31:13 -04:00
Vlad Zolotarov	00fe2acb35	hinted handoff: disable "reuse_segments" Hinted handoff doesn't utilize this feature (which was developed with a commitlog in mind). Since it's enabled by default we need to explicitly disable it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 11:13:41 -04:00
Piotr Jastrzebski	dee64c30b3	types: Test correct map validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:23 +02:00
Piotr Jastrzebski	3d94f0aaf0	types: Test correct in clause validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:23 +02:00
Piotr Jastrzebski	36853a7a5c	types: Test correct tuple validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	94bdc1c868	types: Test correct set validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	429a8e082a	types: Test correct list validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	910d81e03e	types: Add test_tuple_elements_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	e2fe9ca5d0	types: Add test_in_clause_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	cd11959a8e	types: Add test_map_elements_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	22f541af1d	types: Add test_set_elements_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	be405e24e9	types: Add test_list_elements_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	47e242efc5	types: Validate input when tuples Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	c4df3014ac	types: Validate input when parsing a set Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	8a7b05ae26	types: Validate input when parsing a map Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	16596ec045	types: Validate input when parsing a list Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	8482764003	types: Implement validation for tuple Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	bd2823b623	types: Implement validation for set Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	086d8abf89	types: Implement validation for map Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	4a51ee6e34	types: Implement validation for list Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	f5f6367674	types: Add cql_serialization_format parameter to validate Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Takuya ASADA	e3a5ac2945	reloc: run fix_sharedlib() only on application/x-sharedlib and application/x-pie-executable We need to prevent to run fix_sharedlib() on non-ELF files. Fixes #4415 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190409114941.28276-1-syuu@scylladb.com>	2019-04-09 14:54:54 +03:00
Tomasz Grabiec	1b1f241c94	gdb: Print size of large allocations in 'scylla ptr'	2019-04-09 13:44:15 +02:00
Tomasz Grabiec	cda1781a77	gdb: Fix 'scylla ptr' for free pages Fixes runtime error which happens because the setter is expected to take an argument, but our definition doesn't take one. We're not really expecting the setter to be called with False, so don't use setter semantics.	2019-04-09 13:44:15 +02:00
Tomasz Grabiec	13efabe74c	gdb: Set is_live and offset for large allocations properly in 'scylla ptr' Before: (gdb) scylla ptr 0x601000860003 thread 1, large, free After: (gdb) scylla ptr 0x601000860003 thread 1, large, live (0x601000860000 +3) Omission from `e1ea4db7ca`.	2019-04-09 13:22:06 +02:00
Tomasz Grabiec	4002d8db7c	gdb: Fix 'scylla ptr' misqualifying pointers It can be that page::pool is != nullptr and page::offset_in_span is 0 for a page which is inside a large allocation span (live or dead). This may lead to misqualification of a pointer as belonging to a small allocation pool. Only the first page of a span contains reliable information. This patch changes the code to use the span_checker, which knows the real boundaries of spans and exposes reliable information via the span object. Fixes #4368	2019-04-09 13:22:06 +02:00
Tomasz Grabiec	4d3399ee1f	gdb: Make 'scylla memory' show unused memory in small pools Example output: Small pools: objsz spansz usedobj memory unused wst% 1 4096 0 0 0 0.0 1 4096 0 0 0 0.0 1 4096 0 0 0 0.0 1 4096 0 0 0 0.0 2 4096 0 0 0 0.0 2 4096 0 0 0 0.0 3 4096 0 0 0 0.0 3 4096 0 0 0 0.0 4 4096 0 0 0 0.0 5 4096 0 0 0 0.0 6 4096 0 0 0 0.0 7 4096 0 0 0 0.0 8 4096 241 8192 6264 76.5 10 4096 0 8192 8192 99.9 12 4096 35943 454656 23340 1.4 14 4096 0 8192 8192 99.8 16 4096 1171 24576 5840 23.8 20 4096 1007 24576 4436 17.7 24 4096 59380 1437696 12576 0.5 28 4096 548 16384 1040 6.2 32 4096 69433 2314240 92384 0.3 40 4096 36447 1564672 106792 0.4 48 4096 34099 1748992 112240 0.4	2019-04-09 13:22:05 +02:00
Tomasz Grabiec	ac7a393be5	gdb: Fix small pool memory usage reporting in 'scylla memory' Uses span_checker to work around for corrupted _pages_in_use. Refs https://github.com/scylladb/seastar/issues/608 As a bonus, calculates use_count correctly for fallback spans.	2019-04-09 13:22:05 +02:00
Tomasz Grabiec	d0567476e5	gdb: Switch 'scylla memory' to use the span_checker to find large spans Simplifies code.	2019-04-09 13:22:05 +02:00
Tomasz Grabiec	4b748e601c	gdb: Switch task_histogram to use the span_checker It can be that page::pool is != nullptr and page::offset_in_span is 0 for a page which is inside a large allocation span (live or dead). This may lead to misqualification of that span as belonging to a small allocation pool and interpreting its contents as if it contained small objects. Only the first page of a span contains reliable information. This patch changes the code to use the span_checker, which knows the real boundaries of spans and exposes reliable information via the span object. Another problem was that the command scanned dead spans as well. This is no longer the case after this patch. I've seen this command report thousands of no longer live sstable writers and various continuations because of those problems. Fixes #4367	2019-04-09 13:22:05 +02:00
Tomasz Grabiec	c7215a2f67	gdb: Introduce span_checker The purpose is to encapsulate iteration and lookup of seastar allocator memory spans.	2019-04-09 13:22:05 +02:00
Rafael Ávila de Espíndola	89b2c4ddc5	Refactor user type merging The comparison of tables before and after mutation is now done by a generic diff_rows function. The same function will be used for user defined functions and user defined aggregates. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 14:16:40 -07:00
Rafael Ávila de Espíndola	4f1260f3e3	cql_type_parser::raw_builder: Allow building types incrementally Before this patch raw_builder would always start with an empty list of user types. This means that every time a type is added to a keyspace, every type in that keyspace needs to be recreated. With this patch we pass a keyspace_metadata instead of just the keyspace name and can construct new user types on top of previous ones. This will be used in the followup patch, where only new types are created. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 14:06:51 -07:00
Rafael Ávila de Espíndola	c037b266b4	cql3: delete dead code In c++ TOKEN_FUNCTION_NAME is only needed in the .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola	1db0b83711	Include missing header abstract_function.hh uses function, which is defined in function.hh, so it should include it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola	4551691b5d	return a const reference from return_type We define data_type as using data_type = shared_ptr<const abstract_type>; Since it is a shared_ptr, it cannot be copied into another thread since that would create a race condition incrementing the reference counter. In particular, before this patch it is not legal to call return_type from another thread. With this patch read only access from another thread is possible. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola	35f1b1055d	delete unused var Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola	b577082c64	Add a test on nested user types. This would have found a bug in a previous version of this series. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 10:54:33 -07:00
Takuya ASADA	1f009b5e9b	dist/redhat/python3: drop SCYLLA-*-FILE files in rpm Related with #4409, These are more files does not needed for runtime, so drop them too. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190405074030.3990-1-syuu@scylladb.com>	2019-04-08 11:52:48 +03:00
Rafael Ávila de Espíndola	6191fd7701	Avoid duplicated read_keyspace_mutation calls There were many calls to read_keyspace_mutation. One in each function that prepares a mutation for some other schema change. With this patch they are all moved to a single location. Tests: unit (dev, debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190328024440.26201-1-espindola@scylladb.com>	2019-04-07 09:26:56 +03:00
Takuya ASADA	d180caea89	dist/redhat/python3: drop dist/ files in rpm These files does not needed for runtime, drop them. Fixes #4409 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190405071445.18678-1-syuu@scylladb.com>	2019-04-07 09:26:56 +03:00
Amos Kong	db9a721d02	scylla_kernel_check: update kb_fs_not_qualified_aio doc link The doc has been moved to https://docs.scylladb.com/troubleshooting/error_messages/kb_fs_not_qualified_aio/ Fixes #4398 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <75fdc97d222667f4402cadc7a46e52d6f38a32a8.1554375560.git.amos@scylladb.com>	2019-04-07 09:26:56 +03:00
Glauber Costa	2305cc88f3	relocatable python: Be more permissive with mime type checking Fedora28 python magic used to return a x-sharedlib mime type for .so files. Fedora29 changed that to x-pie-executable, so the libraries are no longer relocated. Let's be more permissive and relocate everything that starts with application/. Fixes #4396 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190404140929.7119-1-glauber@scylladb.com>	2019-04-07 09:26:56 +03:00
Piotr Jastrzebski	882ea9caf0	tests: Fix use after free in check_multi_schema Refs #4376 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <7d7b4cf69cea1e4d31058d8f1fd2c01f1dd11c58.1554387442.git.piotr@scylladb.com>	2019-04-07 09:26:56 +03:00
Piotr Jastrzebski	4485868d27	tests: Fix use after free in check_read_indexes Refs #4376 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <0dc76b2a55bebc49558f30e8d2894973ce817577.1554386770.git.piotr@scylladb.com>	2019-04-07 09:26:56 +03:00
Tomasz Grabiec	a717e11026	Merge "row level repair shutdown fixes" from Asias This series fixes row level repair shutdown related issues we saw with dtests, e.g., use after free of the repair meta object, fail to stop a table during shutdown. Fixes: #4044 Fixes: #4314 Fixes: #4333 Fixes: #4380 Tests: repair_additional_test.py:RepairAdditionalTest.repair_abort_test repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test * sestar-dev.git asias/repair.fix.shutdown.v1: repair: Wait for pending repair_meta operation before removing it repair: Check shutdown in row level repair repair: Remove repair meta when node is dead repair: Remove all row level repair during shtudown	2019-04-05 15:47:25 +03:00
Avi Kivity	e63bc6b1e3	Update seastar submodule * seastar 63d8607...6f73675 (5): > Merge "seastar-addr2line: improve the context of backtraces" from Botond > log: fix std::system_error ostream operator to print full error message > Revert "threads: yield on get if we had run for too long." > core/queue: Document concurrency constraints > core/memory: Make small pools use the full span size Fixes #4407. Fixes #4316.	2019-04-05 15:47:25 +03:00
Avi Kivity	b1c4c371fa	Merge "fix I/O calculation for i3.metal instances" from Glauber " Calculation of IO properties is slightly wrong for i3.metal, because we get the number of disks wrong. The reason for that is our check for ephemeral nvme disks, that pre-date the time in which root devices were exposed as nvme devices (nitro and metal instances). " toolchain updated with python3-psutil * 'ec2fixes' of github.com:glommer/scylla: scylla_util.py: do not include root disks in ephemeral list scylla-python3: include the psutil module fix typo in scylla_ec2_check	2019-04-05 15:46:59 +03:00
Asias He	f212dfb887	streaming: Reject stream if the _sys_dist_ks or _view_update_generator are not ready They are of type db::system_distributed_keyspace and db::view::view_update_generator. n1 is in normal status n2 boots up and _sys_dist_ks or _view_update_generator are not initialized n1 runs stream, n2 is the follower. n2 uses the _sys_dist_ks or _view_update_generator "Assertion `local_is_initialized()' failed" is observed Fixes #4360 Message-Id: <4ae13e1640ac8707a9ba0503a2744f6faf89ecf4.1554330030.git.asias@scylladb.com>	2019-04-04 10:48:00 +03:00
Avi Kivity	8abba6f6a6	Merge "Avoid copying data_type" from Rafael " With these changes we avoid a std::vector<data_value> copy, which is nice in itself, but also makes it possible to call get_list from other shards. " * 'espindola/result-set-v3' of https://github.com/espindola/scylla: Avoid copying a std::vector in get_list query-result-set: add and use a get_ptr method	2019-04-03 21:29:22 +03:00
Asias He	99da196e6f	repair: Reject repair if the _sys_dist_ks or _view_update_generator are not ready They are of type db::system_distributed_keyspace and db::view::view_update_generator. n1 is in normal status n2 boots up and _sys_dist_ks or _view_update_generator are not initialized n1 runs repair, n2 is the follower. n2 uses the _sys_dist_ks or _view_update_generator "Assertion `local_is_initialized()' failed" is observed Fixes #4360 Message-Id: <6616c21078c47137a99ba71baf82594ba709597c.1553742487.git.asias@scylladb.com>	2019-04-03 21:29:22 +03:00
Rafael Ávila de Espíndola	74f956e5a8	Avoid copying a std::vector in get_list For now this is just an optimization. But it also avoids copying data_type, which will allow this be used across shards. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-03 09:20:12 -07:00
Rafael Ávila de Espíndola	c2a8807c35	query-result-set: add and use a get_ptr method This moves a copy up the call stack and makes it possible to avoid it completely by passing a reference type to get_nonnull. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-03 09:19:52 -07:00
Tomasz Grabiec	3356a085d2	lsa: Cover more bad_alloc cases with abort When --abort-on-lsa-bad-alloc is enabled we want to abort whenever we think we can be out of memory. We covered failures due to bad_alloc thrown from inside of the allocation section, but did not cover failures from reservations done at the beginning of with_reserve(). Fix by moving the trap into reserve(). Message-Id: <1553258915-27929-1-git-send-email-tgrabiec@scylladb.com>	2019-04-03 16:39:40 +03:00
Glauber Costa	0e9a50ab57	scylla_util.py: do not include root disks in ephemeral list Nitro instances (and metal ones) put their root device in nvme (as a protocol. it is still EBS). Our algorithm so far has relied on parsing the nvme devices to figure out which ones are ephemeral but it will break for those instances. Out of our supported instances so far, the i3.metal is the only one in which this breaks. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-04-03 07:57:00 -04:00
Glauber Costa	6d7ac87136	scylla-python3: include the psutil module Using a new python3 module has never been that easy! So we'll unapologetically use psutil and don't even worry about whether or not CentOS supports it (it doesn't) Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-04-02 17:24:25 -04:00
Glauber Costa	027eee5f13	fix typo in scylla_ec2_check enahanced -> enhanced Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-04-02 17:24:00 -04:00
Dejan Mircevski	a66a5d423a	query_processor: Add query-count metrics ... with labels for each consistency level. Fixes https://github.com/scylladb/scylla/issues/4309 ("add counters breaking up cql requests based on consistency_level"). Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <1554127055-17705-1-git-send-email-dejan@scylladb.com>	2019-04-02 19:08:25 +03:00
Avi Kivity	be6905da84	Update seastar submodule * seastar 5572de7...63d8607 (6): > test: verify that negative sleep time doesn't cause infinite sleep > httpd: Change address handling to use socket_address > dns: Change "unspecififed" address search type to retrive first avail > Allow when_all and when_all_succeed to take function arguments > when_all: abort if memory allocation fails > inet_address: Add missing constructor impl.	2019-04-02 16:56:56 +03:00
Asias He	b98d95ebf0	repair: Remove all row level repair during shtudown We saw dtest failed to stop a node like: ``` ERROR: repair_one_missing_row_test (repair_additional_test.RepairAdditionalTest) ---------------------------------------------------------------------- Traceback (most recent [2019.1.3.node1.repair.zip](https://github.com/scylladb/scylla/files/2723244/2019.1.3.node1.repair.zip) call last): File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 2521, in repair_one_missing_row_test return RepairAdditionalBase._repair_one_missing_row_test(self) File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 1842, in _repair_one_missing_row_test self.check_rows_on_node(node2, nr_rows) File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 34, in check_rows_on_node node.stop(wait_other_notice=True) File "/home/asias/src/cloudius-systems/scylla-ccm/ccmlib/scylla_node.py", line 496, in stop raise NodeError("Problem stopping node %s" % self.name) NodeError: Problem stopping node node1 ``` The problem is: 1) repair_meat is created repair_meta -> repair_writer::create_writer() -> t.stream_in_progress() repari_meta -> repair_reader::repair_reader -> cf.read_in_progress() 2) repair_meta is stored in _repair_metas map. 3) Shtudown repair, repair_meta is not removed from the _repair_metas map 4) Shutdown database which wait for the utils::phased_barrier. To fix, we should stop and remove all the repair_meata from the _repair_metas map. Tests: 30 successful runs of the repair_kill_2_test Fixes: #4044	2019-04-02 19:28:53 +08:00
Asias He	344d0ee37d	repair: Remove repair meta when node is dead Repair follower nodes will create repair meta object when repair master node starts a repair. Normally, the repair meta object is removed when repair master finishes the repair and sends the verb REPAIR_ROW_LEVEL_STOP to all the followers to remove the repair meta object. In case of repair master was killed suddenly, no one will remove the repair meta object. To prevent keeping this repair meta object forever, we should remove such objects when gossip detects a node is dead with the gossip listener. Fixes: #4380 Reviewed-by: Botond Dénes <bdenes@scylladb.com>	2019-04-02 19:28:53 +08:00
Asias He	b061157b21	repair: Check shutdown in row level repair During node shutdown, we should abort the repair as soon as possible. Check if we are in shutdown in row level repair steps. Refs: #4044	2019-04-02 19:28:53 +08:00
Asias He	e3e489328e	repair: Wait for pending repair_meta operation before removing it We remove repair_meta object in remove_repair_meta up receiving of stop row level repair rpc verb. It is possible there is an pending operation of repair_meta. To avoid use after free, we should not remove the repair_meta object until all the pending operations are done. Use a gate to protect it. Fixes: #4333 Fixes: #4314 Tests: 50 succesful run of repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test	2019-04-02 19:28:53 +08:00
Vlad Zolotarov	0dc0a6025d	query_pager::fetch_page: cosmetics: fix code alignment Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190401214030.5570-2-vladz@scylladb.com>	2019-04-02 11:53:10 +03:00
Asias He	70fbe85b3e	main: Add shutdown database log It is useful to know which step we are during shutdown process. Refs: #4044 Message-Id: <f7c94c60d039560bfacd6d473f7d828940cc55b7.1554172140.git.asias@scylladb.com>	2019-04-02 11:49:00 +03:00
Benny Halevy	3749148339	storage_service: fix handling of load_new_sstables exception ignore_ready_future in load_new_ss_tables broke migration_test:TestMigration_with_*.migrate_sstable_with_counter_test_expect_fail dtests. The java.io.NotSerializableException in nodetool was caused by exceptions that were too long. This fix prints the problematic file names onto the node system log and includes the casue in the resulting exception so to provide the user with information about the nature of the error. Fixes #4375 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190331154006.12808-1-bhalevy@scylladb.com>	2019-04-02 11:46:19 +03:00
Avi Kivity	988dfd7209	Merge "add relocatable CLI tools required for scylla setup scripts" from Takuya " To make offline installer easier we need to minimize dependencies as possible. Python dependencies are already dropped by adding relocatable python3 by Glauber, now it's time to drop rest of command line tools which used by scylla setup tools. (even scripts are converted to python3, it still executes some external commands, so these commands should be distributed with offline installer) Note that some of CLI tools haven't added such as NTP and RAID stuff, since these tools have daemons, not just CLI. To use such stuff in offline mode, users have to install them manually. But both NTP setup and RAID setup are optional, users still can run Scylla w/o them. " Toolchain updated to docker.io/scylladb/scylla-toolchain:fedora-29-20190401 for changes in install-dependencies.sh; also updates to gnutls 3.6.7 security release. * 'reloc_clitools_v5' of https://github.com/syuu1228/scylla: reloc: add relocatable CLI tools for scylla setup scripts dist/redhat: drop systemd-libs from dependency dist/redhat: drop file from dependency since it seems unused dist/redhat: drop pciutils from dependency since it only used in DPDK mode	2019-04-01 14:23:04 +03:00
Raphael S. Carvalho	d59f716e1c	table: fix wild disk usage stat after sstables are discarded by truncate Truncate would make disk usage stat go wild because it isn't updated when sstables are removed in table::discard_sstables(). Let's update the stat after sstables are removed from the sstable set. Fixes #3624. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190328154918.25404-1-raphaelsc@scylladb.com>	2019-04-01 13:55:11 +03:00
Duarte Nunes	b2dd8ce065	database: Make exception message more accurate It's the sstable read queue that's overloaded, not the inactive one (which can be considered empty when we can't admit newer reads). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190328003533.6162-1-duarte@scylladb.com>	2019-04-01 13:53:50 +03:00
Takuya ASADA	75a7859019	reloc: add relocatable CLI tools for scylla setup scripts To minimize dependencies of Scylla, add relocatable image of CLI tools required for scylla setup scripts.	2019-04-01 02:59:01 +09:00
Takuya ASADA	a3c1b9fcf3	dist/redhat: drop systemd-libs from dependency Since we switched to relocatable package, we don't need distribution native libraries, so the package is not needed anymore.	2019-04-01 02:58:22 +09:00
Takuya ASADA	a3741b4052	dist/redhat: drop file from dependency since it seems unused The pacakge is not used in our script anymore, drop it.	2019-04-01 02:57:43 +09:00
Takuya ASADA	7d78515d5b	dist/redhat: drop pciutils from dependency since it only used in DPDK mode Since we don't use DPDK mode by default, and the mode is not officially supported, drop pciutils from package dependency. Users who want to use DPDK mode they neeed to install the package manually.	2019-04-01 02:56:31 +09:00
Avi Kivity	77a0d5c5da	Update seastar submodule * seastar 05efbce...5572de7 (5): > posix_file_impl::list_directory: do not ignore symbolic link file type > prometheus: yield explicitly after each metric is processed > thread: add maybe_yield function > metrics: add vector overload of add_group() > memory: tone down message for memory allocator	2019-03-31 15:26:21 +03:00
Tomasz Grabiec	4c0584289b	tests: cql_test_env: Fix _feature_service not being initialized We moved from uninitialized field instead of the constructor parameter. No known issues. Message-Id: <1553854544-26719-1-git-send-email-tgrabiec@scylladb.com>	2019-03-31 13:05:35 +03:00
Takuya ASADA	b1bba0c1b0	dist/redhat/python3: product name customization support Currently scylla-python3 package name is hardcorded, need to support package name renaming just like on other scylla packages. This is required to release enterprise version. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190329003941.12289-1-syuu@scylladb.com>	2019-03-29 19:22:24 +02:00
Amos Kong	98cb7d145b	scylla_setup: don't repeatedly select disks if it's assigned Currently scylla_setup would be stuck to select disks in non-interaction mode. Fixes #4370 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <8fb445708a6ac0d2130f8a8d041b1d8d71f1cf14.1553745961.git.amos@scylladb.com>	2019-03-28 15:21:36 +02:00
Avi Kivity	65dd45d9cf	Merge "sstable: validate file ownership and mode." from Benny " File must be either owned by the process uid or have both read and write access to it, so it could be (hard) linked when sysctl fs.protected_hardlinks is enabled. Fixes #3117 " * 'projects/valid_owner_and_mode/v3-rebased' of https://github.com/bhalevy/scylla: storage_service: handle load_new_sstables exception init: validate file ownership and mode. treewide: use std::filesystem	2019-03-28 14:58:14 +02:00
Benny Halevy	956cb2e61c	storage_service: handle load_new_sstables exception Refs #3117 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:54:56 +02:00
Benny Halevy	e3f7fe44c0	init: validate file ownership and mode. Files and directories must be owned by the process uid. Files must have read access and directories must have read, write, and execute access. Refs #3117 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:40:12 +02:00
Benny Halevy	ff4d8b6e85	treewide: use std::filesystem Rather than {std::experimental,boost,seastar::compat}::filesystem On Sat, 2019-03-23 at 01:44 +0200, Avi Kivity wrote: > The intent for seastar::compat was to allow the application to choose > the C++ dialect and have seastar follow, rather than have seastar choose > the types and have the application follow (as in your patch). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:21:10 +02:00
Dejan Mircevski	aa11f5f35e	Drop unused #include v2: fix "From" field in email Tests: unit/cql_query_test (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <1553099087-11621-1-git-send-email-dejan@scylladb.com>	2019-03-28 01:48:19 +00:00
Duarte Nunes	d8fcdefe4a	tests/view_schema_test: Remove debug output A stray std::cout remained. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-03-27 21:58:10 +00:00
Tomasz Grabiec	2b8bf0dbf8	Merge "db/view: Apply tracked tombstones for new updates" from Duarte When generating view updates for base mutations when no pre-existing data exists, we were forgetting to apply the tracked tombstones. Fixes #4321 Tests: unit(dev) * https://github.com/duarten/scylla materialized-views/4321/v1.1: db/view: Apply tracked tombstones for new updates tests/view_schema_test: Add reproducer for #4321	2019-03-27 13:24:28 +01:00
Duarte Nunes	f609848b69	tests/view_schema_test: Add reproducer for #4321 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-03-27 12:01:39 +00:00
Duarte Nunes	ded9221187	db/view: Apply tracked tombstones for new updates When generating view updates for base mutations when no pre-existing data exists, we were forgetting to apply the tracked tombstones. Fixes #4321 Tests: unit(dev) Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-03-27 12:01:39 +00:00
Glauber Costa	043d102ab6	commitlog: fix typo in error message maxiumum -> maximum Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190326191108.7573-1-glauber@scylladb.com>	2019-03-26 21:32:56 +02:00
Avi Kivity	a77762b02a	Merge "Optimise vint deserialisation" from Paweł " Variable length integers are used are used extensively by SSTables mc format. The current deserialisation routine is quite naive in a way that it reads each byte separately. Since, those vints usually appear inside much larger buffers, we optimise for such cases, read 8-bytes at once and then mask out the unneeded parts (as well as fix their order because big-endian). Tests: unit(dev). perf_vint (average time per element when deserializing 1000 vints): before: vint.deserialize 69442000 14.400ns 0.000ns 14.399ns 14.400ns after: vint.deserialize 241502000 4.140ns 0.000ns 4.140ns 4.140ns perf_fast_forward (data on /tmp): large-partition-single-key-slice on dataset large-part-ds1: before: range time (s) iterations frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> [0, 1] 0.000278 8792 2 7190 119 7367 1960 3 104 2 0 0 1 1 0 0 1 100.0% -> [1, 100) 0.000344 96 99 288100 4335 307689 193809 2 108 2 0 0 1 1 0 0 1 100.0% -> (100, 200] 0.000339 13254 100 295263 2824 301734 222725 2 108 2 0 0 1 1 0 0 1 100.0% after: range time (s) iterations frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> [0, 1] 0.000236 10001 2 8461 59 8718 2261 3 104 2 0 0 1 1 0 0 1 100.0% -> [1, 100) 0.000285 89 99 347500 2441 355826 215745 2 108 2 0 0 1 1 0 0 1 100.0% -> (100, 200] 0.000293 14369 100 341302 1512 350123 222049 2 108 2 0 0 1 1 0 0 1 100.0% " * tag 'optimise-vint/v2' of https://github.com/pdziepak/scylla: sstable: pass full length of buffer to vint deserialiser vint: optimise deserialisation routine vint: drop deserialize_type structure tests/vint: reduce test dependencies tests/perf: add performance test for vint serialisation	2019-03-26 16:41:44 +02:00
Avi Kivity	4b330b3911	Merge "introduce sstables manager" from Benny " This series introduce a rudimentary sstables manager that will be used for making and deleting sstables, and tracking of thereof. The motivation for having a sstables manager is detailed in https://github.com/scylladb/scylla/issues/4149. The gist of it is that we need a proper way to manage the life cycle of sstables to solve potential races between compaction and various consumers of sstables, so they don't get deleted by compaction while being used. In addition, we plan to add global statistics methods like returning the total capacity used by all sstables. This patchset changes the way class sstable gets the large_data_handler. Rather than passing it separately for writing the sstable and when deleting sstables, we provide the large_data_handler when the sstable object is constructed and then use it when needed. Refs #4149 " * 'projects/sstables_manager/v3' of https://github.com/bhalevy/scylla: sstables: provide large_data_handler to constructor sstables_manager: default_sstable_buffer_size need not be a function sstables: introduce sstables_manager sstables: move shareable_components def to its own header tests: use global nop_lp_handler in test_services sstables: compress.hh: add missing include sstables: reorder entry_descriptor constructor params sstables: entry_descriptor: get rid of unused ctor sstables: make load_shared_components a method of sstable sstables: remove default params from sstable constructor database: add table::make_sstable helper distributed_loader: pass column_family to load_sstables_with_open_info distributed_loader: no need for forward declaration of load_sstables_with_open_info distributed_loader: reshard: use default params for make_sstable	2019-03-26 16:31:40 +02:00
Benny Halevy	223e1af521	sstables: provide large_data_handler to constructor And use it for writing the sstable and/or when deleting it. Refs #4198 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:24:19 +02:00
Benny Halevy	c23f658d0e	sstables_manager: default_sstable_buffer_size need not be a function Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	eebc3701a5	sstables: introduce sstables_manager The goal of the sstables manager is to track and manage sstables life-cycle. There is a sstable manager instance per database and it is passed to each column-family (and test environment) on construction. All sstables created, loaded, and deleted pass through the sstables manager. The manager will make sure consumers of sstables are in sync so that sstables will not be deleted while in use. Refs #4149 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	b50c041aa2	sstables: move shareable_components def to its own header To be used by sstables_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	2cd11208a1	tests: use global nop_lp_handler in test_services Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	0e3f9c25e4	sstables: compress.hh: add missing include Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	33cbfe81f2	sstables: reorder entry_descriptor constructor params To match make_sstable's in preparation of moving to sstables_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	ac5f9c1eae	sstables: entry_descriptor: get rid of unused ctor Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	adf8428321	sstables: make load_shared_components a method of sstable and open code its static part in the caller (distributed_loader) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	ff7b7910f1	sstables: remove default params from sstable constructor The goal is to construct sstables only via make_sstables that will be moved to class sstables_manager in a later patch. Defining the default values in both interfaces is unneeded and may to lead to them going out of sync. Therefore, have only make_sstables provide the default parameter values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	3a17053cb8	database: add table::make_sstable helper In most cases we make a sstable based on the table schema and soon - large_data_handler. Encapsulate that in a make_sstable method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	67f705ae04	distributed_loader: pass column_family to load_sstables_with_open_info Rather than just its schema. In preparation for adding table::make_sstable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	99875ba966	distributed_loader: no need for forward declaration of load_sstables_with_open_info Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	7a8ab1d6f1	distributed_loader: reshard: use default params for make_sstable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Avi Kivity	5e39b62fcc	Merge "configure: Optionally don't compress debug in executables" from Rafael " Most of the binaries we link in a debug build are linked with -s, so the only impact is build/debug/scylla, which grows by 583 MiB when using --compress-exec-debuginfo=0. On the other hand, not having to recompress all the debug info from all the used object files is a pretty big win when debugging an issue. For example, linking build/debug/scylla goes from 56.01s user 15.86s system 220% cpu 32.592 total to 27.39s user 19.51s system 991% cpu 4.731 total Note how the cpu time is "only" 2x better, but given that compressing debug info is a long serial task, the wall time is 6.8x better. Tests: unit (debug) " * 'espindola/dont-compress-debug-v5' of https://github.com/espindola/scylla: configure: Add a --compress-exec-debuginfo option configure: Move some flags from cxx_ld_flags to cxxflags configure: rename per mode opt to cxx_ld_flags configure: remove per mode libs configure: remove sanitize_libs and merge sanitize into opt configure: split a ld_flags_{mode} out of cxxflags_{mode}	2019-03-26 15:25:07 +02:00
Avi Kivity	fad1be0ddc	Update seastar submodule * seastar caa98f8...05efbce (2): > fix use after free in rpc server handler > rpc: wait for send_negotiation_frame Fixes #4336.	2019-03-26 14:33:37 +02:00
Gleb Natapov	1abc50ad8a	messaging_service: make sure a client is unique for a destination Function messaging_service::get_rpc_client() suppose to either return existing client or create one and return it. The function is suppose to be atomic, so after checking that requested client does not exist it is safe to assume emplace() will succeed. But we saw bugs that made the function to not be atomic. Lets add an assert that will help to catch such bugs easier if they will happen in the future. Message-Id: <20190326115741.GX26144@scylladb.com>	2019-03-26 14:19:08 +02:00
Avi Kivity	a696a3daf2	Merge "Fix decimal and varint serialization" from Piotr " Fixes #4348 v2 changes: * added a unit test This miniseries fixes decimal/varint serialization - it did not update output iterator in all cases, which may lead to overwriting decimal data if any other value follows them directly in the same buffer (e.g. in a tuple). It also comes with a reproducing unit test covering both decimals and varints. Tests: unit (dev) dtest: json_test.FromJsonUpdateTests.complex_data_types_test json_test.FromJsonInsertTests.complex_data_types_test json_test.ToJsonSelectTests.complex_data_types_test " * 'fix_varint_serialization_2' of https://github.com/psarna/scylla: tests: add test for unpacking decimals types: fix varint and decimal serialization	2019-03-26 13:00:19 +02:00
Piotr Sarna	e538163a29	tests: add test for unpacking decimals Refs #4348	2019-03-26 11:52:44 +01:00
Piotr Sarna	287a02dc05	types: fix varint and decimal serialization Varint and decimal types serialization did not update the output iterator after generating a value, which may lead to corrupted sstables - variable-length integers were properly serialized, but if anything followed them directly in the buffer (e.g. in a tuple), their value will be overwritten. Fixes #4348 Tests: unit (dev) dtest: json_test.FromJsonUpdateTests.complex_data_types_test json_test.FromJsonInsertTests.complex_data_types_test json_test.ToJsonSelectTests.complex_data_types_test Note that dtests still do not succeed 100% due to formatting differences in compared results (e.g. 1.0e+07 vs 1.0E7, but it's no longer a query correctness issue.	2019-03-26 11:02:43 +01:00
Rafael Ávila de Espíndola	ddac002fd4	Make atomic_cell comparison symmetrical I noticed a test failure with Mutation inequality is not symmetric for ... And the difference between the two mutations was that one atomic_cell was live and the other wasn't. Looking at the code I found a few cases where the comparison was not symmetrical. This patch fixes them. This patch will not fix the test, as it will now fail with a "Mutations differ" error, but that is probably an independent issue. Ref #3975. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190325194647.54950-1-espindola@scylladb.com>	2019-03-26 11:14:22 +02:00
Vlad Zolotarov	c798563cb0	scylla_util.py: ignore perftune.py's error messages when calling it in order to get mode's CPU mask When we call perftune.py in order to get a particular mode's cpu set (e.g. mode=sq_split) it may fail and print an error message to stderr because there are too few CPUs for a particular configuration mode (e.g. when there are only 2 CPUs and the mode is sq_split). We already treat these situations correctly however we let the corresponding perftune.py error message get out into the syslog. This is definitely confusing, stressful and annoying. Let's not let these messages out. Fixes #4211 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190325220018.22824-1-vladz@scylladb.com>	2019-03-26 11:08:31 +02:00
Vlad Zolotarov	afa176851b	transport: result_message: fix the compilation with fmt v5.3.0 Compilation fails with fmt release 5.3.0 when we print a bytes_view using "{}" formatter. Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument" Resolve this by explicitly using the operator<<() across the whole operator<<(std::ostream& os, const result_message::rows& msg) function. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190325203628.5902-1-vladz@scylladb.com>	2019-03-26 11:06:18 +02:00
Benny Halevy	af7f2a07f4	table::open_sstable: test has_scylla_component after load has_scylla_component is always false before loading the sstable. Also, return exception future rather than throwing. Hit with the following dtests: counter_tests.TestCounters.upgrade_test counter_tests.TestCountersOnMultipleNodes.counter_consistency_node__test resharding_test.ReshardingTest_nodes?_with_CompactionStrategy.resharding_counter_test update_cluster_layout_tests.TestUpdateClusterLayout.increment_decrement_counters_in_threads_nodes_restarted_test Fixes #4306 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190326084151.18848-1-bhalevy@scylladb.com>	2019-03-26 10:58:52 +02:00
Avi Kivity	f259a4c3b4	Merge "Remove usage of static gossiper object in init.cc and storage_service" from Asias " This series removes the usage of the static gossiper object in init.cc and storage_service. Follow up series will remove more in other components. This is the effort to clean up the component dependencies and have better shutdown procedure. Tests: tests/gossip_test, tests/cql_query_test, tests/sstable_mutation_test, dtests. " * tag 'asias/storage_service_gossiper_dep_v5' of github.com:cloudius-systems/seastar-dev: storage_service: Do not use the global gms::get_local_gossiper() storage_service: Pass gossiper object to storage_service gms: Remove i_failure_detector.hh gossip: Get rid of the gms::get_local_failure_detector static object dht: Do not use failure_detector::is_alive in failure_detector_source_filter tests: Fix stop snitch in gossip_test.cc gossiper: Do not use value_factory from storage_service object gossiper: Use cfg options from _cfg instead of get_local_storage_service gossiper: Pass db::config object to gossiper class init: Pass gossiper object to init_ms_fd_gossiper	2019-03-26 08:54:46 +02:00
Avi Kivity	1d9699d833	Update seastar submodule * seastar 33baf62...caa98f8 (8): > Merge "Add file_accessible and file_stat methods" from Benny > future::then: use std::terminate instead of abort > build: Allow cooked dependencies with configure.py > tests: Show a test's output when it fails > posix_file_impl: Bypass flush() call iff opened with O_DSYNC > posix_file_impl: Propagate and keep open_flags > open_flags: Add O_DSYNC value > build: Forward variables to CMake correctly	2019-03-25 15:45:52 +02:00
Avi Kivity	a7520c0ba9	Merge "Turn cql3_type into a trivial wrapper over data_type" from Rafael " Both cql3_type and abstract_type are normally used inside shared_ptr. This creates a problem when an abstract_type needs to refer to a cql3_type as that creates a cycle. To avoid warnings from asan, we were using a std::unordered_map to store one of the edges of the cycle. This avoids the warning, but wastes even more memory. Even before this series cql3_type was a fairly light weight structure. This patch pushes in that direction and now cql3_type is a struct with a single member variable, a data_type. This avoids the reference cycle and is easier to understand IMHO. The one corner case is varchar. In the old system cql3_type::varchar and cql3_type::text don't compare equal, but they both map to the same data_type. In the new system they would compare equal, so we avoid the confusion by just removing the cql3_type::varchar variable. Tests: unit (dev) " * 'espindola/merge-cq3-type-and-type-v3' of https://github.com/espindola/scylla: Turn cql3_type into a trivial wrapper over data_type Delete cql3_type::varchar Simplify db::cql_type_parser::parse Add a test for the varchar column representation	2019-03-25 15:03:16 +02:00
Tomasz Grabiec	80020118d0	Merge "Fix a couple of bugs related to large entry deletion" from Rafael The crash observed in issue #4335 happens because delete_large_data_entries is passed a deleted name. Normally we don't get a crash, but a garbage name and we fail to delete entries from system.large_. Adding a test for the fix found another issue that the second patch is this series fixes. Tests: unit (dev) Fixes #4335. https://github.com/espindola/scylla guthub/fix-use-after-free-v4: large_data_handler: Fix a use after destruction large_data_handler: Make a variable non static Allow large_data_handler to be stopped twice Allow table to be stopped twice Test that large data entries are deleted	2019-03-25 10:37:36 +01:00
Avi Kivity	8c6306897d	Merge "load_new_sstables: validate new_tables before calling row_cache::invalidate" from Benny " Validate the to-be-loaded sstables in the open_sstable phase and handle any exceptions before calling cf.get_row_cache().invalidate. Currently if exception is thrown from distributed_loader::open_sstable cf._sstables_opened_but_not_loaded may be left partially populated. Fixes #4306 Tests: unit (dev) - next-gating dtests (dev) - migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test_expect_fail - with bypassing exception in distributed_loader::flush_upload_dir to trigger the exception in table::open_sstable " * 'issues/4306/v3' of https://github.com/bhalevy/scylla: table: move sstable counters validation from load_sstable to open_sstable distributed_loader::load_new_sstables: handle exceptions in open_sstable	2019-03-24 20:30:44 +02:00
Avi Kivity	bd3a836e6c	Merge "fixes for relocatable python3 packaging" from Takuya " Aligned way to build relocatable rpm with existing relocatable packages. " * 'relocatable-python3-fix-v3' of https://github.com/syuu1228/scylla: reloc: allow specify rpmbuild dir reloc/python3: archive package version number on build_reloc.sh reloc/python3: archive rpm build script in the relocatable package, build rpm using the script relloc/python3: fix PyYAML package name reloc: rename python3 relocatable package filename to align same style with other packages reloc: move relocatable python build scripts to reloc/python3 and dist/redhat/python3	2019-03-24 20:29:56 +02:00
Duarte Nunes	93a1c27b31	service/storage_proxy: Don't consider view hints for MV backpressure When a view replica becomes unavailable, updates to it are stored as hints at the paired based replica. This on-disk queue of pending view updates grows as long as there are view updated and the view replica remains unavailable. Currently, we take that relative queue size into account when calculating the delay for new base writes, in the context of the backpressure algorithm for materialized views. However, the way we're calculating that on-disk backlog is wrong, since we calculate it per-device and then feed it to all the hints managers for that device. This means that normal hints will show up as backlog for the view hints manager, which in turn introduces delays. This can make the view backpressure mechanism kick-in even if the cluster uses no materialized views. There's yet another way in which considering the view hints backlog is wrong: a view replica that is unavailable for some period of time can cause the backlog to grow to a point where all base writes are applied the maximum delay of 1 second. This turns a single-node failure into cluster unavailability. The fix to both issues is to simply not take this on-disk backlog into account for the backpressure algorithm. Fixes #4351 Fixes #4352 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190321170418.25953-1-duarte@scylladb.com>	2019-03-24 20:29:56 +02:00
Benny Halevy	32bf0f36ef	table: move sstable counters validation from load_sstable to open_sstable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-24 18:25:09 +02:00
Benny Halevy	564be8b720	distributed_loader::load_new_sstables: handle exceptions in open_sstable Propagate exception to caller. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-24 18:25:09 +02:00
Takuya ASADA	efb3865840	reloc: allow specify rpmbuild dir Aded same option on python3/build_rpm.sh, --builddir to specify rpmbuild dir.	2019-03-24 00:34:09 +09:00
Takuya ASADA	dc5cec4194	reloc/python3: archive package version number on build_reloc.sh Instead of getting python3 version number on build_rpm.sh, archive version number when generating python3 relocatable package.	2019-03-24 00:27:24 +09:00
Takuya ASADA	4fed4fecf6	reloc/python3: archive rpm build script in the relocatable package, build rpm using the script Since we archive rpm/deb build script on relocatable package and build rpm/deb using the script, so align python relocatable package too. Also added SCYLLA-RELOCATABLE-FILE, SCYLLA-RELEASE-FILE and SCYLLA-VERSION-FILE since these files are required for relocatable package.	2019-03-24 00:27:16 +09:00
Takuya ASADA	b1283b23bb	relloc/python3: fix PyYAML package name On Fedora 29 (Scylla official toolchain uses it), PyYAML package name is "python3-pyyaml", no uppercase character.	2019-03-24 00:27:02 +09:00
Takuya ASADA	3762c4447a	reloc: rename python3 relocatable package filename to align same style with other packages	2019-03-24 00:26:48 +09:00
Takuya ASADA	a515324732	reloc: move relocatable python build scripts to reloc/python3 and dist/redhat/python3 To make easier to find build scripts and keep script filename simpler, move them to python3 directory.	2019-03-24 00:25:50 +09:00
Tomasz Grabiec	bc4a614e17	Merge "Add scylla fiber gdb command" from Botond Debugging continuations is challenging. There is no support from gdb for finding out which continuation was this continuation called from, nor what other continuations are attached to it. GDB's `bt` command is of limited use, at best a handful of continuations will appear in the backtrace, those that were ready. This series attempts to fill part of this void and provides a command that answers the latter question: what continuations are attached to this one? `scylla fiber` allows for walking a continuation chain, printing each continuation. It is supposed to be the seastar equivalent of `bt`. The continuation chain is walked starting from an arbitrary task, specified by the user. The command will print all continuations attached to the specified task. This series also contains some loosely related cleanup of existing commands and code in `scylla-gdb.py`. * https://github.com/denesb/scylla.git scylla-fiber-gdb-command/v4: scylla-gdb.py: fix static_vector scylla-gdb.py: std_unique_ptr: add get() method scylla-gdb.py: fix existing documentation scylla-gdb.py: fix tasks and task-stats commands scylla-gdb.py: resolve(): add cache parameter scylla-gdb.py: scylla_ptr: move actual logic into analyze() scylla-gdb.py: scylla_ptr: make analyze() usable for outside code scylla-gdb.py: scylla_ptr: accept any valid gdb expression as input scylla-gdb.py: add scylla fiber command	2019-03-23 10:20:20 +02:00
Asias He	7447c92d63	storage_service: Do not use the global gms::get_local_gossiper() Use the gossiper object stored in _gossiper member from storage_service.	2019-03-22 09:11:26 +08:00
Asias He	b91452ed4c	storage_service: Pass gossiper object to storage_service Pass the gossiper object to storage_service class in order to avoid the usage of the static object returned from get_local_gossiper().	2019-03-22 09:11:26 +08:00
Asias He	b2c110699e	gms: Remove i_failure_detector.hh It is not used any more.	2019-03-22 09:08:51 +08:00
Asias He	af579a055b	gossip: Get rid of the gms::get_local_failure_detector static object Store the failure_detector object inside gossiper object. - No more the global object sharded<failure_detector> - No need to initialize sharded<failure_detector> manually which simplifies the code in tests/cql_test_env.cc and init.cc.	2019-03-22 09:08:51 +08:00
Asias He	2b6a4050c2	dht: Do not use failure_detector::is_alive in failure_detector_source_filter Switch failure_detector_source_filter to use get_local_gossiper::is_alive directly since we are going to remove the static gms::get_local_failure_detector object soon. Pass the nodes that are down to the filter direclty, to avoid the range_streamer to depends on gossiper at all.	2019-03-22 08:26:47 +08:00
Asias He	9dbc4af1dd	tests: Fix stop snitch in gossip_test.cc It should stop snitch not failure detector. Fix it up. We are going to remove the static failure_detector object soon.	2019-03-22 08:26:47 +08:00
Asias He	967794798a	gossiper: Do not use value_factory from storage_service object Avoid using value_factory from storage_service inside gossiper.	2019-03-22 08:26:47 +08:00
Asias He	4a55617c6c	gossiper: Use cfg options from _cfg instead of get_local_storage_service Gossiper has db::config _cfg now, avoid using the get_local_storage_service() to get config options.	2019-03-22 08:26:44 +08:00
Asias He	ee1227b3ae	gossiper: Pass db::config object to gossiper class Gossiper calls service::get_local_storage_service() to get cfg options. To avoid cyclic dependency, pass the cfg object to gossiper directly.	2019-03-22 08:25:16 +08:00
Asias He	1652ee512a	init: Pass gossiper object to init_ms_fd_gossiper In order to avoid the usage of the static gossiper object returned from get_local_gossiper().	2019-03-22 08:25:16 +08:00
Rafael Ávila de Espíndola	51754ab068	Test that large data entries are deleted This area is hard to test since we only issue deletes during compaction and we wait for deletes only during shutdown. That is probably worth it, seeing that two independent bugs would have been found by this test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 10:48:20 -07:00
Rafael Ávila de Espíndola	bd1593c12a	Allow table to be stopped twice This will be used in a testcase. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 10:47:59 -07:00
Rafael Ávila de Espíndola	c8da28a3eb	Allow large_data_handler to be stopped twice This will be used in a testcase. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 10:47:23 -07:00
Rafael Ávila de Espíndola	c0b0a6baeb	configure: Add a --compress-exec-debuginfo option The default is the old behavior, but it is now possible to configure with --compress-exec-debuginfo=0 to get faster links but larger binaries. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:55:54 -07:00
Rafael Ávila de Espíndola	ab53055640	configure: Move some flags from cxx_ld_flags to cxxflags They are moved because they are not relevant for linking. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:55:39 -07:00
Rafael Ávila de Espíndola	e11cefab9c	configure: rename per mode opt to cxx_ld_flags It is the same name used in the build.ninja file. A followup patch will add cxxflags and move compiler only flags there. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:46:58 -07:00
Rafael Ávila de Espíndola	443a85a68c	configure: remove per mode libs It was always empty. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:46:32 -07:00
Rafael Ávila de Espíndola	35c7ec6777	configure: remove sanitize_libs and merge sanitize into opt These are flags we want to pass to both compilation and linking. There is nothing special about the fact that they are sanitizer related. With {sanitize} being passed to the link, we don't need {sanitize_libs}. We do need to make sure -fno-sanitize=vptr is the last one in the command line. Before we were implicitly getting it from seastar, but it is bad practice to get some sanitizer flags from seastar but not others. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:43:02 -07:00
Duarte Nunes	5752174762	Merge 'Use staging directory for uploaded sstables awaiting view updates' from Piotr " This series adds moving sstables uploaded via `nodetool refresh` to staging/ directory if they require generating view updates from them. Previous behavior (leaving these sstables in upload/ directory until view updates are generated) might have caused sstables with conflicting names to be mistakenly overwritten by the user. Fixes #4047 Tests: unit (dev) dtest: backup_restore_tests.py + backup_restore_tests.py modified with having materialized view definitions " * 'use_staging_directory_for_uploaded_sstables_awaiting_view_updates' of https://github.com/psarna/scylla: sstables: simplify requires_view_building loader: move uploaded view pending sstables to staging	2019-03-21 12:46:02 -03:00
Gleb Natapov	bb93d990ad	messaging_service: keep shared pointer to an rpc connection while opening mutation fragment stream Current code captures a reference to rpc::client in a continuation, but there is no guaranty that the reference will be valid when continuation runs. Capture shared pointer to rpc::client instead. Fixes #4350. Message-Id: <20190314135538.GC21521@scylladb.com>	2019-03-21 12:46:01 -03:00
Tomasz Grabiec	69775c5721	row_cache: Fix abort in cache populating read concurrent with memtable flush When we're populating a partition range and the population range ends with a partition key (not a token) which is present in sstables and there was a concurrent memtable flush, we would abort on the following assert in cache::autoupdating_underlying_reader: utils::phased_barrier::phase_type creation_phase() const { assert(_reader); return _reader_creation_phase; } That's because autoupdating_underlying_reader::move_to_next_partition() clears the _reader field when it tries to recreate a reader but it finds the new range to be empty: if (!_reader \|\| _reader_creation_phase != phase) { if (_last_key) { auto cmp = dht::ring_position_comparator(_cache._schema); auto&& new_range = _range.split_after(_last_key, cmp); if (!new_range) { _reader = {}; return make_ready_future<mutation_fragment_opt>(); } Fix by not asserting on _reader. creation_phase() will now be meaningful even after we clear the _reader. The meaning of creation_phase() is now "the phase in which the reader was last created or 0", which makes it valid in more cases than before. If the reader was never created we will return 0, which is smaller than any phase returned by cache::phase_of(), since cache starts from phase 1. This shouldn't affect current behavior, since we'd abort() if called for this case, it just makes the value more appropriate for the new semantics. Tests: - unit.row_cache_test (debug) Fixes #4236 Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>	2019-03-21 12:46:00 -03:00
Asias He	c0f744b407	storage_service: Wait for gossip to settle only if do_bind is set In commit `71bf757b2c`, we call wait_for_gossip_to_settle() which takes some time to complete in storage_service::prepare_to_join(). In tests/cql_query_test calls init_server with do_bind == false which in turn calls storage_service::prepare_to_join(). Since in the test, there is only one node, there is no point to wait for gossip to settle. To make the cql_query_test fast again, do not call wait_for_gossip_to_settle if do_bind is false. Before this patch, cql_query_test takes forever to complete. After it takes 10s. Tests: tests/cql_query_test Message-Id: <3ae509e0a011ae30eef3f383c6a107e194e0e243.1553147332.git.asias@scylladb.com>	2019-03-21 12:46:00 -03:00
Avi Kivity	a9cf07369f	Merge "Add local indexes" from Piotr " This series adds support for local indexing, i.e. when the index table resides on the same partition as base data. It addresses the performance issue of having an indexed query that also specifies a partition key - index will be queried locally. " * 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits) tests: add cases for local index prefix optimization tests: add create/drop local index test case tests: add non-standard names cases to local index tests tests: add multi pk case for local index tests tests: add test for malformed local index definitions tests: add local index paging test tests: add local indexing test cql3: add CREATE INDEX syntax for local indexes cql3: use serialization function to create index target string index: add serialization function for index targets index: use proper local index target when adding index index: add parsing target column name from local index targets db: add checking for local index in schema tables index: add checking if serialized target implies local index index: enable parsing multi-key targets index: move target parser code to .cc file json: add non-throwing overload for to_json_value cql3: add checking for local indexes in has_supporting_index() cql3: move finding index restrictions to prepare stage cql3: add picking an index by score ...	2019-03-21 12:46:00 -03:00
Nadav Har'El	561c640ed1	materialized views: allow view without clustering columns When a materialized view was created, the verification code artificially forbade creating a view without a clustering key column. However, there is no real reason to forbid this. In the trivial case, the original base table might not have had a clustering key, and the view might want to use the exact same key. In a more complex case, a view may want to have all the primary key columns as partition key columns, and that should be fine. The patch also includes a regression test, which failed before this patch, and succeeds with it (we test that we can create materialized views in both aforementioned scenarios, and these materialized views work as expected). Duarte raised the opinion that the "trivial" case of a view table with a key identical to that of the base should be disallowed. However, this should be done, if at all (I think it shouldn't), in a follow-up patch, which will implement the non-triviality requirement consistently (e.g., require view primary key to be different from base's, regardless of the existance or non-existance of clustering columns). Fixes #4340. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190320122925.10108-1-nyh@scylladb.com>	2019-03-21 12:45:52 -03:00
Glauber Costa	34b640993f	storage proxy: add tracepoints about delays When we are tracing requests, we would like to know everything that happened to a query that can contribute to it having increased latencies. We insert some of those latencies explicitly due to throttling, but we do not log that into tracing. In the case of storage proxy, we do have a log message at trace level but that is rarely used: trace messages are too heavy of a hammer, there is no way to specify specific queries, etc. The correct place for that is CQL tracing. This patch moves that message to CQL tracing. We also add a matching tracepoint assuring us that no delay happened if that's the case. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190320163350.15075-1-glauber@scylladb.com>	2019-03-21 12:45:52 -03:00
Avi Kivity	eddb98e8c6	Merge "sstables: mc: Write and read static compact tables the same way as Cassandra" from Tomasz " Static compact tables are tables with compact storage and no clustering columns. Before this patch, Scylla was writing rows of static compact tables as clustered rows instead of as static rows. That's because in our in-memory model such tables have regular rows and no static row. In Cassandra's schema (since 3.x), those tables have columns which are marked as static and there are no regular columns. This worked fine as long as Scylla was writing and reading those sstables. But when importing sstables from Cassandra, our reader was skipping the static row, since it's not present in our schema, and returning no rows as a result. Also, Cassandra, and Scylla tools, would have problems reading those sstables. Fix this by writing rows for such tables the same way as Cassandra does. In order to support rolling downgrade, we do that only when all nodes are upgraded. Fixes #4139. Tests: - unit (dev) " * tag 'static-compact-mc-fix-v3.1' of github.com:tgrabiec/scylla: tests: sstables: Test reading of static compact sstable generated by Cassandra tests: sstables: Add test for writing and reading of static compact tables sstables: mc: Write static compact tables the same way as Cassandra sstable: mc: writer: Set _static_row_written inside write_static_row() sstables: Add sstable::features() sstables: mc: writer: Prepare write_static_row() for working with any column_kind storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag sstables: mc: writer: Build indexed_columns together with serialization_header sstables: mc: writer: De-optimize make_serialization_header() sstable: mc: writer: Move attaching of mc-specific components out of generic code	2019-03-21 12:45:51 -03:00
Rafael Ávila de Espíndola	53ab298957	Turn cql3_type into a trivial wrapper over data_type Both cql3_type and abstract_type are normally used inside shared_ptr. This creates a problem when an abstract_type needs to refer to a cql3_type as that creates a cycle. To avoid warnings from asan, we were using a std::unordered_map to store one of the edges of the cycle. This avoids the warning, but wastes even more memory. Even before this patch cql3_type was a fairly light weight structure. This patch pushes in that direction and now cql3_type is a struct with a single member variable, a data_type. This avoids the reference cycle and is easier to understand IMHO. Tests: unit (dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 14:10:28 -07:00
Rafael Ávila de Espíndola	c76148b6ce	Delete cql3_type::varchar varchar is just an alias for text. Handle that conversion directly in the parser and delete the cql3_type::varchar variable. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 14:07:46 -07:00
Rafael Ávila de Espíndola	7f64a6ec4b	Simplify db::cql_type_parser::parse Since its first version, db::cql_type_parser::parse had special cases for native and user defined types. Those are not necessary, as the general parser has no problem handling them. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola	088d59aced	Add a test for the varchar column representation We map varchar to text, and so does cassandra. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola	8d9baf9843	large_data_handler: Make a variable non static The value computed is not static since `f254664fe6`, but unfortunately that was missed in that commit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 09:31:21 -07:00
Rafael Ávila de Espíndola	e7749e7aee	large_data_handler: Fix a use after destruction The path leading to the issue was: The sstable name is allocated and passed to maybe_delete_large_data_entries by reference auto name = sst->get_filename(); return large_data_handler.maybe_delete_large_data_entries(*sst->get_schema(), name, sst->data_size()); A future is created with a reference to it large_partitions = with_sem([&s, &filename, this] { return delete_large_data_entries(s, filename, db::system_keyspace::LARGE_PARTITIONS); }); The semaphore blocks. The filename is destroyed. delete_large_data_entries is called with a destroyed filename. The reason this did not reproduce trivially in a debug build was that the sstable itself was in the stack and the destructed value was read as an internal value, and so asan had nothing to complain about. Unfortunately we also had no tests that the entry in system.large_rows was actually deleted. This patch passes the name by value. It might create up to 3 copies of it. If that is too inefficient it can probably be avoided with a do_with in maybe_delete_large_data_entries. Fixes #4335 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 09:30:42 -07:00
Rafael Ávila de Espíndola	c250a26e68	configure: split a ld_flags_{mode} out of cxxflags_{mode} Flags that we want to pass to gcc during compilation and linking are in cxx_ld_flags_{mode}. With this patch, we no longer pass -I. -I build/{mode}/gen to the link, which should have no impact. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 08:33:23 -07:00
Piotr Sarna	9695a47e96	sstables: simplify requires_view_building Since sstables uploaded via upload/ directory are no longer left there awaiting view updates, the only remaining valid directory is staging/.	2019-03-20 13:47:21 +01:00
Botond Dénes	0c381572fd	repair::row_level: pin table for local reads The repair reader depends on the table object being alive, while it is reading. However, for local reads, there was no synchronization between the lifecycle of the repair reader and that of the table. In some cases this can result in use-after-free. Solve by using the table's existing mechanism for lifecycle extension: `read_in_progress()`. For the non-local reader, when the local node's shard configuration is different from the remote one's, this problem is already solved, as the multishard streaming reader already pins table objects on the used shards. This creates an inconsistency that might be suprising (in a bad way). One reader takes care of pinning needed resources while the other one doesn't. I was thorn on how to reconcile this, and decided to go with the simplest solution, explicitely pinning the table for local reads, that is conserve the inconsistency. It was suggested that this inconsitency is remedied by building resource pinning into the local reader as well [1] but there is opposition to this [2]. Adding a wrapper reader which does just the resource pinning seems excessive, both in code and runtime overhead. Spotted while investigating repair-related crashes which occured during interrupted repairs. Fixes: #4342 [1] https://github.com/scylladb/scylla/issues/4342#issuecomment-474271050 [2] https://github.com/scylladb/scylla/issues/4342#issuecomment-474331657 Tests: none, this is a trivial fix for a not-yet-seen-in-the-wild bug. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <8e84ece8343468960d4e161467ecd9bb10870c27.1553072505.git.bdenes@scylladb.com>	2019-03-20 14:45:22 +02:00
Piotr Sarna	986004a959	loader: move uploaded view pending sstables to staging When loading tables uploaded via `nodetool refresh`, they used to be left in upload/ directory if view updates would need to be generated from them. Since view update generation is asynchronous, sstables left in the directory could erroneously get overwritten by the user, who decides to upload another batch of sstables and some of the names collided. To remedy this, uploaded sstables that need view updates are moved to staging/ directory with a unique generation number, where they await view update generation. Fixes #4047	2019-03-20 13:44:29 +01:00
Juliana Oliveira	8cd6028d0d	Dockerfile: remove cgroup volume mount Mounting /sys/fs/cgroup inside the image causes docker cgroup to not be mounted internally. Therefore, hosts cannot limit resources on Scylla. This patch removes the cgroup volume mount, allowing folders under /sys/fs/cgroup to be created inside docker. Message-Id: <20190320122053.GA20256@shenzou.localdomain>	2019-03-20 14:30:27 +02:00
Nadav Har'El	7c874057f5	materialized_views: propagate "view virtual columns" between nodes db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed to list the same schema tables - the former is the list of their names, and the latter is the list of their schemas. This code duplication makes it easy to forget to update one of them, and indeed recently the new "view_virtual_columns" was added to all_tables() but not to ALL. What this patch does is to make ALL a function instead of constant vector. The newly named all_table_names() function uses all_tables() so the list of schema tables only appears once. So that nobody worries about the performance impact, all_table_names() caches the list in a per-thread vector that is only prepared once per thread. Because after this patch all_table_names() has the "view_virtual_columns" that was previously missing, this patch also fixes #4339, which was about virtual columns in materialized views not being propagated to other nodes. Unfortunately, to test the fix for #4339 we need a test with multiple nodes, so we cannot test it here in a unit test, and will instead use the dtest framework, in a separate patch. Fixes #4339 Branches: 3.0 Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190320063437.32731-1-nyh@scylladb.com>	2019-03-20 09:14:59 -03:00
Nadav Har'El	ccf731a820	Materialized views: add metric for current flow-control delay The materialized views flow control mechanism works by adding a certain delay to each client request, designed to slow down the client to the rate at we can complete the background view work. Until now we could observe this mechanism only indirectly, in whether or not it succeeded to keep the view backlog bounded; But we had no way to directly observe the delay that we decided to add. In fact, we had a bug where this delay was constantly zero, and we didn't even notice :-) So in this patch we add a new metric, scylla_storage_proxy_coordinator_last_mv_flow_control_delay The metric is a floating point number, in units of seconds. This metric is somewhat peculiar that it always contains the last delay used for some request - unlike other metrics it doesn't measure the "current" value of something. Moreover, it can jump wildly because there is no guarantee that each request's delay will be identical (in particular, different requests may involve different base replicas which have different view backlogs, so decide on different delays). In the future we may want to supplement this metric with some sort of delay histogram. But even this simple metric is already useful to debug certain scenarios and understand if the materialized-views flow control is working or not. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190227133630.26328-1-nyh@scylladb.com>	2019-03-20 09:14:59 -03:00
Tomasz Grabiec	fbeae4ffeb	toolchain: Install gdb in the image Scylla built using the frozen toolchain needs to be debugged on a system with matching libraries. It's easiest if it's also done on the same image. Install gdb in the image so that it's always out there when we need it. Fixes #4329 Message-Id: <1553072393-9145-1-git-send-email-tgrabiec@scylladb.com>	2019-03-20 13:35:26 +02:00
Piotr Sarna	41679de13e	tests: add cases for local index prefix optimization The cases check if incorporating clustering key prefix into the indexed query works fine (i.e. does not require filtering and returns proper rows).	2019-03-20 10:51:27 +01:00
Piotr Sarna	56a0e6d992	tests: add create/drop local index test case	2019-03-20 10:51:27 +01:00
Piotr Sarna	3c61c8e18a	tests: add non-standard names cases to local index tests New test cases cover case-sensitive column/table names and names with non-alphanumeric characters like commas and parentheses.	2019-03-20 10:51:27 +01:00
Piotr Sarna	d664e0e522	tests: add multi pk case for local index tests	2019-03-20 10:51:27 +01:00
Piotr Sarna	3b39029924	tests: add test for malformed local index definitions	2019-03-20 10:51:27 +01:00
Piotr Sarna	4b82011cd3	tests: add local index paging test	2019-03-20 10:51:27 +01:00
Piotr Sarna	8836500fcd	tests: add local indexing test A test case for local indexing is added to the SI suite.	2019-03-20 10:51:27 +01:00
Piotr Sarna	cedec95f8d	cql3: add CREATE INDEX syntax for local indexes In order to create a local index, the syntax used is: CREATE INDEX t ON ((p1, p2, p3), v); where (p1, p2, p3) are partition key columns (all of them), and v is the indexed column.	2019-03-20 10:51:27 +01:00
Piotr Sarna	1fd61c5ac4	cql3: use serialization function to create index target string Instead of building the string manually, a serialization function is called to create a string out of index target list.	2019-03-20 10:51:27 +01:00
Piotr Sarna	757419b524	index: add serialization function for index targets Since target_parser is responsible for deserializing target strings, the function that serializes them belongs in the same class.	2019-03-20 10:51:26 +01:00
Piotr Sarna	074ed2c8a5	index: use proper local index target when adding index With global indexes, target column name is always the same as the string kept in 'options[target]' field. It's not the case for local indexes, and so a proper extracting function is used to get the value.	2019-03-20 10:20:24 +01:00
Piotr Sarna	2fcae3d0ec	index: add parsing target column name from local index targets When (re)creating a local index, the target string needs to be used to parse out the actual indexed column: "(base_pk_part1,base_pk_part2,base_pk_part3),actual_indexed_column". This column is later used to deterine if an index should be applied to a SELECT statement.	2019-03-20 10:20:24 +01:00
Piotr Sarna	e0d7807eed	db: add checking for local index in schema tables Based on which targets the index has, it will be either local or global - local indexes have their full base partition key embedded in their targets.	2019-03-20 10:20:24 +01:00
Piotr Sarna	de5e5ee1a5	index: add checking if serialized target implies local index This utility enables checking if the specified target indicated having a local index, even before base table schema is known.	2019-03-20 10:20:24 +01:00
Piotr Sarna	5672edc149	index: enable parsing multi-key targets Parsing index targets that consist of partition key columns followed by clustering key columns is enabled.	2019-03-20 10:20:24 +01:00
Piotr Sarna	9782381dd4	index: move target parser code to .cc file It will be useful later when expanding the implementation.	2019-03-20 10:20:24 +01:00
Piotr Sarna	25264d61ee	json: add non-throwing overload for to_json_value It will be needed later to avoid unnecessary try-catch blocks.	2019-03-20 10:20:24 +01:00
Piotr Sarna	b46ab76d4b	cql3: add checking for local indexes in has_supporting_index() With local indexes it's not sufficient to check if a single restriction is supported by an index in order to decide that in can be used, because local indexes can be leveraged only when full partition key is properly restricted. (It also serves as a great example why restrictions code would greatly benefit from a facelift! :) )	2019-03-20 10:20:24 +01:00
Piotr Sarna	87f6e37caa	cql3: move finding index restrictions to prepare stage Index restrictions that match a given index were recomputed during execution stage, which is redundant and prone to errors. Now, used index restrictions are cached in a prepare statement.	2019-03-20 10:20:22 +01:00
Piotr Sarna	9823898b27	cql3: add picking an index by score Instead of choosing the first index that we find (in column def order), the index with highest score is picked. Currently local indexes score higher than global ones if restrictions allow local indexing to be applied.	2019-03-20 10:20:02 +01:00
Piotr Sarna	2f173f7ed8	cql3: add handling paging state for local indexes When computing paging state for local indexes, the partition and clustering keys are different than with global ones: - partition key is the same as base's - clustering key starts with the indexed column	2019-03-20 10:20:02 +01:00
Piotr Sarna	75dd964751	cql3: add handling partition slices for local indexes For local indexes, a slice will consist of the indexed column followed by base clustering columns.	2019-03-20 10:20:01 +01:00
Piotr Sarna	b12162c8f5	cql3: add returning correct partition ranges for local indexes Local indexes always share the partition range with their base.	2019-03-20 09:51:46 +01:00
Piotr Sarna	da8e8f18b3	cql3: make read_posting_list a member function It already accepts several arguments that can be extracted from 'this', and more will be added in the future. New parameters include lambdas prepared during prepare stage that define how to extract partition/clustering key ranges depending on which index is used, so keeping it a static function will result in unbounded number of parameters with complex types, which will in turn make the function header almost illegible for a reader. Hence, read_posting_list becomes a member function with easy access to any data prepared during prepare stage.	2019-03-20 09:51:46 +01:00
Piotr Sarna	85017c5ad4	cql3: look for indexed column definition only once There's no need to look for the column definition inside a loop.	2019-03-20 09:51:46 +01:00
Piotr Sarna	8002471c81	cql3: allow index target to keep multiple columns Instead of having just one column definition, index target is now a variant of either single column definition or a vector of them. The vector is expected to be used when part of a target definition is enclosed in parentheses: $ CREATE INDEX ON t((p),v); or $ CREATE INDEX ON t((p1,p2), v); etc. This feature will allow providing (possibly composite) base partition key to CREATE INDEX statement, which will result in creating a local index.	2019-03-20 09:51:46 +01:00
Piotr Sarna	a45022dbc7	docs: document index target serialization Index target serialization format is extended for the purpose of local indexing. Both new and old formats are described in docs.	2019-03-20 09:51:46 +01:00
Piotr Sarna	9c984f9da9	index: fix indentation	2019-03-20 09:51:46 +01:00
Piotr Sarna	3b908b7b5d	index: add base partition keys to local index schema When the index is local, its partition key in underlying materialized view is the the same as base's, and the indexed column is a first clustering key. This implementation ensures that view and base rows will reside on the same partition, while querying the indexed column will be possible by putting it as a first clustering key part.	2019-03-20 09:51:46 +01:00
Piotr Sarna	90d47ca183	schema: add is_local_index cached value to index metadata In order to quickly distinguish global indexes from local ones, a cached boolean value is introduced.	2019-03-20 09:51:46 +01:00
Botond Dénes	ddf795d2f9	configure.py: add check header targets Our guidelines dictate that each header is self-sufficient, i.e. after including it into an empty .cc file, the .cc file can be compiled without having to include any other header file. Currently we don't have any tool to check that a header is self sufficient. This patch aims to remedy that by adding a target to check each header, as well as a target to check all the headers. For each header a target is generated that does the equivalent of including the header into an empty .cc file, then compiling the resulting .cc file.This targetis called {header_name}.o, so for given the header `myheader.hh` this will be `build/dev/myheader.hh.o` (if the dev build-mode is used). Also a target, `checkheaders` is added which validates all headers in the project. This currently fails as we have many headers that are not self-sufficient. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <fdf550dc71203417252f1d8144e7a540eec074a1.1552636812.git.bdenes@scylladb.com>	2019-03-19 17:35:18 +02:00
Botond Dénes	721dd70d93	scylla-gdb.py: add scylla fiber command The scylla fiber command traverses a continuation chain, given an arbitrary task pointer. Example (cropped for brevity): (gdb) scylla fiber this #0 (task) 0x0000600000550360 0x000000000468ac40 vtable for seastar... #1 (task) 0x0000600000550300 0x00000000046c3778 vtable for seastar... #2 (task) 0x00006000018af600 0x00000000046c37a0 vtable for seastar... #3 (task) 0x00006000005502a0 0x00000000046c37f0 vtable for seastar... #4 (task*) 0x0000600001a65e10 0x00000000046c6b10 vtable for seastar... scylla fiber can be passed any expression that evaluates to a task pointer. C++ variables, raw adresses and GDB variables (e.g. $1) all work. The command works by scanning the task object for pointers. If a pointer is found it is dereferenced. If successful it checks that the pointer dereferences to a vtable, the class for which is a known task. If this succeeds the found task is saved, the scan then recursively proceeds to scan the newly found task until a task with no further attached continuations is found.	2019-03-19 17:06:41 +02:00
Botond Dénes	697fc5cefe	scylla-gdb.py: scylla_ptr: accept any valid gdb expression as input	2019-03-19 17:06:41 +02:00
Botond Dénes	e1ea4db7ca	scylla-gdb.py: scylla_ptr: make analyze() usable for outside code Instead of a formatted message, intended for humans, return a `pointer_metadata` object, suitable for being using by code. The formatting of the pointer metadata into the human readable message is now done by the `pointer_metadata.__str__()` method, on the call site. Also make `analyze()` a class method, making it possible for being called without having to create a `scylla_ptr` command instance, possibly confusing GDB.	2019-03-19 17:06:41 +02:00
Botond Dénes	e77b6d12d1	scylla-gdb.py: scylla_ptr: move actual logic into analyze() In preparation to this method being made usable for outside code.	2019-03-19 17:06:41 +02:00
Botond Dénes	7d5c0ff666	scylla-gdb.py: resolve(): add cache parameter Allow callers to prevent the resolved name from being saved. Useful when one is just probing addresses but doesn't want to flood the cache with useless symbols.	2019-03-19 17:06:41 +02:00
Botond Dénes	48b96d25b3	scylla-gdb.py: fix tasks and task-stats commands These two commands are broken for some time, roughly since the CPU scheduler was merged. Fix them and move the task queue parsing code into a common method, which now is used by both commands.	2019-03-19 17:06:41 +02:00
Botond Dénes	87c28df429	scylla-gdb.py: fix existing documentation Some commands are documented, but not in the python way. Refactor these commands so they use the standard python way for self documenting. In addition to being more "python", this makes these documentation strings discoverable by GDB so they appear in the `help scylla` output.	2019-03-19 17:06:41 +02:00
Botond Dénes	e1dffc3850	scylla-gdb.py: std_unique_ptr: add get() method Add a `get()` method that retrieves the wrapped pointer without dereferencing it. All existing methods are refactored to use this new method to obtain the pointer instead of directly accessing the members. This way only a single method has to be fixed if the object implementation changes.	2019-03-19 17:06:41 +02:00
Botond Dénes	c51b11c0ed	scylla-gdb.py: fix static_vector Appearantly a new 'dummy' level was added.	2019-03-19 17:06:41 +02:00
Glauber Costa	7119440cbc	tests: make sure that commitlog replay works after truncate. Tomek and I recently had a discussion about whether or not a commitlog replay would be safe after we dropped or truncated a table that is not flushed (durable, but auto_snapshots being false). While we agreed that would be the safe, we both agreed we would feel better with a unit test covering that. This patch adds such a test (btw, it passes) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190318223811.6862-1-glauber@scylladb.com>	2019-03-19 11:30:51 +01:00
Avi Kivity	0441b59a70	Update seastar submodule * seastar 463d24e...33baf62 (3): > reactor: improve detection of io_pgetevents() > rpc: fix stack use after free in frame reading functions > core/thread: enable move-only functions	2019-03-19 11:44:35 +02:00
Takuya ASADA	32cee92d56	dist/debian: don't strip ld.so On some environment dh_strip fails at libreloc/ld.so, so it's better to skip too just like libprotobuf.so.15. error message is: dh_strip -Xlibprotobuf.so.15 --dbg-package=scylla-server-dbg strip:debian/scylla-server/opt/scylladb/libreloc/ld.so[.gnu.build.attributes]: corrupt GNU build attribute note: bad description size: Bad value dh_strip: strip --remove-section=.comment --remove-section=.note --strip-unneeded debian/scylla-server/opt/scylladb/libreloc/ld.so returned exit code 1 0 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190319005153.26506-1-syuu@scylladb.com>	2019-03-19 11:06:44 +02:00
Asias He	71bf757b2c	gossiper: Enable features only after gossip is settled n1, n2, n3 in the cluster, shutdown n1, n2, n3 start n1, n2 start n3, we saw features are enabled using the system table while n1 and n2 are already up and running in the cluster. INFO 2019-02-27 09:24:41,023 [shard 0] gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} INFO 2019-02-27 09:24:41,025 [shard 0] storage_service - Starting up server gossip INFO 2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.1 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} INFO 2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.2 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} The problem is we enable the features too early in the start up process. We should enable features after gossip is settled. Fixes #4289 Message-Id: <04f2edb25457806bd9e8450dfdcccc9f466ae832.1551406991.git.asias@scylladb.com>	2019-03-18 18:25:29 +01:00
Dejan Mircevski	c7d05b88a6	Update GCC version check in configure.py This brings the version check up-to-date with README.md and HACKING.md, which were updated by commit fa2b03 ("Replace std::experimental types with C++17 std version.") to say that minimum GCC 8.1.1 is required. Tests: manually run configure.py with various `--compiler` values. Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190318130543.24982-1-dejan@scylladb.com>	2019-03-18 15:24:25 +02:00
Tomasz Grabiec	33f15aa1b5	tests: sstables: Test reading of static compact sstable generated by Cassandra	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	c78568daef	tests: sstables: Add test for writing and reading of static compact tables	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	47ca280e57	sstables: mc: Write static compact tables the same way as Cassandra Static compact tables are tables with compact storage and no clustering columns. Before this patch, Scylla was writing rows of static compact tables as clustered rows instead of static rows. That's because in our in-memory model such tables have regular rows and no static row. In Cassandra's schema (since 3.x), those tables have columns which are marked as static and there are no regular columns. This worked fine as long as Scylla was writing and reading those sstables. But when importing sstables from Cassandra, our reader was skipping the static row, since it's not present in the schema, and returning no rows as a result. Also, Cassandra, and Scylla tools, would have problems reading those sstables. Fix this by writing rows for such tables the same way as Cassandra does. In order to support rolling downgrade, we do that only when all nodes are upgraded. Fixes #4139.	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	b0ff68d8d9	sstable: mc: writer: Set _static_row_written inside write_static_row()	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	b68df143a1	sstables: Add sstable::features()	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	cf9721e855	sstables: mc: writer: Prepare write_static_row() for working with any column_kind	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	fefef7b9eb	storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag When enabled on all nodes, sstable writers will start to produce correct MC-format sstables for compact storage tables by writing rows into the static row (like C*) rather than into the regular row. We only do that when all nodes are upgraded to support rolling downgrade. After all nodes are upgraded, regular rolling downgrade will not be possible. Refs #4139	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	52d634025d	sstables: mc: writer: Build indexed_columns together with serialization_header The set of columns in both must match, so it's better to build them together. Later the for choosing columns will become more complicated, and this patch will allow for avoiding duplication.	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	701ac53b80	sstables: mc: writer: De-optimize make_serialization_header() So that it's easier to make it use schema_v3 conditionally in later patches. It's not on the hot path, so it shouldn't matter that we don't reserve the vectors.	2019-03-18 11:15:18 +01:00
Tomasz Grabiec	8bb8d67a93	sstable: mc: writer: Move attaching of mc-specific components out of generic code	2019-03-18 11:15:18 +01:00
Tomasz Grabiec	b0e6f17a22	Merge "Fix empty remote common_features in check_knows_remote_features" from Asias Three nodes in the cluster node1, node2, node3 Shutdown the whole cluster Start node1 Start node2, node2 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.2 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't started yet, node1 sees node3 has empty features. In get_supported_features(), an empty common features will be returned if an empty features of a node is seen. To fix, we should fallback to use the features saved in system table. Start node3, node3 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't inserted its own features into gossip endpoint_state_map. get_supported_features() returns the common features of all nodes in endpoint_state_map. To fix, we should fallback to use the features stored in the system table for such node in this case. Fixes #4225 Fixes #4341 * dev asias/fix_check_knows_remote_features.upstream.v4.1: gossiper: Remove unused register_feature and unregister_feature gossiper: Remove unused wait_for_feature_on_all_node and wait_for_feature_on_node gossiper: Log feature is enabled only if the feature is not enabled previously gossiper: Fix empty remote common_features in check_knows_remote_features	2019-03-18 10:56:10 +01:00
Asias He	1d59f26c11	gossiper: Fix empty remote common_features in check_knows_remote_features Three nodes in the cluster node1, node2, node3 Shutdown the whole cluster Start node1 Start node2, node2 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.2 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't started yet, node1 sees node3 has empty features. In get_supported_features(), an empty common features will be returned if an empty features of a node is seen. To fix, we should fallback to use the features saved in system table. Start node3, node3 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't inserted its own features into gossip endpoint_state_map. get_supported_features() returns the common features of all nodes in endpoint_state_map. To fix, we should fallback to use the features stored in the system table for such node in this case. Fixes #4225	2019-03-18 10:56:10 +01:00
Asias He	acb4badbc3	gossiper: Log feature is enabled only if the feature is not enabled previously We saw the log "Feature FOO is enabled" more than once like below. It is better to log it only when the feature is not enabled previously. gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL gossip - Feature CORRECT_COUNTER_ORDER is enabled gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled gossip - Feature COUNTERS is enabled gossip - Feature DIGEST_MULTIPARTITION_READ is enabled gossip - Feature INDEXES is enabled gossip - Feature LARGE_PARTITIONS is enabled gossip - Feature LA_SSTABLE_FORMAT is enabled gossip - Feature MATERIALIZED_VIEWS is enabled gossip - Feature MC_SSTABLE_FORMAT is enabled gossip - Feature RANGE_TOMBSTONES is enabled gossip - Feature ROLES is enabled gossip - Feature ROW_LEVEL_REPAIR is enabled gossip - Feature SCHEMA_TABLES_V3 is enabled gossip - Feature STREAM_WITH_RPC_STREAM is enabled gossip - Feature TRUNCATION_TABLE is enabled gossip - Feature WRITE_FAILURE_REPLY is enabled gossip - Feature XXHASH is enabled gossip - Feature CORRECT_COUNTER_ORDER is enabled gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled gossip - Feature COUNTERS is enabled gossip - Feature DIGEST_MULTIPARTITION_READ is enabled gossip - Feature INDEXES is enabled gossip - Feature LARGE_PARTITIONS is enabled gossip - Feature LA_SSTABLE_FORMAT is enabled gossip - Feature MATERIALIZED_VIEWS is enabled gossip - Feature MC_SSTABLE_FORMAT is enabled gossip - Feature RANGE_TOMBSTONES is enabled gossip - Feature ROLES is enabled gossip - Feature ROW_LEVEL_REPAIR is enabled gossip - Feature SCHEMA_TABLES_V3 is enabled gossip - Feature STREAM_WITH_RPC_STREAM is enabled gossip - Feature TRUNCATION_TABLE is enabled gossip - Feature WRITE_FAILURE_REPLY is enabled gossip - Feature XXHASH is enabled gossip - InetAddress 127.0.0.2 is now UP, status = NORMAL	2019-03-18 10:56:10 +01:00
Asias He	f32f08c91e	gossiper: Remove unused wait_for_feature_on_all_node and wait_for_feature_on_node Remove unused check_features helper as well.	2019-03-18 10:56:09 +01:00
Asias He	6dbcb2e0c9	gossiper: Remove unused register_feature and unregister_feature They are not used any more.	2019-03-18 10:56:09 +01:00
Benny Halevy	ecf88d8e2e	compaction: fix sstable_window_size calculation is only unit/size is set If a user that changes the default UNIT from DAYS to HOURS and does not set the compaction_window_size will endup with a window of 24H instead of 1H. According to the docs https://docs.scylladb.com/getting-started/compaction/#twcs-options compaction_window_size should default to a value of 1. Fixes #4310 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190307131318.13998-1-bhalevy@scylladb.com>	2019-03-18 11:19:18 +02:00
Takuya ASADA	02be95365f	reloc/build_rpm.sh: don't use '*' for tar xf argument It works accidentally but it just expanded by bash to use mached files in current directory, not correctly recognized by tar. Need to use full file name instead. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190312172243.5482-2-syuu@scylladb.com>	2019-03-18 11:09:55 +02:00
Takuya ASADA	5b10b6a0ce	reloc/build_reloc.sh: enable DPDK We get following link error when running reloc/build_reloc.sh in dbuild, need to enable DPDK on Seastar: g++: error: /usr/lib64/librte_cfgfile.so: No such file or directory g++: error: /usr/lib64/librte_cmdline.so: No such file or directory g++: error: /usr/lib64/librte_ethdev.so: No such file or directory g++: error: /usr/lib64/librte_hash.so: No such file or directory g++: error: /usr/lib64/librte_kvargs.so: No such file or directory g++: error: /usr/lib64/librte_mbuf.so: No such file or directory g++: error: /usr/lib64/librte_eal.so: No such file or directory g++: error: /usr/lib64/librte_mempool.so: No such file or directory g++: error: /usr/lib64/librte_mempool_ring.so: No such file or directory g++: error: /usr/lib64/librte_pmd_bnxt.so: No such file or directory g++: error: /usr/lib64/librte_pmd_e1000.so: No such file or directory g++: error: /usr/lib64/librte_pmd_ena.so: No such file or directory g++: error: /usr/lib64/librte_pmd_enic.so: No such file or directory g++: error: /usr/lib64/librte_pmd_fm10k.so: No such file or directory g++: error: /usr/lib64/librte_pmd_qede.so: No such file or directory g++: error: /usr/lib64/librte_pmd_i40e.so: No such file or directory g++: error: /usr/lib64/librte_pmd_ixgbe.so: No such file or directory g++: error: /usr/lib64/librte_pmd_nfp.so: No such file or directory g++: error: /usr/lib64/librte_pmd_ring.so: No such file or directory g++: error: /usr/lib64/librte_pmd_sfc_efx.so: No such file or directory g++: error: /usr/lib64/librte_pmd_vmxnet3_uio.so: No such file or directory g++: error: /usr/lib64/librte_ring.so: No such file or directory Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190312172243.5482-1-syuu@scylladb.com>	2019-03-18 11:09:55 +02:00
Piotr Sarna	2e05d86cf3	service: reduce number of spawned threads when notifying Commit `9c544df217` introduced running up/down/join/leave notifications in threaded context, but spawned a thread for every notification, while it could be done once for all notifiees. Reported-by: Avi Kivity <avi@scylladb.com> Message-Id: <34815d5aa11902c4a052cff38f4c45c45ff919d8.1552897848.git.sarna@scylladb.com>	2019-03-18 10:45:47 +02:00
Avi Kivity	64fa2dd1d2	Merge "gdb: Introduce 'scylla sstables'" from Tomasz " Finds all sstables on current shard and prints useful information, like on-disk and in-memory usage. Example: (gdb) scylla sstables (sstables::sstable) 0x60100034d200: local=1 data_file=9551, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x601000348600: local=1 data_file=1229, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x601000348000: local=1 data_file=4785, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x60100034c600: local=1 data_file=298, in_memory=266192 (bf=400, summary=3072, sm=262096) ... total (shard-local): count=144, data_file=782839677, in_memory=59774408 Because of the way it finds sstables (bag_sstable_set), doesn't yet support tables using LeveledCompactionStrategy. " * 'gdb-scylla-sstables' of github.com:tgrabiec/scylla: gdb: Introduce 'scylla sstables' gdb: Introduce find_instances() gdb: Extract std_unqiue_ptr.get() gdb: Add chunked_vector wrapper gdb: Add small_vector wrapper gdb: Add circular_buffer.size() and circular_buffer.external_memory_footprint() gdb: Add wrapper for seastar::lw_shared_ptr gdb: Add std_vector.external_memory_footprint() gdb: Add wrapper for boost::variant gdb: Add wrapper for std::optional	2019-03-17 19:37:44 +02:00
Takuya ASADA	270f9cf9e6	dist/debian: fix installing scyllatop Since we removed dist/common/bin/scyllatop we are getting a build error on .deb package build (`1bb65a0888`). To fix the error we need to create a symlink for /usr/bin/scyllatop. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190316162105.28855-1-syuu@scylladb.com>	2019-03-17 19:37:44 +02:00
Tomasz Grabiec	05e2c87936	gdb: Introduce 'scylla sstables' Finds all sstables on current shard and prints useful information, like on-disk and in-memory usage. Example: (gdb) scylla sstables (sstables::sstable) 0x60100034d200: local=1 data_file=9551, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x601000348600: local=1 data_file=1229, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x601000348000: local=1 data_file=4785, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x60100034c600: local=1 data_file=298, in_memory=266192 (bf=400, summary=3072, sm=262096)	2019-03-15 15:12:48 +01:00
Tomasz Grabiec	929653f51d	gdb: Introduce find_instances()	2019-03-15 15:12:48 +01:00
Tomasz Grabiec	fc4952c579	gdb: Extract std_unqiue_ptr.get()	2019-03-15 15:12:48 +01:00
Tomasz Grabiec	e47a5019f2	gdb: Add chunked_vector wrapper	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	a6da71e4da	gdb: Add small_vector wrapper	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	0e8589cfdf	gdb: Add circular_buffer.size() and circular_buffer.external_memory_footprint()	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	380c6fbdfe	gdb: Add wrapper for seastar::lw_shared_ptr	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	93e5e0d644	gdb: Add std_vector.external_memory_footprint()	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	8866b1320a	gdb: Add wrapper for boost::variant	2019-03-15 15:12:46 +01:00
Tomasz Grabiec	dd237c32af	gdb: Add wrapper for std::optional	2019-03-15 15:12:46 +01:00
Paweł Dziepak	f4f56027bf	Merge "Detect partitioner mismatch" from Piotr " Refuse to accept SSTables that were created with partitioner different than the one used by the Scylla server. Fixes #4331 " * 'haaawk/4331/v4' of github.com:scylladb/seastar-dev: sstables: Add test for sstable::validate_partitioner sstables: Add sstable::validate_partitioner and use it	2019-03-15 11:45:10 +00:00
Piotr Jastrzebski	2b0437a147	sstables: Add test for sstable::validate_partitioner Make sure the exception is thrown when Scylla tries to load an SSTable created with non-compatible partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-03-15 10:47:47 +01:00
Piotr Jastrzebski	4aea97f120	sstables: Add sstable::validate_partitioner and use it Scylla server can't read sstables that were created with different partitioner than the one being used by Scylla. We should make sure that Scylla identifies such mismatch and refuses to use such SSTables. We can use partitioner information stored in validation metadata (Statistics.db file) for each SSTable and compare it against partitioner used by Scylla. Fixes #4331 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-03-15 10:14:37 +01:00
Rafael Ávila de Espíndola	94c28cfb16	sstable: Wait for future returned by maybe_record_large_cells. A previous version of the patch that introduced these calls had no limit on how far behind the large data recording could get, and maybe_record_large_cells returned null. The final version switched to a semaphore, but unfortunately these calls were not updated. Tests: unit (dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190314195856.66387-1-espindola@scylladb.com>	2019-03-14 21:01:37 +01:00
Paweł Dziepak	349601ac32	sstable: pass full length of buffer to vint deserialiser vint deserialiser can be more performant if it is allowed to do an overread (i.e. read more memory than the value it is deserialising). In case of sstable reads those vints are going to be usually in a middle of a much larger buffer so lets pass the whole length of the buffer and enable this optimisation.	2019-03-14 13:37:06 +00:00
Paweł Dziepak	552fc0c6b9	vint: optimise deserialisation routine At the moment, vint deserialisation is using a naive approach, reading each byte separately. In practice, vints are going to most often appears inside larger buffers. That means we can read 8-bytes at a time end then figure out unneded parts and mask them out. This way we avoid a loop and do less memory loads which are much more expensive than arithmetic operations (even if they hit the cache).	2019-03-14 13:37:06 +00:00
Paweł Dziepak	57de2c26b3	vint: drop deserialize_type structure Deserialisation function returns a structure containing both the value and its length in the input buffer. In the vast majority of the cases the caller will already know the length and having this structure will make it harder for the compiler to emit good code, especially if the function is not inlined. In practice I've seen the structure causing register pressure problems that lead to spilling variables to memory.	2019-03-14 13:37:06 +00:00
Paweł Dziepak	6110278439	tests/vint: reduce test dependencies vint serialisation test doesn't need whole Scylla so lets reduce its dependencies to improve build times.	2019-03-14 13:37:06 +00:00
Paweł Dziepak	54a079cdb5	tests/perf: add performance test for vint serialisation	2019-03-14 13:37:06 +00:00
Piotr Sarna	9c544df217	service: run notifying code in threaded context In order to allow yielding when handling endpoint lifecycle changes, notifiers now run in threaded context. Implementations which used this assumption before are supplemented with assertions that they indeed run in seastar::async mode. Fixes #4317 Message-Id: <45bbaf2d25dac314e4f322a91350705fad8b81ed.1552567666.git.sarna@scylladb.com>	2019-03-14 12:56:53 +00:00
Piotr Sarna	a7602bd2f1	database: add global view update stats Currently view update metrics are only per-table, but per-table metrics are not always enabled. In order to be able to see the number of generated view updates in all cases, global stats are added. Fixes #4221 Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>	2019-03-14 12:04:18 +00:00
Paweł Dziepak	d4d2eb2ed5	Update seastar submodule * seastar e640314...463d24e (3): > Merge 'Handle IOV_MAX limit in posix_file_impl' from Paweł > core: remove unneeded 'exceptional future ignored' report > tests/perf: support multiple iterations in a single test run	2019-03-13 14:24:58 +00:00
Tomasz Grabiec	2ef9d9c12e	Merge "Record large cells to system.large_cells" from Rafael Issue #4234 asks for a large collection detector. Discussing the issue Benny pointed out that it is probably better to have a generic large cell detector as it makes a natural progression on what we already warn on (large partitions and large rows). This patch series implements that. It is on top of shutdown-order-patches-v7 which is currently on next. With the charges to use a semaphore this patch series might be getting a bit big. Let me know if I should split it. * https://github.com/espindola/scylla espindola/large-cells-on-top-of-shutdown-v5: db: refactor large data deletion code db: Rename (maybe_)?update_large_partitions db: refactor a try_record helper large_data_handler: assert it is not used after stop() db: don't use _stopped directly sstables: delete dead error handling code. large_data_handler: Remove const from a few functions large_data_handler: propagate a future out of stop() large_data_handler: Run large data recording in parallel Create a system.large_cells table db: Record large cells Add a test for large cells	2019-03-13 09:44:57 +01:00
Rafael Ávila de Espíndola	f983570ac8	Add a test for large cells Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	63251b66c1	db: Record large cells Fixes #4234. Large cells are now recorded in system.large_cells. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	d17083b483	Create a system.large_cells table This is analogous to the system.large_rows table, but holds individual cells, so it also needs the column name. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	8b4ae95168	large_data_handler: Run large data recording in parallel With this changes the futures returned by large_data_handler will not normally wait for entries to be written to system.large_rows or system.large_partitions. We use a semaphore to bound how behind system.large_* table updates can get. This should avoid delaying sstables writes in the common case, which is more relevant once we warn of large cells since the the default threshold will be just 1MB. Note that there is no ordering between the various maybe_record_* and maybe_delete_large_data_entries requests. This means that we can end up with a stale entry that is only removed once the TTL expires. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	54b856e5e4	large_data_handler: propagate a future out of stop() stop() will close a semaphore in a followup patch, so it needs to return a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	989ab33507	large_data_handler: Remove const from a few functions These will use a member semaphore variable in a followup patch, so they cannot be const. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	0b763ec19b	sstables: delete dead error handling code. maybe_delete_large_data_entries handles exceptions internally, so the code this patch deletes would never run. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	5fcb3ff2d7	db: don't use _stopped directly This gives flexibility in how it is implemented. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	a17a936882	large_data_handler: assert it is not used after stop() This should have been changed in the patch db: stop the commit log after the tables during shutdown But unfortunately I missed it then. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	f3089bf3d1	db: refactor a try_record helper We had almost identical error handling for large_partitions and large_rows. Refactor in preparation for large_cells. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:02 -07:00
Rafael Ávila de Espíndola	d7f263d334	db: Rename (maybe_)?update_large_partitions This renames it to record_large_partitions, which matches record_large_rows. It also changes the signature to be closer to record_large_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola	f254664fe6	db: refactor large data deletion code The code for deleting entries from system.large_partitions was almost a duplicate from the code for deleting entries from system.large_rows. This patch unifies the two, which also improves the error message when we fail to delete entries from system.large_partitions. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:16:04 -07:00
Asias He	b8158dd65d	streaming: Get rid of the keep alive timer in streaming There is no guarantee that rpc streaming makes progress in some time period. Remove the keep alive timer in streaming to avoid killing the session when the rpc streaming is just slow. The keep alive timer is used to close the session in the following case: n2 (the rpc streaming sender) streams to n1 (the rpc streaming receiver) kill -9 n2 We need this because we do not kill the session when gossip think a node is down, because we think the node down might only be temporary and it is a waste to drop the previous work that has done especially when the stream session takes long time. Since in range_streamer, we do not stream all data in a single stream session, we stream 10% of the data per time, and we have retry logic. I think it is fine to kill a stream session when gossip thinks a node is down. This patch changes to close all stream session with the node that gossip think it is down. Message-Id: <bdbb9486a533eee25fcaf4a23a946629ba946537.1551773823.git.asias@scylladb.com>	2019-03-12 12:20:28 +01:00
Duarte Nunes	2718c90448	Merge 'Add canceling long-standing view update requests' from Piotr " This series allows canceling view update requests when a node is discovered DOWN. View updates are sent in the background with long timeout (5 minutes), and in case we discover that the node is unavailable, there's no point in waiting that long for the request to finish. What's more, waiting for these requests occurs on shutdown, which may result in waiting 5 minutes until Scylla properly shuts down, which is bad for both users and dtests. This series implements storage_proxy as a lifecycle subscriber, so it can react to membership changes. It also keeps track of all "interruptible" writes per endpoint, so once a node is detected as DOWN, an artificial timeout can be triggered for all aforementioned write requests. Fixes #3826 Fixes #3966 Fixes #4028 " * 'write_hints_for_view_updates_on_shutdown_4' of https://github.com/psarna/scylla: service: remove unused stop_hints_manager storage_proxy: add drain_on_shutdown implementation main: register storage proxy as lifecycle subscriber storage_proxy: add endpoint_lifecycle_subscriber interface storage_proxy: register view update handlers for view write type storage_proxy: add intrusive list of view write handlers storage_proxy: add view_update_write_response_handler	2019-03-08 13:34:46 -03:00
Piotr Sarna	ae52b3baa7	tests: fix complex timestamp test flakiness Complex timestamp tests were ported from dtest and contained a potential race - rows were updated with TTL 1 and then checked if the row exists in both base and view replicas in an eventually() loop. During this loop however, TTL of 1 second might have already passed and the row could have been deleted from base. This patch changes the mentioned TTL to 30 seconds, making the tests extremely unlikely to be flaky. Message-Id: <6b43fe31850babeaa43465eb771c0af45ee6e80d.1552041571.git.sarna@scylladb.com>	2019-03-08 13:34:27 -03:00
Tomasz Grabiec	eb5506275b	Merge "Further enhancements to perf_fast_forward" from Paweł This series contains several improvements to perf_fast_forward that either address some of the problems seen in the automated runs or help understanding the results. The main problem was that test small-partition-slicing had a preparation stage disproportionally long compared to the actual testing phase. While the fragments per second results wasn't affected by that, it restricted the number of iterations of the test that we were able to run, and the test which single iterations is short (and more prone to noise) was executed only four times. This was solved by sharing the preparation stage with all iterations, thus enabling the test to be run many times and improving the stability of the results. Another, improvement is the ability to dump all test results and process them producing histograms. This allows us to see how the distribution of particular statistics looks like and if there are some complications. Refs #4278. * https://github.com/pdziepak/scylla.git more-perf_fast_forward/v1: tests/perf_fast_forward: print number of iterations of each test tests/perf_fast_forward: reuse keys in small partition slicing test tests/perf_fast_forward: extract json result file writing logic tests/perf_fast_forward: add an option to dump all results tests/perf_fast_forward: add script for analysing full results	2019-03-07 12:22:13 -03:00
Piotr Sarna	aea4b7ea78	service: remove unused stop_hints_manager Stopping hints manager now occurs when draining storage proxy and it shouldn't be executed independently, so it's removed from external API.	2019-03-07 13:44:06 +01:00
Piotr Sarna	cc806909d7	storage_proxy: add drain_on_shutdown implementation When storage proxy is shutting down, all interruptible writes can be timed out in order not to wait for them. Instead, the mechanism will fall back to storing hints and/or not progressing with view building.	2019-03-07 13:44:05 +01:00
Piotr Sarna	c61d0ee8aa	main: register storage proxy as lifecycle subscriber In order to be able to act when node joins/leaves, storage proxy is registered as an endpoint lifecycle subscriber. Fixes #3826 Fixes #4028	2019-03-07 12:10:40 +01:00
Piotr Sarna	92df1d5a6b	storage_proxy: add endpoint_lifecycle_subscriber interface Storage proxy is able to react to membership changes in order to cancel long-standing operations for an endpoint.	2019-03-07 12:10:40 +01:00
Piotr Sarna	f9ff97511f	storage_proxy: register view update handlers for view write type View update handlers have a specialized class, so all writes of type write_type::VIEW are now registered as such.	2019-03-07 12:10:40 +01:00
Piotr Sarna	75ec5fa876	storage_proxy: add intrusive list of view write handlers In order to be able to iterate over view update write response handlers, an intrusive list of them is added to storage proxy. This way iteration can be easily yielded without invalidating operators and all logic is moved to slow path.	2019-03-07 12:10:40 +01:00
Piotr Sarna	c2048a0758	storage_proxy: add view_update_write_response_handler View update write response handler inherits from a regular write response handler, but it's also possible to link it intrusively in order to be able to induce timeouts on them later.	2019-03-07 12:10:40 +01:00
Paweł Dziepak	0ba7a3c55a	tests/perf_fast_forward: add script for analysing full results perf_fast_forward with flag --dump-all-results reports the results of every test iteration that was executed. This patch introduces a python script that can analyse those results (in json format) and present them in a more human-friendly way. For now, the only option is to plot histograms of selected statistics.	2019-03-06 15:48:49 +00:00
Paweł Dziepak	4220b90b22	tests/perf_fast_forward: add an option to dump all results perf_fast_forward runs each test case multiple times and reports a summary of those results (median, min, max, and median absolute deviation). While very convenient the summary may hide some important information (e.g. the distribution of the results). This patch adds an option to report results of every single executed iteration.	2019-03-06 15:48:48 +00:00
Paweł Dziepak	55ed8b2472	tests/perf_fast_forward: extract json result file writing logic We are about to report, depending on flags, both full results as well as the results summary written now. Most of the logic is going to be identical.	2019-03-06 15:48:45 +00:00
Paweł Dziepak	daafde21c5	tests/perf_fast_forward: reuse keys in small partition slicing test	2019-03-06 15:48:42 +00:00
Paweł Dziepak	0eb1e570aa	tests/perf_fast_forward: print number of iterations of each test	2019-03-06 15:48:38 +00:00
Avi Kivity	0beeb2f721	Merge "implement upgradesstables + scub" from Calle " Fixes #4245 Breaks up "perform_cleanup" in parameterized "rewrite_sstables" and implements upgrade + scrub in terms of this. Both run as a "regular" compaction, but ignore the normal criteria for compaction and select obsolete/all tables. We also ensure all previous compactions are done so we can guarantee all tables are rewritten post invocation of command. " * 'calle/upgrade_sstables' of github.com:scylladb/seastar-dev: api::storage_service: Implement "scrub" api/storage_service: Implement "upgradesstables" api::storage_service: Add keyspace + tables helper compaction_manager: Add perform_sstable_scrub compaction_manager: Add perform_sstable_upgrade compaction_manager: break out rewrite_sstables from cleanup table: parameterize cleanup_sstables	2019-03-06 15:47:26 +02:00
Duarte Nunes	a29ec4be76	Merge 'Update system.large_partitions during shutdown' from Rafael " Currently any large partitions found during shutdown are not recorded. The reason is that the database commit log is already off, so there is nowhere to record it to. One possible solution is to have an independent system database. With that the regular db is shutdown first and writes can continue to the system db. That is a pretty big change. It would also not allow us to record large partitions in any system tables. This patch series instead tries to stop the commit log later. With that any large partitions are recorded to the log and moved to a sstable on the next startup. " * 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla: db: stop the commit log after the tables during shutdown db: stop the compaction manager earlier db: Add a stop_database helper db: Don't record large partitions in system tables	2019-03-06 10:36:38 -03:00
Calle Wilund	ef1bdebd0a	api::storage_service: Implement "scrub"	2019-03-06 13:13:21 +00:00
Calle Wilund	23f4c982ea	api/storage_service: Implement "upgradesstables" Fixes #4245 Implemented as a compation barrier (forcing previous compactions to finish) + parameterized "cleanup", with sstable list based on parameters.	2019-03-06 13:13:21 +00:00
Calle Wilund	3b5588dddd	api::storage_service: Add keyspace + tables helper To avoid repeating code to get keyspace + tables	2019-03-06 13:13:21 +00:00
Calle Wilund	c0bb6a4bef	compaction_manager: Add perform_sstable_scrub Suspiciously similar to an unconditional upgrade	2019-03-06 13:13:21 +00:00
Calle Wilund	7585b8c310	compaction_manager: Add perform_sstable_upgrade Rewrites obsolete/all sstables via compaction	2019-03-06 13:13:21 +00:00
Tomasz Grabiec	889f31fabe	Merge "fix slow truncation under flush pressure" from Glauber Truncating a table is very slow if the system is under pressure. Because in that case we mostly just want to get rid of the existing data, it shouldn't take this long. The problem happens because truncate has to wait for memtable flushes to end, twice. This is regardless of whether or not the table being truncated has any data. 1. The first time is when we call truncate itself: if auto_snapshot is enabled, we will flush the contents of this table first and we are expected to be slow. However, even if auto_snapshot is disabled we will still do it -- which is a bug -- if the table is marked as durable. We should just not flush in this case and it is a silly bug. 1. The second time is when we call cf->stop(). Stopping a table will wait for a flush to finish. At this point, regardless of which path (Durable or non-durable) we took in the previous step we will have no more data in the table. However, calling `flush()` still need to acquire a flush_permit, which means we will wait for whichever memtable is flushing at that very moment to end. If the system is under pressure and a memtable flush will take many seconds, so will truncate. Even if auto_snapshots are enabled, we shouldn't have to flush twice. The first flush should already put is in a state in which the next one is immediate (maybe holding on to the permit, maybe destroying the memtable_list already at that point -> since no other memtables should be created). If auto_snapshots are not enabled, the whole thing should just be instantaneous. This patchset fixes that by removing the flush need when !auto_snapshot, and special casing the flush of an empty table. Fixes #4294 * git@github.com:glommer/scylla.git slowtruncate-v2: database: immediately flush tables with no memtables. truncate: do not flush memtables if auto_snapshot is false.	2019-03-06 13:54:58 +01:00
Eliran Sinvani	479131259e	auth: prevent failure due to race in tables creation This commit rewrites the logic of table creation at startup of the auth mechanism to be race proof. This is done by simply ignoring the already_exists exception as done in system_distributed_keyspace. The old creation logic, tested for existance of the column family and right after called announce_new_column_family with the newly created table schema. The problem was that it does not prevent a race since the announcement itself is a fiber and the created table can still be gossiped from another node, causing the announce function to throw an already_exists exception that in turn crashes scylla. Message-Id: <20190306075016.28131-1-eliransin@scylladb.com>	2019-03-06 13:09:09 +01:00
Rafael Ávila de Espíndola	16ed9a2574	db: stop the commit log after the tables during shutdown This allows for system.large_partitions to be updated if a large partition is found while writing the last sstables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola	a3e1f14134	db: stop the compaction manager earlier We want to finish all large data logging in stop_system, so stopping the compaction manager should be the first thing stop_system does. The make_ready_future<>() will be removed in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola	765d8535f1	db: Add a stop_database helper This reduces code duplication. A followup patch will add more code to stop_database. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:45 -08:00
Rafael Ávila de Espíndola	0b86a99592	db: Don't record large partitions in system tables This will allow us to delay shutdown of all system tables in a uniform way. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 17:52:00 -08:00
Tomasz Grabiec	c584f48c32	Merge "transport: sort bound ranges in read reques in order to conform to cql definitions" from Eliran According to the cql definitions, if no ORDER BY clause is present, records should be returned ordered by the clustering keys. Since the backend returns the ranges according to their order of appearance in the request, the bounds should be sorted before sending it to the backend. This kind of sorting is needed in queries that generates more than one bound to be read, examples to such queris are: 1. a SELECT query with an IN clause. 2. a SELECT query on a mixed order tupple of columns (see #2050). The assumption this commit makes is the correctness of the bounds list, that is, the bounds are non overlapping. If this wasn't true, multiple occurences of the same reccord could have returned for certain queries. Tests: 1. Unit tests release 2. All dtest that requires #2050 and #2029 Fixes #2029	2019-03-05 21:07:15 +01:00
Avi Kivity	3cfbd682ec	Merge "Add JSON support to tuples and UDT" from Piotr " Fixes #3708 This series adds JSON serialization and deserialization procedures to tuples and user defined types. Tests: unit (dev) " * 'add_tuple_and_udt_json_support_2' of https://github.com/psarna/scylla: tests: add test cases for JSON and UDT types: add JSON support to UDT tests: add JSON tuple tests types: add JSON support for tuples	2019-03-05 20:06:15 +02:00
Glauber Costa	c2c6c71398	truncate: do not flush memtables if auto_snapshot is false. Right now we flush memtables if the table is durable (which in practice it almost always is). We are truncating, so we don't want the data. We should only flush if auto_snapshot is true. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-03-05 11:22:48 -05:00
Glauber Costa	ed8261a0fe	database: immediately flush tables with no memtables. If a table has no data, it may still take a long time to flush. This is because before we even try to flush, we need go acquire a permit and that can take a while if there is a long running flush already queued. We can special case the situation in which there is no data in any of the memtables owned by table and return immediately. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-03-05 11:22:48 -05:00
Piotr Sarna	a5c66d5ce1	tests: add test cases for JSON and UDT	2019-03-05 16:25:18 +01:00
Piotr Sarna	ebf0eb92bb	types: add JSON support to UDT User defined types can now be serialized to and deserialized from JSON. Fixes #3708	2019-03-05 16:08:05 +01:00
Piotr Sarna	c2064d152d	tests: add JSON tuple tests	2019-03-05 16:08:05 +01:00
Piotr Sarna	aa0cc8a8a2	types: add JSON support for tuples Tuples can now be serialized to and deserialized from JSON. Refs #3708	2019-03-05 16:08:04 +01:00
Piotr Sarna	e9bc2a7912	cql3: fix error message for lack of primary keys in JSON When any primary key part is not present in INSERT JSON statement, proper error message will be presented to the client. Tests: unit (dev) Message-Id: <3aa99703523c45056396a0b6d97091da30206dab.1551797502.git.sarna@scylladb.com>	2019-03-05 16:54:46 +02:00
Avi Kivity	256b7d34e2	Update seastar submodule * seastar ab54765...e640314 (10): > net: enable IP_BIND_ADDRESS_NO_PORT before binding a socket during connection > core: show address in error message for posix_listen failures > fmt: remove submodule > tests: fix loopback socket close() to not fail when the peer's side is already closed > Merge "Add suffixes to target names" from Jesse > temporary_buffer: improve documentation for alignment param requirements > docs: Fix dependencies for split tutorial target > deleter: prevent early memory free caused by deleter append. > doc/tutorial.md: introduce memory allocation foreign_ptr > Fix CLI help message (network & DPDK options) Toolchain and configure.py updated for fmt submodule removal.	2019-03-05 15:51:38 +02:00
Botond Dénes	817490cda1	tests/multishard_mutation_query_test: fuzzy_test: replace BOOST_WARN_* with logger::debug() fuzzy_test performs some checks that are expected to fail and whoose failure does not influence the outcome of the test. For this it uses the `BOOT_WARN_*` family of macros. These will just log a warning when their predicate fails. This can however confuse someone looking at the logs trying to determine the cause of a failure. Since these checks are performed primarly to provide an aid in debugging failures, replace them with a conditional debug-level log message. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <f550a9d9ab1b5b4aeb4f81860cbd3d924fc86898.1551792035.git.bdenes@scylladb.com>	2019-03-05 15:24:53 +02:00
Botond Dénes	0ed0d3297a	tests/multishard_mutation_query_test: test_abandoned_read: reduce querier TTL The `test_abandoned_read` verifies that an abandoned read does a proper cleanup. One of the things checked is that after the querier TTL expires, the saved queriers are cleaned-up. This check however had a very tight timing. The TTL was 2s and the test waited 2s before it did the check, which is wrapped in an `eventually_true()` (max +1s). The TTL timer scans the queriers with a period of TTL/2 so a querier can live 1.5*TTL time. This means that the 2s + 1s wait time is just on the limit and with some bad luck (and a slow machine) it can fail. Reduce the TTL in this test to 1s to relax the dependence on timing. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <ed0d45b5a07960b83b391d289cade9b9f60c7785.1551787638.git.bdenes@scylladb.com>	2019-03-05 14:10:04 +02:00
Eliran Sinvani	eeb0845be0	unit test: validate order instead of just content in the mixed order token test This change ammends on the functionality of the result generation, it changes the behaviour to return the expected results vector sorted in the expected order of appearance in the result set. Then the result set is validated for both, content and also order. Tests: unit tests (Release) Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2019-03-05 13:51:17 +02:00
Eliran Sinvani	13284d9272	unit test: change IN clause tests to validate with ordering_spec Whenever a query with an IN clause on clustering keys is executed, assuming only one partition, the rows are ordered according to the clustering keys. This commit adds the order validation to the content validation whenever possible (which means removing the ignore order part). Tests: unit tests (Release) Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2019-03-05 13:51:17 +02:00
Eliran Sinvani	7df0c873aa	transport: sort bound ranges in read reques in order to conform to cql definitions According to the cql definitions, if no ORDER BY clause is present, records should be returned ordered by the clustering keys. Since the backend returns the ranges according to their order of appearance in the request, the bounds should be sorted before sending it to the backend. This kind of sorting is needed in queries that generates more than one bound to be read, examples to such queris are: 1. a SELECT query with an IN clause. 2. a SELECT query on a mixed order tupple of columns (see #2050). The assumption this commit makes is the correctness of the bounds list, that is, the bounds are non overlapping. If this wasn't true, multiple occurences of the same reccord could have returned for certain queries. Tests: 1. Unit tests release 2. All dtest that requires #2050 and #2029 Fixes #2029 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2019-03-05 13:51:17 +02:00
Avi Kivity	5993c05a1b	Merge "partitioner: Futurize split_range_to_single_shard" from Asias " Futurize split_range_to_single_shard to fix reactor stall. Fixes: #3846 " * tag 'asias/split_range_to_single_shard/v4' of github.com:scylladb/seastar-dev: partitioner: Futurize split_range_to_single_shard tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc	2019-03-05 11:25:36 +02:00
Asias He	58fae5f4c1	partitioner: Futurize split_range_to_single_shard We saw reactor stalls when closing SSTables. The backtrace looks like: Oct 12 19:00:51 dell-1 scylla[435045]: Backtrace:[Backtrace #0] void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/sylla/scylla/seastar/util/backtrace.hh:56 seastar::backtrace_buffer::append_backtrace() at /home/sylla/scylla/seastar/core/reactor.cc:410 (inlined by) print_with_backtrace at /home/sylla/scylla/seastar/core/reactor.cc:431 seastar::reactor::block_notifier(int) at /home/sylla/scylla/seastar/core/reactor.cc:749 _L_unlock_13 at funlockfile.c:? std::experimental::fundamentals_v1::_Optional_base<range_bound<dht::ring_position>, true>::_Optional_base(std::experimental::fundamentals_v1::_Optional_base<range_bound<dht::ring_position>, true>&&) at /opt/scylladb/include/c++/7/experimental/optional:247 (inlined by) std::experimental::fundamentals_v1::optional<range_bound<dht::ring_position> >::optional(std::experimental::fundamentals_v1::optional<range_bound<dht::ring_position> >&&) at /opt/scylladb/include/c++/7/experimental/optional:493 (inlined by) wrapping_range<dht::ring_position>::wrapping_range(wrapping_range<dht::ring_position>&&) at /home/sylla/scylla/./range.hh:61 (inlined by) nonwrapping_range<dht::ring_position>::nonwrapping_range(nonwrapping_range<dht::ring_position>&&) at /home/sylla/scylla/./range.hh:430 (inlined by) void __gnu_cxx::new_allocator<nonwrapping_range<dht::ring_position> >::construct<nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position> >(nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/ext/new_allocator.h:136 (inlined by) void std::allocator_traits<std::allocator<nonwrapping_range<dht::ring_position> > >::construct<nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position> >(std::allocator<nonwrapping_range<dht::ring_position> >&, nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/alloc_traits.h:475 (inlined by) nonwrapping_range<dht::ring_position>& std::deque<nonwrapping_range<dht::ring_position>, std::allocator<nonwrapping_range<dht::ring_position> > >::emplace_back<nonwrapping_range<dht::ring_position> >(nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/deque.tcc:167 (inlined by) std::deque<nonwrapping_range<dht::ring_position>, std::allocator<nonwrapping_range<dht::ring_position> > >::push_back(nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/stl_deque.h:1558 (inlined by) dht::split_range_to_single_shard(dht::i_partitioner const&, schema const&, nonwrapping_range<dht::ring_position> const&, unsigned int) at /home/sylla/scylla/dht/i_partitioner.cc:454 dht::split_range_to_single_shard(schema const&, nonwrapping_range<dht::ring_position> const&, unsigned int) at /home/sylla/scylla/dht/i_partitioner.cc:464 create_sharding_metadata at /home/sylla/scylla/sstables/sstables.cc:2075 (inlined by) sstables::sstable::write_scylla_metadata(seastar::io_priority_class const&, unsigned int, sstables::sstable_enabled_features) at /home/sylla/scylla/sstables/sstables.cc:2435 sstables::sstable_writer_m::consume_end_of_stream() at /home/sylla/scylla/sstables/sstables.cc:3483 sstables::compaction::finish_new_sstable(std::experimental::fundamentals_v1::optional<sstables::sstable_writer>&, seastar::lw_shared_ptr<sstables::sstable>&) at /home/sylla/scylla/sstables/compaction.cc:338 (inlined by) sstables::regular_compaction::stop_sstable_writer() at /home/sylla/scylla/sstables/compaction.cc:579 (inlined by) sstables::regular_compaction::finish_sstable_writer() at /home/sylla/scylla/sstables/compaction.cc:585 sstables::compacting_sstable_writer::consume_end_of_stream() at /home/sylla/scylla/sstables/compaction.cc:494 (inlined by) auto compact_mutation_state<(emit_only_live_rows)0, (compact_for_sstables)1>::consume_end_of_stream<sstables::compacting_sstable_writer>(sstables::compacting_sstable_writer&) at /home/sylla/scylla/./mutation_compactor.hh:292 (inlined by) compact_mutation<(emit_only_live_rows)0, (compact_for_sstables)1, sstables::compacting_sstable_writer>::consume_end_of_stream() at /home/sylla/scylla/./mutation_compactor.hh:397 (inlined by) stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >::consume_end_of_stream() at /home/sylla/scylla/./mutation_reader.hh:366 (inlined by) auto flat_mutation_reader::impl::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)> >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)>, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /home/sylla/scylla/./flat_mutation_reader.hh:288 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)> >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)>, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /home/sylla/scylla/./flat_mutation_reader.hh:370 (inlined by) operator() at /home/sylla/scylla/sstables/compaction.cc:757 (inlined by) apply at /home/sylla/scylla/seastar/core/apply.hh:35 (inlined by) apply<sstables::compaction::run(std::unique_ptr<sstables::compaction>)::<lambda()> > at /home/sylla/scylla/seastar/core/apply.hh:43 (inlined by) apply<sstables::compaction::run(std::unique_ptr<sstables::compaction>)::<lambda()> > at /home/sylla/scylla/seastar/core/future.hh:1309 (inlined by) operator() at /home/sylla/scylla/./seastar/core/thread.hh:315 (inlined by) _M_invoke at /opt/scylladb/include/c++/7/bits/std_function.h:316 std::function<void ()>::operator()() const at /opt/scylladb/include/c++/7/bits/std_function.h:706 (inlined by) seastar::thread_context::main() at /home/sylla/scylla/seastar/core/thread.cc:313 The call chain is: sstable_writer_k_l::consume_end_of_stream and mc::writer::consume_end_of_stream -> sstable::write_scylla_metadata -> create_sharding_metadata -> dht::split_range_to_single_shard Since sstable writer assumes a thread context. We can futurize dht::split_range_to_single_shard. Fixes: #3846 Tests: dtest + build/dev/tests/partitioner_test	2019-03-05 17:21:27 +08:00
Benny Halevy	1021eb29c9	distributed_loader: fix old format counters exception table::load_sstable: fix missing arg in old format counters exception Properly catch and log the exception in load_new_sstables. Abort when the exception is caught to keep current behavior. Seen with migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test without enable_dangerous_direct_import_of_cassandra_counters. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190301091235.2914-1-bhalevy@scylladb.com>	2019-03-04 17:36:09 +01:00
Avi Kivity	026821fb59	Merge "Record large rows in the system.large_rows table" from Rafael " This fixes #3988. We already have a system.large_partitions, but only a warning for large rows. These patches close the gap by also recording large rows into a new system.large_rows. " * 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla: Add a testcase for large rows Populate system.large_rows. Create a system.large_rows table Extract a key_to_str helper Don't call record_large_rows if stopped Add a delete_large_rows_entries method to large_data_handler db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void Rename maybe_delete_large_partitions_entry Rename log_large_row to record_large_rows Rename maybe_log_large_row to maybe_record_large_rows	2019-03-04 18:31:10 +02:00
Avi Kivity	da0a25859b	Merge "Improvements to commitlog logs" from Paweł " This series contains minor improvements to commitlog log messages that have helped investigating #4231, but are not specific to that bug. " * tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla: commitlog: use consistent chunk offsets in logs commitlog: provide more information in logs commitlog: remove unnecessary comment	2019-03-04 14:52:46 +02:00
Paweł Dziepak	00b33de25c	commitlog: use consistent chunk offsets in logs Logs in commitlog writer use offset in the file of the chunk header to identify chunks. However, the replayer is using offset after the header for the same purpose. This causes unnecessary confusion suggesting that the replayer is reading at the wrong position. This patch changes the replayer so that it reports chunk header offsets.	2019-03-04 12:15:50 +00:00
Paweł Dziepak	813b00a1a6	commitlog: provide more information in logs This commits adds some more information to the logs. Motivated, by experiences with investigating #4231. * size of each write * position of each write * log message for final write	2019-03-04 12:15:50 +00:00
Paweł Dziepak	1a657e9c5f	commitlog: remove unnecessary comment	2019-03-04 12:15:50 +00:00
Avi Kivity	d95dec22d9	Merge "Fix commitlog chunks overwriting each other" from Paweł " This series fixes a problem in the commitlog cycle() function that confused in-memory and on-disk size of chunks it wrote to disk. The former was used to decide how much data needs to be actually written, and the latter was used to compute the offset of the next chunk. If two chunk writes happened concurrently one the one positioned earlier in the file could corrupt the header of the next one. Fixes #4231. Tests: unit(dev), dtest(commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup,test_commitlog_replay_with_alter_table) " * tag 'fix-commitlog-cycle/v1' of https://github.com/pdziepak/scylla: commitlog: write the correct buffer size utils/fragmented_temporary_buffer_view: add remove suffix	2019-03-04 14:14:32 +02:00
Tomasz Grabiec	58e7ad20eb	sstable/compaction: Use correct schema in the writing consumer Introduced in `2a437ab427`. regular_compaction::select_sstable_writer() creates the sstable writer when the first partition is consumed from the combined mutation fragment stream. It gets the schema directly from the table object. That may be a different schema than the one used by the readers if there was a concurrent schema alter duringthat small time window. As a result, the writing consumer attached to readers will interpret fragments using the wrong version of the schema. One effect of this is storing values of some columns under a different column. This patch replaces all column_family::schema() accesses with accesses to the _schema memeber which is obtained once per compaction and is the same schema which readers use. Fixes #4304. Tests: - manual tests with hard-coded schema change injection to reproduce the bug - build/dev/scylla boot - tests/sstable_mutation_test Message-Id: <1551698056-23386-1-git-send-email-tgrabiec@scylladb.com>	2019-03-04 13:27:19 +02:00
Paweł Dziepak	434023425d	commitlog: write the correct buffer size Commitlog files contain multiple chunks. Each chunk starts as a single (possibly, fragmented buffer). The size of that buffer in memory may be larger than the size in the file. cycle() was incorrectly using the in-memory size to write the whole buffer to the file. That sometimes caused data corruption, since a smaller on-file size was used to compute the offset of the next chunk and there could be multiple chunk writes happening at the same time. This patch solves the issue by ensuring that only the actual on-file size of the chunk is written.	2019-03-04 10:25:48 +00:00
Paweł Dziepak	ca8d1025c0	utils/fragmented_temporary_buffer_view: add remove suffix This patch adds fragmented_temporary_buffer_view::remove_suffix(). It is also necessary to adjust remove_prefix() since now the total size of all fragments may be larger than the size of the view if both those operations are performed.	2019-03-04 10:23:45 +00:00
Asias He	3861f538dc	tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc We are going to convert split_range_to_single_shard to return a future.	2019-03-04 09:41:09 +08:00
Avi Kivity	8f71e7ffd4	Merge "auth: Prevent disallowed roles from logging in" from Jesse " This series heavily refactors `auth_test` in anticipation of the last patch, which fixes a bug and which should be backported. Branches: branch-3.0, branch-2.3 " Fixes #4284 * 'jhk/check_can_login/v2' of https://github.com/hakuch/scylla: auth: Reject logins from disallowed roles tests: Restrict the scope of a variable tests: Simplify boolean assertions in `auth_test` tests: Abstract out repeated assertion checking tests: Do not use the `auth` namespace tests: Validate authentication correctly tests: Ensure test roles are created and dropped tests: Use `static` variables in `auth_test` tests: Remove non-useful test	2019-03-02 17:13:06 +02:00
Asias He	a949ccee82	repair: Reject combination of -dc and -hosts options 4 nodes in the cluster n1, n2 in dc1 n3, n4 in dc2 dc1 RF=2, dc2 RF=2. If we run nodetool repair -hosts 127.0.0.1,127.0.03 -dc "dc1,dc2" multi on n1. The -hosts option will be ignored and only the -dc option will be used to choose which hosts to repair. In this case, n1 to n4 will be repaired. If user wants to select specific hosts to repair with, there is no need to specify the -dc option. Use the -hosts option is enough. Reject the combination and not to surprise the user. In https://issues.apache.org/jira/browse/CASSANDRA-9876, the same logic is introduced as well. Refs #3836 Message-Id: <e95ac1099f98dd53bb9d6534316005ea3577e639.1551406529.git.asias@scylladb.com>	2019-03-02 16:42:29 +02:00
Juliana Oliveira	6322293263	dist/docker: add ssh server Scylla Manager communicates through SSH, so this patch adds SSH server to Scylla's docker image in order for it to be configurable by Scylla Manager. Message-Id: <20190301161428.GA12148@shenzou.localdomain>	2019-03-01 19:11:35 +02:00
Avi Kivity	41078de096	tools: toolchain: update image for gcc-8.3.1-2.fc29.x86_64 tests: unit (debug, dev, release)	2019-03-01 16:42:18 +02:00
Duarte Nunes	44966d0a66	Merge 'Fix view update generation optimizations' from Piotr " This series aims to fix inconsistencies in recent view update generation series (`435447998`). First of all, it checks view row marker liveness instead of that of a base row marker when deciding if optimizations can be applied or not. Secondly, tests based on creating mutations directly are removed. Instead: - dtest case which detected inconsistencies in previous series is ported to be a unit test - the above case is also expanded to cover views with regular base column in their key - additional test for TTL and timestamps is added and it's based on CQL Tests: unit (dev) dtest: materialized_views_test.TestMaterializedViews.test_no_base_column_in_view_pk_complex_timestamp_without_flush Fixes: #4271 " * 'fix_virtual_columns_liveness_checks_in_update_optimization_5' of https://github.com/psarna/scylla: tests: add view update optimization case for TTL database: add view_stats getter tests: port complex timestamp view test from dtest db,view: fix virtual columns liveness checks tests: remove update generating test case	2019-03-01 10:58:39 -03:00
Jesse Haber-Kucharsky	a139afc30c	auth: Reject logins from disallowed roles When the `LOGIN` option for a role is set to `false`, Scylla should not permit the role to log in. Fixes #4284 Tests: unit (debug)	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	320b4a7b99	tests: Restrict the scope of a variable	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	f8764a12e6	tests: Simplify boolean assertions in `auth_test`	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	879217ccaf	tests: Abstract out repeated assertion checking	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	3c8eeb0e86	tests: Do not use the `auth` namespace	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	afed9c7bee	tests: Validate authentication correctly There are additional validation steps that the server executes in addition to simply invoking the authenticator, so we adapt the tests to also perform that validation. We also eliminate lots of code duplication.	2019-02-28 15:01:14 -05:00
Jesse Haber-Kucharsky	baefde0f6c	tests: Ensure test roles are created and dropped Since the role manager and authenticator work in tandem, the test cases should use the wrapper for `auth::service` to create and drop users instead of just doing it through the authenticator.	2019-02-28 15:00:20 -05:00
Jesse Haber-Kucharsky	fd88d59ad9	tests: Use `static` variables in `auth_test` This way, we avoid copies and alleviate resource-management concerns.	2019-02-28 14:59:38 -05:00
Jesse Haber-Kucharsky	f274982522	tests: Remove non-useful test Password handling is verified in its own test suite, and this test not only makes a number of assumptions about implementation details, but also tries to verify a hashing scheme (bcrypt) which is not supported on most Linux distributions.	2019-02-28 14:58:27 -05:00
Avi Kivity	7c968f4a9e	build: move XXH_PRIVATE_API and SEASTAR_TESTING_MAIN non-mode-specific These defines are global, so they can be in the mode-agnostic cxxflags rather than the mode-specific cxxflags_{mode}. Message-Id: <20190228081247.20116-1-avi@scylladb.com>	2019-02-28 09:51:02 +00:00
Piotr Sarna	032f8e2893	tests: add view update optimization case for TTL This test case checks whether redundant updates are omitted and the essential ones are still generated.	2019-02-28 10:47:20 +01:00
Piotr Sarna	67e63d4dd7	database: add view_stats getter It will be used for testing purposes	2019-02-28 10:47:20 +01:00
Piotr Sarna	09b8d2e9d6	tests: port complex timestamp view test from dtest This test was useful in discovering corner cases for TTLs of virtual columns, so it's ported to unit test suite from dtest. The test is also extended with a mirrored case for base regular column that is included in view pk.	2019-02-28 10:47:20 +01:00
Piotr Sarna	5f85a7a821	db,view: fix virtual columns liveness checks When looking for optimization paths, columns selected in a view are checked against multiple conditions - unfortunately virtual columns were erroneously skipped from that check, which resulted in ignoring their TTLs. That can lead to overoptimizing and not including vital liveness info into view rows, which can then result in row disappearing too early.	2019-02-28 10:47:19 +01:00
Piotr Sarna	b963543762	tests: remove update generating test case This test case should have been based on CQL instead of creating artificial update scenarios. It also contains invalid cases regarding base and view row marker, so it's removed here and replaced with CQL-based test in this same series.	2019-02-28 10:40:47 +01:00
Avi Kivity	20eadb2c39	relocatable-package: package and redirect gnutls configuration gnutls requires a configuration file, and the configuration file must match the one used by the library. Since we ship our own version of the library with the relocatable package, we must also ship the configuration file. Luckily, it is possible to override the location of the configuration file via an environment variable, so all we need to do is to copy the file to the archive and provide the environment variable in the thunk that adjusts the library path. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190227110529.14146-1-avi@scylladb.com>	2019-02-28 10:57:32 +02:00
Avi Kivity	4022a919f6	test: allocate at least one logical core per unit test Currently, we only allocate memory for concurrent unit test runs. This can cause CPU overcommit when running test.py on machines with a log of memory but few cores. This overcommit can cause timeouts in tests that are time-sensitive (bad practice, but can happen) and makes the desktop sluggish. Improve by allocating at least one logical core per running test. Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190227132516.22147-1-avi@scylladb.com>	2019-02-28 10:34:33 +02:00
Dan Yasny	6dbb48a12a	node_health_check: collect scylla.d contents with node_health_check We are missing data for CPU conf files and potentially other information when collecting node data. Fixes #4094 Message-Id: <20190225204727.20805-5-dyasny@scylladb.com>	2019-02-28 10:23:19 +02:00
Dan Yasny	9055e7a49e	node_health_check: Add redhat-release to health check if present Collect /etc/redhat-release as well as os-release from relevant hosts. The problem with os-release is that it doesn't contain the minor version of the EL OS family. Since this is only present in Red Hat distributions and derivatives, it will not be collected in Debian derivatives. Another approach is to use lsb_release -a but it will not provide anything more useful than os-release on Debian and lsb needs to be installed on EL derivatives first. Fixes #4093 Message-Id: <20190225204727.20805-4-dyasny@scylladb.com>	2019-02-28 10:23:12 +02:00
Dan Yasny	2f26390f52	node_health_check: Use clear hostname instead of -i for filenames and report names Hostname -i produces a garbled output on new systems with ipv6 enabled, better to use the clean hostname instead, for the file names. Message-Id: <20190225204727.20805-3-dyasny@scylladb.com>	2019-02-28 10:23:06 +02:00
Dan Yasny	f483c594ee	node_health_check: Detect the address for the CQL (port 9042) listener and use it The script relies on hostname -i for host address, which can be wrong in some systems. This patch checks for where the defined CQL_PORT is listening, and uses the correct IP address instead. Message-Id: <20190225204727.20805-2-dyasny@scylladb.com>	2019-02-28 10:22:58 +02:00
Avi Kivity	632c7c303a	Merge "auth: Restructure SASL code" from Jesse " This series restructures the SASL code that was previously internal to the `password_authenticator` so that it can be used in other contexts. " * 'jhk/restructure_sasl/v1' of https://github.com/hakuch/scylla: auth: Rename SASL challenge class for "PLAIN" auth: Make a ctor `explicit` auth: Move `sasl_challenge` to its own file auth: Decouple SASL code from its parent class	2019-02-28 10:19:41 +02:00
Jesse Haber-Kucharsky	f2d92f81e8	auth: Report a more specific error with bad creds Without this change, the resulting error message for an invalid password is "authentication failed". With this change, we report "Username and/or password are incorrect". Fixes #4285 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <32d00be8af5075ee10d2c14f85b76843a9adac10.1551306914.git.jhaberku@scylladb.com>	2019-02-28 09:53:57 +02:00
Jesse Haber-Kucharsky	3d883e8cf2	auth: Rename SASL challenge class for "PLAIN"	2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky	0c955b7992	auth: Make a ctor `explicit`	2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky	dc41f1098b	auth: Move `sasl_challenge` to its own file This will allow for other authenticators other than `password_authenticator` from making use of the PLAIN SASL authentication code.	2019-02-27 18:36:52 -05:00
Jesse Haber-Kucharsky	2d59fa6be9	auth: Decouple SASL code from its parent class This way, we can (in the future) use this implementation of the SASL "PLAIN" mechanism in other contexts other than `password_authenticator`.	2019-02-27 18:11:31 -05:00
Avi Kivity	88322086cb	Merge "Add fuzzer-type unit test for range scans" from Botond " This series adds a fuzzer-type unit test for range scans, which generates a semi-random dataset and executes semi-random range scans against it, validating the result. This test aims to cover a wide range of corner cases with the help of randomness. Data and queries against it are generated in such a way that various corner cases and their combinations are likely to be covered. The infrastructure under range-scans have gone under massive changes in the last year, growing in complexity and scope. The correctness of range scans is critical for the correct functioning of any Scylla cluster, and while the current unit tests served well in detecting any major problems (mostly while developing), they are too simplistic and can only be relied on to check the correctness of the basic functionality. This test aims to extend coverage drastically, testing cases that the author of the range-scan code or that of the existing unit tests didn't even think exists, by relying on some randomness. Fixes: #3954 (deprecates really) " * 'more-extensive-range-scan-unit-tests/v2' of https://github.com/denesb/scylla: tests/multishard_mutation_query_test: add fuzzy test tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan() tests/test_table: add advanced `create_test_table()` overload tests/test_table: make `create_test_table()` customizable query: add trim_clustering_row_ranges_to() tests/test_table: add keyspace and table name params tests/test_table: s/create_test_cf/create_test_table/ tests: move create_test_cf() to tests/test_table.{hh,cc} tests/multishard_mutation_query_test: drop many partition test tests/multishard_mutation_query_test: drop range tombstone test	2019-02-27 17:26:53 +02:00
Avi Kivity	cc2f9841c4	Merge "Simplify -g and -gz checks in configure.py" from Rafael * 'simplify-g-gz-check-v2' of https://github.com/espindola/scylla: Assume -gz is always available Assume -g is always available	2019-02-27 17:19:37 +02:00
Duarte Nunes	871790a340	Merge 'Hide virtual columns write time and ttl from the user' from Piotr " This miniseries hides virtual columns's writetime and ttl from the user. Tests: unit (dev) Fixes #4288 " * 'hide_virtual_columns_writetime_and_ttl_2' of https://github.com/psarna/scylla: tests: add test for hiding virtual columns from WRITETIME cql3: hide virtual columns from WRITETIME() and TTL() schema: add column_definition::is_hidden_from_cql	2019-02-27 14:36:08 +00:00
Calle Wilund	93602ecee3	compaction_manager: break out rewrite_sstables from cleanup Allowing additional behaviour control. Such as which tables, and whether to actually lock ourselves out as a "cleanup".	2019-02-27 14:25:31 +00:00
Calle Wilund	7fb6bbe68c	table: parameterize cleanup_sstables To allow using the logic for one-sstable-at-a-time compaction (i.e. rewrite) of sstables without the "normal" cleanup logic and partition selection.	2019-02-27 14:25:31 +00:00
Piotr Sarna	09eb0429ce	tests: add test for hiding virtual columns from WRITETIME Visibility checks for virtual columns' WRITETIME and TTL are added.	2019-02-27 15:08:16 +01:00
Piotr Sarna	af39787bf0	cql3: hide virtual columns from WRITETIME() and TTL() Virtual columns should not be visible to the user, so they are now hidden not only from directly selecting them, but also via WRITETIME() and TTL() keywords. Fixes #4288	2019-02-27 15:08:15 +01:00
Piotr Sarna	b0ab4c28cf	schema: add column_definition::is_hidden_from_cql Right now the only columns hidden from CQL are view virtual columns, but in case of expanding this set, a helper function is provided.	2019-02-27 15:07:54 +01:00
Avi Kivity	d189e12438	tests: database_test: fix misaligned dma write test_distributed_loader_with_pending_delete issues a dma write, but violates the unwritten contract to temporary_buffer::aligned(), which requires that size be a multiple of alignment. As a result the test fails spuriously. Instead of playing with the alignment, rewrite that snippet to use the easier-to-use make_file_output_stream(). Introduced in `1ba88b709f`. Branches: master. Message-Id: <20190226181850.3074-1-avi@scylladb.com>	2019-02-27 09:00:31 +01:00
Rafael Ávila de Espíndola	d9e0b47d53	Add a testcase for large rows Tests: unit (release) Fixes #3988. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:56:50 -08:00
Rafael Ávila de Espíndola	25f81cf3e3	Populate system.large_rows. It now records large rows when they are first written to an sstable and removes them when the sstable is deleted. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:56:42 -08:00
Rafael Ávila de Espíndola	66d8a0cf93	Create a system.large_rows table This is analogous to the system.large_partitions table, but holds individual rows, so it also needs the clustering key of the large rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	da4c0da78a	Extract a key_to_str helper It will be used in more places in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	b7fd03d0fd	Don't call record_large_rows if stopped The implementations large_data_handler should only be called if large_data_handler hasn't been stopped yet. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	0c401f56f8	Add a delete_large_rows_entries method to large_data_handler This will be responsible for removing large rows from system.large_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	81a21ea425	db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void These functions will record into tables in a followup patch, so they will need to return a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	d4c001cba8	Rename maybe_delete_large_partitions_entry It will also delete large rows, so rename it to maybe_delete_large_data_entries. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	e9a13aff90	Rename log_large_row to record_large_rows It will also record into a table in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	6fb7066755	Rename maybe_log_large_row to maybe_record_large_rows It will also record into a table in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	a586ac209a	Assume -gz is always available It is available since clang 5 and gcc 5. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola	054078b6af	Assume -g is always available From the log it looks like these checks were added in 2014 because of a broken clang. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola	87106ea5e2	Improve the build mode documentation With this patch HACKING suggest using just ./configure.py and passing the mode to ninja. It also expands on the characteristics of each mode and mentions the dev mode. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190208020444.19145-1-espindola@scylladb.com>	2019-02-26 19:54:50 +02:00
Nadav Har'El	da54d0fc7d	Materialized views: fix accidental zeroing of flow-control delay The materialized-views flow control carefully calculates an amount of microseconds to delay a client to slow it down to the desired rate - but then a typo (std::min instead of std::max) causes this delay to be zeroed, which in effect completely nullifies the flow control algorithm. Before this fix, experiments suggested that view flow control was not having any effect and view backlog not bounded at all. After this fix, we can see the flow control having its desired effect, and the view backlog converging. Fixes #4143. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190226161452.498-1-nyh@scylladb.com>	2019-02-26 18:22:18 +02:00
Tomasz Grabiec	1a63a313c8	Merge "repair: Rename names to be consistent with rpc verb " from Asias Some of the function names are not updated after we change the rpc verb names. Rename them to make them consistent with the rpc verb names. * seastar-dev.git asias/row_level_repair_rename_consistent_with_rpc_verb/v1: repair: Rename request_sync_boundary to get_sync_boundary repair: Rename request_full_row_hashes to get_full_row_hashes repair: Rename request_combined_row_hash to get_combined_row_hash repair: Rename request_row_diff to get_row_diff repair: Rename send_row_diff to put_row_diff repair: Update function name in docs/row_level_repair.md	2019-02-26 13:01:36 +01:00
Tomasz Grabiec	b06aac4fdb	Merge "Fix temporary spurious schema version mismatch when nodes are restarted" from Asias Fixes: #4148 Fixes: #4258 Tests: resharding_test.py:reshardingtest_nodes4_with_sizetieredcompactionstrategy.resharding_by_smp_increase_test * seastar-dev.git asias/fix_schema_mismatch_when_nodes_restarts/v1: database: Add update_schema_version and announce_schema_version storage_service: Add application_state::SCHEMA when gossip starts	2019-02-26 12:55:52 +01:00
Avi Kivity	5f94bc902a	transport: add option to disable shard-aware drivers The shard-aware drivers can cause a huge amount of connections to be created when there are tens of thousands of clients. While normally the shard-aware drivers are beneficial, in those cases they can consume too much memory. Provide an option to disable shard awareness from the server (it is likely to be easier to do this on the server than to reprovision those thousands of clients). Tests: manual test with wireshark. Message-Id: <20190223173331.24424-1-avi@scylladb.com>	2019-02-26 12:44:11 +01:00
Asias He	459836079c	storage_service: Add application_state::SCHEMA when gossip starts In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw: 4 nodes in the tests n1, n2, n3, n4 are started n1 is stopped n1 is changed to use different shard config n1 is restarted ( 2019-01-27 04:56:00,377 ) The backtrace happened on n2 right fater n1 restarts: 0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled 1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled 2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled 3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed) 4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status = 5 Segmentation fault on shard 0. 6 Backtrace: 7 0x00000000041c0782 8 0x00000000040d9a8c 9 0x00000000040d9d35 10 0x00000000040d9d83 11 /lib64/libpthread.so.0+0x00000000000121af 12 0x0000000001a8ac0e 13 0x00000000040ba39e 14 0x00000000040ba561 15 0x000000000418c247 16 0x0000000004265437 17 0x000000000054766e 18 /lib64/libc.so.6+0x0000000000020f29 19 0x00000000005b17d9 The theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time n1 has SCHEMA application_state, when n1 restarts, n2 gets new application state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty application_state for SCHEMA. We dereference the nullptr application_state and abort. In commit `da80f27f44`, we fixed the problem by checking the pointer before dereference. To prevent this to happen in the first place, we'd better to add application_state::SCHEMA when gossip starts. This way, peer nodes always see the application_state::SCHEMA when a node restarts. Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test Fixes #4148 Fixes #4258	2019-02-26 19:30:22 +08:00
Asias He	75edbe939d	database: Add update_schema_version and announce_schema_version Split the update_schema_version_and_announce() into update_schema_version() and announce_schema_version(). This is going to be used in storage_service::prepare_to_join() where we want to first update the schema version, start gossip, announce the schema version.	2019-02-26 19:10:02 +08:00
Amnon Heiman	b8a838c66c	node_exporter_install: Add a force install option It is sometimes usefull for force reinstallation of the node_exporter, for example during upgrade or if something is wrong with the current installation. This patch adds a --force command line option. If the --force is given to the node_expoerter_install, it will reinstall node_exporter to the latest version, regardless if it was already installed. The symbolic link in /usr/bin/node_exporter will be set to the installed version, so if there are other installed version, they will remain. Examples: $ sudo ./dist/common/scripts/node_exporter_install node_exporter already installed, you can use `--force` to force reinstallation $ sudo ./dist/common/scripts/node_exporter_install --force node_exporter already installed, reinstalling Fixes #4201 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190225151120.21919-1-amnon@scylladb.com>	2019-02-25 20:16:58 +02:00
Pekka Enberg	ca288189a9	dist/ami: Support different products for the AMI Let's add a PRODUCT variable, similar to build_rpm.sh, for example, so that we can override package names for enterprise AMIs. Message-Id: <20190225063319.19516-1-penberg@scylladb.com>	2019-02-25 11:17:44 +02:00
Asias He	3e615c3a15	repair: Update function name in docs/row_level_repair.md The repair rpc request_* functions are renamed to get_*. The send_row_diff is renamed to put_row_diff.	2019-02-25 15:13:39 +08:00
Asias He	62104902db	repair: Rename send_row_diff to put_row_diff Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	6e4ea1b3c4	repair: Rename request_row_diff to get_row_diff Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	5b29fb30ac	repair: Rename request_combined_row_hash to get_combined_row_hash Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	6f6c4878d5	repair: Rename request_full_row_hashes to get_full_row_hashes Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	02ddfa393e	repair: Rename request_sync_boundary to get_sync_boundary Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Avi Kivity	a0b0db7915	Merge "Fix regression in perf_fast_forward results" from Paweł " After `adcb3ec20c` ("row_cache: read is not single-partition if inter-partition forwarding is enabled") we have noticed a regression in the results of some perf_fast_forward tests. This was caused by those tests not disabling partition-level fast-forwarding even though it was not needed and the commit in question fixed an incorrect optimisation in such cases. However, after solving that issue it has also become apparent that mutation_reader_merger performs worse when the fast-forwarding is disabled. This was attributed to logic responsible for dropping readers as soon as they have reached the end of stream (which cannot be done if fast-forwarding is enabled). This problem was mitigated with avoiding a scan of the list and removing readers in small batches. Fixes #4246. Fixes #4254. Tests: unit(dev) " * tag 'perf_fast_forward-fix-regression/v1' of https://github.com/pdziepak/scylla: mutation_reader_merger: drop unneded readers in small batches mutation_reader_merger: track readers by iterators and not pointers tests/perf_fast_forward: disable partition-level fast-forwarding if not needed	2019-02-24 19:24:00 +02:00
Avi Kivity	e3c53ff3ff	Update seastar submodule * seastar 2313dec...ab54765 (10): > Fix C++-17-only uses of static_assert() with a single parameter. > README.md: fix out-of-date explanation of C++ dialect > net: fix tcp load balancer accounting leak while moving socket to other shard > Revert "deleter: prevent early memory free caused by deleter append." > deleter: prevent early memory free caused by deleter append. > Solve seastar.unit.thread failure in debug mode > Fix iovec-based read_dma: use make_readv_iocb instead of make_read_iocb > build: Fix the required version of `fmt` > app_template: fix use after move in app constructor > build: Rename CMake variable for private flags Fixes #4269.	2019-02-24 16:06:23 +02:00
Avi Kivity	a3a7bea12f	Merge "Clean up preprocessor definitions" from Jesse * 'jhk/define_debug/v1' of https://github.com/hakuch/scylla: build: Remove the `DEBUG_SHARED_PTR` pp variable build: Prefer the Seastar version of a pp variable	2019-02-23 14:04:08 +02:00
Jesse Haber-Kucharsky	f9297895c1	auth: Change the log level for async. retries The log message is benign, but it has caused some users of Scylla to think that an error has occurred. Fixes #3850 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <ba49c38266c0e77c3ed23cfca3c1a082b3060f17.1550777586.git.jhaberku@scylladb.com>	2019-02-23 14:03:16 +02:00
Tomasz Grabiec	3f698701c2	gdb: Drop incorrect throw of StopIteration It is converted into a RuntimeError by python3: https://docs.python.org/3/library/exceptions.html#StopIteration We should just return. Message-Id: <20190221144321.18093-1-tgrabiec@scylladb.com>	2019-02-23 14:02:47 +02:00
Nadav Har'El	0eddf19432	main: add INFO log messages at start, initialization end, and end. Scylla currently prints a welcome message when it starts, with the Scylla version, but this is not printed to the regular log so in some cases (e.g., Jenkins runs) we do not see it in the log. So let's add a regular INFO-level log message with the same information. Also, Scylla currently doesn't print any specific log message when it normally completes its shutdown. In some cases, users may end up wondering whether Scylla hung in the middle of the shutdown, or in fact exited normally. Refs #4238. So in this patch we add a "shutdown complete" message as the very last message in a successfull shutdown. We print Scylla's version also in the shutdown message, which may be useful to see in the logs when shutting down one version of Scylla and starting a different version. Finally, we also add a log message when initialization is complete, which may also be useful to understand whether Scylla hung during initialization. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217140659.19512-1-nyh@scylladb.com>	2019-02-22 16:52:31 +01:00
Tomasz Grabiec	b90cb91468	gdb: Introduce 'scylla cache' Prints contents of the row cache for each table on current shard. Message-Id: <20190222144420.19677-1-tgrabiec@scylladb.com>	2019-02-22 14:58:58 +00:00
Paweł Dziepak	b524f96a74	mutation_reader_merger: drop unneded readers in small batches It was observed that destroying readers as soon as they are not needed negatively affects performance of relatively small reads. We don't want to keep them alive for too long either, since they may own a lot of memory, but deferring the destruction slightly and removing them in batches of 4 seems to solve the problem for the small reads.	2019-02-22 14:43:38 +00:00
Paweł Dziepak	435e24f509	mutation_reader_merger: track readers by iterators and not pointers mutation_reader_merger uses a std::list of mutation_reader to keep them alive while the rest of the logic operates on non-owning pointers. This means that when it is a time to drop some of the readers that are no longer needed, the merger needs to scan the list looking for them. That's not ideal. The solution is to make the logic use iterators to elements in that list, which allows for O(1) removal of an unneeded reader. Iterators to list are just pointers to the node and are not invalidated by unrelated additions and removals.	2019-02-22 14:33:10 +00:00
Paweł Dziepak	5d5777f85e	tests/perf_fast_forward: disable partition-level fast-forwarding if not needed Several of the test cases in perf_fast_forward do not need partition-level fast-forwarding. However, since the defaults are used to construct most of the readers the fast-forwarding is enabled regardless. This showed an apparent regression in the perf_fast_forward results after `adcb3ec20c` ("row_cache: read is not single-partition if inter-partition forwarding is enabled") which disabled an optimisation that was invalid when partition-level fast-forwarind was requested. This patch ensures that all single-partition reads that do not need partition-level fast-forwarding keep it disabled.	2019-02-22 14:28:02 +00:00
Avi Kivity	fdefee696e	Merge "sstables: mc: writer: Avoid large allocations for keeping promoted index entries" from Tomasz " Currently we keep the entries in a circular_buffer, which uses a contiguous storage. For large partitions with many promoted index entries this can cause OOM and sstable compaction failure. A similar problem exists for the offset vector built in write_promoted_index(). This change solves the problem by serializing promoted index entries and the offset vector on the fly directly into a bytes_ostream, which uses fragmented storage. The serialization of the first entry is deferred, so that serialization is avoided if there will be less than 2 entries. Promoted index is not added for such partitions. There still remains a problem that large-enough promoted index can cause OOM. Refs #4217 Tests: - unit (release) - scylla-bench write Branches: 3.0 " * tag 'fix-large-alloc-for-promoted-index-v3' of github.com:tgrabiec/scylla: sstables: mc: writer: Avoid large allocations for maintaining promoted index sstables: mc: writer: Avoid double-serialization of the promoted index	2019-02-22 15:44:51 +02:00
Avi Kivity	177159da75	Merge "delete_atomically recovery" from Benny " The delete_atomically function is required to delete a set of sstables atomically. I.e. Either delete all or none of them. Deleting only some sstables in the set might result in data resurrection in case sstable A holding tombstone that cover mutation in sstable B, is deleted, while sstable B remains. This patchset introduces a log file holding a list of SSTable TOC files to delete for recovering a partial delete_atomically operation. A new subdirectory is create in the sstables dir called `pending_delete` holding in-flight logs. The logs are created with a temporary name (using a .tmp suffix) and renamed to the final .log name once ready. This indicates the commit point for the operation. When populating the column family, all files in the pending_delete sub-directory are examined. Temporary log files are just removed, and committed log files are read, replayed, and deleted. Fixes #4082 Tests: unit (dev), database_test (debug) " * 'projects/delete_atomically_recovery/v5' of https://github.com/bhalevy/scylla: tests: database_test: add test_distributed_loader_with_pending_delete distributed_loader: replay and cleanup pending_delete log files distributed_loader: populated_column_family: separate temp sst dirs cleanup phase docs: add sstables-directory-structure.md sstables: commit sstables to delete_atomically into a pending_delete log file sstables: delete_atomically: delete sstables in a thread sstables: component_basename: reuse with sstring component sstables: introduce component_basename database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions sstables: add delete_sstable_and_maybe_large_data_entries sstables: call remove_by_toc_name in dtor if marked_for_deletion	2019-02-22 15:37:17 +02:00
Benny Halevy	1ba88b709f	tests: database_test: add test_distributed_loader_with_pending_delete Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:08:22 +02:00
Benny Halevy	043673b236	distributed_loader: replay and cleanup pending_delete log files Scan the table's pending_delete sub-directory if it exists. Remove any temporary pending_delete log files to roll back the respective delete_atomically operation. Replay completed pending_delete log files to roll forward the respective delete_atomically operation, and finally delete the log files. Cleanup of temporary sstable directories and pending_delete sstables are done in a preliminary scan phase when populating the column family so that we won't attempt to load the to-be-deleted sstables. Fixes #4082 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:08:22 +02:00
Benny Halevy	ee3ad75492	distributed_loader: populated_column_family: separate temp sst dirs cleanup phase In preparation for replaying pending_delete log files, we would like to first remove any temporary sst dirs and later handle pending_delete log files, and only then populate the column family. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:08:22 +02:00
Benny Halevy	f35e4cbac7	docs: add sstables-directory-structure.md Refs #4184 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:08:22 +02:00
Benny Halevy	024d0a6d49	sstables: commit sstables to delete_atomically into a pending_delete log file To facilitate recovery of a delete_atomically operation that crashed mid way, add a replayable log file holding the committed sstables to delete. It will be used by populate_column_family to replay the atomic deletion. 1. Write the toc names of sstables to be deleted into a temporary file. 2. Once flushed and closed, rename the temp log file into the final name and flush the pending_delete directory. 3. delete the sstables. 4. Remove the pending_delete log file and flush the pending_delete directory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:05:37 +02:00
Benny Halevy	70fda0eda0	sstables: delete_atomically: delete sstables in a thread In prepaton for implementing a pending_delete log file. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:05:37 +02:00
Benny Halevy	9ac04850a0	sstables: component_basename: reuse with sstring component Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:05:10 +02:00
Benny Halevy	a2a9750074	sstables: introduce component_basename component_basename returns just the basename for the component filename without the leading sstdir path. To be used for delete_atomically's pending_delete log file. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Benny Halevy	13ffda5c31	database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions 1. We would like to be able to call maybe_delete_large_partitions_entry from the sstable destructor path in the future so the sstable might go away while the large data entries are being deleted. 2. We would like the caller to handle any exception on this path, especially in the prepatation part, before calling delete_large_partitions_entry(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Benny Halevy	ae29db8db6	sstables: add delete_sstable_and_maybe_large_data_entries To be called by delete_atomically, rather that passing a vector to delete_sstables. This way, no need to build `sstables_to_delete_atomically` vector To be replaced in the future with a sstable method once we provide the large_data_handler upon construction. Handle exceptions from remove_by_toc_name or maybe_delete_large_partitions_entry by merely logging an error. There is nothing else we can do at this point. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Benny Halevy	387f14a874	sstables: call remove_by_toc_name in dtor if marked_for_deletion No need to call delete_sstables which works on a list of sstable (by toc name). Also, add FIXME comment about not calling large_data_handler.maybe_delete_large_partitions_entry on this path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Avi Kivity	34b254381f	sstables: checksummed_file_writer: fix dma alignment checksummed_file_writer does not override allocate_buffer(), so it inherits data_source_impl's default allocate_buffer, which does not care about alignment. The buffer is then passed to the real file_data_sink_impl, and thence to the file itself, which cannot complete the write since it is not properly aligned. This doesn't fail in release mode, since the Seastar allocator will supply a properly aligned buffer even if not asked to do so. The ASAN allocator usually does supply an aligned buffer, but not always, which causes the test to fail. Fix by forwarding the allocate_buffer() function to the underlying data_source. Fixes #4262. Branches: branch-3.0 Message-Id: <20190221184115.6695-1-avi@scylladb.com>	2019-02-21 21:26:56 +01:00
Jesse Haber-Kucharsky	b7b50392ed	build: Remove the `DEBUG_SHARED_PTR` pp variable This definition is exported by Seastar as `SEASTAR_DEBUG_SHARED_PTR` and no code in Scylla uses this definition either way.	2019-02-21 10:45:09 -05:00
Jesse Haber-Kucharsky	f4883a1aea	build: Prefer the Seastar version of a pp variable Seastar defines `SEASTAR_DEFAULT_ALLOCATOR`, and everywhere else in Scylla we use this variable too.	2019-02-21 10:41:42 -05:00
Piotr Sarna	c743617236	cql3: unify max value for row limit and per-partition limit Limits are stored as uint32_t everywhere, but in some places int32_t was used, which created inconsistencies when comparing the value to std::numeric_limits<Type>::max(). In order to solve inconsistencies, the types are unified to uint32_t, and instead of explicitly calling numeric limit max, an already existing constant value query::max_rows is utilized. Fixes #4253 Message-Id: <4234712ff61a0391821acaba63455a34844e489b.1550683120.git.sarna@scylladb.com>	2019-02-21 13:56:02 +02:00
Tomasz Grabiec	ecff716f40	query-result-set: Give more context on failure We've seen schema application failing with marshal_exception here. That's not enough information to figure out what is the problem. Knowing which table and column is affected would make diagnosis much easier in certain cases. This patch wraps errors in query::deserialization_error with more information. Example output: query::deserialization_error (failed on column system_schema.tables#bloom_filter_fp_chance \ (version: c179c1d7-9503-3f66-a5b3-70e72af3392a, id: 0, index: 0, type: org.apache.cassandra.db.marshal.DoubleType):\ seastar::internal::backtraced<marshal_exception> (marshaling error: read_simple - not enough bytes (expected 8, got 3) Message-Id: <20190221113219.13018-1-tgrabiec@scylladb.com>	2019-02-21 11:35:27 +00:00
Nadav Har'El	f55bdea364	compaction manager: avoid spurious "asked to stop" message at the end of the log This patch removes the log message about "compaction_manager - Asked to stop" at the very end of Scylla runs. This log message is confusing because it only has the "asked to stop" part, without finally a "stopped", and may lead a user to incorrectly fear that the shutdown hung - when it in fact finished just fine. The database object holds a compaction_manager and stop()s it when the database is stop()ed - and that is the very last thing our shutdown does. However, much earlier, as the first shutdown operation (i.e., the last at_exit() in main.cc), we already stop() the compaction manager. The second stop() call does nothing, but unfortunately prints the log message just before checking if it has anything to stop. So this patch just moves the log message to after the check. Fixes #4238. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217142657.19963-1-nyh@scylladb.com>	2019-02-21 12:32:47 +01:00
Rafael Ávila de Espíndola	5a7bff36ca	Simplify sstable::filename No functionality change, but avoids a std::unordered_map. Tests: unit (dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190221014630.15476-1-espindola@scylladb.com>	2019-02-21 12:40:01 +02:00
Avi Kivity	5520fc37ba	Merge " Fix INSERT JSON with null values" from Piotr " Fixes #4256 This miniseries fixes a problem with inserting NULL values through INSERT JSON interface. Tests: unit (dev) " * 'fix_insert_json_with_null' of https://github.com/psarna/scylla: tests: add test for INSERT JSON with null values cql3: add missing value erasing to json parser	2019-02-21 12:36:09 +02:00
Piotr Sarna	4d211690f9	tests: add test for INSERT JSON with null values	2019-02-21 11:25:14 +01:00
Piotr Sarna	6618191e49	cql3: add missing value erasing to json parser When inserting a null value through INSERT JSON, the column was erroneously not removed from the 'not used' list of columns. Fixes #4256	2019-02-21 11:23:44 +01:00
Tomasz Grabiec	8687666169	schema_tables: Add trace-level logging of schema mutations Can be useful in diagnosing problems with application of schema mutations. do_merge_schema() is called on every change of schema of the local node. create_table_from_mutations() is called on schema merge when a table was altered or created using mutations read from local schema tables after applying the change, or when loading schema on boot. Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>	2019-02-21 12:16:38 +02:00
Tomasz Grabiec	f65d1e649d	schema_mutations: Make printable Message-Id: <20190221093929.8929-1-tgrabiec@scylladb.com>	2019-02-21 12:16:32 +02:00
Avi Kivity	9adfd11374	Merge "Avoid including cryptopp headers" from Rafael " cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. This patch series introduces a single .cc file that has to include cryptopp headers. " * 'avoid-cryptopp-v3' of https://github.com/espindola/scylla: Avoid including cryptopp headers Delete dead code	2019-02-21 10:31:20 +02:00
Rafael Ávila de Espíndola	fd5ea2df5a	Avoid including cryptopp headers cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. The issue has been reported as https://github.com/weidai11/cryptopp/issues/793 To work around it, this patch uses a pimpl to have a single .cc file that has to include cryptopp headers. While at it, it also reduces the differences and code duplication between the md5 and sha1 hashers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-20 08:03:46 -08:00
Rafael Ávila de Espíndola	a309f952d2	Delete dead code This code would have be to refactored by the next patch. Since it is commented out, just delete it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-20 08:03:46 -08:00
Duarte Nunes	4354479985	Merge 'Minimize generated view updates for unselected column updates' from Piotr " This series addresses the issue of redundant view updates, generated for columns that were not selected for given materialized view. Cases covered (quote:) * If a base row has a live row marker, then we can avoid generating view updates if only unselected columns change; * If a base row has no live row marker, then we can avoid generating view updates if unselected columns are updated, unless they are newly created, deleted, or they have a TTL. Additionally, this series includes caching selected columns and is_index information to avoid unnecessary CPU cycles spent on recomputing these two. Fixes #3819 " * 'send_less_view_updates_if_not_necessary_4' of https://github.com/psarna/scylla: tests: add cases for view update generation optimizations view: minimize generated view updates for unselected columns view: cache is_index for view pointer index: make non-pointer overload of is_index function index: avoid copying when checking for is_index	2019-02-20 13:24:44 +00:00
Piotr Sarna	563456e3ac	tests: add cases for view update generation optimizations Test cases that cover avoiding generating view updates when not necessary (e.g. when a column not selected by the view is modified) are added.	2019-02-20 14:05:29 +01:00
Piotr Sarna	bd52e05ae2	view: minimize generated view updates for unselected columns In some cases generating view updates for columns that were not selected in CREATE VIEW statement is redundant - it is the case when the update will not influence row liveness in anyway. Currently, these cases are optimized out: - row marker is live and only unselected columns were updated; - row marked is not live and only unselected columns were updated, and in the process nothing was created or deleted and there was no TTL involved;	2019-02-20 14:05:27 +01:00
Piotr Sarna	dbe8491655	view: cache is_index for view pointer It's detrimental to keep querying index manager whether a view is backing a secondary index every time, so this value is cached at construct time. At the same time, this value is not simply passed to view_info when being created in secondary index manager, in order to decouple materialized view logic from secondary indexes as much as possible (the sole existence of is_index() is bad enough).	2019-02-20 12:52:32 +01:00
Piotr Sarna	cb20fc2e4f	index: make non-pointer overload of is_index function Previous interface enforced passing a shared pointer, which might result in calling unneeded shared_from_this().	2019-02-20 12:52:32 +01:00
Piotr Sarna	94db098d39	index: avoid copying when checking for is_index Previously is_index implementation used list_indexes() helper function, which copies data.	2019-02-20 12:52:32 +01:00
Tomasz Grabiec	a8c74bc7ab	gdb: Print LSA/Cache/Memtable memory usage from "scylla memory" Example output: LSA: allocated: 181010432 used: 177209344 free: 3801088 Cache: total: 97255424 used: 60700600 free: 36554824 Memtables: total: 83755008 Regular: real dirty: 79429632 virt dirty: 35168426 System: real dirty: 524288 virt dirty: 466764 Streaming: real dirty: 0 virt dirty: 0 Message-Id: <1550598424-23428-1-git-send-email-tgrabiec@scylladb.com>	2019-02-20 12:53:53 +02:00
Tomasz Grabiec	dafe22dd83	lsa: Fix spurios abort with --enable-abort-on-lsa-bad-alloc allocate_segment() can fail even though we're not out of memory, when it's invoked inside an allocating section with the cache region locked. That section may later succeed after retried after memory reclamation. We should ignore bad_alloc thrown inside allocating section body and fail only when the whole section fails. Fixes #2924 Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>	2019-02-20 12:53:49 +02:00
Avi Kivity	84465c23c4	Merge "Add multi-column restrictions filtering" from Piotr " Fixes #3574 This series adds missing multi-column restrictions filtering to CQL. The underlying infrastructure already allows checking multi-column restrictions in a reasonable way, so this series consists of mostly adding simple interfaces and parameters. Also, unit test cases for multi-column restrictions are provided. Tests: unit (dev) " * 'add_multi_column_restrictions_filtering_3' of https://github.com/psarna/scylla: tests: add multi-column filtering tests cql3: add multi-column restrictions filtering cql3: add specified is_satisfied_by to multi-column restriction cql3: rewrite raw loop in is_satisfied_by to boost::any_of cql3: fix is_satisfied_by for multi-column restrictions cql3: add missing include to multi-column restriction	2019-02-19 14:42:14 +02:00
Piotr Sarna	9432937816	tests: add multi-column filtering tests Refs #3574	2019-02-19 13:24:25 +01:00
Piotr Sarna	4dc0b0672c	cql3: add multi-column restrictions filtering It's now possible to pass multi-column restrictions to queries that require filtering. Fixes #3574	2019-02-19 13:24:25 +01:00
Piotr Sarna	3db526ffe2	cql3: add specified is_satisfied_by to multi-column restriction Multi-column restrictions need only schema, clustering key and query options in order to decide if they are satisfied, so an overloaded function that takes reduced number of parameters is added.	2019-02-19 13:24:25 +01:00
Piotr Sarna	16dbc917a4	cql3: rewrite raw loop in is_satisfied_by to boost::any_of	2019-02-19 13:24:12 +01:00
Piotr Sarna	0d675e4419	cql3: fix is_satisfied_by for multi-column restrictions Multi-column restriction should be satisfied by the value if any of the ranges contains it, not all of them. Example: SELECT * FROM t WHERE (a,b) IN ((1,2),(1,3)) will operate on two singular ranges: [(1,2),(1,2)] and [(1,3),(1,3)]. It's sufficient for a value to be inside any of these two in order to satisfy the restriction.	2019-02-19 13:10:58 +01:00
Avi Kivity	934ba7ccb2	Merge "tests: introduce test environment and cleanup sstable tests" from Benny " As part of implementing sstables manager and fixing issue related to updating large_data_handler on all delete paths, we want to funnel all sstable creations, loading, and deletions through a manager. The patchset lays out test infrastructure to funnel these opeations through class sstables::test_env. In the process, it cleans up many numerous call sites in the existing unit tests that evolved over time. Refs #4198 Refs #4149 Tests: unit (dev) " * 'projects/test_env/v3' of https://github.com/bhalevy/scylla: tests: introduce sstables::test_env tests: perf_sstable: rename test_env tests: sstable_datafile_test: use useable_sst tests: sstable_test: add write_and_validate_sst helper tests: sstable_test: add test_using_reusable_sst helper tests: sstable_test: use reusable_sst where possible tests: sstable_test: add test_using_working_sst helper tests: sstable_3_x_test: make_test_sstable tests: run_sstable_resharding_test: use default parameters to make_sstable tests: sstables::test::make_test_sstable: reorder params tests: test_setup: do_with_test_directory is unused tests: move sstable_resharding_strategy_tests to sstable_reharding_test tests: move create_token_from_key helpers to test_services tests: move column_family_for_tests to test_services dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh	2019-02-19 11:26:42 +02:00
Piotr Sarna	4eecb57a0b	cql3: add missing include to multi-column restriction	2019-02-19 10:24:31 +01:00
Tomasz Grabiec	9c6f897731	tools/toolchain/README: Add the "Troubleshooting" section Message-Id: <1550567863-29404-1-git-send-email-tgrabiec@scylladb.com>	2019-02-19 11:21:02 +02:00
Tzach Livyatan	622361bf1a	docs/docker-hub.md: Docker Compose cluster example This adds a simple example of launching a 3-node Scylla cluster with Docker Compose. Signed-off-by: Tzach Livyatan <tzach@scylladb.com> [ penberg: minor edits ] Message-Id: <20190213081003.6401-1-tzach@scylladb.com>	2019-02-19 09:52:20 +02:00
Avi Kivity	e37e095432	build: allow configuring and testing multiple modes Allow the --mode argument to ./configure.py and ./test.py to be repeated. This is to allow contiuous integration to configure only debug and release, leaving dev to developers. Message-Id: <20190214162736.16443-1-avi@scylladb.com>	2019-02-18 15:52:25 +00:00
Tomasz Grabiec	08f4a3664e	sstables: mc: writer: Avoid large allocations for maintaining promoted index Currently, we keep the entries in a circular_buffer, which uses a contiguous storage. For large partitions with many promoted index entries this can cause OOM and sstable compaction failure. A similar problem exists for the offset vector built in write_promoted_index(). This change solves the problem by serializing promoted index entries and the offset vector on the fly directly into a bytes_ostream, which uses fragmented storage. The serialization of the first entry is deferred, so that serialization is avoided if there will be less than 2 entries. Promoted index is not added for such partitions. There still remains a problem that large-enough promoted index can cause OOM. Refs #4217	2019-02-18 16:03:07 +01:00
Tomasz Grabiec	4e093bc3a4	sstables: mc: writer: Avoid double-serialization of the promoted index	2019-02-18 16:03:07 +01:00
Duarte Nunes	6e83457b1b	Merge 'Add PER PARTITION LIMIT' from Piotr " This series introduces PER PARTITION LIMIT to CQL. Protocol and storage is already capable of applying per-partition limits, so for nonpaged queries the changes are superficial - a variable is parsed and passed down. For paged queries and filtering the situation is a little bit more complicated due to corner cases: results for one partition can be split over 2 or more pages, filtering may drop rows, etc. To solve these, another variable is added to paging state - the number of rows already returned from last served partition. Note that "last" partition may be stretched over any number of pages, not just the last one, which is a case especially when considering filtering. As a result, per-partition-limiting queries are not eligible for page generator optimization, because they may need to have their results locally filtered for extraneous rows (e.g. when the next page asks for per-partition limit 5, but we already received 4 rows from the last partition, so need just 1 more from last partition key, but 5 from all next ones). Tests: unit (dev) Fixes #2202 " * 'add_per_partition_limit_3' of https://github.com/psarna/scylla: tests: remove superficial ignore_order from filtering tests tests: add filtering with per partition key limit test tests: publish extract_paging_state and count_rows_fetched tests: fix order of parameters in with_rows_ignore_order cql3,grammar: add PER PARTITION LIMIT idl,service: add persistent last partition row count cql3: prevent page generator usage for per-partition limit cql3: add checking for previous partition count to filtering pager: add adjusting per-partition row limit cql3: obey per partition limit for filtering cql3: clean up unneeded limit variables cql3: obey per partition limit for select statement cql3: add get_per_partition_limit cql3: add per_partition_limit to CQL statement	2019-02-18 14:47:11 +00:00
Amnon Heiman	750b76b1de	scylla-housekeeping: Read JSON as UTF-8 string for older Python 3 compatibility Python 3.6 is the first version to accept bytes to the json.loads(), which causes the following error on older Python 3 versions: Traceback (most recent call last): File "/usr/lib/scylla/scylla-housekeeping", line 175, in <module> args.func(args) File "/usr/lib/scylla/scylla-housekeeping", line 121, in check_version raise e File "/usr/lib/scylla/scylla-housekeeping", line 116, in check_version versions = get_json_from_url(version_url + params) File "/usr/lib/scylla/scylla-housekeeping", line 55, in get_json_from_url return json.loads(data) File "/usr/lib64/python3.4/json/__init__.py", line 312, in loads s.__class__.__name__)) TypeError: the JSON object must be str, not 'bytes' To support those older Python versions, convert the bytes read to utf8 strings before calling the json.loads(). Fixes #4239 Branches: master, 3.0 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190218112312.24455-1-amnon@scylladb.com>	2019-02-18 14:52:32 +02:00
Piotr Sarna	5ad5221ce1	tests: remove superficial ignore_order from filtering tests Testing filtering with LIMIT used with_rows_ignore_order function, while it's better to use simpler with_rows.	2019-02-18 11:06:44 +01:00
Piotr Sarna	5f67a501ec	tests: add filtering with per partition key limit test	2019-02-18 11:06:44 +01:00
Piotr Sarna	a84e237177	tests: publish extract_paging_state and count_rows_fetched These local lambda functions will be reused, so they are promoted to static functions.	2019-02-18 11:06:44 +01:00
Piotr Sarna	824e9dc352	tests: fix order of parameters in with_rows_ignore_order When reporting a failure, expected rows were mixed up with received rows. Also, the message assumed it received more rows, but it can as well be less, so now it reports a "different number" of rows.	2019-02-18 11:06:44 +01:00
Piotr Sarna	3e4f065847	cql3,grammar: add PER PARTITION LIMIT Select statements now allow passing PER PARTITION LIMIT (?) directive which will trim results for each partition accordingly.	2019-02-18 11:06:44 +01:00
Piotr Sarna	acf7bedad4	idl,service: add persistent last partition row count In order to process paged queries with per-partition limits properly, paging state needs to keep additional information: what was the row count of last partition returned in previous run. That's necessary because the end of previous page and the beginning of current one might consist of rows with the same partition key and we need to be able to trim the results to the number indicated by per-partition limit.	2019-02-18 11:06:44 +01:00
Piotr Sarna	3a2b004f02	cql3: prevent page generator usage for per-partition limit Paged queries that induce per-partition limits cannot use page generator optimization, as sometimes the results need to be filtered for extraneous rows on page breaks.	2019-02-18 11:06:44 +01:00
Piotr Sarna	1dadae212a	cql3: add checking for previous partition count to filtering Filtering now needs to take into account per partition limits as well, and for that it's essential to be able to compare partition keys and decide which rows should be dropped - if previous page(s) contained rows with the same partition key, these need to be taken into consideration too.	2019-02-18 11:06:43 +01:00
Piotr Sarna	82a3883575	pager: add adjusting per-partition row limit For filtering pagers, per partition limit should be set to page size every time a query is executed, because some rows may potentially get dropped from results.	2019-02-18 10:55:52 +01:00
Piotr Sarna	b965c3778f	cql3: obey per partition limit for filtering Filtering queries now take into account the limit of rows per single partition provided by the user.	2019-02-18 10:29:34 +01:00
Piotr Sarna	b3aa939cde	cql3: clean up unneeded limit variables Some places extracted a `limit` variable to be captured by lambdas, but they were not used inside them.	2019-02-18 10:29:34 +01:00
Piotr Sarna	cfb6e9c79c	cql3: obey per partition limit for select statement Select statement now takes into account the limit of rows per single partition provided by the user.	2019-02-18 10:29:34 +01:00
Piotr Sarna	41b466246e	cql3: add get_per_partition_limit	2019-02-18 10:29:34 +01:00
Piotr Sarna	93786a9148	cql3: add per_partition_limit to CQL statement Select statements can now accept per_partition_limit variable.	2019-02-18 10:29:34 +01:00
Gleb Natapov	b01a659014	storage_proxy: remove old Cassandra code Part of the code is already implemented (counters and hinted-handoff). Part of the code will probably never be (triggers). And the rest is the code that estimates number of rows per range to determine query parallelism, but we implemented exponential growth algorithms instead. Message-Id: <20190214112226.GE19055@scylladb.com>	2019-02-18 10:34:55 +02:00
Avi Kivity	a1567b0997	Merge "replace get_restricted_ranges() function with generator interface" from Gleb " get_restricted_ranges() is inefficient since it calculates all vnodes that cover a requested key ranges in advance, but callers often use only the first one. Replace the function with generator interface that generates requested number of vnodes on demand. " * 'gleb/query_ranges_to_vnodes_generator' of github.com:scylladb/seastar-dev: storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator storage_proxy: remove old get_restricted_ranges() interface cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface storage_proxy: introduce new query_ranges_to_vnode_generator interface	2019-02-18 10:33:54 +02:00
Avi Kivity	497367f9f7	Revert "build: switch debug mode from -O0 to -Og" This reverts commit `e988521b89`. It triggers a bug int gcc variable tracking, and there are reports it significantly slows down compilation.	2019-02-17 18:32:28 +02:00
Nadav Har'El	05db7d8957	Materialized views: name the "batch_memory_max" constant Give the constant 1024*1024 introduced in an earlier commit a name, "batch_memory_max", and move it from view.cc to view_builder.hh. It now resides next to the pre-existing constant that controlled how many rows were read in each build step, "batch_size". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217100222.15673-1-nyh@scylladb.com>	2019-02-17 13:28:16 +00:00
Avi Kivity	7b411e30a9	Update seastar submodule * seastar 11546d4...2313dec (6): > Deprecate thread_scheduling_group in favor of scheduling_group > Merge "Fixes for Doxygen documentation" from Jesse > future: optionally type-erase future::then() and future::then_wrapped > build: Allow deprecated declarations internally > rpc: fix insertion of server connections into server's container > rpc: split BOOST_REQUIRE with long conditions into multiple	2019-02-16 22:27:34 +02:00
Avi Kivity	03531c2443	fragmented_temporary_buffer: fix read_exactly() during premature end-of-stream read_exactly(), when given a stream that does not contain the amount of data requested, will loop endlessly, allocating more and more memory as it does, until it fails with an exception (at which point it will release the memory). Fix by returning an empty result, like input_stream::read_exactly() (which it replaces). Add a test case that fails without a fix. Affected callers are the native transport, commitlog replay, and internal deserialization. Fixes #4233. Branches: master, branch-3.0 Tests: unit(dev) Message-Id: <20190216150825.14841-1-avi@scylladb.com>	2019-02-16 17:06:19 +00:00
Takuya ASADA	af988a5360	install-dependencies.sh: show description when 'yum-utils' package is installed on Fedora When yum-utils already installed on Fedora, 'yum install dnf-utils' causes conflict, will fail. We should show description message instead of just causing dnf error mesage. Fixes #4215 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190215221103.2379-1-syuu@scylladb.com>	2019-02-16 17:16:18 +02:00
Pekka Enberg	f7cf04ac4b	tools/toolchain: Clean up DNF cache from Docker image Make sure we call "dnf clean all" to remove the DNF cache, which reduces Docker image size as per the following guidelines: https://github.com/fedora-cloud/Fedora-Dockerfiles/wiki/Guidelines-for-Creating-Dockerfiles A freshly built image is 250 MB smaller than the one on Docker Hub: <none> <none> b8cafc8ff557 16 seconds ago 1.2 GB docker.io/scylladb/scylla-toolchain fedora-29-20190212 d253d45a964c 3 days ago 1.45 GB Message-Id: <20190215142322.12466-1-penberg@scylladb.com>	2019-02-16 17:12:10 +02:00
Botond Dénes	2125e99531	service/storage_service: fix pre-bootstrap wait for schema agreement When bootstrapping, a node should to wait to have a schema agreement with its peers, before it can join the ring. This is to ensure it can immediately accept writes. Failing to reach schema agreement before joining is not fatal, as the node can pull unknown schemas on writes on-demand. However, if such a schema contains references to UDFs, the node will reject writes using it, due to #3760. To ensure that schema agreement is reached before joining the ring, `storage_service::join_token_ring()` has to checks. First it checks that at least one peer was connected previously. For this it compares `database::get_version()` with `database::empty_version`. The (implied) assumption is that this will become something other than `database::empty_version` only after having connected (and pulled schemas from) at least one peer. This assumption doesn't hold anymore, as we now set the version earlier in the boot process. The second check verifies that we have the same schema version as all known, live peers. This check assumes (since `3e415e2`) that we have already "met" all (or at least some) of our peers and if there is just one known node (us) it concludes that this is a single-node cluster, which automatically has schema agreement. It's easy to see how these two checks will fail. The first fails to ensure that we have met our peers, and the second wrongfully concludes that we are a one-node cluster, and hence have schema agreement. To fix this, modify the first check. Instead of relying on the presence of a non-empty database version, supposedly implying that we already talked to our peers, explicitely make sure that we have really talked to at least one other node, before proceeding to the second check, which will now do the correct thing, actually checking the schema versions. Fixes: #4196 Branches: 3.0, 2.3 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <40b95b18e09c787e31ba6c5519fb64d68b4ca32e.1550228389.git.bdenes@scylladb.com>	2019-02-15 15:56:46 +01:00
Rafael Ávila de Espíndola	9cd14f2602	Don't write to system.large_partition during shutdown The included testcase used to crash because during database::stop() we would try to update system.large_partition. There doesn't seem to be an order we can stop the existing services in cql_test_env that makes this possible. This patch then adds another step when shutting down a database: first stop updating system.large_partition. This means that during shutdown any memtable flush, compaction or sstable deletion will not be reflected in system.large_partition. This is hopefully not too bad since the data in the table is TTLed. This seems to impact only tests, since main.cc calls _exit directly. Tests: unit (release,debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190213194851.117692-1-espindola@scylladb.com>	2019-02-15 10:49:10 +01:00
Avi Kivity	e988521b89	build: switch debug mode from -O0 to -Og -Og is advertised as debug-friendly optimization, both in compile time and debug experience. It also cuts sstable_mutation_test run time in half: Changing -O0 to -Og Before: real 16m49.441s user 16m34.641s sys 0m10.490s After: real 8m38.696s user 8m26.073s sys 0m10.575s Message-Id: <20190214205521.19341-1-avi@scylladb.com>	2019-02-15 08:19:48 +02:00
Benny Halevy	c8f239ff2b	tests: introduce sstables::test_env In preparation to adding sstables_manager we want to establish an environment for testing sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:37:41 +02:00
Benny Halevy	f9546b23b7	tests: perf_sstable: rename test_env test_env is going to be a class in sstables namespace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:15 +02:00
Benny Halevy	d6cfc1fae5	tests: sstable_datafile_test: use useable_sst Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	2a6b5a7622	tests: sstable_test: add write_and_validate_sst helper In preparation for sstables::test_env Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	255f05e6c8	tests: sstable_test: add test_using_reusable_sst helper In preparation for sstables::test_env Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	e11e29a1fc	tests: sstable_test: use reusable_sst where possible Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	9d4989f2e8	tests: sstable_test: add test_using_working_sst helper In preparation for sstables::test_env Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	55aac22b37	tests: sstable_3_x_test: make_test_sstable Reused for making sstables for test cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	3bc1b8b9ff	tests: run_sstable_resharding_test: use default parameters to make_sstable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	b0f3f8d766	tests: sstables::test::make_test_sstable: reorder params In preparation for providing a default large_data_handler in a test-standard way. buffer_size parameter reordered and now has a default value same as make_sstable()'s. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:36 +02:00
Benny Halevy	bcd3f36a8a	tests: test_setup: do_with_test_directory is unused Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:32 +02:00
Benny Halevy	b39c7bc4ae	tests: move sstable_resharding_strategy_tests to sstable_reharding_test Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:32 +02:00
Benny Halevy	8801a6da1f	tests: move create_token_from_key helpers to test_services Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:32 +02:00
Benny Halevy	815fd76c25	tests: move column_family_for_tests to test_services And unify multiple copies of column_family_test_config(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:10 +02:00
Benny Halevy	b6ad61d2e5	dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh So it can be used by other tests Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:16:52 +02:00
Nadav Har'El	43c42d608d	materialized views: forbid using "virtual" columns in restrictions For fixing issue #3362 we added in materialized views, in some cases, "virtual columns" for columns which were not selected into the view. Although these columns nominally exist in the view's schema, they must not be visible to the user, and in commit `3f3a76aa8f` we prevented a user from being able to SELECT these columns. In this patch we also prevent the user from being able to use these column names (which shouldn't exist in the view) in WHERE restrictions. Fixes #4216 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212162014.18778-1-nyh@scylladb.com>	2019-02-14 16:08:41 +02:00
Gleb Natapov	0b84b04f97	consistency_level: make it more const correct Message-Id: <20190214122631.GF19055@scylladb.com>	2019-02-14 14:52:51 +02:00
Nadav Har'El	fec562ec8f	Materialized views: limit size of row batching during bulk view building The bulk materialized-view building processes (when adding a materialized view to a table with existing data) currently reads the base table in batches of 128 (view_builder::batch_size) rows. This is clearly better than reading entire partitions (which may be huge), but still, 128 rows may grow pretty large when we have rows with large strings or blobs, and there is no real reason to buffer 128 rows when they are large. Instead, when the rows we read so far exceed some size threshold (in this patch, 1MB), we can operate on them immediately instead of waiting for 128. As a side-effect, this patch also solves another bug: At worst case, all the base rows of one batch may be written into one output view partition, in one mutation. But there is a hard limit on the size of one mutation (commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the batch size to exceed this limit. By not batching further after 1MB, we avoid reaching this limit when individual rows do not reach it but 128 of them did. Fixes #4213. This patch also includes a unit test reproducing #4213, and demonstrating that it is now solved. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190214093424.7172-1-nyh@scylladb.com>	2019-02-14 12:04:40 +02:00
Calle Wilund	e70286a849	db/extensions: Allow schema extensions to turn themselves off Fixes #4222 Iff an extension creation callback returns null (not exception) we treat this as "I'm not needed" and simply ignore it. Message-Id: <20190213124311.23238-1-calle@scylladb.com>	2019-02-13 14:50:51 +02:00
Jesse Haber-Kucharsky	74ac1deee1	build: Fix the build on Ubuntu The way the `pkg-config` executable works on Fedora and Ubuntu is different, since on Fedora `pkg-config` is provided by the `pkgconf` project. In the build directory of Seastar, `seastar.pc` and `seastar-testing.pc` are generated. `seastar` is a requirement of `seastar-testing`. When pkg-config is invoked like this: pkg-config --libs build/release/seastar-testing.pc the version of `pkg-config` on Fedora resolves the reference to `seastar` in `Requires` to the `seastar.pc` in the same directory. However, the version of `pkg-config` on Ubuntu 18.04 does not: Package seastar was not found in the pkg-config search path. Perhaps you should add the directory containing `seastar.pc' to the PKG_CONFIG_PATH environment variable Package 'seastar', required by '/seastar-testing', not found To address the divergent behavior, we set the `PKG_CONFIG_PATH` variable to point to the directory containing `seastar.pc`. With this change, I was able to configure Scylla on both Fedora 29 and Ubuntu 18.04. Fixes #4218 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <d7164bde2790708425ac6761154d517404818ecd.1550002959.git.jhaberku@scylladb.com>	2019-02-13 13:33:50 +02:00
Avi Kivity	2915baeff4	Merge "Move truncation records to separate table" from Calle " Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data. " * 'calle/truncation' of github.com:scylladb/seastar-dev: truncation_migration_test: Add rudimentary test system_keyspace: Add waitable for trunc. migration cql_test_env: Add separate config w. feature disable cql_test_env: Add truncation migration to init cql_assertions: Add null/non-null tests storage_service: Add features disabling for tests Add system.truncated documentation in docs commitlog_replay: Use dedicated table for truncation storage_service: Add "truncation_table" feature	2019-02-13 11:16:30 +02:00
Calle Wilund	2e320a456c	truncation_migration_test: Add rudimentary test	2019-02-13 09:08:12 +00:00
Calle Wilund	4e657c0633	system_keyspace: Add waitable for trunc. migration For tests. Hooray for separation of concern.	2019-02-13 09:08:12 +00:00
Calle Wilund	b253757b17	cql_test_env: Add separate config w. feature disable	2019-02-13 09:08:12 +00:00
Calle Wilund	859a1d8f36	cql_test_env: Add truncation migration to init	2019-02-13 09:08:12 +00:00
Calle Wilund	fbcbe529ad	cql_assertions: Add null/non-null tests	2019-02-13 09:08:12 +00:00
Calle Wilund	64e8c6f31d	storage_service: Add features disabling for tests	2019-02-13 09:08:12 +00:00
Calle Wilund	7d3867e153	Add system.truncated documentation in docs	2019-02-13 09:08:12 +00:00
Calle Wilund	12ebcf1ec7	commitlog_replay: Use dedicated table for truncation Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data.	2019-02-13 09:08:12 +00:00
Calle Wilund	ff5e541335	storage_service: Add "truncation_table" feature	2019-02-13 09:08:12 +00:00
Avi Kivity	a3de5581ce	Update seastar submodule * seastar 428f4ac...11546d4 (9): > reactor: Fix an infinite loop caused the by high resolution timer not being monitored > build: Add back `SEASTAR_SHUFFLE_TASK_QUEUE` > build: Unify dependency versions > future-util: optimize parallel_for_each() with single element > core/sharded.hh: fix doxygen for "Multicore" group > build: switch from travis-ci to circleci > perftune.py: fix irqbalance tuning on Ubuntu 18 > build: Make the use of sanitizers transitive > net: ipv6: fix ipv6 detection and tests by binding to loopback	2019-02-12 18:42:07 +02:00
Avi Kivity	c7aa73af51	Merge "Automatically pause shard readers when not used" from Botond " Recently, there has been a series of incidents of the multishard combining reader deadlocking, when the concurrency of reads were severely restricted and there was no timeout for the read. Several fixes have been merged (`414b14a6b`, `21b4b2b9a`, `ee193f1ab`, `170fa382f`) but eliminating all occurrences of deadlocks proved to be a whack-a-mole game. After the last bug report I have decided that instead of trying to plug new wholes as we find them, I'll try to make wholes impossible to appear in the first place. To translate this into the multishard reader, instead of sprinkling new `reader.pause()` calls all over the place in the multishard reader to solve the newly found deadlocks, make the pausing of readers fully automatic on the shard reader level. Readers are now always kept in a paused state, except when actually used. This eliminates the entire class of deadlock bugs. This patch-set also aims at simplifying the multishard reader code, as well as the code of the existing `lifecycle_policy` implementations. This effort resulted in: * mutation_reader.cc: no change in SLOC, although it now also contains logic that used to be duplicated in every `lifecycle_policy` implementation; * multishard_mutation_query.cc: 150 SLOC removed; * database.cc: 30 SLOC removed; Also the code is now (hopefully) simpler, safer and has a clearer structure. Fixes #4050 (main issue) Fixes #3970 Fixes #3998 (deprecates really) " * 'simplify-and-fix-multishard-reader/v3.1' of https://github.com/denesb/scylla: query_mutations_on_all_shards(): make states light-weight query_mutations_on_all_shards(): get rid of read_context::paused_reader query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state query_mutations_on_all_shards(): pause looked-up readers query_mutation_on_all_shards(): remove unecessary indirection shard_reader: auto pause readers after being used reader_concurrency_semaphore::inactive_read_handle: fix handle semantics shard_reader: make reader creation sync shard_reader: use semaphore directly to pause-resume shard_reader: recreate_reader(): fix empty range case foreign_reader: rip out the now unused private API shard_reader: move away from foreign_reader multishard_combining_reader: make shard_reader a shared pointer multishard_combining_reader: move the shard reader definition out multishard_combining_reader: disentangle shard_reader	2019-02-12 16:22:52 +02:00
Botond Dénes	db106a32c8	query_mutations_on_all_shards(): make states light-weight Previously the different states a reader can be in were all separate structs, and were joined together by a variant. When this was designed this made sense as states were numerous and quite different. By this point however the number of states has been reduced to 4, with 3 of them being almost the same. Thus it makes sense to merge these states into single struct and keep track of the current state with an enum field. This can theoretically increase the chances of mistakes, but in practice I expect the opposite, due to the simpler (and less) code. Also, all the important checks that verify that a reader is in the state expected by the code are all left in place. A byproduct of this change is that the amount of cross-shard writes is greatly reduced. Whereas previously the whole state object had to be rewritten on state change, now a single enum value has to be updated. Cross shard reads are reduced as well to the read of a few foreign pointers, all state-related data is now kept on the shard where the associated reader lives.	2019-02-12 16:20:51 +02:00
Botond Dénes	65b2eb0939	query_mutations_on_all_shards(): get rid of read_context::paused_reader	2019-02-12 16:20:51 +02:00
Botond Dénes	ec44a4dbb1	query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state These two states are now the same, with the artificial distinction that all readers are promoted to readey_to_save state after the compaction state and the combined buffer is dismantled. From a practical perspective this distinction is meaningless so merge the two states into a single `saving` state.	2019-02-12 16:20:51 +02:00
Botond Dénes	9a1bd24d82	query_mutations_on_all_shards(): pause looked-up readers On the beginning of each page, all saved readers from the previous pages (if any) are looked up, so they can be reused. Some of these saved readers can end up not being used at all for the current page, in which case they will needlessly sit on their permit for the duration of filling the page. Avoid this by immediately pausing all looked-up readers. This also allows a nice unifying of the reader saving logic, as now all readers will be in a paused state when `save_reader()` is called. Previously, looked-up, but not used readers were an exception to this, requiring extra logic to handle both cases. This logic can now be removed.	2019-02-12 16:20:51 +02:00
Botond Dénes	61b9ed7faf	query_mutation_on_all_shards(): remove unecessary indirection	2019-02-12 16:20:51 +02:00
Botond Dénes	9000626647	shard_reader: auto pause readers after being used Previously it was the responsibility of the layer above (multishard combining reader) to pause readers, which happened via an explicit `pause()` call. This proved to be a very bad design as we kept finding spots where the multishard reader should have paused the reader to avoid potential deadlocks (due to starved reader concurrency semaphores), but didn't. This commit moves the responsibility of pausing the reader into the shard reader. The reader is now kept in a paused state, except when it is actually used (a `fill_buffer()` or `fast_forward_to()` call is executing). This is fully transparent to the layer above. As a side note, the shard reader now also hides when the reader is created. This also used to be the responsibility of the multishard reader, and although it caused no problems so far, it can be considered a leak of internal details. The shard reader now automatically creates the remote reader on the first time it is attempted to be used. The code has been reorganized, such that there is now a clear separation of responsibilities. The multishard combining reader handles the combining of the output of the shard readers, as well as issuing read-aheads. The shard reader handles read-ahead and creating the remote reader when needed, as well as transferring the results of remote reads to the "home" shard. The remote reader (`shard_reader::remote_reader`, new in this patch) handles pausing-resuming as well as recreating the reader after it was evicted. Layers don't access each other's internals (like they used to). After this commit, the reader passed to `destroy_reader()` will always be in paused state.	2019-02-12 16:20:51 +02:00
Botond Dénes	ab5d717052	reader_concurrency_semaphore::inactive_read_handle: fix handle semantics That is: * make it move only; * make moved-from handles null handles; * add (public) default constructor, which constructs a null handle;	2019-02-12 16:20:51 +02:00
Botond Dénes	37006135dc	shard_reader: make reader creation sync Reader creation happens through the `reader_lifecycle_policy` interface, which offers a `create_reader()` method. This method accepts a shard parameter (among others) and returns a future. Its implementation is expected to go to the specified shard and then return with the created reader. The method is expected to be called from the shard where the shard reader (and consequently the multishard reader) lives. This API, while reasonable enough, has a serious flaw. It doesn't make batching possible. For example, if the shard reader issues a call to the remote shard to fill the remote reader's buffer, but finds that it was evicted while paused, it has to come back to the local shard just to issue the recreate call. This makes the code both convoluted and slow. Change the reader creation API to be synchronous, that is, callable from the shard where the reader has to be created, allowing for simple call sites and batching. This change requires that implementations of the lifecycle policy update any per-reader data-structure they have from the remote shard. This is not a problem however, as these data-structures are usually partitioned, such that they can be accessed safely from a remote shard. Another, very pleasant, consequence of this change is that now all methods of the lifecycle interface are sync and thus calls to them cannot overlap anymore. This patch also removes the `test_multishard_combining_reader_destroyed_with_pending_create_reader` unit test, which is not useful anymore. For now just emulate the old interface inside shard reader. We will overhaul the shard reader after some further changes to minimize noise.	2019-02-12 16:20:51 +02:00
Botond Dénes	57d1f6589c	shard_reader: use semaphore directly to pause-resume The shard reader relies on the `reader_lifecycle_policy` for pausing and resuming the remote reader. The lifecycle policy's API was designed to be as general as possible, allowing for any implementation of pause/resume. However, in practice, we have a single implementation of pause/resume: registering/unregistering the reader with the relevant `reader_concurrency_semaphore`, and we don't expect any new implementations to appear in the future. Thus, the generic API of the lifecycle policy, is needlessly abstract making its implementations needlessly complex. We can instead make this very concrete and have the lifecycle policy just return the relevant semaphore, removing the need for every implementor of the lifecycle policy interface to have a duplicate implementation of the very same logic. For now just emulate the old interface inside shard reader. We will overhaul the shard reader after some further changes to minimize noise.	2019-02-12 16:20:51 +02:00
Botond Dénes	fae5a2a8c8	shard_reader: recreate_reader(): fix empty range case If the shard reader is created for a singular range (has a single partition), and then it is evicted after reaching EOS, when recreated we would have to create a reader that reads an empty range, since the only partition the range has was already read. Since it is not possible to create a reader with an empty range, we just didn't recreate the reader in this case. This is incorrect however, as the code might still attempt to read from this reader, if only due to a bug, and would trigger a crash. The correct fix is to create an empty reader that will immediately be at EOS.	2019-02-12 16:20:51 +02:00
Botond Dénes	cd807586f6	foreign_reader: rip out the now unused private API Drop all the glue code, needed in the past so the shard reader can be implemented on top of foreign reader. As the shard reader moved away from foreign reader, this glue code is not needed anymore.	2019-02-12 16:20:51 +02:00
Botond Dénes	d80bc3c0a5	shard_reader: move away from foreign_reader In the past, shard reader wrapped a foreign reader instance, adding functionality required by the multishard reader on top. This has worked well to a certain degree, but after the addition of pause-resume of shard reader, the cooperation with foreign reader became more-and-more a struggle. It has now gotten to a point, where it feels like shard reader is fighting foreign reader as much as it reuses it. This manifested itself in the ever growing amount of glue code, and hacks baked into foreign reader (which is supposed to be of general use), specific to the usage in the multishard reader. It is time we don't force this code-reuse anymore and instead implement all the required functionality in shard reader directly.	2019-02-12 16:20:51 +02:00
Botond Dénes	da0c01c68b	multishard_combining_reader: make shard_reader a shared pointer Some members of shard reader have to be accessed even after it is destroyed. This is required by background work that might still be pending when the reader is destroyed. This was solved by creating a special `state` struct, which contained all the members of the shard readers that had to be accessed even after it was destroyed. This state struct was managed through a shared pointer, that each continuation that was expected to outlive the reader, held a copy of. This however created a minefield, where each line of the code had to be carefully audited to access only fields that will be guaranteed to remain valid. Fix this mess by making the whole class a shared pointer, with `enable_shared_from_this`. Now each continuation just has to make sure to keep `this` alive and code can now access all members freely (well, almost).	2019-02-12 16:20:51 +02:00
Botond Dénes	f1c3421eb4	multishard_combining_reader: move the shard reader definition out Shard reader started its life as a very thin layer above foreign reader, with just some convenience methods added. As usual, by now it has grown into a hairy monster, its class definition out-growing even that of the multishard reader itself. It is time shard reader is moved into the top-level scope, improving the readability of both classes.	2019-02-12 16:20:51 +02:00
Botond Dénes	7114b59309	multishard_combining_reader: disentangle shard_reader Currently shard reader has a reference to the owning multishard reader and it freely accesses its members. This resulted in a mess, where it's not clear what exactly shard reader depends on. Disentangle this mess, by making the shard reader self-sufficient, passing all it depends on into its constructor.	2019-02-12 16:20:51 +02:00
Nadav Har'El	85e5791710	tests/view_schema_test: fix flakiness caused by missing eventually() All tests that involve writing to a base table and then reading from the view table must use the eventually() function to account for the fact that the view update is asynchronous, and may be visible only some time after writing the base table. Forgetting an eventually() can cause the test to become flaky and sometimes fail because the expected data is not yet in the view. Botond noticed these failures in practice in two subtests (test_partition_key_filtering_with_slice and test_clustering_key_in_restrictions). This patch fixes both tests, and I also reviewed the entire source file view_schem_test.cc and found additional places missing an eventually() (and also places that unnecessarily used eventually() to read from the base table), and fixed those as well. Fixes #4212 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212121140.14679-1-nyh@scylladb.com>	2019-02-12 16:10:30 +02:00
Paweł Dziepak	eb03cf00f5	sstable: write_components: drop default for encoding stats There is no value if having a default value for encoding_stats parameter of write_components(). If anything it weakens the tests by encouraging not using the real encoding stats which is not what the actual sstable write path in Scylla does. This patch removes the default value and makes most of the tests provide real encoding statistics. The ones that do not are those that have no easy way of obtaining those (and those stats are not that important for the test itself) or there is a reason for not using those (sstable_3_x_test::test_sstable_write_large_row uses row size thresholds based on size with default-constructed encoding_stats). Message-Id: <20190212124356.14878-1-pdziepak@scylladb.com>	2019-02-12 16:08:24 +02:00
Calle Wilund	4a52ed7884	commitlog: Accept recycled (not yet re-used) segments in replay Refs #4085 Changes commitlog descriptor to both accept "Recycled-Commitlog..." file names, and preserve said name in the descriptor. This ensures we pick up the not-yet-used recycled segments left from a crash for replay. The replay in turn will simply ignore the recycled files, and post actual replay they will be deleted as needed. Message-Id: <20190129123311.16050-1-calle@scylladb.com>	2019-02-12 12:23:55 +02:00
Nadav Har'El	93baa334ea	create-relocatable-package.py: speed up slow compression create-relocatable-package.py currently (refs #4194) builds a compressed tar file, but does so using a painfully slow Python implementation of gzip, which is a problem considering the huge size (around 2 gigabytes) of Scylla's executable. On my machine, running it for a release build of Scylla takes a whopping 6 minutes. Just replacing the Python compression with a pipe to an external "gzip" process speeds up the run to just 2 minutes. But gzip is still not optimal, using only one thread even when on a many-core machine. If we switch to "pigz", a parallel implementation of "gzip", all cores are used and on my machine the compression speeds up to just 23 seconds - that's 15 times faster than before this patch. So this patch has create-relocatable-package.py use an external pigz process. "pigz" is now required on the build system (if you want to create packages), so is added to install-dependencies.sh. [avi: update toolchain] Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212090333.3970-1-nyh@scylladb.com>	2019-02-12 11:19:04 +02:00
Nadav Har'El	1cf1af1502	scylla_setup: fix non-interactive behavior In commit `ec66dd6562`, in non-interactive runs of scylla_setup all options were unintentionally set to "false", regardless of the options passed on the scylla_setup command line. This can lead to all sorts of wrong behaviors, and in particular one test setup assumed it was enabling the Scylla service (which was previously the default) but after this commit, it no longer did. This patch restores the previous behavior: Non-interactive invocations of scylla_setup adhere to the defaults and the command-line options, rather than blindly choosing "false". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190211214105.32613-1-nyh@scylladb.com>	2019-02-12 10:50:00 +02:00
Gleb Natapov	26e5700819	storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator Do not recalculate too much ranges in advance, it requires large allocation and usually means that a consumer of the interface is going to do to much work in parallel. Fixes: #3767	2019-02-12 10:45:25 +02:00
Avi Kivity	da9628c6dc	auth: password_authenticator: protect against NULL salted_hash In case salted_hash was NULL, we'd access uninitialized memory when dereferencing the optional in get_as<>(). Protect against that by using get_opt() and failing authentication if we see a NULL. Fixes #4168. Tests: unit (release) Branches: 3.0, 2.3 Message-Id: <20190211173820.8053-1-avi@scylladb.com>	2019-02-11 18:54:03 +01:00
Botond Dénes	c9e00172e9	tests/multishard_mutation_query_test: add fuzzy test "Fuzzy test" executes semi-random range-scans against semi-random data. By doing so we hope to achieve a coverage of edge cases that would be very hard to achieve by "conventional" unit tests. Fuzzy test generates a table with a population of partitions that are a combinations of all of: * Size of static row: none, tiny, small and large; * Number of clustering rows: none, few, several, and lots; * Size of clustering rows: tiny, small and large; * Number of range deletions: few, several and lots; * Number of rows covered by a range deletion: few, several; As well as a partition with extreme large static row, extreme number of rows and rows of extreme size. To avoid writing an excess amount of data, the size limit of pages is reduced to 1KB (from the default 1MB) and the row count limit of pages is reduced to 1000 (from the default of 10000). The test then executes range-scans against this population. For each range scan, a random partition range is generated, that is guaranteed to contain at least one partition (to avoid executing mostly empty scans), as well as a random partition-slice (row ranges). The data returned by the query is then thoroughly validated against the population description returned by the `create_test_table()` function. As this test has a large degree of randomness to it, covering a quasi-infinite input-space, it can (theoretically) fail at any time. As such I took great care in making such failures deterministically reproducible, based on a single random seed, which is logged to the output in case of a failure, together with instructions on how to repeat the particular run. The test also uses extensive logging to aid investigations. For logging, seastar's logging mechanism is used, as `BOOST_TEST_MESSAGE` produces unintelligible output when running with -c > 1. Log messages are carefully tagged, so that the test produces the least amount of noise by default, while being very explicit about what's happening when ran with `debug` or especially `trace` log levels.	2019-02-11 17:14:47 +02:00
Botond Dénes	4b2cac6f40	tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan() The existing `read_all_partitions_with_paged_scan()` implementation was tailored to the existing, simplistic test cases. Refactor it so that it can be used in much more complex test cases: * Allow specifying the page's `max_size`. * Allow specifying the query range. * Allow specifying the partition slice's ck ranges. * Fix minor bugs in the paging logic. To avoid churn, a backward-compatible overload is added, that retains the old parameter set.	2019-02-11 17:14:47 +02:00
Botond Dénes	542301fdc9	tests/test_table: add advanced `create_test_table()` overload This overload provides a middle ground between the very generic, but hard-to-use "expert version" and to very restrictive and simplistic "beginner version". It allows the user to declaratively describe the to-be-generated population in terms of bunch `std::uniform_int_distribution` objects (e.g. number of rows, size of rows, etc.). This allows for generating a random population in a controlled way, with a minimum amount of boiler-plate code on the user side.	2019-02-11 17:14:47 +02:00
Botond Dénes	7e1c1c2e8c	tests/test_table: make `create_test_table()` customizable Allow the user to specify the population of the table in a generic and flexible way. This patch essentially rewrites the `create_test_table()` implementation from scratch, so that it populates the table using the partition generator passed in by the user. Backward compatibility is kept, by providing a `create_test_table()` overload that is identical to the previous API. This overload is now implemented on top of the generic overload.	2019-02-11 17:14:47 +02:00
Gleb Natapov	ecc5230de5	storage_proxy: remove old get_restricted_ranges() interface It is not used any more.	2019-02-11 14:45:43 +02:00
Gleb Natapov	0cd9bbb71d	cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface	2019-02-11 14:45:43 +02:00
Gleb Natapov	e6208b1cde	tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface	2019-02-11 14:45:43 +02:00
Gleb Natapov	2735a85c8e	storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface	2019-02-11 14:45:43 +02:00
Gleb Natapov	692a0bd000	storage_proxy: introduce new query_ranges_to_vnode_generator interface get_restricted_ranges() function gets query provided key ranges and divides them on vnode boundaries. It iterates over all ranges and calculates all vnodes, but all its users are usually interested in only one vnode since most likely it will be enough to populate a page. If it will be not enough they will ask for more. This patch introduces new interface instead of the function that allows to generate vnode ranges on demand instead of precalculating all of them.	2019-02-11 14:45:43 +02:00
Avi Kivity	cb51fcab9d	README: improbe dbuild instructions Add a quick start, document more options, and link from the main README. Message-Id: <20190210154606.21739-1-avi@scylladb.com>	2019-02-11 09:25:25 +01:00
Avi Kivity	2724a66a12	docker: don't send .git during "docker build" It's huge and useless during "docker build" operations. Message-Id: <20190208161848.21125-1-avi@scylladb.com>	2019-02-11 09:17:14 +01:00
Glauber Costa	e0bfd1c40a	allow Cassandra SSTables with counters to be imported if they are new enough Right now Cassandra SSTables with counters cannot be imported into Scylla. The reason for that is that Cassandra changed their counter representation in their 2.1 version and kept transparently supporting both representations. We do not support their old representation, nor there is a sane way to figure out by looking at the data which one is in use. For safety, we had made the decision long ago to not import any tables with counters: if a counter was generated in older Cassandra, we would misrepresent them. In this patch, I propose we offer a non-default way to import SSTables with counters: we can gate it with a flag, and trust that the user knows what they are doing when flipping it (at their own peril). Cassandra 2.1 is by now pretty old. many users can safely say they've never used anything older. While there are tools like sstableloader that can be used to import those counters, there are often situations in which directly importing SSTables is either better, faster, or worse: the only option left. I argue that having a flag that allow us to import them when we are sure it is safe is better than having no option at all. With this patch I was able to successfully import Cassandra tables with counters that were generated in Cassandra 2.1, reshard and compact their SSTables, and read the data back to get the same values in Scylla as in Cassandra. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190210154028.12472-1-glauber@scylladb.com>	2019-02-10 17:50:48 +02:00
Glauber Costa	61ea54eff6	tools: toolchain: dbuild: use host networking This is convenient to test scylla directly by invoking build/dev/scylla. This needs to be done under docker because the shared objects scylla looks for may not exist in the host system. During quick development we may not want to go through the trouble of packaging relocatable scylla every time to test changes. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190209021033.8400-1-glauber@scylladb.com>	2019-02-10 12:16:47 +02:00
Duarte Nunes	d2d885fb93	Merge 'Fix misdetection of remote counter shards' from Paweł " The code reading counter cells form sstables verifies that there are no unsupported local or remote shards. The latter are detected by checking if all shards are present in the counter cell header (only remote shards do not have entries there). However, the logic responsible for doing that was incorrectly computing the total number of counter shards in a cell if the header was larger than a single counter shard. This resulted in incorrect complaints that remote shards are present. Fixes #4206 Tests: unit(release) " * tag 'counter-header-fix/v1' of https://github.com/pdziepak/scylla: tests/sstables: test counter cell header with large number of shards sstables/counters: fix remote counter shard detection	2019-02-10 12:16:31 +02:00
Paweł Dziepak	4eeb8eeed5	tests/sstables: test counter cell header with large number of shards The logic responsible for reading counters from sstables was getting confused by large headers. The size of the header depends directly on the number of shards. This tests checks that we can handle cells with large number of counter shards properly.	2019-02-08 17:06:31 +00:00
Paweł Dziepak	df1ac03154	sstables/counters: fix remote counter shard detection Each counter cell has a header with an entry for each local and global shards. The detection of remote shards is done by checking if there are any counter shards that do not have an entry in the header. This is done by computing the number of counter shards in a cell and comparing it to the number of header entries. However, the computation was wrong and included the size taken by the header itself. As a result, if the header was as big or larger than a single counter shard Scylla incorrectly complained about remote shards.	2019-02-08 17:04:22 +00:00
Glauber Costa	8ba6b569b1	relocatable python: make sure all shared objects are relocated The interpreter as it is right now has a bug: I incorrectly assumed that all the shared libraries that python dynamically links would be in lib-dynload. That is not true, and at least some of them are in site-packages. With that, we were loading system libraries for some shared objects. The approach taken to fix this is to just check if we're seeing a shared library and relocate everything we see: we will end up relocating the ones in lib64 too, but that not only should be okay, it is probably even more fool-proof. While doing that I noticed that I had forgotten to incorporate one of previous feedback from Avi (that we're leaving temporary files behind). So I'm fixing that as well. [avi: update toolchain] Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190208115501.7234-1-glauber@scylladb.com>	2019-02-08 18:42:24 +02:00
Glauber Costa	fb742473e2	replace /usr/local as a source of packages in the python relocatable interpreter I was playing with the python3 interpreter trying to get pip to work, just to see how far we can go. We don't really need pip, but I figured it would be a good stress test to make sure that the process is working and robust. And it didn't really work, because although pip will correctly install things into $relocatable_root/local/lib, sys.path will still refer to a hardcoded /usr/local. While this should not affect Scylla, since we expect to have all our modules in out path anyway -- and that path is searched before /usr/local, it is still dangerous to make an absolute reference like this. Unfortunately, /usr/local/ it is included unconditionally by site.py, which is executed when the interpreter is started and there is no environment variable I found to change that (the help string refers to PYTHONNOUSERSITE, but I found no mention of that in site.py whatsoever) There is a way to tell site.py not to bother to add user sites, by passing the -s flag, which this patch does. Aside from doing that, we also enhance PYTHONPATH to include a reference to ./local/{lib,lib64}/python<version>/site-packages. After applying this patch, I was able to build an interpreter containing only python3-pip and python3-setuptools, and build the relocatable environment from there. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190206052104.25927-1-glauber@scylladb.com>	2019-02-08 18:41:52 +02:00
Botond Dénes	181bf64858	query: add trim_clustering_row_ranges_to() This algorithm was already duplicated in two places (service/pager/query_pagers.cc and mutation_reader.cc). Soon it will be used in a third place. Instead of triplicating, move it into a function that everybody can use.	2019-02-08 16:30:17 +02:00
Botond Dénes	bc31d8cbcc	tests/test_table: add keyspace and table name params Allow the keyspace and table names to be customizable by the caller.	2019-02-08 16:30:17 +02:00
Botond Dénes	2d885c6453	tests/test_table: s/create_test_cf/create_test_table/ Also move it to the `test` namespace.	2019-02-08 16:30:17 +02:00
Botond Dénes	c2a6ac307f	tests: move create_test_cf() to tests/test_table.{hh,cc} In the next patches `create_test_cf()` will be made much more powerful and as such generally useful. Move it into its own files so other tests can start using it as well.	2019-02-08 16:30:17 +02:00
Botond Dénes	2d3c4f9009	tests/multishard_mutation_query_test: drop many partition test Soon a much better test will be added that will cover many partitions as well and much more.	2019-02-08 16:30:17 +02:00
Botond Dénes	ced0e7ecb3	tests/multishard_mutation_query_test: drop range tombstone test Soon a much better test will be added that will also cover range tombstones and much more.	2019-02-08 16:30:17 +02:00
Paweł Dziepak	64b1a2caf9	tests: modernise tmpdir tmpdir is a helper class representing a temporary directory. Unfortunately, it suffers for some problems such as lack of proper encapsulation and weak typing. This has caused bugs in the past when the user code accidentally modified the member variable with the path to the directory. This patch modernises tmpdir and updates its users. The path is stored in a std::filesystem::path and available read-only to the class users. mkdtemp and boost are replaced by standard solution. The users are update to use path more (when it didn't involve too many changes to their code) and stop using lw_shared_ptr to store the tmpdir when it wasn't necessary. tmpdir intentionally doesn't provide any helpers for getting the path as a string in order to discourage weak types. Message-Id: <20190207145727.491-1-pdziepak@scylladb.com>	2019-02-07 20:18:14 +02:00
Avi Kivity	e2e25720c1	Update seastar submodule * seastar c3be06d...428f4ac (13): > build: make the "dist" test respect the build type > Merge 'Add support for docker --cpuset-cpus' from Juliana > Merge "Add support for Coroutines TS" from Paweł > Merge "Modernize dependency management" from Avi > future: propagate broken_promise exception to abandoned continuations > net/inet_address: avoid clang Wmissing-braces > build: Default to the "Release" type if unspecified > rpc: log an exception that may happen while processing an RPC message > Add a --split-dwarf option to configure.py > build: Fix the `StdFilesystem` module > Compress debug info by default > Add an option for building with split dwarf > Dockerfile: install stow	2019-02-07 20:08:15 +02:00
Paweł Dziepak	de2a447576	utils/extremum_tracking: drop default constructor Default constructed extremum_tracker has uninitialised _default_value which basically makes it never correct to do that. Since this class is a mechanism and not a value it doesn't really need to be a regular type, so let's drop the default constructor. Message-Id: <20190207162430.7460-1-pdziepak@scylladb.com>	2019-02-07 18:31:25 +02:00
Tomasz Grabiec	7184289015	Merge "Various fixes and improvements for sstables statistics" from Paweł This series contains several fixes and improvements as well as new tests for sstable code dealing with statistics. * https://github.com/pdziepak/scylla.git sstable-stats-fixes/v1-rebased: sstables: compaction: don't access moved-from vector of sstables memtable: move encoding_stats_collector implementation out of header sstables: seal_statistics(): pass encoding_stats by constant reference sstables/mc/writer: don't assume all schema columns are present tests/sstable3: improvements to file compare tests: extract mutation data model tests/data_model: add support for expiring atomic cells tests/data_model: allow specifying timestamp for row markers tests/memtable: test column tracking for encoding stats sstables: use correct source of statistics in get_encoding_stats_for_compaction() utils/extremum_tracking: preserve "not-set" status on merge sstables/metadata_collector: move the default values to the global tracker tests/sstables: test for reading serialisation header tests/sstables: pass encoding stats to write_components() tests/sstable: test merging encoding_stats Fixes #4202.	2019-02-07 12:35:29 +01:00
Paweł Dziepak	67252de195	tests/sstable: test merging encoding_stats	2019-02-07 10:17:06 +00:00
Paweł Dziepak	e25603fbf7	tests/sstables: pass encoding stats to write_components() By default write_components() uses a safe default for encoding_stats which indicates that all columns are present. This may hide so bugs, so let's pass the real thing in the tests that this may matter.	2019-02-07 10:17:06 +00:00
Paweł Dziepak	d44d5ebf86	tests/sstables: test for reading serialisation header	2019-02-07 10:17:06 +00:00
Paweł Dziepak	ebf667fb9c	sstables/metadata_collector: move the default values to the global tracker column_stats is a per-partition tracker, while metadata_collector is the global one. The statistics gathered by column_stats are merged into the metadata_collector. In order to ensure that we get proper default values in case no value of particular kind (e.g. no TTLs) was seen they need to be set on the global tracker, not the per-partition one.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	2680022df0	utils/extremum_tracking: preserve "not-set" status on merge extremum_tracker allows choosing a default value that's going to be used only if no "real" values were provided. Since it is never compared with the actual input values it can be anything. For instance, if the minimum tracker default value is 0 and there was one update with the value 1 the detected minimum is going to be 1 (the default is ignored). However, this doesn't work when the trackers are merged since that process always leaves the destination tracker in the "set" state regardless whether any of the merged trakcers has ever seen any value. This is fixed by this patch, by properly preserving _is_set state on merge.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	84d8ee35d4	sstables: use correct source of statistics in get_encoding_stats_for_compaction() sstable class is responsible for much more things that it should. In particular, it takes care of both writing and reading sstables. The problem that it causes is that it is very easy to confuse those two. This is what has happened in get_encoding_stats_for_compaction(). Originally, it was using _c_stats as a source of the statistics, which is used only during the write and per-partition. Needless to say, the returned encoding_stats were bogus. The correct source of those statistics is get_stats_metadata().	2019-02-07 10:16:50 +00:00
Paweł Dziepak	e315448d0a	tests/memtable: test column tracking for encoding stats	2019-02-07 10:16:50 +00:00
Paweł Dziepak	591d5195a9	tests/data_model: allow specifying timestamp for row markers	2019-02-07 10:16:50 +00:00
Paweł Dziepak	b07cba6a89	tests/data_model: add support for expiring atomic cells	2019-02-07 10:16:50 +00:00
Paweł Dziepak	aab0b7360f	tests: extract mutation data model	2019-02-07 10:16:50 +00:00
Paweł Dziepak	fa216be260	tests/sstable3: improvements to file compare This patch introduces some improvement to file comparison: - exception flags are set so that any error triggers an exceptions and guarantees that they are not silently ignored - std::ios_base::binary flag is passed to open() - istreambuf_iterator is used instead of istream_iterator. It is better suited for comparing binary data.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	bc61471132	sstables/mc/writer: don't assume all schema columns are present The writer constructor prepares lists of present static and regular columns, those should be used for any further checks.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	0132bcc035	sstables: seal_statistics(): pass encoding_stats by constant reference	2019-02-07 10:16:50 +00:00
Paweł Dziepak	341f186933	memtable: move encoding_stats_collector implementation out of header	2019-02-07 10:16:50 +00:00
Paweł Dziepak	6d5c1a9813	sstables: compaction: don't access moved-from vector of sstables	2019-02-07 10:16:50 +00:00
Paweł Dziepak	a8a45a243b	tests/cql_test_env: don't override tmpdir::path The interface tmpdir::path isn't properly encapsulated and its users can modify the path even though they really shouldn't. This can happen accidentally, in cql_test_env a reference to tmpdir::path was created and later assigned to in one of the code paths. This caused tmpdir destructor to remove wrong directory at program exit. This patch solves the problem by avoiding referencing tmpdir::path, a copy is perfectly acceptable considering that this is tests-only code. Message-Id: <20190206173046.26801-1-pdziepak@scylladb.com>	2019-02-06 20:55:40 +02:00
Takuya ASADA	96b1cb97ba	dist/ami: don't cleanup build dir rm -rf build/* was to start rpm building on clean state, but it also delete scylla built binaries so it was not good idea. Instead of rm -rf build/*, we can check file existance on cloned directory, if it seems good we can reuse it. Also we need to run git pull on each package repo since it may not included latest commit. Fixes #4189 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190206101755.2056-1-syuu@scylladb.com>	2019-02-06 15:33:09 +02:00
Nadav Har'El	3e7dc7230d	build_deb.sh: fix error message The error message was apparently copied from the RPM script. Fix it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190205162148.20698-1-nyh@scylladb.com>	2019-02-05 18:22:36 +02:00
Avi Kivity	54748ad15b	Merge "Allow non-key IN restrictions" from Piotr " Fixes #4193 Fixes #3795 This series enables handling IN restrictions for regular columns, which is needed by both filtering and indexing mechanisms. Tests: unit (release) " * 'allow_non_key_in_restrictions' of https://github.com/psarna/scylla: tests: add filtering with IN restriction test cql3: remove unused can_have_only_one_value function cql3: allow non-key IN restrictions	2019-02-05 17:30:35 +02:00
Piotr Sarna	45db5da51b	tests: add filtering with IN restriction test Test case for filtering regular columns with IN restriction is added.	2019-02-05 16:04:17 +01:00
Piotr Sarna	36609d1376	cql3: remove unused can_have_only_one_value function	2019-02-05 16:04:17 +01:00
Piotr Sarna	c178ed8b16	cql3: allow non-key IN restrictions Restricting a regular column with IN restriction is a perfectly valid case for filtering and indexing, so it should be allowed. Fixes #4193 Fixes #3795	2019-02-05 15:50:17 +01:00
Rafael Ávila de Espíndola	84542dadfa	sstables: delete_atomically: don't drop futures We still allow the delete of rows from system.large_partition to run in parallel with the sstable deletion, but now we return a future that waits for both. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190205001526.68774-1-espindola@scylladb.com>	2019-02-05 16:47:58 +02:00
Calle Wilund	ba6a8ef35b	tls: Use a default prio string disabling TLS1.0 forcing min 128bits Fixes #4010 Unless user sets this explicitly, we should try explicitly avoid deprecated protocol versions. While gnutls should do this for connections initiated thusly, clients such as drivers etc might use obsolete versions. Message-Id: <20190107131513.30197-1-calle@scylladb.com>	2019-02-05 15:34:18 +02:00
Avi Kivity	6c71eae63f	Merge "API: Stream compaction history records" from Amnon " get_compaction_history can return a lot of records which will add up to a big http reply. This series makes sure it will not create large allocations when returning the results. It adds an api to the query_processor to use paged queries with a consumer function that returns a future, this way we can use the http stream after each record. This implementation will prevent large allocations and stalls. Fixes #4152 " * 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev: tests/query_processor_test: add query_with_consumer_test system_keyspace, api: stream get_compaction_history query_processor: query and for_each_cql_result with future	2019-02-05 14:16:36 +02:00
Avi Kivity	ebf179318c	Merge "SI: Add virtual columns to underlying MV" from Duarte " Virtual columns are MV-specific columns that contribute to the liveness of view rows. However, we were not adding those columns when creating an index's underlying MV, causing indexes to miss base rows. Fixes #4144 Branches: master, branch-3.0 " Reviewed-by: Nadav Har'El <nyh@scylladb.com> * 'sec-index/virtual-columns/v1' of https://github.com/duarten/scylla: tests/secondary_index_test: Add reproducer for #4144 index/secondary_index_manager: Add virtual columns to MV	2019-02-05 13:26:45 +02:00
Avi Kivity	367ef8d318	Merge "provide our own, relocatable, python3 interpreter" from Glauber " We would like to deploy Scylla in constrained environments where internet access is not permitted. In those environments it is not possible to acquire the dependencies of Scylla from external repos and the packages have to be sent alongside with its dependencies. In older distributions, like CentOS7 there isn't a python3 interpreter available. And while we can package one from EPEL this tends to break in practice when installing the software in older patchlevels (for instance, installing into RHEL7.3 when the latest is RHEL7.5). The reason for that, as we saw in practice, is that EPEL may not respect RHEL patchlevels and have the python interpreter depending on newer versions of some system libraries. virtualenv can be used to create isolated python enviornments, but it is not designed for full isolation and I hit at least two roadblocks in practice: 1) It doesn't copy the files, linking some instead. There is an --always-copy option but it is broken (for years) in some distributions. 2) Even when the above works, it still doesn't copy some files, relying on the system files instead (one sad example was the subprocess module that was just kept in the system and not moved to the virtualenv) This patch solves that problem by creating a python3 environment in a directory with the modules that Scylla uses, and no other else. It is essentially doing what vitualenv should do but doesn't. Once this environment is assembled the binaries are then made relocatable the same way the Scylla binary is. One difference (for now) between the Scylla binary relocation process and ours is that we steer away from LD_LIBRARY_PATH: the environment variable is inherited by any child process steming from the caller, which means that we are unable to use the subprocess module to call system binaries like mkfs (which our scripts do a lot). Instead, we rely on RUNPATH to tell the binary where to search for its libraries. Once we generate an archive with the python3 interpreter, we then package it as an rpm with bare any dependencies. The dependencies listed are: $ rpm -qpR scylla-relocatable-python3-3.6.7-1.el7.x86_64.rpm rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(PayloadIsXz) <= 5.2-1 And the total size of that rpm, with all modules scylla needs is 20MB. The Scylla rpm now have a way more modest dependency list: $ rpm -qpR scylla-server-666.development-0.20190121.80b7c7953.el7.x86_64.rpm \| sort \| uniq /bin/sh curl file hwloc kernel >= 3.10.0-514 mdadm pciutils rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(PayloadIsXz) <= 5.2-1 scylla-conf scylla-relocatable-python3 <== our python3 package. systemd-libs util-linux xfsprogs I have tested this end to end by generating RPMs from our master branch, then installing them in a clean CentOS7.3 installation without even using yum, just rpm -Uhv <package_list> Then I called scylla_setup to make sure all python scripts were working and started Scylla successfully. " * 'scylla-python3-v5' of github.com:glommer/scylla: Create a relocatable python3 interpreter spec file: fix python3 dependency list. fixup scripts before installing them to their final location automatically relocate python scripts make scyllatop relocatable use relative paths for installing scylla and iotune binaries	2019-02-05 12:53:34 +02:00
Amnon Heiman	c96c3ce9e8	tests/query_processor_test: add query_with_consumer_test This patch adds a unit test for querying with a consumer function. query with consumer uses paging, the tests covers the scenarios where the number of rows bellow and above the page size, it also test the option to stop in the middle of reading. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-02-05 12:35:53 +02:00
Amnon Heiman	6c7742d616	system_keyspace, api: stream get_compaction_history get_compaciton_history can return big chunk of data. To prevent large memory allocation, the get_compaction_history now read each compaction_history record and use the http stream to send it. Fixes #4152 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-02-05 11:14:53 +02:00
Amnon Heiman	c0e3b7673d	query_processor: query and for_each_cql_result with future query and for_each_cql_result accept a function that reads a row and return a stop_iterator. This implementation of those functions gets a function that returns a future stop_iterator allowing preemption between calls. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-02-05 11:14:53 +02:00
Glauber Costa	afed2cddae	Create a relocatable python3 interpreter We would like to deploy Scylla in constrained environments where internet access is not permitted. In those environments it is not possible to acquire the dependencies of Scylla from external repos and the packages have to be sent alongside with its dependencies. In older distributions, like CentOS7 there isn't a python3 interpreter available. And while we can package one from EPEL this tends to break in practice when installing the software in older patchlevels (for instance, installing into RHEL7.3 when the latest is RHEL7.5). The reason for that, as we saw in practice, is that EPEL may not respect RHEL patchlevels and have the python interpreter depending on newer versions of some system libraries. virtualenv can be used to create isolated python enviornments, but it is not designed for full isolation and I hit at least two roadblocks in practice: 1) It doesn't copy the files, linking some instead. There is an --always-copy option but it is broken (for years) in some distributions. 2) Even when the above works, it still doesn't copy some files, relying on the system files instead (one sad example was the subprocess module that was just kept in the system and not moved to the virtualenv) This patch solves that problem by creating a python3 environment in a directory with the modules that Scylla uses, and no other else. It is essentially doing what vitualenv should do but doesn't. Once this environment is assembled the binaries are then made relocatable the same way the Scylla binary is. One difference (for now) between the Scylla binary relocation process and ours is that we steer away from LD_LIBRARY_PATH: the environment variable is inherited by any child process steming from the caller, which means that we are unable to use the subprocess module to call system binaries like mkfs (which our scripts do a lot). Instead, we rely on RUNPATH to tell the binary where to search for its libraries. In terms of the python interpreter, PYTHONPATH does not need to be set for this to work as the python interpreter will include the lib directory in its PYTHONPATH. To confirm this, we executed the following code: bin/python3 -c "import sys; print('\n'.join(sys.path))" with the interpreter unpacked to both /home/centos/glaubertmp/test/ and /tmp. It yields respectively: /home/centos/glaubertmp/test/lib64/python36.zip /home/centos/glaubertmp/test/lib64/python3.6 /home/centos/glaubertmp/test/lib64/python3.6/lib-dynload /home/centos/glaubertmp/test/lib64/python3.6/site-packages and /tmp/python/lib64/python36.zip /tmp/python/lib64/python3.6 /tmp/python/lib64/python3.6/lib-dynload /tmp/python/lib64/python3.6/site-packages This was tested by moving the .tar.gz generated on my Fedora28 laptop to a CentOS machine without python3 installed. I could then invoke ./scylla_python_env/python3 and use the interpreter to call 'ls' through the subprocess module. I have also tested that we can successfully import all the modules we listed for installation and that we can read a sample yaml file (since PyYAML depends on the system's libyaml, we know that this works) Time to build: real 0m15.935s user 0m15.198s sys 0m0.382s Final archive size (uncompressed): 81MB Final archive sie (compressed) : 25MB Signed-off-by: Glauber Costa <glauber@scylladb.com> -- v3: - rewrite in python3 - do not use temporary directories, add directly to the archive. Only the python binary have to be materialized - Use --cacheonly for repoquery, and also repoquery --list in a second step to grab the file list v2: - do not use yum, resolve dependencies from installed packages instead - move to scripts as Avi wants this not only for old offline CentOS	2019-02-04 18:02:40 -05:00
Glauber Costa	f757b42ba7	spec file: fix python3 dependency list. The dependency list as it was did not reflect the fact that scyllatop is now written in python3. Some packages, like urwid, should use the python3 version. CentOS doesn't really have an urwid package for python3, not even in EPEL. So this officially marks the point in which we can't build packages that will install in CentOS7 anyway. Luckily, we will soon be providing our own python3 interpreter. But for now, as a first step, simplify the dependency list by removing the CentOS/Fedora conditional and listing the full python3 list Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 18:02:40 -05:00
Glauber Costa	7052028752	fixup scripts before installing them to their final location Before installing python files to their final location in install.sh, replace them with a thunk so that they can work with our python3 interpreter. The way the thunk works, they will also work without our python3 interpreter so unconditionally fixing them up is always safe. I opt in this patch for fixing up just at install time to simplify developer's life, who won't have to worry about this at all. Note about the rpm .spec file: since we are relying on specific format for the shebangs, we shouldn't let rpmbuild mess with them. Therefore, we need to disable a global variable that controls that behavior (by definition, Fedora rpmbuild will rewrite all shebangs to /usr/bin/python3) Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 18:02:40 -05:00
Glauber Costa	3869628429	automatically relocate python scripts Given a python script at $DIR/script.py, this copies the script to $DIR/libexec/script.py.bin, fixes its shebang to use /usr/bin/env instead of an absolute path for the interpreter and replaces the original script with a thunk that calls into that script. PYTHONPATH is adjusted so that the original directory containing the script can also serve as a source of modules, as would be originally intended. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 18:02:39 -05:00
Glauber Costa	1bb65a0888	make scyllatop relocatable Right now the binary we distribute with scyllatop calls into /usr/lib/scylla/scyllatop/scyllatop.py unconditionally. Calling that is all that this binary does. This poses a problem to our relocatable process, since we don't want to be referring to absolute paths (And moreover, that is calling python whereas it should be calling python3) The scyllatop.py files includes a python3 shebang and is executable. Therefore, it is best to just create a link to that file and execute it directly Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 16:12:46 -05:00
Glauber Costa	e890b8af09	use relative paths for installing scylla and iotune binaries The answer is yes: if we install them in $root/opt, we should link to $root/opt Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 14:33:51 -05:00
Piotr Jastrzebski	834bec5cc9	Read shard awareness columns as dropped Without this new version of Scylla won't be able to start with system tables inherited after older version that had shard awareness columns. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <cb62f20fc0c98f532c6f4ad5e08b3794951e85bd.1549289050.git.piotr@scylladb.com>	2019-02-04 18:43:11 +02:00
Rafael Ávila de Espíndola	bbd9dfcba7	Add a --split-dwarf option to configure.py It is off by default as it conflicts with distcc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190204002706.15540-1-espindola@scylladb.com>	2019-02-04 18:42:16 +02:00
Benny Halevy	a9e1e0233a	Add a dev build mode to test.py Message-Id: <20190204162112.7471-2-espindola@scylladb.com>	2019-02-04 18:38:23 +02:00
Rafael Ávila de Espíndola	6243443591	Add a dev build mode The build times I got with a clean ccache were: ninja dev 10806.89s user 678.29s system 2805% cpu 6:49.33 total ninja release 28906.37s user 1094.53s system 2378% cpu 21:01.27 total ninja debug 18611.17s user 1405.66s system 2310% cpu 14:26.52 total With this version -gz is not passed to seastar's configure. It should probably be seastar's configure responsibility to do that and I will send a separate patch to do it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190204162112.7471-1-espindola@scylladb.com>	2019-02-04 18:38:22 +02:00
Calle Wilund	9cadbaa96f	commitlog_replayer: Bugfix: finding truncation positions uses local var ref "uuid" was ref:ed in a continuation. Works 99.9% of the time because the continuation is not actually delayed (and assuming we begin the checks with non-truncated (system) cf:s it works). But if we do delay continuation, the resulting cf map will be borked. Fixes #4187. Message-Id: <20190204141831.3387-1-calle@scylladb.com>	2019-02-04 16:51:13 +02:00
Rafael Ávila de Espíndola	15a515a39b	build: Don't link utils/gz/gen_crc_combine_table with seastar It doesn't use seastar, so there is no point in linking with it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190203214145.43009-1-espindola@scylladb.com>	2019-02-04 15:43:16 +02:00
Botond Dénes	2a67355ded	multishard_combining_reader: better shard selection algorithm The multishard reader has to combine the output of all shards into a single fragment stream. To do that, each time a `partition_start` is read it has to check if there is another partition, from another shard, that has to be emitted before this partition. Currently for this it uses the partitioner. At every partition start fragment it checks if the token falls into the current shard sub-range. The shard sub-range is the continuous range of tokens, where each token belongs to the same shard. If the partition doesn't belong to the current shard sub-range the multishard reader assumes the following shard sub-range of the next shard will have data and move over to it. This assumption will however only stand on very dense tables, and will fail miserably on less dense tables, resulting in the multishard reader effectively iterating over the shard sub-ranges (4096 in the worst case), only to find data in just a few of them. This resulted in high user-perceived latency when scanning a sparse table. This patch replaces this algorithm with one based on a shard heap. The shards are now organized into a min-heap, by the next token they have data for. When a partition start fragment is read from the current shard, its token is compared to the smallest token in the shard heap. If smaller, we continue to read from the current shard. Otherwise we move to the shard with the smallest token. When constructing the reader, or after fast-forwarding we don't know what first token each reader will produce. To avoid reading in a partition from each reader, we assume each reader will produce the first token from the first shard sub-range that overlaps with the query range. This algorithm performs much better on sparse tables, while also being slightly better on dense tables. I did only a very rough measurement using CQL tracing. I populated a table with four rows on a 64 shards machine, then scanned the entire table. Time to scan the table (microseconds): before 27'846 after 5'248 Fixes: #4125 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d559f887b650ab8caa79ad4d45fa2b7adc39462d.1548846019.git.bdenes@scylladb.com>	2019-02-04 14:10:23 +02:00
Piotr Sarna	11e6d88ca7	tests: supplement filtering collections with more cases Filtering test cases for collections are supplemented with checking whether CONTAINS works correctly for sets and maps. Message-Id: <4a684152cdcdb65e1415ba5859699cb324312c2b.1548837150.git.sarna@scylladb.com>	2019-02-03 17:19:30 +02:00
Avi Kivity	468f8c7ee7	Merge "Print a warning if a row is too large" from Rafael " This is a first step in fixing #3988. " * 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla: Rename large_partition_handler Print a warning if a row is too large Remove defaut parameter value Rename _threshold_bytes to _partition_threshold_bytes keys: add schema-aware printing for clustering_key_prefix	2019-02-03 13:57:42 +02:00
Nadav Har'El	5a695b8029	Materialized views: fix three error messages Three error messages were supposed to include a column name, but a "{}" was missing in the format so the given column name didn't actually appear in the error message. So this patch adds the missing {}'s. Fixes #4183. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190203112100.13031-1-nyh@scylladb.com>	2019-02-03 12:23:29 +01:00
Tomasz Grabiec	72dd6f54e3	gdb: Print total amount of memory used by small and large allocations Message-Id: <1548956406-7601-2-git-send-email-tgrabiec@scylladb.com>	2019-02-01 13:18:16 +00:00
Tomasz Grabiec	f48fa542fc	gdb: Extend 'scylla memory' to show memory used by large allocations Adds new columns to the "Page spans" table named "large [B]" and "[spans]", which shows how much memory is allocated in spans of given size. Excludes spans used by small pools. Useful in determining what is the size of large allocations which consume the memory. Example output: Page spans: index size [B] free [B] large [B] [spans] 0 4096 4096 4096 1 1 8192 32768 0 0 2 16384 16384 0 0 3 32768 98304 2785280 85 4 65536 65536 1900544 29 5 131072 524288 471597056 3598 ... 31 8796093022208 0 0 0 Large allocations: 484675584 [B] Message-Id: <1548956406-7601-1-git-send-email-tgrabiec@scylladb.com>	2019-02-01 13:18:01 +00:00
Asias He	28d6d117d2	migration_manager: Fix nullptr dereference in maybe_schedule_schema_pull Commit `976324bbb8` changed to use get_application_state_ptr to get a pointer of the application_state. It may return nullptr that is dereferenced unconditionally. In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw: 4 nodes in the tests n1, n2, n3, n4 are started n1 is stopped n1 is changed to use different shard config n1 is restarted ( 2019-01-27 04:56:00,377 ) The backtrace happened on n2 right fater n1 restarts: 0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled 1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled 2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled 3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed) 4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status = 5 Segmentation fault on shard 0. 6 Backtrace: 7 0x00000000041c0782 8 0x00000000040d9a8c 9 0x00000000040d9d35 10 0x00000000040d9d83 11 /lib64/libpthread.so.0+0x00000000000121af 12 0x0000000001a8ac0e 13 0x00000000040ba39e 14 0x00000000040ba561 15 0x000000000418c247 16 0x0000000004265437 17 0x000000000054766e 18 /lib64/libc.so.6+0x0000000000020f29 19 0x00000000005b17d9 We do not know when this backtrace happened, but according to log from n3 an n4: INFO 2019-01-27 04:56:22,154 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL INFO 2019-01-27 04:56:21,594 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL We can be sure the backtrace on n2 happened before 04:56:21 - 19 seconds (the delay the gossip notice a peer is down), so the abort time is around 04:56:0X. The migration_manager::maybe_schedule_schema_pull that triggers the backtrace must be scheduled before n1 is restarted, because it dereference application_state pointer after it sleeps 60 seconds, so the time maybe_schedule_schema_pull is called is around 04:55:0X which is before n1 is restarted. So my theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time n1 has SCHEMA application_state, when n1 restarts, n2 gets new application state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty application_state for SCHEMA. We dereference the nullptr application_state and abort. Fixes: #4148 Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test Message-Id: <9ef33277483ae193a49c5f441486ee6e045d766b.1548896554.git.asias@scylladb.com>	2019-02-01 09:01:08 +02:00
Piotr Jastrzebski	ad217bbdc7	Revert "system_keyspace: add sharding information to local table" This reverts commit `bdce561ada`. Those columns are not used and cause problems with tools. Refs #4112 Message-Id: <c772ebc0ebc001e5bdf229424c6d51dc58cd5d2e.1548945023.git.piotr@scylladb.com>	2019-01-31 19:06:55 +01:00
Avi Kivity	9adf46b50e	Update seastar submodule * seastar 2f35731...c3be06d (1): > rpc: support closing streaming when only sink or source was created Ref #4124.	2019-01-31 12:39:02 +02:00
Nadav Har'El	7b9b7f8ebc	docs/metrics.md: document syntax for choosing specific instance/shard As another useful example of Prometheus syntax, show the syntax of plotting a graph for one particular node or shard. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190129221607.11813-1-nyh@scylladb.com>	2019-01-31 12:37:30 +02:00
Asias He	9d9ecda619	repair: Log keyspace and table name in repair_cf_range When a repair failed, we saw logs like: repair - Checksum of range (8235770168569320790, 8235957818553794560] on 127.0.0.1 failed: std::bad_alloc (std::bad_alloc) It is hard to tell which keyspace and table has failed. To fix, log the keyspace and table name. It is useful to know when debugging. Fixes #4166 Message-Id: <8424d314125b88bf5378ea02a703b0f82c2daeda.1548818669.git.asias@scylladb.com>	2019-01-31 12:36:46 +02:00
Gleb Natapov	a70374d982	messaging_service: do not forget to close stream when sending it to another side failed Fixes #4124 Message-Id: <20190131091857.GC3172@scylladb.com>	2019-01-31 12:01:56 +02:00
Piotr Jastrzebski	4b47094f30	Prevent undefined behaviour while writing range tombstones in LA/KA Stop calling .remove_suffix on empty string_view. ck_bview can be empty because this function can be called for a half open range tombstone. It is impossible to write such range tombstones to LA/KA SSTables so we should throw a proper exception instead of allowing an undefined behaviour. Refs #4113 Tests: unit(release) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c3738916953e4b10812aed95e645c739b4c29462.1548777086.git.piotr@scylladb.com>	2019-01-31 10:58:19 +01:00
Glauber Costa	94ead559f7	move scylla-housekeeping to dist/common/scripts All of our python scripts are there and they are all installed automatically into /usr/lib/scylla. By keeping scylla-housekeeping separately we are just complicating our build process. This would be just a minor annoyance but this broke the new relocatable process for python3 that I am trying to put together because I forgot to add the new location as a source for the scripts. Therefore, I propose we start being more diligent with this and keeping all scripts together for the future. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190123191732.32126-2-glauber@scylladb.com>	2019-01-31 11:44:34 +02:00
Jesse Haber-Kucharsky	c37aa258c5	build: Fix incremental builds when Seastar changes When a file in the `seastar` directory changes, we want to minimize the amount of Scylla artifacts that are re-built while ensuring that all changes in Seastar are reflected in Scylla correctly. For compiling object files, we change Seastar to be an "order only" dependency so that changes to Seastar don't trigger unnecessary builds. For linking, we add an "implicit" dependency on Seastar so that Scylla is re-linked when Seastar changes. With these changes, modifying a Seastar header file will trigger the recompilation of the affected Scylla object files, and modifying a Seastar source file will trigger linking only. Fixes #4171 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <0ab43d79ce0d41348238465d1819d4c937ac6414.1548906335.git.jhaberku@scylladb.com>	2019-01-31 11:00:40 +02:00
Raphael S. Carvalho	930f8caff9	sstables/compaction: Fix segfault when replacing expired sstable in incremental compaction Fully expired sstable is not added to compacting set, meaning it's not actually compacted, but it's kept in a list of sstables which incremental compaction uses to check if any sstable can be replaced. Incremental compaction was unconditionally removing expired sstable from compacting set, which led to segfault because end iterator was given. The fix is about changing sstable_set::erase() behavior to follow standard one for erase functions which will works if the target element is not present. Fixes #4085. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190130163100.5824-1-raphaelsc@scylladb.com>	2019-01-30 16:32:45 +00:00
Avi Kivity	056b6a4439	Update seastar submodule * seastar 07e1ed3...2f35731 (1): > Merge " Initial seastar ipv6 support" from Calle	2019-01-30 17:41:39 +02:00
Avi Kivity	1224cde871	Merge "Make perf_simple_query produce JSON results" from Paweł " This series enhances perf_simple_query error reporting by adding an option of producing a json file containing the results. The format of that file is very similar to the results produces by perf_fast_forward in order to ease integration with any tools that may want to interpret them. In addition to that perf_simple_query now prints to the standard output median, median absolute deviation, minimum and maximum of the partial results, so that there is no need for external scripts to compute those values. " * tag 'perf_simple_query-json/v1' of https://github.com/pdziepak/scylla: perf_simple_query: produce json results perf_simple_query: calculate and print statistics perf: time_parallel: return results of each iteration perf_simple_query: take advantage of threads in main()	2019-01-30 17:39:19 +02:00
Paweł Dziepak	6a0ee5dbbf	Merge "Simpler fix for the memtable reader's fragment monotonicity violation" from Botond " Recently it was discovered that the memtable reader (partition_snapshot_reader to be more precise) can violate mutation fragment monotonicity, by remitting range tombstones when those overlap with more than one ck range of the partition slice. This was fixed by `7049cd9`, however after that fix was merged a much simpler fix was proposed by Tomek, one that doesn't involve nearly as much changes to the partition snapshot reader and hences poses less risk of breaking it. This mini-series reverts the previous fix, then applies the new, simpler one. Refs: #4104 " * 'partition-snapshot-reader-simpler-fix/v2' of https://github.com/denesb/scylla: partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges Revert "partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges"	2019-01-30 15:24:31 +00:00
Jesse Haber-Kucharsky	b39eac653d	Switch to the the CMake-ified Seastar Committer: Avi Kivity <avi@scylladb.com> Branch: next Switch to the the CMake-ified Seastar This change allows Scylla to be compiled against the `master` branch of Seastar. The necessary changes: - Add `-Wno-error` to prevent a Seastar warning from terminating the build - The new Seastar build system generates the pkg-config files (for example, `seastar.pc`) at configure time, so we don't need to invoke Ninja to generate them - The `-march` argument is no longer inherited from Seastar (correctly), so it needs to be provided independently - Define `SEASTAR_TESTING_MAIN` so that the definition of an entry point is included for all unit test compilation units - Independently link Scylla against Seastar's compiled copy of fmt in its build directory - All test files use the (now public) Seastar testing headers - Add some missing Seastar headers to source files [avi: regenerate frozen toolchain, adjust seastar submoule] Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <02141f2e1ecff5cbcd56b32768356c3bf62750c4.1548820547.git.jhaberku@scylladb.com>	2019-01-30 11:17:38 +02:00
Botond Dénes	8d59c36165	partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges When entering a new ck range (of the partition-slice), the partition snapshot reader will apply to its range tombstones stream all the tombstones that are relevant to the new ck range. When the partition has range tombstones that overlap with multiple ck ranges, these will be applied to the range tombstone stream when entering any of the ck ranges they overlap with. This will result in the violation of the monotonicity of the mutation fragments emitted by the reader, as these range tombstones will be re-emitted on each ck range, if the ck range has at least one clustering row they apply to. For example, given the following partition: rt{[1,10]}, cr{1}, cr{2}, cr{3}... And a partition-slice with the following ck ranges: [1,2], [3, 4] The reader will emit the following fragment stream: rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ... Note how the range tombstone is emitted twice. In addition to violating the monotonicity guarantee, this can also result in an explosion of the number of emitted range tombstones. Fix by trimming range tombstones to the start of the current ck range, thus ensuring that they will not violate mutation fragment monotonicity guarantees. Refs: #4104 This is a much simpler fix for the above issue, than the already committed one (7049cd937A). The latter is reverted by the previous patch and this patch applies the simpler fix.	2019-01-30 10:01:13 +02:00
Nadav Har'El	9dd3c59c77	docs/metrics.md: explain Prometheus and Grafana docs/metrics.md so far explained just the REST API for retrieving current metrics from a single Scylla node. In this patch, I add basic explanations on how to use the Prometheus and Grafana tools included in the "scylla-grafana-monitoring" project. It is true that technically, what is being explained here doesn't come with the Scylla project and requires the separate scylla-grafana-monitoring to be installed as well. Nevertheless, most Scylla developers will need this knowledge eventually and suprisingly it appears it was never documented anywhere accessible to newbie developers, and I think metrics.md is the right place to introduce it. In fact, I myself wasn't aware until today that Prometheus actually had its own Web UI on port 9090, and that it is probably more useful for developers than Grafana is. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190129114214.17786-1-nyh@scylladb.com>	2019-01-29 15:46:06 +02:00
Duarte Nunes	35c03f41a4	Merge 'Fix multiple contains for one column' from Piotr " An error in validating CONTAINS restrictions against collections caused only the first restriction to be taken into account due to returning prematurely. This miniseries provides a fix for that as well as a matching test case. Tests: unit (release) Fixes #4161 " * 'fix_multiple_contains_for_one_column' of https://github.com/psarna/scylla: tests: enable CONTAINS tests for filtering cql3: remove premature return from is_satisfied_by cql3: restore indentation	2019-01-29 11:10:13 +00:00
Piotr Sarna	11aae54cca	tests: enable CONTAINS tests for filtering Tests for filtering with CONTAINS restrictions were not enabled, so they are now. Also, another case for having two CONTAINS restrictions for a single column is added. Refs #4161	2019-01-29 11:47:28 +01:00
Piotr Sarna	9595fec2ec	cql3: remove premature return from is_satisfied_by Function which checked whether a CONTAINS restriction is satisfied by a collection erroneously returned prematurely after checking just the first restriction - which works fine for the usual case, but fails if there are multiple CONTAINS restrictions present for a column. Fixes #4161	2019-01-29 11:47:28 +01:00
Piotr Sarna	89af01315d	cql3: restore indentation	2019-01-29 11:47:28 +01:00
Rafael Ávila de Espíndola	625080b414	Rename large_partition_handler Now that it also handles large rows, rename it to large_data_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:14 -08:00
Rafael Ávila de Espíndola	1185138a34	Print a warning if a row is too large Tests: unit (release) Refs #3988. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:10 -08:00
Rafael Ávila de Espíndola	776d5bb9e2	Remove defaut parameter value The value is already passed by cql_table_large_partition_handler, so the default was just for nop_large_partition_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola	30528fa853	Rename _threshold_bytes to _partition_threshold_bytes A followup patch will add a threshold for rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola	561285488b	keys: add schema-aware printing for clustering_key_prefix For reporting large rows we have to be able to print clustering keys in addition to partition keys. Refs #3988. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 13:01:54 -08:00
Paweł Dziepak	335dca54a5	perf_simple_query: produce json results	2019-01-28 16:36:06 +00:00
Paweł Dziepak	7d21c9c31f	perf_simple_query: calculate and print statistics	2019-01-28 16:36:06 +00:00
Paweł Dziepak	eb3d80fa2b	perf: time_parallel: return results of each iteration	2019-01-28 16:35:33 +00:00
Pekka Enberg	7bda3abbc6	toolchain/dbuild: Fix permission errors when SELinux is enabled Use the ":z" suffix to tell Docker to relabel file objets on shared volumes. Fixes accessing filesystem via dbuild when SELinux is enabled. Message-Id: <20190128160557.2066-1-penberg@scylladb.com>	2019-01-28 18:16:53 +02:00
Paweł Dziepak	6a1e1e8454	perf_simple_query: take advantage of threads in main()	2019-01-28 13:21:08 +00:00
Paweł Dziepak	11a1f97307	Merge "Fix cleanup of temporary sstable directories" from Benny " Cleanup of temporary sstable directories in distributed_loader::populate_column_family is completely broken and non tested. This code path was never executed since populate_column_family doesn't currently list subdirectories at all. This patchset fixes this code path and scans subdirectories in populate_column_family. Also, a unit test is added for testing the cleanup of incomplete (unsealed) sstables. Fixes: #4129 " * 'projects/sst-temp-dir-cleanup/v3' of https://github.com/bhalevy/scylla: tests: add test_distributed_loader_with_incomplete_sstables tests: single_node_cql_env::do_with: use the provided data_file_directories path if available tests: single_node_cql_env::_data_dir is not used distributed_loader: populate_column_family should scan directories too sstables: fix is_temp_dir distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir distributed_loader: remove temporary sstable directories only on shard 0 distributed_loader: push future returned by rmdir into futures vector	2019-01-28 12:23:00 +00:00
Duarte Nunes	ea34e242de	Merge 'Do not use hints for view building' from Piotr " This series prevents view building to fall back to storing hints. Instead, it will try to send hints to an endpoint as if it has consistency level ONE, and in case of failure retry the whole building step. Then, view building will never be marked as finished prematurely (because of pending hints), which will help avoid creating inconsistencies when decommissioning a node from the cluster. Tests: unit (release) dtest (materialized_views_test.py.) Fixes #3857 Fixes #4039 " 'do_not_mark_view_as_built_with_hints_7' of https://github.com/psarna/scylla: db,view: add updating view_building_paused statistics database: add view_building_paused metrics table: make populate_views not allow hints db,view: add allow_hints parameter to mutate_MV storage_proxy: add allow_hints parameter to send_to_endpoint	2019-01-28 10:31:14 +00:00
Piotr Sarna	9a6261ca27	db,view: add updating view_building_paused statistics Each time view building does is paused because of connection failure, view_building_paused metrics is bumped.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e30b0663d6	database: add view_building_paused metrics The metrics exposes how many times view building process was paused, e.g. because target node was down or overloaded.	2019-01-28 09:38:42 +01:00
Piotr Sarna	5dec6dc6c6	table: make populate_views not allow hints View building uses populate_views to generate and send view updates. This procedure will now not allow hints to be used to acknowledge the write. Instead, the whole building step will be retried on failure. Fixes #3857 Fixes #4039	2019-01-28 09:38:42 +01:00
Piotr Sarna	e30cf22956	db,view: add allow_hints parameter to mutate_MV Mutating MV function can now accept a parameter whether hints should be allowed during sending mutations to endpoints.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e0fe9ce2c0	storage_proxy: add allow_hints parameter to send_to_endpoint With hints allowed, send_to_endpoint will leverage consistency level ANY to send data. Otherwise, it will use the default - cl::ONE.	2019-01-28 09:38:41 +01:00
Rafael Ávila de Espíndola	5332ebd50c	Update the description of compaction_large_partition_warning_threshold_mb Despite the name, this option also controls if a warning is issued during memtable writes. Warning during memtable writes is useful but the option name also exists in cassandra, so probably the best we can do is update the description. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190125020821.72815-1-espindola@scylladb.com>	2019-01-28 09:09:35 +02:00
Takuya ASADA	5c6c008109	dist/ami: follow build script changes on -jmx/-tools/-ami packages We need to follow changes of rpm package build procedure on -jmx/-tools/-ami packages, since it have been changed when we merged relocatable pacakge. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190127204436.13959-1-syuu@scylladb.com>	2019-01-28 09:08:32 +02:00
Takuya ASADA	7db1b45839	reloc: move relocatable libraries from /opt/scylladb/lib to /opt/scylladb/libreloc On Scylla 3rdparty tools, we add /opt/scylladb/lib to LD_LIBRARY_PATH. We use same directory for relocatable binaries, including libc.so.6. Once we install both scylla-env package and relocatable version of scylla-server package, the loader tries to load libc from /opt/scylladb/lib then entire distribution become unusable. We may able to use Obsoletes or Conflict tag on .rpm/.deb to avoid install new Scylla package with scylla-env, but it's better & safer not to share same directory for different purpose. Fixes #3943 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190128023757.25676-1-syuu@scylladb.com>	2019-01-28 09:04:56 +02:00
Avi Kivity	274f553485	tools: toolchain: run dbuild container with same timezone as host Make it easier to work interactively by not reporting surprising times. There are also reports that dtest fails with incorrect timezones, but those are probably bugs in dtest. Message-Id: <20190127134754.1428-1-avi@scylladb.com>	2019-01-27 22:48:42 +00:00
Duarte Nunes	aafaf840a2	tests/secondary_index_test: Add reproducer for #4144 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-01-27 22:30:34 +00:00
Duarte Nunes	aa476cd6c9	index/secondary_index_manager: Add virtual columns to MV Virtual columns are MV-specific columns that contribute to the liveness of view rows. However, we were not adding those columns when creating an index's underlying MV, causing indexes to miss base rows. Fixes #4144 Branches: master, branch-3.0 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-01-27 22:30:12 +00:00
Benny Halevy	36b6a3ebcf	tests: add test_distributed_loader_with_incomplete_sstables Test removal of sstables with temporary TOC file, with and without temporary sstable directory. Temporary sstable directories may be empty or still have leftover components in them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:48:24 +02:00
Benny Halevy	64a23ea3bc	tests: single_node_cql_env::do_with: use the provided data_file_directories path if available Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	441809094a	tests: single_node_cql_env::_data_dir is not used Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	74ef09a3a2	distributed_loader: populate_column_family should scan directories too To detect and cleanup leftover temporary sstable directories. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	bd85975277	sstables: fix is_temp_dir 1. fs::canonical required that the path will exist. and there is no need for fs::canonical here. 2. fs::path::extension will return the leading dot. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	c2a5f3b842	distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir populate_column_family currently lists only regular files. ignoring all directories. A later patch in this series allows it to list also directories so to cleanup the temporary sstable directories, yet valid sub-directories, like staging\|upload\|snapshots, may still exist and need to be ignored. Other kinds of handling, like validating recgnized sub-directories and halting on unrecognized sub-directories are possible, yet out of scope for this patch(set). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	9bd7b2f4e6	distributed_loader: remove temporary sstable directories only on shard 0 Similar to calling remove_sstable_with_temp_toc later on in populate_column_family(), we need only one thread to do the cleanup work and the existing convention is that it's shard 0. Since lister::rmdir is checking remove_file of all entries (recursively) and the dir itself, doing that concurrently would fail. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	bcfb2e509b	distributed_loader: push future returned by rmdir into futures vector	2019-01-27 14:14:32 +02:00
Asias He	ee0bb0aa94	tests: Drop the unsupported random_read mode in perf_sstable It is not supported. Remove it. Message-Id: <fe31e090574be96a9620b6902ceb843699d558d0.1548403105.git.asias@scylladb.com>	2019-01-25 14:24:40 +00:00
Avi Kivity	85abb13679	Merge "Fix cross shard cf usage" from Piotr " Lambda passed to distribute_reader_and_consume_on_shards shouldn't capture shard local variables. Fixes #4108 Tests: unit(release), dtest(update_cluster_layout_tests.TestLargeScaleCluster.add_50_nodes_test) " * 'haaawk/4108/v2' of github.com:scylladb/seastar-dev: Fix cross shard cf usage in repair Fix cross shard cf usage in streaming	2019-01-24 19:40:44 +02:00
Avi Kivity	d0f9e00e85	Merge " Support 64-bit gc_clock" (fixes) from Benny " Use int64_t in data::cell for expiry / deletion time. Extend time_overflow unit tests in cql_query_test to use select statements with and without bypass cache to access deeper into the system. Refs #3353 " * 'projects/gc_clock_64_fixes/v1' of https://github.com/bhalevy/scylla: tests: extend time_overflow unit tests data::cell: use int64_t for expiry and deletion time	2019-01-24 19:15:12 +02:00
Piotr Jastrzebski	fab1b7a3a2	Fix cross shard cf usage in repair Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 18:13:49 +01:00
Piotr Jastrzebski	1ac7283550	Fix cross shard cf usage in streaming Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 18:13:30 +01:00
Glauber Costa	ec66dd6562	scylla_setup: tell users about the possibility of a non-interactive session From day1, scylla_setup can be run either iteractively or through command line parameters. Still, one of the requests we are asked the most from users is whether we can provide them with a version of scylla_setup that they can call from their scripts. This probably happens because once you call a script interactively, it may not be totally obvious that a different mode is available. Even when we do tell users about that possibility, the request number two is then "which flags do I pass?" The solution I am proposing is to just tell users the answers to those qestions at the end of an interactive session. After this patch, we print the following message to the console: ScyllaDB setup finished. scylla_setup accepts command line arguments as well! For easily provisioning in a similar environmen than this, type: scylla_setup --no-raid-setup --nic eth0 --no-kernel-check \ --no-verify-package --no-enable-service --no-ntp-setup \ --no-node-exporter --no-fstrim-setup Also, to avoid the time-consuming I/O tuning you can add --no-io-setup and copy the contents of /etc/scylla.d/io* Only do that if you are moving the files into machines with the exact same hardware Notes on the implementation: it is unfortunate for these purposes that all our options are negated. Most conditionals are branching on true conditions, so although I could write this: args.no_option = not interactive_ask_service(...) if not args.no_option: ... I opted in this patch to write: option = interactive_ask_service(...) args.no_option = not option if option: ... There is an extra line and we have to update args separately, but it makes it less hard to get confused in the conditional with the double negation. Let me know if there are disagreements here. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190124153832.21140-1-glauber@scylladb.com>	2019-01-24 17:41:26 +02:00
Benny Halevy	6efd85ed01	tests: extend time_overflow unit tests Test also cql select queries with and without bypass cache. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-24 15:55:06 +02:00
Benny Halevy	7373825473	data::cell: use int64_t for expiry and deletion time Ttl may still use int32_t to reduce footprint Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-24 15:55:06 +02:00
Takuya ASADA	597059b4b1	dist/debian: skip stripping libprotobuf.so.15 dh_strip won't able to strip libprotobuf.so.15, and we actually don't need to strip dependency libraries, so skip it. Fixes #4135 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190123202213.2117-4-syuu@scylladb.com>	2019-01-24 15:51:56 +02:00
Takuya ASADA	aefc18e70d	dist/debian: install /usr/bin/file for dh_strip dh_strip requires /usr/bin/file but does not automatically installed, so install it on build_deb.sh. Fixes #4134 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190123202213.2117-3-syuu@scylladb.com>	2019-01-24 15:51:53 +02:00
Benny Halevy	fbebd0bb1d	thrift: validate_column_name: fix exception format string It's printing uint32_t rather than char*. Refs #4140 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190124104002.32381-1-bhalevy@scylladb.com>	2019-01-24 12:46:23 +02:00
Avi Kivity	b58b82c9a2	Merge "Cut build dependencies around types.hh" from Piotr " I've recently had to work around types.hh/types.cc files and had very unpleasent experience with incremental build on every change to types.hh. It took ~30 min on my machine which is almost as much as the clean build. I looked around and it turns out that types.hh contains the whole hierarchy of the types. On the same time, many places access the types only through abstract_type which is the root of the hierarchy. This patchset extracts user_type_impl, tuple_type_impl, map_type_impl, set_type_impl, list_type_impl and collection_type_impl from types.hh and places each of them in a separate header. The result of this is that change in user_type_impl causes now incremental build of ~6 min instead of ~30 min. Change to tuple_type_impl causes incremental build of ~7.5 min instead of ~30 min and change to map_type_impl triggers incremental build that takes ~20 min instead of ~30 min. Tests: unit(release) " * 'haaawk/types_build_speedup_2/rfc/2' of github.com:scylladb/seastar-dev: Stop including types/list.hh in cql3/tuples.hh Stop including types/set.hh into cql3/sets.hh Move collection_type_impl out of types.hh to types/collection.hh Move set_type_impl out of types.hh to types/set.hh Move list_type_impl out of types.hh to types/list.hh Move map_type_impl out of types.hh to types/map.hh Move tuple_type_impl from types.hh to types/tuple.hh Decouple database.hh from types/user.hh Allow to use shared_ptr with incomplete type other than sstable Move user_type_impl out of types.hh to types/user.hh	2019-01-24 11:21:22 +02:00
Piotr Jastrzebski	a3912a35f5	Stop including types/list.hh in cql3/tuples.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:57:19 +01:00
Piotr Jastrzebski	fe8dfc8fdc	Stop including types/set.hh into cql3/sets.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:57:19 +01:00
Piotr Jastrzebski	5a5201a50b	Move collection_type_impl out of types.hh to types/collection.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	ad016a732b	Move set_type_impl out of types.hh to types/set.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	b1e1b66732	Move list_type_impl out of types.hh to types/list.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	147cc031db	Move map_type_impl out of types.hh to types/map.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	b6b2fdc5be	Move tuple_type_impl from types.hh to types/tuple.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	7666e81b51	Decouple database.hh from types/user.hh This commit declares shared_ptr<user_types_metadata> in database.hh were user_types_metadata is an incomplete type so it requires "Allow to use shared_ptr with incomplete type other than sstable" to compile correctly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:55:04 +01:00
Piotr Jastrzebski	316be5c6b5	Allow to use shared_ptr with incomplete type other than sstable When seastar/core/shared_ptr_incomplete.hh is included in a header then it causes problems with all declarations of shared_ptr<T> with incomplete type T that end up in the same compilation unit. The problem happens when we have a compilation unit that includes two headers a.hh and b.hh such that a.hh includes seastar/core/shared_ptr_incomplete.hh and b.hh declares shared_ptr<T> with incomplete type T. On the same time this compilation unit does not use declared shared_ptr<T> so it should compile and work but it does not because shared_ptr_incomplete.hh is included and it forces instantiation of: template <typename T> T* lw_shared_ptr_accessors<T, void_t<decltype(lw_shared_ptr_deleter<T>{})>>::to_value(lw_shared_ptr_counter_base* counter) { return static_cast<T>(counter); } for each declared shared_ptr<T> with incomplete type T. Even the once that are never used. Following commit "Decouple database.hh from types/user.hh" moves user_types_metadata type out of database.hh and instead declares shared_ptr<user_types_metadata> in database.hh where user_types_metadata is incomplete. Without this commit the compilation of the following one fails with: In file included from ./sstables/sstables.hh:34, from ./db/size_estimates_virtual_reader.hh:38, from db/system_keyspace.cc:77: seastar/include/seastar/core/shared_ptr_incomplete.hh: In instantiation of ‘static T seastar::internal::lw_shared_ptr_accessors<T, seastar::internal::void_t<decltype (seastar::lw_shared_ptr_deleter<T>{})> >::to_value(seastar::lw_shared_ptr_counter_base) [with T = user_types_metadata]’: seastar/include/seastar/core/shared_ptr.hh:243:51: required from ‘static void seastar::internal::lw_shared_ptr_accessors<T, seastar::internal::void_t<decltype (seastar::lw_shared_ptr_deleter<T>{})> >::dispose(seastar::lw_shared_ptr_counter_base) [with T = user_types_metadata]’ seastar/include/seastar/core/shared_ptr.hh:300:31: required from ‘seastar::lw_shared_ptr<T>::~lw_shared_ptr() [with T = user_types_metadata]’ ./database.hh:1004:7: required from ‘static void seastar::internal::lw_shared_ptr_accessors_no_esft<T>::dispose(seastar::lw_shared_ptr_counter_base) [with T = keyspace_metadata]’ seastar/include/seastar/core/shared_ptr.hh:300:31: required from ‘seastar::lw_shared_ptr<T>::~lw_shared_ptr() [with T = keyspace_metadata]’ ./db/size_estimates_virtual_reader.hh:233:67: required from here seastar/include/seastar/core/shared_ptr_incomplete.hh:38:12: error: invalid static_cast from type ‘seastar::lw_shared_ptr_counter_base’ to type ‘user_types_metadata’ return static_cast<T>(counter); ^~~~~~~~~~~~~~~~~~~~~~~~ [131/415] CXX build/release/distributed_loader.o Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:45:25 +01:00
Piotr Jastrzebski	e92b4c3dbc	Move user_type_impl out of types.hh to types/user.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:04:04 +01:00
Rafael Ávila de Espíndola	f7d1dc16d4	database: Use nop_large_partition_handler to avoid self-reporting Currently nop_large_partition_handler is only used in tests, but it can also be used avoid self-reporting. Tests: unit(Release) I also tested starting scylla with --compaction-large-partition-warning-threshold-mb=0. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190123205059.39573-1-espindola@scylladb.com>	2019-01-23 21:11:21 +00:00
Avi Kivity	4882f29f82	Merge "Detemplatize primary key restrictions" from Piotr " This series is a first small step towards rewriting CQL restrictions layer. Primary key restrictions used to be a template that accepts either partition_key or clustering_key, but the implementation is already based on virtual inheritance, so in multiple cases these templates need specializations. Refs #3815 " * 'detemplatize_primary_key_restrictions_2' of https://github.com/psarna/scylla: cql3: alias single_column_primary_key_restrictions cql3: remove KeyType template from statement_restrictions cql3: remove template from primary_key_restrictions cql3: remove forwarding_primary_key_restrictions	2019-01-23 17:43:03 +02:00
Piotr Sarna	9982587bea	cql3: alias single_column_primary_key_restrictions In preparation for detemplatizing this class, it's aliased with single_column_partition_key restrictions and single_column_clustering_key_restrictions accordingly.	2019-01-23 17:43:03 +02:00
Piotr Sarna	4663094474	cql3: remove KeyType template from statement_restrictions The code is unfolded into serving partition and clustering key cases separately instead of overloading a template.	2019-01-23 17:43:03 +02:00
Piotr Sarna	4bd0cb8dd9	cql3: remove template from primary_key_restrictions Partition key restrictions and clustering key restrictions currently require virtual function specializations and have lots of distinct code, so there's no value in having primary_key_restrictions<KeyType> template.	2019-01-23 17:43:03 +02:00
Piotr Sarna	bdd8566ea3	cql3: remove forwarding_primary_key_restrictions I presume this header was created during code translation from C*, but it's not used or included anywhere.	2019-01-23 17:43:03 +02:00
Avi Kivity	c83ae62aed	build: fix libdeflate object file corruption during parallel build libdeflate's build places some object files in the source directory, which is shared between the debug and release build. If the same object file (for the two modes) is written concurrently, or if one more reads it while the other writes it, it will be corrupted. Fix by not building the executables at all. They aren't needed, and we already placed the libraries' objects in the build directory (which is unshared). We only need the libraries anyway. Fixes #4130. Branches: master, branch-3.0 Message-Id: <20190123145435.19049-1-avi@scylladb.com>	2019-01-23 15:32:17 +00:00
Nadav Har'El	76f1fcc346	cql3: really ensure retrieval of columns for filtering Commit `fd422c954e` aimed to fix issue #3803. In that issue, if a query SELECTed only certain columns but did filtering (ALLOW FILTERING) over other unselected columns, the filtering didn't work. The fix involved adding the columns being filtered to the set of columns we read from disk, so they can be filtered. But that commit included an optimization: If you have clustering keys c1 and c2, and the query asks for a specific partition key and c1 < 3 and c2 > 3, the "c1 < 3" part does NOT need to be filtered because it is already done as a slice (a contiguous read from disk). The committed code erroneously concluded that both c1 and c2 don't need to be filtered, which was wrong (c2 does need to be read and filtered). In this patch, we fix this optimization. Previously, we used the "prefix length", which in the above example was 2 (both c1 and c2 were filtered) but we need a new and more elaborate function, num_prefix_columns_that_need_not_be_filtered(), to determine we can only skip filtering of 1 (c1) and cannot skip the second. Fixes #4121. This patch also adds a unit test to confirm this. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190123131212.6269-1-nyh@scylladb.com>	2019-01-23 15:24:30 +02:00
Avi Kivity	835ad406de	tools: toolchain: update docker build command to include --no-cache If docker sees the Dockerfile hasn't changed it may reuse an old image, not caring that context files and dependent images have in fact changed. This can happen for us if install-dependencies.sh or the base Fedora image changed. To make sure we always get a correct image, add --no-cache to the build command. Message-Id: <20190122185042.23131-1-avi@scylladb.com>	2019-01-23 10:47:40 +01:00
Glauber Costa	5d754c1d11	install-dependencies.sh: add packages that will be needed by scylla-python3 Done in a separate step so we can update the toolchain first. dnf-utils is used to bring us repoquery, which we will use to derive the list of files in the python packages. patchelf is needed so we can add a DT_RUNPATH section to the interpreter binary. the python modules, as well as the python3 interpreter are taken from the current RPM spec file. Signed-off-by: Glauber Costa <glauber@scylladb.com> [avi: regenerate frozen toolchain image] Message-Id: <20190123011751.14440-1-glauber@scylladb.com>	2019-01-23 10:53:10 +02:00
Avi Kivity	c1dd04986b	Merge "Prepare for the switch to CMake-ified Seastar" from Jesse " This series prepares for the integration of the `master` branch of Seastar back into Scylla. A number of changes to the existing build are necessary to integrate Seastar correctly, and these are detailed in the individual change messages. I tested with and without DPDK, in release and debug mode. The actual switch is a separate patch. " * 'jhk/seastar_cmake/v4' of https://github.com/hakuch/scylla: build: Fix link order for DPDK tests: Split out `sstable_datafile_test` build: Remove unnecessary inclusion tests: Fix use-after-free errors in static vars build: Remove Seastar internals build: Only use Seastar flags from pkg-config build: Query Seastar flags using pkg-config build: Change parameters for `pkg_config` function	2019-01-23 10:33:00 +02:00
Duarte Nunes	88c7c1e851	Merge 'hinted handoff: cache cf mappings' from Vlad " Cache cf mappings when breaking in the middle of a segment sending so that the sender has them the next time it wants to send this segment for where it left off before. Also add the "discard" metric so that we can track hints that are being discarded in the send flow. " Fixes #4122 * 'hinted_handoff_cache_cf_mappings-v1' of https://github.com/vladzcloudius/scylla: hinted handoff: cache column family mappings for segments that were not sent out in full hinted handoff: add a "discarded" metric	2019-01-23 00:44:41 +00:00
Jesse Haber-Kucharsky	3d79bd25b2	build: Fix link order for DPDK Without this change, DPDK libraries will not be linked to Scylla correctly when we switch to the new pkg-config support in Seastar.	2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky	cfb1492a6e	tests: Split out `sstable_datafile_test` Each `*_test.cc` file must be compiled separately so that there is only one definition of `main`. This change correctly defines an independent `sstable_datafile_test` from `sstable_datafile_test.cc` and adds that test to the existing suite.	2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky	02dd7bcc82	build: Remove unnecessary inclusion	2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky	2a62550002	tests: Fix use-after-free errors in static vars Without these two variables being declared as TLS, executing these two tests in "debug" mode fail AddressSanitizer's checks.	2019-01-22 18:24:52 -05:00
Jesse Haber-Kucharsky	88cc43d5e0	build: Remove Seastar internals We don't need to re-specify Seastar internals in Scylla's build, since everything private to Seastar is managed via pkg-config. We can eliminate all references to ragel and generated ragel header files from Seastar. We can also simplify the dependence on generated Seastar header files by ensuring that all object files depend on Seastar being built first.	2019-01-22 18:24:38 -05:00
Jesse Haber-Kucharsky	4f44e143be	build: Only use Seastar flags from pkg-config Some Seastar-specific flags were manually specified as Ninja rules, but we want to rely exclusively on Seastar for its necessary flags. The pkg-config file generated by the latest version of Seastar is correct and allows us to do this, but the version generated by Scylla's current check-out of Seastar does not. Therefore, we have to manually adjust the pkg-config results temporarily until we update Seastar.	2019-01-22 18:24:38 -05:00
Jesse Haber-Kucharsky	8743cff59b	build: Query Seastar flags using pkg-config Previously, we manually parsed the pkg-config file. We now used pkg-config itself to get the correct build flags. This means that we will get the correct behavior for variable expansion, and fields like `Requires`, `Requires.private`, and `Libs.private`. Previously, these fields were ignored.	2019-01-22 18:24:38 -05:00
Vlad Zolotarov	34829b8f81	hinted handoff: cache column family mappings for segments that were not sent out in full We will try to send a particular segment later (in 1s) from the place where we left off if it wasn't sent out in full before. However we may miss some of column family mappings when we get back to sending this file and start sending from some entry in the middle of it (where we left off) if we didn't save column family mappings we cached while reading this segment from its begining. This happens because commitlog doesn't save a column family information in every entry but rather once for each uniq column family (version) per "cycle" (see commitlog::segment description for more info). Therefore we have to assume that a particular column family mapping appears once in the whole segment (worst case). And therefore, when we decide to resume sending a segment we need to keep the column family mappings we accumulated so far and drop them only after we are done with this particular segment (sent it out in full). Fixes #4122 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-01-22 15:24:22 -05:00
Vlad Zolotarov	4516a8cfc4	hinted handoff: add a "discarded" metric Account the amount of hints that were discarded in the send path. This may happen for instance due to a schema change or because a hint being to old. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-01-22 14:11:09 -05:00
Avi Kivity	fa0312d0f2	Merge "Support 64-bit gc_clock" from Benny " wrap around on 2038-01-19 03:14:07 UTC. Such dates are valid deletion times starting 2018-01-19 with the 20 years long maximum ttl. This patchset extends gc_clock::duration::rep to int64_t and adds respective unit tests for the max_ttl cases. Fixes #3353 Tests: unit (release) " * 'projects/gc_clock_64/v2' of https://github.com/bhalevy/scylla: tests: cql_query_test add test_time_overflow gc_clock: make 64 bit sstables: mc: use int64_t for local_deletion_time and ttl sstables: add capped_tombstone_deletion_time stats counter sstables: mc: cap partition tombstone local_deletion_time to max sstables: add capped_local_deletion_time stats counter sstables: mc: metadata collector: cap local_deletion_time at max sstables: mc: use proper gc_clock types for local_deletion_time and ttl db: get default_time_to_live as int32_t rather than gc_clock::rep sstables: safely convert ttl and local_deletion_time to int32_t sstables: mc: move liveness_info initialization to members sstables: mc: move parsing of liveness_info deltas to data_consume_rows_context_m sstables: mc: define expired_liveness_ttl as signed int32_t sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time	2019-01-22 18:21:55 +02:00
Glauber Costa	54bc0ce70d	scylla_setup: make sure it works (again) in interactive mode Commit `019a2e3a27` marked some arguments as required, which improved the usability of scylla_setup. The problem is that when we call scylla_setup in interactive mode, no argument should be required. After the aforementioned commit scylla_setup will either complain that the required arguments were not passed if zero arguments are present, or skip interactive mode if one of the mandatory ones is present. This patch fixes that by checking whether or not we were invoked with no command line arguments and lifting the requirements for mandatory arguments in that case. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190122003621.11156-1-glauber@scylladb.com>	2019-01-22 16:54:55 +02:00
Benny Halevy	7d0854a1e5	tests: cql_query_test add test_time_overflow Test 32-bit time overflow scenarios. Fails without "gc_clock: make 64 bit". Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	93270dd8e0	gc_clock: make 64 bit Fixes: #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	1ccd72f115	sstables: mc: use int64_t for local_deletion_time and ttl In preparation for changing gc_clock::duration::rep to int64_t. Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	427d6e6090	sstables: add capped_tombstone_deletion_time stats counter Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	0ec46924bf	sstables: mc: cap partition tombstone local_deletion_time to max deletion_time struct as int32_t deletion_time that cannot hold long time values. Cap local_deletion_time to max_local_deletion_time and log a warning about that, This corresponds to Cassandra's MAX_DELETION_TIME. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	156f9ffa11	sstables: add capped_local_deletion_time stats counter Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	7609a04565	sstables: mc: metadata collector: cap local_deletion_time at max max local_deletion_time_tracker in stats is int32_t so just track the limit of (max int32_t - 1) if time_point is greater than the limit. This corresponds to Cassandra's MAX_DELETION_TIME. Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	bd6861989d	sstables: mc: use proper gc_clock types for local_deletion_time and ttl Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	9878b36895	db: get default_time_to_live as int32_t rather than gc_clock::rep Otherwise, value_cast<> throws std::bad_cast exception when gc_clock::rep is defined as int64_t. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	33314cec3f	sstables: safely convert ttl and local_deletion_time to int32_t Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	9a00c5a763	sstables: mc: move liveness_info initialization to members Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	0aba922b6d	sstables: mc: move parsing of liveness_info deltas to data_consume_rows_context_m To be consistent with other calls to parse_* methods there. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	6465a673f5	sstables: mc: define expired_liveness_ttl as signed int32_t Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	c4c2133e3e	sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time mc format only writes delta local_deletion_time of tombstones. Conventional deletion_time is written only for the partition header. Restructure the code to pass a tombstone to write_delta_deletion_time rather than struct deletion_time to prepare for using 64-bit deletion times. The tombstone uses gc_clock::time_point while struct deletion_time is limited to int32_t local_deletion_time. Note that for "live" tombstones we encode <api::missing_timestamp, no_deletion_time> as was previously evaluated by to_deletion_time(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	820906b794	sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Tomasz Grabiec	dbc1894bd5	lsa: Avoid unnecessary compact_and_evict_locked() When the reclaim request was satisfied from the pool there's no need to call compact_and_evict_locked(). This allows us to avoid calling boost::range::make_heap(), which is a tiny performance difference, as well as some confusing log messages. Message-Id: <1548091941-8534-1-git-send-email-tgrabiec@scylladb.com>	2019-01-21 20:19:20 +02:00
Jesse Haber-Kucharsky	72da3283b9	build: Change parameters for `pkg_config` function We can invoke pkg-config with multiple options, and we specify the package name first since this is the "target" of the pkg-config query. Supporting multiple options is necessary for querying Seastar's pkg-config file with `--static`, which we anticipate in a future change.	2019-01-21 11:38:25 -05:00
Glauber Costa	ca997b5f60	scylla_setup: warn users on the severity of answering no to IOTUne The system won't work properly if IOTune is not run. While it is fair to skip this step because it takes long-- indeed, it is common to provision io.conf manually to be able to skip this step, first time users don't know this and can have the impression that this is just a totally optional step. Except the node won't boot up without it. As a user nicely put recently in our mailing list: "...in this case, it would be even simpler to forbid answering "no" to this not-so-optional step :)" We should not forbid saying no to IOTune, but we should warn the user about the consequences of doing so. Fixes #4120 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190121144506.17121-1-glauber@scylladb.com>	2019-01-21 16:55:50 +02:00
Botond Dénes	4e89dea9ea	database: don't allow access to global semaphores Recently we had a bug (#4096) due to a component (`multishard_mutation_query()`) assuming that all reads used the semaphore obtainable via `database::user_read_concurrency_sem()`. This problem revealed that it is plain wrong to allow access to the shard-global semaphores residing in the database object. Instead all code wishing to access the relevant semaphore for some read, should do so via the relevant `table` object, thus guaranteeing that it will get the correct semaphore, configured for that table. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>	2019-01-21 16:29:02 +02:00
Piotr Sarna	5d76a635ca	distributed_loader: migrate flush_upload_dir to thread Flushing upload dir code suffers from overcomplication, so in order to make it a little bit simpler, it's moved to threaded context. Refs #4118 Message-Id: <232cca077bae7116cfa87de9c9b4ba60efc2a01d.1548077720.git.sarna@scylladb.com>	2019-01-21 15:48:17 +02:00
Gleb Natapov	85cb09294e	storage_service: do not start thrift and cql servers if a node is isolated due to errors Scylla starts doing IO much earlier that it starts cql/thrift servers. The IO may cause an error that will try stop all servers, but since they are still not running it will do nothing, but servers will be started later. Fix it by checking that the node is not isolated before starting servers. Message-Id: <20190110152830.GE3172@scylladb.com>	2019-01-21 13:04:23 +00:00
Tomasz Grabiec	e02baabd62	tests: perf_fast_forward: Introduce --with-compression option Message-Id: <1547819062-4369-1-git-send-email-tgrabiec@scylladb.com>	2019-01-21 12:18:31 +00:00
Botond Dénes	ff2884f25b	Revert "partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges" A much simpler and more complete fix was found. Let's revert this before applying the simpler fix. This reverts commit `7049cd9374`.	2019-01-21 13:56:56 +02:00
Botond Dénes	f229dff210	auth/service: unregister migration listener on stop() Otherwise any event that triggers notification to this listener would trigger a heap-use-after-free. Refs: #4107 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <b6bbd609371a2312aed7571b05119d59c7d103d7.1548067626.git.bdenes@scylladb.com>	2019-01-21 13:06:59 +02:00
Tomasz Grabiec	d7c701d2d1	Merge "Type-erase gratuitous templates with functions" from Avi Many area of the code are splattered with unneeded templates. This patchset replaces some of them, where the template parameter is a function object, with an std::function or noncopyable_function (with a preference towards the latter; but it is not always possible). As the template is compiled for each instantiation (if the function object is a lambda) while a function is compiled only once, there are significant savings in compile time and bloat. text data bss dec hex filename 85160690 42120 284910 85487720 5187068 scylla.before 84824762 42120 284910 85151792 5135030 scylla.after * https://github.com/avikivity/scylla detemplate/v2: api/commitlog: de-template acquire_cl_metric() database: de-template do_parse_schema_tables database: merge for_all_partitions and for_all_partitions_slow hints: de-template scan_for_hints_dirs() schema_tables: partially de-template make_map_mutation() distributed_loader: de-template tests: commitlog_test: de-template tests: cql_auth_query_test: de-template test: de-template eventually() and eventually_true() tests: flush_queue_test: de-template hint_test: de-template tests: mutation_fragment_test: de-template test: mutation_test: de-template	2019-01-21 11:32:22 +01:00
Avi Kivity	826cf90f3f	Merge "Restore mutating uploaded sstables to level 0" from Piotr " This miniseries fixes the behaviour of distributed loader, which now unconditionally mutates new sstables found in /upload dir to LCS level 0 first, and only after that proceeds with either queueing them for update generation or moving them to data directory. " * 'restore_always_mutating_sstables_level_0' of https://github.com/psarna/scylla: distributed_loader: restore indentation distributed_loader: restore always mutating to level 0	2019-01-20 20:32:15 +02:00
Benny Halevy	844a2de263	sstables: mc: prevent signed integer overflow Fix runtime error: signed integer overflow introduced by `2dc3776407` Delta-encoded values may wrap around if the encoded value is less than the base value. This could happen in two places: In the mc-format serialization header itself, where the base values are implicit Cassandra epoch time, and in the sstables data files, where the base values are taken from the encoding_stats (later written to the serialization_header). In these cases, when the calculation is done using signed integer/long we may see "runtime error: signed integer overflow" messages in debug mode (with -fsanitize=undefined / -fsanitize=signed-integer-overflow). Overflow here is expected and harmless since we do not gurantee that neither the base values in the serialization header are greater than or equal to Cassandra's epoch now that the delta-encoded values are always greater than or equal to the respective base values in the serialization header. To prevent these warnings, the subtraction/addition should be done with unsigned (two's complement) arithmetic and the result converted to the signed type. Note that to keep the code simple where possible, when also rely on implicit conversion of signed integers to unsigned when either one of added value is unsigned and the other is signed. Fixes: #4098 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190120142950.15776-1-bhalevy@scylladb.com>	2019-01-20 16:59:46 +02:00
Avi Kivity	1e5c09dbce	test: mutation_test: de-template Replace the with_column_family helper template with an ordinary funciton, to reduce code bloat.	2019-01-20 15:55:20 +02:00
Avi Kivity	28db56df13	tests: mutation_fragment_test: de-template The for_each_target() template is called four times, so making it a normal function reduces a lot of code generation.	2019-01-20 15:55:20 +02:00
Avi Kivity	401684503d	hint_test: de-template While cl_test is duplicated with commitlog_test, at least deduplicate it internally by converting it to an ordinary function.	2019-01-20 15:55:20 +02:00
Avi Kivity	208b0f80a4	tests: flush_queue_test: de-template The internal test_propagation template is instantiated many times. Replace with an oridinary function to reduce bloat. Call sites adjusted to have a uniform signature.	2019-01-20 15:55:20 +02:00
Avi Kivity	2f36d30572	test: de-template eventually() and eventually_true() These templates are not trivial and called many times. De-template them to reduce code bloat.	2019-01-20 15:55:20 +02:00
Avi Kivity	96a8eacc3c	tests: cql_auth_query_test: de-template Replace the with_user() and verify_unauthorized_then_ok() templates with functions.	2019-01-20 15:55:20 +02:00
Avi Kivity	e0b0e18234	tests: commitlog_test: de-template The cl_test function is called many times, so its contents are bloat. De-template it so it is compiled only once.	2019-01-20 15:55:20 +02:00
Avi Kivity	baf9480c8d	distributed_loader: de-template distributed_loader has several large templates that can be converted to normal function with the help of noncopyable_function<>, reducing code bloat. One of the lambdas used as an actual argument was adjusted, because the de-templated callee only accepts functions returning a future, while the original accepted both functions returning a future and functions returning void (similar to future::then).	2019-01-20 15:55:20 +02:00
Avi Kivity	e0914a080e	schema_tables: partially de-template make_map_mutation() make_map_mutation() is called several times, hopfully with the same Map type parameter. Replace the Func parameter with a noncopyable_function<>.	2019-01-20 15:55:20 +02:00
Avi Kivity	630f841e5b	hints: de-template scan_for_hints_dirs() This function is called twice, and is not doing anything performance critical, so replace the template parameter Func with std::function<>.x	2019-01-20 15:55:20 +02:00
Avi Kivity	fae4c6c0b6	database: merge for_all_partitions and for_all_partitions_slow for_all_partitions is only used in the implementation of for_all_partitions_slow, so merge them and get rid of a template.	2019-01-20 15:55:20 +02:00
Avi Kivity	9858395c3e	database: de-template do_parse_schema_tables This long slow-path function is called four times, so de-templating it is an easy win. We use std::function instead of noncopyable_function because the function is copied within the parallel_for_each callback. The original code uses a move, which is incorrect, but did not fail because moving the lambdas that were used as the actual arguments is equivalent to a copy.	2019-01-20 15:55:18 +02:00
Tomasz Grabiec	c422bfc2c5	tests: perf_fast_forward: Store results for each dataset in separate sub-directory Otherwise read test results for subsequent datasets will override each other. Also, rename population test case to not include dataset name, which is now redundant. Message-Id: <1547822942-9690-1-git-send-email-tgrabiec@scylladb.com>	2019-01-20 15:38:46 +02:00
Botond Dénes	7049cd9374	partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges When entering a new ck range (of the partition-slice), the partition snapshot reader will apply to its range tombstones stream all the tombstones that are relevant to the new ck range. When the partition has range tombstones that overlap with multiple ck ranges, these will be applied to the range tombstone stream when entering any of the ck ranges they overlap with. This will result in the violation of the monotonicity of the mutation fragments emitted by the reader, as these range tombstones will be re-emitted on each ck range, if the ck range has at least one clustering row they apply to. For example, given the following partition: rt{[1,10]}, cr{1}, cr{2}, cr{3}... And a partition-slice with the following ck ranges: [1,2], [3, 4] The reader will emit the following fragment stream: rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ... Note how the range tombstone is emitted twice. In addition to violating the monotonicity guarantee, this can also result in an explosion of the number of emitted range tombstones. Fix by applying only those range tombstones to the range tombstone stream, that have a position strictly greater than that of the last emitted clustering row (or range tombstone), when entering a new ck range. Fixes: #4104 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <e047af76df75972acb3c32c7ef9bb5d65d804c82.1547916701.git.bdenes@scylladb.com>	2019-01-20 15:38:04 +02:00
Paweł Dziepak	14757d8a83	types: collection_type: drop tombstone if covered by higher-level one At the moment are inefficiencies in how collection_type_impl::mutation::compact_and_expire( handles tombstones. If there is a higher-level tombstone that covers the collection one (including cases where there is no collection tombstone) it will be applied to the collection tombstone and present in the compaction output. This also means that the collection tombstone is never dropped if fully covered by a higher-level one. This patch fixes both those problems. After the compaction the collection tombstone is either unchanged or removed if covered by a higher-level one. Fixes #4092. Message-Id: <20190118174244.15880-1-pdziepak@scylladb.com>	2019-01-20 15:32:34 +02:00
Avi Kivity	e51ef95868	Update seastar submodule * seastar af6b797...7d620e1 (1): > perftune.py: don't let any exception out when connecting to AWS meta server Fixes #4102.	2019-01-20 13:59:09 +02:00
Avi Kivity	32e79fc23b	api/commitlog: de-template acquire_cl_metric() Use std::function instead of a template parameter. Likely doesn't gain anyting, because the template was always instantiated with the same type (the result of std::bind() with the same signatures), but still good practice. std::function was used instead of noncopyable_function because sharded::map_reduce0() copies the input function.	2019-01-20 11:58:39 +02:00
Avi Kivity	6e6372e8d2	Revert "Merge "Type-eaese gratuitous templates with functions" from Avi" This reverts commit `31c6a794e9`, reversing changes made to `4537ec7426`. It causes bad_function_calls in some situations: INFO 2019-01-20 01:41:12,164 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708 INFO 2019-01-20 01:41:13,952 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema) INFO 2019-01-20 01:41:13,958 [shard 0] legacy_schema_migrator - Dropping legacy schema tables INFO 2019-01-20 01:41:14,702 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables ERROR 2019-01-20 01:41:14,999 [shard 0] seastar - Exiting on unhandled exception: std::bad_function_call (bad_function_call)	2019-01-20 11:32:14 +02:00
Paweł Dziepak	e212d37a8a	utils/small_vector: fix leak in copy assignment slow path Fixes #4105. Message-Id: <20190118153936.5039-1-pdziepak@scylladb.com>	2019-01-18 17:49:46 +02:00
Paweł Dziepak	23cfb29fea	Merge "compaction: mc: re-calculate encoding_stats" from Benny " Use input sstables stats metadata to re-calculate encoding_stats. Fixes #3971. " * 'projects/compaction-encoding-stats/v3' of https://github.com/bhalevy/scylla: compaction: mc: re-calculate encoding_stats based on column stats memtable: extract encoding_stats_collector base class to encoding_stats header file	2019-01-18 14:36:17 +00:00
Tomasz Grabiec	7308effb45	tests: flat_mutation_reader_test: Drop unneeded includes Message-Id: <1547819118-4645-1-git-send-email-tgrabiec@scylladb.com>	2019-01-18 13:58:05 +00:00
Tomasz Grabiec	6461e085fe	managed_bytes: Fix compilation on gcc 8.2 The compilation fails on -Warray-bounds, even though the branch is never taken: inlined from ‘managed_bytes::managed_bytes(bytes_view)’ at ./utils/managed_bytes.hh:195:22, inlined from ‘managed_bytes::managed_bytes(const bytes&)’ at ./utils/managed_bytes.hh:162:77, inlined from ‘dht::token dht::bytes_to_token(bytes)’ at dht/random_partitioner.cc:68:57, inlined from ‘dht::token dht::random_partitioner::get_token(bytes)’ at dht/random_partitioner.cc:85:39: /usr/include/c++/8/bits/stl_algobase.h:368:23: error: ‘void* __builtin_memmove(void, const void, long unsigned int)’ offset 16 from the object at ‘<anonymous>’ is out of the bounds of referenced subobject ‘managed_bytes::small_blob::data’ with type ‘signed char [15]’ at offset 0 [-Werror=array-bounds] __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Work around by disabling the diagnostic locally. Message-Id: <1547205350-30225-1-git-send-email-tgrabiec@scylladb.com>	2019-01-18 13:48:05 +00:00
Tomasz Grabiec	31c6a794e9	Merge "Type-eaese gratuitous templates with functions" from Avi Many area of the code are splattered with unneeded templates. This patchset replaces some of them, where the template parameter is a function object, with an std::function or noncopyable_function (with a preference towards the latter; but it is not always possible). As the template is compiled for each instantiation (if the function object is a lambda) while a function is compiled only once, there are significant savings in compile time and bloat. text data bss dec hex filename 85160690 42120 284910 85487720 5187068 scylla.before 84824762 42120 284910 85151792 5135030 scylla.after * https://github.com/avikivity/scylla detemplate/v1: api/commitlog: de-template acquire_cl_metric() database: de-template do_parse_schema_tables database: merge for_all_partitions and for_all_partitions_slow hints: de-template scan_for_hints_dirs() schema_tables: partially de-template make_map_mutation() distributed_loader: de-template tests: commitlog_test: de-template tests: cql_auth_query_test: de-template test: de-template eventually() and eventually_true() tests: flush_queue_test: de-template hint_test: de-template tests: mutation_fragment_test: de-template test: mutation_test: de-template	2019-01-18 11:42:01 +01:00
Piotr Sarna	3d65eb5d4a	distributed_loader: restore indentation	2019-01-18 10:59:37 +01:00
Piotr Sarna	e50e9b5150	distributed_loader: restore always mutating to level 0 When introducing view update generation path for sstables in /upload directory, mutating these sstables was moved to regular path only. It was wrong, because sstables that need view updates generated from them may still need to be downgraded to LCS level 0, so they won't disrupt LCS assumptions after being loaded. Reported-by: Nadav Har'El <nyh@scylladb.com>	2019-01-18 10:35:20 +01:00
Avi Kivity	089931fb56	test: mutation_test: de-template Replace the with_column_family helper template with an ordinary funciton, to reduce code bloat.	2019-01-17 19:06:42 +02:00
Avi Kivity	53a3db9446	tests: mutation_fragment_test: de-template The for_each_target() template is called four times, so making it a normal function reduces a lot of code generation.	2019-01-17 19:05:48 +02:00
Avi Kivity	4a21de4592	hint_test: de-template While cl_test is duplicated with commitlog_test, at least deduplicate it internally by converting it to an ordinary function.	2019-01-17 19:03:31 +02:00
Avi Kivity	1f02fd3ff6	tests: flush_queue_test: de-template The internal test_propagation template is instantiated many times. Replace with an oridinary function to reduce bloat. Call sites adjusted to have a uniform signature.	2019-01-17 19:02:26 +02:00
Avi Kivity	63077501ed	test: de-template eventually() and eventually_true() These templates are not trivial and called many times. De-template them to reduce code bloat.	2019-01-17 19:00:55 +02:00
Avi Kivity	a5d3254ed3	tests: cql_auth_query_test: de-template Replace the with_user() and verify_unauthorized_then_ok() templates with functions. Some adjustments made to the call site to unify the signatures.	2019-01-17 18:59:30 +02:00
Avi Kivity	8c05debecb	tests: commitlog_test: de-template The cl_test function is called many times, so its contents are bloat. De-template it so it is compiled only once.	2019-01-17 18:57:35 +02:00
Avi Kivity	b6239134c2	distributed_loader: de-template distributed_loader has several large templates that can be converted to normal function with the help of noncopyable_function<>, reducing code bloat.	2019-01-17 18:56:22 +02:00
Avi Kivity	2407c35cc1	schema_tables: partially de-template make_map_mutation() make_map_mutation() is called several times, hopfully with the same Map type parameter. Replace the Func parameter with a noncopyable_function<>.	2019-01-17 18:54:43 +02:00
Avi Kivity	81d004b2c0	hints: de-template scan_for_hints_dirs() This function is called twice, and is not doing anything performance critical, so replace the template parameter Func with std::function<>.x	2019-01-17 18:51:46 +02:00
Avi Kivity	f61dbc9855	database: merge for_all_partitions and for_all_partitions_slow for_all_partitions is only used in the implementation of for_all_partitions_slow, so merge them and get rid of a template.	2019-01-17 18:50:36 +02:00
Avi Kivity	4568a4e4b0	database: de-template do_parse_schema_tables This long slow-path function is called four times, so de-templating it is an easy win.	2019-01-17 18:48:57 +02:00
Avi Kivity	08bd28942b	api/commitlog: de-template acquire_cl_metric() Use noncopyable_function instead of a template parameter. Likely doesn't gain anyting, because the template was always instantiated with the same type (the result of std::bind() with the same signatures), but still good practice.	2019-01-17 18:45:14 +02:00
Botond Dénes	4537ec7426	mutlishard_mutation_query(): use correct reader concurrency semaphore The multishard mutation query used the semaphore obtained from `database::user_read_concurrency_sem()` to pause-resume shard readers. This presented a problem when `multishard_mutation_query()` was reading from system tables. In this case the readers themselves would obtain their permits from the system read concurrency semaphore. Since the pausing of shard readers used the user read semaphore, pausing failed to fulfill its objective of alleviating pressure on the semaphore the reads obtained their permits from. In some cases this lead to a deadlock during system reads. To ensure the correct semaphore is used for pausing-resuming readers, obtain the semaphore from the `table` object. To avoid looking up the table on every pause or resume call, cache the semaphores when readers are created. Fixes: #4096 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c784a3cd525ce29642d7216fbe92638fa7884e88.1547729119.git.bdenes@scylladb.com>	2019-01-17 15:19:59 +02:00
Avi Kivity	8e9989685d	scyllatop: complete conversion to python3 `d2dbbba139` converted scyllatop's interperter to Python 3, but neglected to do the actual conversion. This patch does so, by running 2to3 over allfiles and adding an additional bytes->string decode step in prometheus.py. Superfluous 2to3 changes to print() calls were removed. Message-Id: <20190117124121.7409-1-avi@scylladb.com>	2019-01-17 12:50:25 +00:00
Duarte Nunes	7505815013	Merge 'Fix filtering with LIMIT and paging' from Piotr " Before this series the limit was applied per page instead of globally, which might have resulted in returning too many rows. To fix that: 1. restrictions filter now has a 'remaining' parameter in order to stop accepting rows after enough of them have already been accepted 2. pager passes its row limit to restrictions filter, so no more rows than necessary will be served to the client 3. results no longer need to be trimmed on select_statement level Tests: unit (release) " * 'fix_filtering_limit_with_paging_3' of https://github.com/psarna/scylla: tests: add filtering+limit+paging test case tests: allow null paging state in filtering tests cql3: fix filtering with LIMIT with regard to paging	2019-01-17 12:50:00 +00:00
Piotr Sarna	ed7328613f	tests: add filtering+limit+paging test case A test case that checks whether a combination of paging and LIMIT clause for filtering queries doesn't return with too many rows. Refs #4100	2019-01-17 13:25:10 +01:00
Piotr Sarna	7d4f994e98	tests: allow null paging state in filtering tests Previously the utility to extract paging state asserted that the state exists, but in future tests it would be useful to be able to call this function even if it would return null.	2019-01-17 13:25:10 +01:00
Piotr Sarna	87c23372fb	cql3: fix filtering with LIMIT with regard to paging Previously the limit was erroneously applied per page instead of being accumulated, which might have caused returning too many rows. As of now, LIMIT is handled properly inside restrictions filter. Fixes #4100	2019-01-17 13:25:09 +01:00
Piotr Sarna	02d88de082	db,view: add consuming units in staging table registration View update generator service can accept sstables even before it starts, but it should still acknowledge the number of waiters in the semaphore. Reported-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <fcaa0f2884ebb4d34d1716e9e1cfed0642b4b85d.1547661048.git.sarna@scylladb.com>	2019-01-16 18:05:17 +00:00
Benny Halevy	1d483bc424	compaction: mc: re-calculate encoding_stats based on column stats When compacting several sstables, get and merge their encoding_stats for encoding the result. Introduce sstable::get_encoding_stats_for_compaction to return encoding_stats based on the sstable's column stats. Use encoding_stats_collector to keep track of the minimum encoding_stats values of all input sstables. Fixes #3971 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-16 17:59:59 +02:00
Benny Halevy	e2c4d2d60a	memtable: extract encoding_stats_collector base class to encoding_stats header file To be used also by compaction. Refs #3971 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-16 17:59:58 +02:00
Asias He	4b9e1a9f1d	repair: Add row level metrics Number of rows sent and received - tx_row_nr - rx_row_nr Bytes of rows sent and received - tx_row_bytes - rx_row_bytes Number of row hashes sent and received - tx_hashes_nr - rx_hashes_nr Number of rows read from disk - row_from_disk_nr Bytes of rows read from disk - row_from_disk_bytes Message-Id: <d1ee6b8ae8370857fe45f88b6c13087ea217d381.1547603905.git.asias@scylladb.com>	2019-01-16 14:04:57 +02:00
Duarte Nunes	04a14b27e4	Merge 'Add handling staging sstables to /upload dir' from Piotr " This series adds generating view updates from sstables added through /upload directory if their tables have accompanying materialized views. Said sstables are left in /upload directory until updates are generated from them and are treated just like staging sstables from /staging dir. If there are no views for a given tables, sstables are simply moved from /upload dir to datadir without any changes. Tests: unit (release) " * 'add_handling_staging_sstables_to_upload_dir_5' of https://github.com/psarna/scylla: all: rename view_update_from_staging_generator distributed_loader: fix indentation service: add generating view updates from uploaded sstables init: pass view update generator to storage service sstables: treat sstables in upload dir as needing view build sstables,table: rename is_staging to requires_view_building distributed_loader: use proper directory for opening SSTable db,view: make throttling optional for view_update_generator	2019-01-15 18:19:27 +00:00
Duarte Nunes	9b79f0f58b	Merge 'Add stream phasing' from Piotr " This series addresses the problem mentioned in issue 4032, which is a race between creating a view and streaming sstables to a node. Before this patch the following scenario is possible: - sstable X arrives from a streaming session - we decide that view updates won't be generated from an sstable X by the view builder - new view is created for the table that owns sstable X - view builder doesn't generate updates from sstable X, even though the table has accompanying views - which is an inconsistency This race is fixed by making the view builder wait for all ongoing streams, just like it does for reads and writes. It's implemented with a phaser. Tests: unit (release) dtest(not merged yet: materialized_views_test.TestMaterializedViews.stream_from_repair_during_build_process_test) " * 'add_stream_phasing_2' of https://github.com/psarna/scylla: repair: add stream phasing to row level repair streaming: add phasing incoming streams multishard_writer: add phaser operation parameter view: wait for stream sessions to finish before view building table: wait for pending streams on table::stop database: add pending streams phaser	2019-01-15 18:18:40 +00:00
Piotr Sarna	0eb703dc80	all: rename view_update_from_staging_generator The new name, view_update_generator, is both more concise and correct, since we now generate from directories other than "/staging".	2019-01-15 17:31:47 +01:00
Piotr Sarna	a5d24e40e0	distributed_loader: fix indentation Bad indentation was introduced in the previous commit.	2019-01-15 17:31:37 +01:00
Piotr Sarna	13c8c84045	service: add generating view updates from uploaded sstables SSTables loaded to the system via /upload dir may sometimes be needed to generate view updates from them (if their table has accompanying views). Fixes #4047	2019-01-15 17:31:37 +01:00
Piotr Sarna	46305861c3	init: pass view update generator to storage service Storage service needs to access view update generator in order to register staging sstables from /upload directory.	2019-01-15 17:31:36 +01:00
Piotr Sarna	13f6453350	sstables: treat sstables in upload dir as needing view build In some cases, sstables put in the upload dir should have view updates generated from them. In order to avoid moving them across directories (which then involves handling failure paths), upload dir will also be treated as a valid directory where staging sstables reside. Regular sstables that are not needed for view updates will be immediately moved from upload/ dir as before.	2019-01-15 16:47:01 +01:00
Piotr Sarna	09401e0e71	sstables,table: rename is_staging to requires_view_building A generalized name will be more fitting once we treat uploaded sstables as requiring view building too.	2019-01-15 16:47:01 +01:00
Piotr Sarna	76616f6803	distributed_loader: use proper directory for opening SSTable Previous implementation assumes that each SSTable resides directly in table::datadir directory, while what should actually be used is directory path from SSTable descriptor. This patch prevents a regression when adding staging sstables support for upload/ dir.	2019-01-15 16:47:01 +01:00
Piotr Sarna	beb4836726	db,view: make throttling optional for view_update_generator Currently registering new view updates is throttled by a semaphore, which makes sense during stream sessions in order to avoid overloading the queue. Still, registration also occurs during initialization, where it makes little sense to wait on a semaphore, since view update generator might not have started at all yet.	2019-01-15 16:47:01 +01:00
Paweł Dziepak	635873639b	Merge "Encoding stats enhancements" from Benny " Cleanup various cases related to updating of metatdata stats and encoding stats updating in preparation for 64-bit gc_clock (#3353). Fixes #4026 Fixes #4033 Fixes #4035 Fixes #4041 Refs #3353 " * 'projects/encoding-stats-fixes/v6' of https://github.com/bhalevy/scylla: sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES sstables: mc: use api::timestamp_type in write_liveness_info sstables: mc: sstable_write encoding_stats are const mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time memtable: don't use encoding_stats epochs as default memtable: mc: udpate min_ttl encoding stats for dead row marker memtable: mc: add comment regarding updating encoding stats of collection tombstones sstables: metadata_collector: add update tombstone stats sstables: assert that delete_time is not live when updating stats sstables: move update_deletion_time_stats to metadata collector sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram sstables: mc: write_liveness_info and write_collection should update tombstone_histogram sstables: update_local_deletion_time for row marker deletion_time and expiration	2019-01-15 16:53:36 +02:00
Tomasz Grabiec	32f711ce56	row_cache: Fix crash on memtable flush with LCS Presence checker is constructed and destroyed in the standard allocator context, but the presence check was invoked in the LSA context. If the presence checker allocates and caches some managed objects, there will be alloc-dealloc mismatch. That is the case with LeveledCompactionStrategy, which uses incremental_selector. Fix by invoking the presence check in the standard allocator context. Fixes #4063. Message-Id: <1547547700-16599-1-git-send-email-tgrabiec@scylladb.com>	2019-01-15 16:53:36 +02:00
Piotr Sarna	08a42d47a5	repair: add stream phasing to row level repair In order to allow other services to wait for incoming streams to finish, row level repair uses stream phasing when creating new sstables from incoming data. Fixes scylladb#4032	2019-01-15 10:28:21 +01:00
Piotr Sarna	7e61f02365	streaming: add phasing incoming streams Incoming streams are now phased, which can be leveraged later to wait for all ongoing streams to finish. Refs #4032	2019-01-15 10:28:15 +01:00
Asias He	1cc7e45f44	database: Make log max_vector_size and internal_count debug level It is useful for developers but not useful for users. Make it debug level. Message-Id: <775ce22d6f8088a44d35601509622a7e73ddeb9b.1547524976.git.asias@scylladb.com>	2019-01-15 11:02:30 +02:00
Piotr Sarna	238003b773	multishard_writer: add phaser operation parameter Multishard writer can now accept a phaser operation parameter in order to sustain a phased operation (e.g. a streaming session).	2019-01-15 10:02:22 +01:00
Piotr Sarna	b9203ec4f8	view: wait for stream sessions to finish before view building During streaming, there's a race between streamed sstables and view creation, which might result in some tables not being used to generate view updates, even though they should. That happens when the decision about view update path for a table is done before view creation, but after already receiving some sstables via streaming. These will not be used in view building even though they should. Hence, a phaser is used to make the view builder wait for all ongoing stream sessions for a table to finish before proceeding with build steps. Refs #4032	2019-01-15 09:36:55 +01:00
Piotr Sarna	d3a8fb378c	table: wait for pending streams on table::stop Stream sessions are now phased, so it's possible to wait for existing streams to finish gently before stopping a table.	2019-01-15 09:36:55 +01:00
Piotr Sarna	8a5aaf2839	database: add pending streams phaser This phaser will be used later to wait for all existing stream sessions to finish before proceeding with view building.	2019-01-15 09:36:55 +01:00
Nadav Har'El	9062750089	scylla_util.py: make view_hints_directory setting optional It is optional to set "view_hints_directory", so we shouldn't insist that it is defined in scylla.yaml on upgrade. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190114125225.10794-1-nyh@scylladb.com>	2019-01-14 14:59:20 +02:00
Benny Halevy	238866228f	memtable: rename get_stats to get_encoding_stats For symmetry reasons to similar sstable and compaction methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190113105155.29118-2-bhalevy@scylladb.com>	2019-01-14 14:58:43 +02:00
Avi Kivity	df090a15ff	Merge "Add counters for inactive reads" from Botond " This mini-series adds counters for the inactive reads registered in the reader concurrency semaphore. " * 'reader-concurrency-semaphore-counters/v6' of https://github.com/denesb/scylla: tests/querier_cache: use stats to get the no. of inactive reads reader_concurrency_semaphore: add counters for inactive reads	2019-01-14 11:56:43 +02:00
Rafael Ávila de Espíndola	acd6999ba9	Don't use SEASTAR_HAVE_LZ4_COMPRESS_DEFAULT in scylla The existence of LZ4_compress_default is a property of the lz4 library, not seastar. With this patch scylla does its own configure check instead of depending on the one done by seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190114013737.5395-1-espindola@scylladb.com>	2019-01-14 11:51:20 +02:00
Rafael Ávila de Espíndola	684fb607c4	sstable: handle missing index entry This patch fixes a crash when the index file is corrupted and we get an empty index entry list. Tests: unit (release) Fixes: 2532 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190110202833.29333-1-espindola@scylladb.com>	2019-01-14 10:47:21 +01:00
Avi Kivity	f5ee466a1c	Merge "Cleanup UDT and tuple names creation" from Piotr " Currently the logic is scattered between types., cql3_types. and sstables/mc/writer.cc. This patchset places all the logic in types.* and makes sure we correctly add "frozen<...>" and "FrozenType(...)" to the names of tuples and UDTs. Fixes #4087 Tests: unit(release) " * 'haaawk/4087_v1' of github.com:scylladb/seastar-dev: Add comment explaining tuple type name creation Add "FrozenType(...)" to UDT name only when it's frozen Move "FrozenType(...)" addition to UDT name to user_type_impl Add "frozen<...>" to tuple CQL name only when it's frozen Move "frozen<...>" addition to tuple CQL name to tuple_type_impl Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type Add "frozen<...>" to UDT CQL name only when it's frozen Move "frozen<...>" addition to UDT CQL name to user_type_impl	2019-01-13 15:34:24 +02:00
Benny Halevy	b243852a70	sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	d9e2aa65fc	sstables: mc: use api::timestamp_type in write_liveness_info Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	7ea96aa778	sstables: mc: sstable_write encoding_stats are const Encoding stats are immutable once statistics are sealed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	5d2d2bf47a	mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time It is actually the local deletion time rather than the ttl Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	2c99eb28d8	memtable: don't use encoding_stats epochs as default Why default to an artificial minimum when you can do better with zero effort? Track the actual minima in the memtable instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	9b78911379	memtable: mc: udpate min_ttl encoding stats for dead row marker Update min ttl with expired_liveness_ttl (although it's value of max int32 is not expected to affect the minimum). Fixes #4041 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	47964d9ddc	memtable: mc: add comment regarding updating encoding stats of collection tombstones When the row flag has_complex_deletion is set, some collection columns may have deletion tombstones and some may not. we don't strictly need to update stats will not affect the encoding_stats anyway. Fixes #4035 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	75ccd29b6a	sstables: metadata_collector: add update tombstone stats Conditionally update timestamp and local_deletion_time stats based on tombstone Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	0ae85a126a	sstables: assert that delete_time is not live when updating stats Be compatible with Cassandra Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	12e6b503c9	sstables: move update_deletion_time_stats to metadata collector Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	2989b986ef	sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram Refs #4026 Refs #4033 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	bcb1fcd402	sstables: mc: write_liveness_info and write_collection should update tombstone_histogram Fixes #4033 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	0ca4ae658c	sstables: update_local_deletion_time for row marker deletion_time and expiration Fixes #4026 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Tomasz Grabiec	f12a3e2066	sstables: index_reader: Rename _promoted_index_size Message-Id: <1547219234-21182-2-git-send-email-tgrabiec@scylladb.com>	2019-01-13 11:29:13 +02:00
Tomasz Grabiec	6c5f8e0eda	sstables: index_reader: Simplify offset calculations Now that continuous_data_consumer::position() is meaningful (since `36dd660`), we can use our position in the stream to calculate offsets instead of duplicating state machine in offset calculations. The value of position() - data.size() always holds the current offset in the stream. Message-Id: <1547219234-21182-1-git-send-email-tgrabiec@scylladb.com>	2019-01-13 11:29:12 +02:00
Avi Kivity	0d52bdcbad	install-dependencies.sh: unwrap long lines Put package names one per line. This makes it easier to review changes, and to backport changes to this file. No content changes. Message-Id: <20190112091024.21878-1-avi@scylladb.com>	2019-01-12 14:23:27 +02:00
Avi Kivity	391d1e0fe0	table: const correctness for table::get_sstables() and related Do not allow write access to the sstable list via this accessor. Luckily there are no violations, and now we enforce it. Message-Id: <20190111151049.16953-1-avi@scylladb.com>	2019-01-11 17:39:17 +01:00
Rafael Ávila de Espíndola	cd9ce18874	sstable: rename the is_boundary predicate The new name makes it clear what is on either side of the boundary. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190110221324.33618-1-espindola@scylladb.com>	2019-01-11 14:36:49 +02:00
Piotr Jastrzebski	96b880f81c	Add comment explaining tuple type name creation To keep format compatibiliti we never wrap tuple type name into "org.apache.cassandra.db.marshal.FrozenType(...)". Even when the tuple is frozen. This patch adds a comment in tuple_type_impl::make_name that explains the situation. For more details see #4087 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:14:26 +01:00
Piotr Jastrzebski	57e655d716	Add "FrozenType(...)" to UDT name only when it's frozen At the moment Scylla supports only frozen UDTs but the code should be able to handle non-frozen UDTs as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:08:02 +01:00
Piotr Jastrzebski	fc17bd376b	Move "FrozenType(...)" addition to UDT name to user_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:07:47 +01:00
Piotr Jastrzebski	1fdfc461b8	Add "frozen<...>" to tuple CQL name only when it's frozen At the moment Scylla supports only frozen tuples but the code should be able to handle non-frozen tuples as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	749eee2711	Move "frozen<...>" addition to tuple CQL name to tuple_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	7aba17de2c	Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	56060573bb	Add "frozen<...>" to UDT CQL name only when it's frozen At the moment Scylla supports only frozen UDTs but the code should be able to handle non-frozen UDTs as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	a928c103c2	Move "frozen<...>" addition to UDT CQL name to user_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:09:00 +01:00
Raphael S. Carvalho	1b7cad3531	database: Fix race condition in sstable snapshot Race condition takes place when one of the sstables selected by snapshot is deleted by compaction. Snapshot fails because it tries to link a sstable that was previously unlinked by compaction's sstable deletion. Fixes #4051. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190110194048.26051-1-raphaelsc@scylladb.com>	2019-01-11 07:53:14 +02:00
Benny Halevy	2dc3776407	sstables: mc: sign-extend serialization_header min_local_deletion_time_base and min_ttl_base Refs #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190110141439.1324-1-bhalevy@scylladb.com>	2019-01-10 16:23:20 +02:00
Gleb Natapov	a29182b447	sstable: fix use after free while applying extensions in sstable::open_file sstable_file_io_extensions() return an array of pointers to extensions, but do_for_each() may defer and the array will be destroyed. The match keeps it alive until do_for_each completes. Message-Id: <20190110125656.GC3172@scylladb.com>	2019-01-10 15:10:06 +02:00
Avi Kivity	b247ce01c3	table: restore indentation after changes to table::make_sstable_reader Message-Id: <20190109175804.9352-2-avi@scylladb.com>	2019-01-10 13:00:53 +01:00
Avi Kivity	3d6be2f822	table: reduce duplication in table::make_sstable_reader make_sstable_reader needs to deal with single-key and scanning reads, and with restricting and non-restricting (in terms of read concurrency) readers. Right now it does this combinatorically - there are separate cases for restricting single-key reads, non-restricting single-key reads, restricing scans, and non-restricting scans. This makes further changes more complicated, so separate the two concepts. The patch splits the code into two stages; the first selects between a single-key and a scan, and the second selects between a restricting and non-restricting read. This slightly pessimizes non-restricting reads (a mutation_source is created and immediately destroyed), but that's not the common case. Tests: unit(release) Message-Id: <20190109175804.9352-1-avi@scylladb.com>	2019-01-10 13:00:40 +01:00
Benny Halevy	16dda033a5	sstables: row_marker: initialize _expiry compare_row_marker_for_merge compares deletion_time also for row markers that have missing timestamps. This happened to succeed due to implicit initialization to 0. However, we prefer the initialization to be explicit and allow calling row_marker::deletion_time() in all states. Fixes #4068 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190110102949.17896-1-bhalevy@scylladb.com>	2019-01-10 12:45:07 +01:00
Avi Kivity	4a6aeced59	Merge "Fix UDTs representation in serialization header" from Piotr " Tests: unit(release) " Fixes #4073. * commit 'FETCH_HEAD~1': Add test for serialization header with UDT Fix UDT names in serialization header	2019-01-10 12:57:11 +02:00
Piotr Jastrzebski	d4bc5b64cf	Add test for serialization header with UDT Serialization header stores column types for all columns in sstable. If any of them is a UDT then it has to be wrapped into "org.apache.cassandra.db.marshal.FrozenType(...)". This patch adds a test case to verify that. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-10 10:59:01 +01:00
Piotr Jastrzebski	3de85aebc9	Fix UDT names in serialization header Serialization header stores type names of all columns in a table. Including partition key columns, clustering key columns, static columns and regular columns. If one of those types is a user defined type then we need to wrap its name into "org.apache.cassandra.db.marshal.FrozenType(...)". Fixes #4073 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-10 10:58:30 +01:00
Benny Halevy	60323b79d1	sstables: mc: sign-extend delta local_deletion_time and delta ttl Follow Cassandra's encoding so that values that are less than the baseline encoding_stats will wrap-around in 64-bits rather tham 32. Fixes #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com>	2019-01-09 21:43:30 +02:00
Rafael Ávila de Espíndola	26ac2c23ef	Change _row_ names that refer to partitions This renames some variables and functions to make it clear that they refer to partitions and not rows. Old versions of sstablemetadata used to refer to a row histogram, but current versions now mention a partition histogram instead. This patch doesn't change the exposed API names. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181229223311.4184-2-espindola@scylladb.com>	2019-01-09 14:53:42 +02:00
Takuya ASADA	f00e9051ea	reloc: show error message when relocatable package doesn't exist Both build_rpm.sh/build_deb.sh are failing at beginning of the script when relocatable package does not exist, need to prevent it and show user friendly message. Fixes #4071 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190109094353.16690-1-syuu@scylladb.com>	2019-01-09 12:53:08 +02:00
Raphael S. Carvalho	f5301990fc	compaction: release reference of cleaned sstable in compaction manager Compaction manager holds reference to all cleaning sstables till the very end, and that becomes a problem because disk space of cleaned sstables cannot be reclaimed due to respective file descriptors opened. Fixes #3735. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181221000941.15024-1-raphaelsc@scylladb.com>	2019-01-08 14:14:01 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Rafael Ávila de Espíndola	51a08c3240	sstable: remove constexpr from run time predicates We never check these predicates at compile time. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190108010055.92042-1-espindola@scylladb.com>	2019-01-08 12:28:42 +02:00
Piotr Sarna	c5346cdf9b	database, table: split table-related code to table.cc All table:: related code is moved to table.cc source file, which splits database.cc size in half and thus allows faster compilation on multiple cores. Refs #1 Message-Id: <28e67f7793ff2147ffce18df5e0b077e14d3b8bd.1546940360.git.sarna@scylladb.com>	2019-01-08 12:02:42 +02:00
Avi Kivity	8ecb528d5a	Update seastar submodule * seastar 67fd967...af6b797 (1): > iotune: Initialize io_rates member variables Fixes #4064	2019-01-08 12:02:42 +02:00
Avi Kivity	d8adbeda11	tests: mutation_source_test: generate valid utf-8 data test_fast_forwarding_across_partitions_to_empty_range uses an uninitialized string to populate an sstable, but this can be invalid utf-8 so that sstable cannot be sstabledumped. Make it valid by using make_random_string(). Fixes #4040. Message-Id: <20190107193240.14409-1-avi@scylladb.com>	2019-01-08 12:02:42 +02:00
Asias He	1de24c8495	repair: Use mf.visit() in fragment_hasher When new fragment type is added, it will fail to compile instead of producing runtime errors. Message-Id: <cf10200e4185c779aad15da3a776a5b79f5323af.1546930796.git.asias@scylladb.com>	2019-01-08 12:02:42 +02:00
Rafael Ávila de Espíndola	67039e942b	Remove the only use of with_alignment from scylla In c++17 there are standard ways of requesting aligned memory, so seastar doesn't need to provide one. This patch is in preparation for removing with_alignment from seastar. Tests: unit (debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190107191019.22295-1-espindola@scylladb.com>	2019-01-07 21:34:47 +02:00
Rafael Ávila de Espíndola	0d4529a5f1	Change timeout to fix tests in a debug build The current timeout is way too small for debug builds. Currently jenkins runs avoid the problem by increasing the timeout by 100x. This patch increases it by 10x, with seems to be sufficient to run the tests in most desktop machines. Message-Id: <20190107191413.22531-1-espindola@scylladb.com>	2019-01-07 21:34:06 +02:00
Avi Kivity	34251f5ea1	tools: toolchain: update image for all-user sudo	2019-01-07 21:22:42 +02:00
Takuya ASADA	3514b185fd	tools: toolchain: allow sudo for all users Non-privileged user may not belongs to "wheel" group, for example Debian variants uses "sudo" group instead of "wheel". To make sudo able to work on all environment we should allow sudo for "ALL" instead of "wheel". Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190107173410.23140-1-syuu@scylladb.com>	2019-01-07 20:47:22 +02:00
Benny Halevy	40410465d7	sstables: mc: expired_liveness_ttl should be max int32_t rather than max uint32_t Corresponding to Cassandra's EXPIRED_LIVENESS_TTL = Integer.MAX_VALUE; Fixes #4060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190107172457.20430-1-bhalevy@scylladb.com>	2019-01-07 18:41:37 +01:00
Avi Kivity	20b6d00e56	tools: toolchain: support dbuild from subdirectory or parent directory of scylla.git When building something other than Scylla (like scylla-tools-java or scylla-jmx) it is convenient to run it from some other directory. To do that, allow running dbuild from any directory (so we locate tools/toolchain/image relative to the dbuild script rather than use a fixed path) and mount the current directory since it's likely the user will want to access files there. Message-Id: <20190107165824.25164-1-avi@scylladb.com>	2019-01-07 18:35:51 +01:00
Nadav Har'El	f6e0ce02fa	docs/isolation.md: new document Start a new document with an overview of isolation in Scylla, i.e., scheduling groups, I/O priority classes, controllers, etc. As all documents in docs/, this is a document for developers (not users!) who need to understand how isolation between different pieces of Scylla (e.g., queries, compaction, repair, etc.) works, which scheduling groups and I/O classes we have and why, etc. The document is still very partial and includes a lot of TODOs on places where the explanation needs to be expanded. In particular it needs an accurate explanation (and not just a name) of what kind of work is done under each of the groups and classes, and an explanation of how we set up RPC to use which scheduling groups for the code it executes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190103183232.21348-1-nyh@scylladb.com>	2019-01-07 17:48:35 +02:00
Botond Dénes	80affca5f7	tests/querier_cache: use stats to get the no. of inactive reads Now that we added stats for the inactive reads, the tests don't need the `reader_concurrency_semaphore::inactive_reads()` method, instead they can rely on the stats to check the number of inactive reads.	2019-01-07 17:06:26 +02:00
Botond Dénes	e56c26205f	reader_concurrency_semaphore: add counters for inactive reads Add counters that give insight into inactive read related events. Two counters are added: * permit_based_evictions * population	2019-01-07 16:45:49 +02:00
Nadav Har'El	da090a5458	materialized views: move hints to top-level directory While we keep ordinary hints in a directory parallel to the data directory, we decided to keep the materialized view hints in a subdirectory of the data directory, named "view_pending_updates". But during boot, we expect all subdirectories of data/ to be keyspace names, and when we notice this one, we print a warning: WARN: database - Skipping undefined keyspace: view_pending_updates This spurious warning annoyed users. But moreover, we could have bigger problems if the user actually tries to create a keyspace with that name. So in this patch, we move the view hints to a separate top-level directory, which defaults to /var/lib/scylla/view_hints, but as usual can be configured. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190107142257.16342-1-nyh@scylladb.com>	2019-01-07 16:43:43 +02:00
Takuya ASADA	eddecdd0b5	dist/redhat: drop unused dependencies wget and yum-builddep are not used anymore, don't install them. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190107091148.1590-7-syuu@scylladb.com>	2019-01-07 12:56:18 +00:00
Takuya ASADA	40dc62fa98	dist/debian: don't use sudo to rm debian dir sudo does not allowed in dbuild with non-root privilege, and also it should be owned by current user, stop using sudo. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190107091148.1590-5-syuu@scylladb.com>	2019-01-07 12:56:18 +00:00
Takuya ASADA	237de20ff9	dist/debian: correct dbuild path /usr/sbin/debuild is typo, should be /usr/bin. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190107091148.1590-4-syuu@scylladb.com>	2019-01-07 12:56:17 +00:00
Pekka Enberg	2520c8caac	Merge 'Improve frozen toolchain for continuous integration' from Avi "Add features that are useful for continuous integration pipelines (and also ordinary developers): - sudo support, with and without a tty, as our packaging scripts require it - install ccache package to allow reducing incremental build times - dependencies needed to build scylla-jmx and scylla-tools-java" * tag 'toolchain-ci/v1' of https://github.com/avikivity/scylla: tools: toolchain: update image for ant, maven, ccache, sudo tools: toolchain: dbuild: pass-through supplementary groups tools: toolchain: defeat PAM tools: toolchain: improve sudo support tools: toolchain: break long line in dbuild tools: toolchain: prepare sudoers file tools: toolchain: install ccache install-dependencies.sh: add maven and ant	2019-01-07 12:56:17 +00:00
Pekka Enberg	9b27a3035c	Merge 'Reduce inclusions of "database.hh"' from Avi "This patchset reduces inclusions of database.hh, particularly in header files. It reduces the number of objects depending on database.hh from 166 to 116. Tests: unit(release), playing a little with tracing" * tag 'database.hh/v1' of https://github.com/avikivity/scylla: streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh sstables: writer.hh: add some forward declarations table_helper: remove database.hh include table_helper: de-inline insert() and setup_keyspace() table_helper: de-template setup_keyspace() table_helper: simplify template body of table_helper::insert() schema_tables: remove #include of database.hh cql_type_parser: remove dependency on user_types_metadata thrift: add missing include of sleep.hh cql3: ks_prop_defs: remove #include "database.hh"	2019-01-07 12:56:17 +00:00
Benny Halevy	b017d87a43	tests: mc: add back missing sstable_3_x_test Statistics.db files To be able to verify the golden version with sstabledump. These files were generated by running sstable_3_x_test and keeping its generated output files. Refs #4043 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190103112511.23488-2-bhalevy@scylladb.com>	2019-01-07 12:56:16 +00:00
Benny Halevy	517ad58823	tests: mc: delete empty line from write_static_row/mc-1-big-TOC.txt Refs #4043 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190103112511.23488-1-bhalevy@scylladb.com>	2019-01-07 12:56:16 +00:00
Nadav Har'El	b14616b879	docs/logging.md: improvements Various small improvements to docs/logging.md: 1. Describe the options to log to stdout or syslog and their defaults. 2. Mention the possibility of using nodetool instead of REST API. 3. Additional small tweaks to formatting. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190106111851.26700-1-nyh@scylladb.com>	2019-01-06 13:20:53 +02:00
Nadav Har'El	232e97ad06	docs/logging.md: new document Add a new document about logging in Scylla, and how to change the log levels when running Scylla and during the run. It needs more developer-oriented information (e.g., how to create new logger subsystems in the code) but I think it's a good start. Some of the text is based on Glauber's writeup for the Scylla website on changing log levels at runtime. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190106103606.26032-1-nyh@scylladb.com>	2019-01-06 12:40:14 +02:00
Benny Halevy	2daf81e80f	dist: redhat/debian specs: add dependency on 'file' package Needed by seastar-addr2line Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190101203434.14858-1-bhalevy@scylladb.com>	2019-01-06 12:13:08 +02:00
Avi Kivity	f02c64cadf	streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh This header, which is easily replaced with a forward declaration, introduces a dependency on database.hh everywhere. Remove it and scatter includes of database.hh in source files that really need it.	2019-01-05 17:33:25 +02:00
Avi Kivity	ca93b88cfb	sstables: writer.hh: add some forward declarations This makes the header less dependent on previously-included headers.	2019-01-05 17:04:16 +02:00
Avi Kivity	53a21c7787	table_helper: remove database.hh include	2019-01-05 16:39:26 +02:00
Avi Kivity	7534412071	table_helper: de-inline insert() and setup_keyspace() After previous patches de-templated these functions, we can de-inline them. This helps reduce compile time and prepares to reduce header dependencies.	2019-01-05 16:28:46 +02:00
Avi Kivity	cfedf4ab0f	table_helper: de-template setup_keyspace() This setup function has no reason to be a template and is easily converted. We can then later de-inline it to reduce dependencies.	2019-01-05 16:23:10 +02:00
Avi Kivity	659147cd79	table_helper: simplify template body of table_helper::insert() Move most of the body into a non-template overload to reduce dependencies in the header (and template bloat). The function is not on any fast path, and noncopyable_function will likely not even allocate anything.	2019-01-05 16:22:08 +02:00
Avi Kivity	c3ef99f84f	schema_tables: remove #include of database.hh Distribute in source files (and one header - table_helper.hh) that need it.	2019-01-05 15:43:07 +02:00
Avi Kivity	f43f82d1d2	cql_type_parser: remove dependency on user_types_metadata A default parameter of type T (or lw_shared_ptr<T>) requires that T be defined. Remove the depndency by redefining the default parameter as an overload, for T = user_types_metadata.	2019-01-05 15:40:58 +02:00
Avi Kivity	4ba1d4d1dc	thrift: add missing include of sleep.hh Currently obtained indirectly through database.hh.	2019-01-05 15:39:30 +02:00
Avi Kivity	d24962e16c	cql3: ks_prop_defs: remove #include "database.hh" Replace with forward declaration to reduce rebuilds.	2019-01-05 14:26:03 +02:00
Jesse Haber-Kucharsky	17a5f7acab	build: Link against libatomic Since Scylla uses functions from the `atomic` header in its own source code, we need to explicitly link against the stub library that is provided for hardware architectures that do not have native support for atomic operations. Fixes #4053 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <7d62e762130494d73565ce8c031f53aaf866d3aa.1546645041.git.jhaberku@scylladb.com>	2019-01-05 13:38:57 +02:00
Avi Kivity	36e4e9fb54	Update seastar submodule * seastar 6c8c229...67fd967 (1): > perftune.py: tune only active NVMe HW queues on i3 AWS instances	2019-01-04 13:17:29 +02:00
Avi Kivity	b0980ba7c6	compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads The workload in #3844 has these characteristics: - very small data set size (a few gigabytes per shard) - large working set size (all the data, enough for high cache miss rate) - high overwrite rate (so a compaction results in 12X data reduction) As a result, the compaction backlog controller assigns very few shares to compaction (low data set size -> low backlog), so compaction proceeds very slowly. Meanwhile, we have tons of cache misses, and each cache miss needs to read from a large number of sstables (since compaction isn't progressing). The end result is a high read amplification, and in this test, timeouts. While we could declare that the scenario is very artificial, there are other real-world scenarios that could trigger it. Consider a 100% write load (population phase) followed by 100% read. Towards the end of the last compaction, the backlog will drop more and more until compaction slows to a crawl, and until it completes, all the data (for that compaction) will have to be read from its input sstables, resulting in read amplification. We should probably have read amplification affect the backlog, but for now the simpler solution is to increase the minimum shares to 50 so that compaction always makes forward progress. This will result in higher-than-needed compaction bandwidth in some low write rate scenarios so we will see fluctuations in request rate (what the controller was designed to avoid), but these fluctioations will be limited to 5%. Since the base class backlog_controller has a fixed (0, 0) point, remove it and add it to derived classes (setting it to (0, 50) for compaction). Fixes #3844 (or at least improves it). Message-Id: <20181231162710.29410-1-avi@scylladb.com>	2019-01-04 10:58:43 +01:00
Duarte Nunes	b851cb1a9a	distributed_loader: Forbid uploading MV sstables Instead suggest that the views be re-created. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190103142933.35354-1-duarte@scylladb.com>	2019-01-03 16:31:20 +02:00
Avi Kivity	7d3562a403	tools: toolchain: update image for ant, maven, ccache, sudo	2019-01-03 16:16:47 +02:00
Avi Kivity	344468e20d	tools: toolchain: dbuild: pass-through supplementary groups Useful for ccache.	2019-01-03 16:16:47 +02:00
Avi Kivity	11889f5ea9	tools: toolchain: defeat PAM Prevent PAM from enforcing security and preventing sudo from working. This is done by replacing the default configuration (designed for workstations) to one that uses pam_permit for everything.	2019-01-03 16:16:47 +02:00
Avi Kivity	9c258923d8	tools: toolchain: improve sudo support Bind-mount /etc/passwd and /etc/group so sudo doesn't complain, and support sudo without password or tty.	2019-01-03 16:16:47 +02:00
Avi Kivity	05f78df7b9	tools: toolchain: break long line in dbuild	2019-01-03 16:16:47 +02:00
Avi Kivity	f79a300081	tools: toolchain: prepare sudoers file Don't require a tty or passwords, since they won't be available in continuous integration environments.	2019-01-03 16:16:47 +02:00
Avi Kivity	25040824cf	tools: toolchain: install ccache Not strictly necessary, but often useful to reduce rebuild times. The user will need to bind-mount a populated cache.	2019-01-03 16:16:47 +02:00
Avi Kivity	527e3a58ff	install-dependencies.sh: add maven and ant Add tools needed to build scylla-jmx and scylla-tools-java. While not requirements of this repository, it's nicer if a single setup can be used to build and run everything. We also install pystache as it's used by packaging scripts.	2019-01-03 16:16:45 +02:00
Duarte Nunes	3235c13125	utils/fragmented_temporary_buffer: Correctly implement remove_suffix() The current implementation breaks the invariant that _size_bytes = reduce(_fragments, &temporary_buffer::size) In particular, this breaks algorithms that check the individual segment size. Correctly implement remove_suffix() by destroying superfluous temporary_buffer's and by trimming the last one, if needed. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190103133523.34937-1-duarte@scylladb.com>	2019-01-03 13:37:01 +00:00
Botond Dénes	021feef513	querier_cache: simplify memory eviction use-after-free fix, add tests Simplify the fix for memory based eviction, introduced by `918d255` so there is no need to massage the counters. Also add a check to `test_memory_based_cache_eviction` which checks for the bug fixed. While at it also add a check to `test_time_based_cache_eviction` for the fix to time based eviction (`e5a0ea3`). Tests: tests/querier_cache:debug Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c89e2788a88c2a701a2c39f377328e77ac01e3ef.1546515465.git.bdenes@scylladb.com>	2019-01-03 13:44:08 +02:00
Tomasz Grabiec	1613a623e1	Merge "Fix crash on corrupt sstable" from Rafael * https://github.com/espindola/scylla espindola/invalid_boundary4: sstables: Refactor predicates on bound_kind_m Fix crash on corrupt sstable	2019-01-03 12:02:09 +01:00
Duarte Nunes	42d9ca8266	Merge 'Add staging SSTables support to row level repair' from Piotr " This series adds staging SSTables support to row level repair. It was introduced for streaming sessions before, but since row level repair doesn't leverage sessions at all, it's added separately. Tests: unit (release) dtest (repair_additional_test.py:RepairAdditionalTest, excluding repair_abort_test, which fails for me locally on master) " * 'add_staging_sstables_generation_to_row_level_repair_2' of https://github.com/psarna/scylla: repair: add staging sstables support to row level repair main,repair: add params to row level repair init streaming,view: move view update checks to separate file	2019-01-03 09:40:13 +00:00
Piotr Sarna	a73d9ccf31	service: mark existing views as built before bootstrap When a node is bootstrapping, it will receive data from other nodes via streaming, including materialized views. Regardless whether these views are built on other nodes or not, building them on newly bootstrapped nodes has no effect - updates were either already streamed completely (if view building have finished) or will be propagated via view building, if the process is still ongoing. So, marking all views as 'built' for the bootstrapped node prevents it from spawning superfluous view building processes. Fixes #4001 Message-Id: <fd53692c38d944122d1b1013fdb0aedf517fa409.1546498861.git.sarna@scylladb.com>	2019-01-03 09:39:33 +00:00
Botond Dénes	e5a0ea390a	querier_cache: unregister queriers evicted due to expired TTL Currently queriers evicted due to their TTL expiring are not unregistered from the `reader_concurrency_semaphore`. This can cause a use-after-free when the semaphore tries to evict the same querier at some later point in time, as the querier entry it has a pointer to is now invalid. Fix by unregistering the querier from the semaphore before destroying the entry. Refs: #4018 Refs: #4031 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4adfd09f5af8a12d73c29d59407a789324cd3d01.1546504034.git.bdenes@scylladb.com>	2019-01-03 10:29:26 +02:00
Piotr Sarna	bc74ac6f09	repair: add staging sstables support to row level repair In some cases, sstables created during row level repair should be enqueued as staging in order to generate view updates from them. Fixes #4034	2019-01-03 08:36:45 +01:00
Piotr Sarna	a0003c52cf	main,repair: add params to row level repair init Row level repair needs references to system distributed keyspace and view update generator in order to enqueue some sstables as staging.	2019-01-03 08:31:41 +01:00
Piotr Sarna	9d46715613	streaming,view: move view update checks to separate file Checking if view update path should be used for sstables is going to be reused in row level repair code, so relevant functions are moved to a separate header.	2019-01-03 08:31:40 +01:00
Avi Kivity	918d255168	querier_cache: unregister querier from reader_concurrency_semaphore during eviction In insert_querier(), we may evict older queriers to make room for the new one. However, we forgot to unregister the evicted queriers from reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore eventually wanted to evict something, it saw an inactive_read_handle that was not connected to a querier_cache::entry, and crashed on use-after-free. Fix by evicting through the inactive_read_handle associated with the querier to be evicted. This removes traces of the querier from both reader_concurrency_semaphore and querier_cache. We also have to massage the statistics since querier_inactive_read::evict() updates different counters. Fixes #4018. Tests: unit(release) Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190102175023.26093-1-avi@scylladb.com>	2019-01-03 09:15:07 +02:00
Rafael Ávila de Espíndola	28c014351f	Fix crash on corrupt sstable The check in consume_range_tombstone was too late. Before getting to it we would fail an assert calling to_bound_kind. This moves the check earlier and adds a testcase. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-02 17:52:07 -08:00
Rafael Ávila de Espíndola	3c9178d122	sstables: Refactor predicates on bound_kind_m This moves the predicate functions to the start of the file, renames is_in_bound_kind to is_bound_kind for consistency with to_bound_kind and defines all predicates in a similar fashion. It also uses the predicates to reduce code duplication. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-02 17:50:44 -08:00
Avi Kivity	2717bdd301	tools: toolchain: allow adjusting "docker run" command line It is useful to adjust the command line when running the docker image, for example to attach a data volume or a ccache directory. Add e mechanism to do that. Message-Id: <20181228163306.19439-1-avi@scylladb.com>	2019-01-01 21:44:50 +00:00
Avi Kivity	d19660ec0a	Merge "commitlog: Use fragmented buffers for reading entries" from Duarte " Instead of allocating a contiguous temporary_buffer when reading mutations from the commitlog - or hint - replaying, use fragemnted buffers instead. Refs #4020 " * 'commitlog/fragmented-read/v1' of https://github.com/duarten/scylla: db/commitlog: Use fragmented buffers to read entries db/commitlog: Implement skip in terms of input buffer skipping tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix() utils/fragmented_temporary_buffer: Add remove_suffix tests/fragmented_temporary_buffer_test: Add unit test for skip() utils/fragmented_temporary_buffer: Allow skipping in the input stream	2019-01-01 19:08:34 +02:00
Avi Kivity	6641353854	tracing: remove static class_registry Static class_registries hinder librarification by requiring linking with all object files (instead of a library from which objects are linked on demand) and reduce readability by hiding dependencies and by their horrible syntax. Hide them behind a non-static, non-template tracing backend registry. Message-Id: <20181229121000.7885-1-avi@scylladb.com>	2018-12-31 13:24:54 +00:00
Duarte Nunes	b7517183fa	db/commitlog: Use fragmented buffers to read entries Leverage fragmented_temporary_buffer when reading commit log entries, avoiding large allocations. Refs #4020 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	0e50a9bc6d	db/commitlog: Implement skip in terms of input buffer skipping This simplifies the code and allows to get rid of the overload of advance() taking a temporary_buffer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8379ac6189	tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	1a88cd7992	utils/fragmented_temporary_buffer: Add remove_suffix Essentially hide some bytes off the end of the buffer. Needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	50dd8b67b2	tests/fragmented_temporary_buffer_test: Add unit test for skip() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8eab0a3e01	utils/fragmented_temporary_buffer: Allow skipping in the input stream Add fragmented_temporary_buffer::istream::skip(), needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Avi Kivity	c180a18dbb	Distribute distributed_loader into its own header and source files distributed_loader is a sizeable fraction of database.cc, so moving it out reduces compile time and improves readability. Message-Id: <20181230200926.15074-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Avi Kivity	49958d5836	tools: toolchain: update for lz4 1.8.3 lz4 1.8.3 was released with a fix for data corruption during compression. While the release notes indicate we aren't vulnerable, be cautious and update anyway. Message-Id: <20181230144716.7238-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Hagit Segev	141fad9c14	Update README.md fix a typo	2018-12-31 13:33:04 +02:00
Asias He	d90836a2d3	streaming: Make total_incoming_bytes and total_outgoing_bytes metrics monotonic Currently, they increases and decreases as the stream sessions are created and destroyed. Make them prometheus monotonically increasing counter for easier monitoring. Message-Id: <7c07cea25a59a09377292dc8f64ed33ff12eda87.1545959905.git.asias@scylladb.com>	2018-12-30 16:52:17 +02:00
Pekka Enberg	96172b7bca	Merge 'Fixes for the view_update_from_staging_generator' from Duarte "This series contains a couple of fixes to the view_update_from_staging_generator, the object responsible for generating view updates from sstables written through streaming. Fixes #4021" * 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla: db/view/view_update_from_staging_generator: Break semaphore on stop() db/view/view_update_from_staging_generator: Restore formatting db/view/view_update_from_staging_generator: Avoid creating more than one fiber	2018-12-29 18:31:40 +02:00
Duarte Nunes	f41d13f38c	db/view/view_update_from_staging_generator: Break semaphore on stop() This avoid having fibers waiting _registration_sem without ever being notified. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:04 +00:00
Duarte Nunes	4974addc5c	db/view/view_update_from_staging_generator: Restore formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:02 +00:00
Duarte Nunes	201196130d	db/view/view_update_from_staging_generator: Avoid creating more than one fiber If view_update_from_staging_generator::maybe_generate_view_updates() is called before view_update_from_staging_generator::start(), as can happen in main.cc, then we can potentially create more than one fiber, which leads to corrupted state and conflicting operations. To avoid this, use just one fiber and be explicit about notifying it that more work is needed, by leveraging a condition-variable. Fixes #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:52:51 +00:00
Duarte Nunes	66113a2d39	Merge 'Replace query_processor's sharded<database> with plain database' from Avi " A sharded<database> is not very useful for accessing data since data is usually distributed across many nodes, while a sharded<database> contains only a single node's view. So it is really only used for accessing replicated metadata, not data. As such only the local shard is accessed. Use that to simplify query_processor a little by replacing sharded<database> with a plain database. We can probably be more ambitious and make all accesses, data and metadata, go through storage_proxy, but this is a start. " * tag 'qp-unshard-database/v1' of https://github.com/avikivity/scylla: query_processor: replace sharded<database> with the local shard commitlog_replayer: don't use query_processor client_state: change set_keyspace() to accept a single database shard legacy_schema_migrator: initialize with database reference	2018-12-29 12:14:19 +00:00
Avi Kivity	0c0cc66ee7	system_keyspace, view: reduce interdependencies system_keyspace is an implementation detail for most of its users, not part of the interface, as it's only used to store internal data. Therefore, including it in a header file causes unneeded dependencies. This patch removes a dependency between views and system_keyspace.hh by moving view_name and view_build_progress into a separate header file, and using forward declarations where possible. This allows us to remove an inclusion of system_keyspace.hh from a header file (the last one), so that further changes to system_keyspace.hh will cause fewer recompilations. Message-Id: <20181228215736.11493-1-avi@scylladb.com>	2018-12-29 12:12:15 +00:00
Avi Kivity	30745eeb72	query_processor: replace sharded<database> with the local shard query_processor uses storage_proxy to access data, and the local database object to access replicated metadata. While it seems strange that the database object is not used to access data, it is logical when you consider that a sharded<database> only contain's this node's data, not the cluster data. Take advantage of this to replace sharded<database> with a single database shard.	2018-12-29 11:02:15 +02:00
Avi Kivity	f0a709cfc8	commitlog_replayer: don't use query_processor During normal writes, query processing happens before commitlog, so logically commitlog replaying the commitlog shouldn't need it. And in fact the dependency on query_processor can be eliminated, all it needs is the local node's database.	2018-12-29 11:00:29 +02:00
Avi Kivity	7830086317	client_state: change set_keyspace() to accept a single database shard set_keyspace() only needs one shard (it is checking replicated state, not sharded data) so arrange for it to receive only that one shard.	2018-12-29 10:58:39 +02:00
Avi Kivity	e4233262cf	legacy_schema_migrator: initialize with database reference Provide legacy_schema_migrator with a sharded<database> so it doesn't need to use the one from query_processor. We want to replace query_processor's sharded<database> with just a local database reference in order to simplify it, and this is standing in the way.	2018-12-29 10:58:22 +02:00
Duarte Nunes	bab7e6877b	streaming/stream_session: Only stage sstables for tables with views When streaming, sstables for which we need to generate view updates are placed in a special staging directory. However, we only need to do this for tables that actually have views. Refs #4021 Message-Id: <20181227215412.5632-1-duarte@scylladb.com>	2018-12-28 18:32:24 +02:00
Avi Kivity	feddf0b021	tools: toolchain: patch boost for use-after-free in Boost.Test XML output The version of boost in Fedora 29 has a use-after-free bug that is only exposed when ./test.py is run with the --jenkins flag. To patch it, use a fixed version from the copr repository scylladb/toolchain. Message-Id: <20181228150419.29623-1-avi@scylladb.com>	2018-12-28 16:35:28 +01:00
Tomasz Grabiec	7747f2dde3	Merge "nodetool toppartitions" from Rafi & Avi Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write operation over a period of time. Content: - data_listener classes: mechanism that interfaces with mutation readers in database and table classes, - toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this interfaces with data_listeners and the REST api), - REST api for toppartitions query. Uses Top-k structure for handling stream summary statistics (based on implementation in C, see #2811). What's still missing: - JMX interface to nodetool (interface customization may be required), - Querying #rows and #bytes (currently, only #partitions is supported). Fixes #2811 https://github.com/avikivity/scylla rafie_toppartitions_v7.1: top_k: whitespace and minor fixes top_k: map template arguments top_k: std::list -> chunked_vector top_k: support for appending top_k results nodetool toppartitions: refactor table::config constructor nodetool toppartitions: data listeners nodetool toppartitions: add data_listeners to database/table nodetool toppartitions: fully_qualified_cf_name nodetool toppartitions: Toppartitions query implementation nodetool toppartitions: Toppartitions query REST API nodetool toppartitions: nodetool-toppartitions script	2018-12-28 16:31:24 +01:00
Rafi Einstein	7677d2ba2c	nodetool toppartitions: nodetool-toppartitions script A Python script mimicking the nodetool toppartitions utility, utilizing Scylla REST API. Examples: $ ./nodetool-toppartitions --help usage: nodetool-toppartitions [-h] [-k LIST_SIZE] [-s CAPACITY] keyspace table duration Samples database reads and writes and reports the most active partitions in a specified table positional arguments: keyspace Name of keyspace table Name of column family duration Query duration in milliseconds optional arguments: -h, --help show this help message and exit -k LIST_SIZE The number of the top partitions to list (default: 10) -s CAPACITY The capacity of stream summary (default: 256) $ ./nodetool-toppartitions ks test1 10000 READ Partition Count 30 2 20 2 10 2 WRITE Partition Count 30 1 20 1 10 1 Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:48:03 +02:00
Rafi Einstein	197f38d4ee	nodetool toppartitions: Toppartitions query REST API A HTTP GET operation starts the query (with args: ks/cf name and duration in ms). It executes synchroneously, results are returned as JSON: $ curl -s -X GET http://localhost:10000/column_family/toppartitions/ks:cf1?duration=10000 \| jq { "read": [ { "count": "15", "error": "0", "partition": "4b504d39354f37353131" }, { "count": "15", "error": "0", "partition": "3738313134394d353530" } ], "write": [ { "count": "15", "error": "0", "partition": "4b504d39354f37353131" }, { "count": "15", "error": "0", "partition": "3738313134394d353530" } ] } Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	6b2c21f69b	nodetool toppartitions: Toppartitions query implementation toppartitions_query installs toppartitions_data_listener-s on all database shards, waits for the designated period, uninstalls shards and collects top-k read/write partition keys. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	404f75def5	nodetool toppartitions: fully_qualified_cf_name Encapsulate keyspace:column_family REST API argument parsing into fully_qualified_cf_name class. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	0bffe5f83e	nodetool toppartitions: add data_listeners to database/table Add data_listeners member to database. Adds data_listeners* to table::config, to be used by table methods to invoke listeners. Install on_read() listener in table::make_reader(). Install on_write() listener in database::apply_in_memory(). Tests: Unit (release) Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	08ba115c16	nodetool toppartitions: data listeners Mechanism that interfaces with mutation readers in database and table classes, to allow tracking most frequent partition keys in read and write operation. Basic design is specified in #2811. Tracking top #rows and #bytes will be supported in the future. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	038f8c7988	nodetool toppartitions: refactor table::config constructor Eliminae extra parameters to ctor and deduce them instead from db param. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	eda43b93c9	top_k: support for appending top_k results Allow appending results of one top_k into another. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:56 +02:00
Rafi Einstein	aeebe8e86b	top_k: std::list -> chunked_vector Replaced std::list with chunked_vector. Because chunked_vector requires a noexcept move constructor from its value type, change the bad_boy type in the unit test not to throw in the move constructor. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:07 +02:00
Avi Kivity	8e2f6d0513	Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz " partition_snapshots created in the memtable will keep a reference to the memtable (as region) and to memtable::_cleaner. As long as the reader is alive, the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumes destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043` (in >= 3.0-rc1) Fixes #4030. Tests: - mvcc_test (debug) " tag 'fix-snapshot-merging-use-after-free-v1.1' of github.com:tgrabiec/scylla: tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed tests: mvcc: Introduce mvcc_container::migrate() tests: mvcc: Make mvcc_partition move-constructible tests: mvcc: Introduce mvcc_container::make_not_evictable() tests: mvcc: Allow constructing mvcc_container without a cache_tracker mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup mvcc: partition_snapshot: Introduce migrate() mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner	2018-12-28 12:45:10 +02:00
Tomasz Grabiec	bb1c9cb6f3	tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	4d13dea39a	tests: mvcc: Introduce mvcc_container::migrate()	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	676868ed31	tests: mvcc: Make mvcc_partition move-constructible	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	c6798f7872	tests: mvcc: Introduce mvcc_container::make_not_evictable()	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	1fa00656ea	tests: mvcc: Allow constructing mvcc_container without a cache_tracker Some test cases will need many containers to simulate memtable -> cache transitions, but there can be only one cache_tracker per shard due to metrics. Allow constructing a conatiner without a cache_tracker (and thus non-evictable).	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	ac49b1def0	mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup partition_snapshots created in the memtable will keep a reference to the memtable (as region*) and to memtable::_cleaner. As long as the reader is alive the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that, nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumses destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem, because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043`. Fixes #4030.	2018-12-27 18:08:50 +01:00
Tomasz Grabiec	20f5d5d1a1	mvcc: partition_snapshot: Introduce migrate() Snapshots which outlive the memtable will need to have their _region and _cleaner references updated. The snapshot can be destroyed after the memtable when it is queud in the mutation_cleaner.	2018-12-27 18:08:50 +01:00
Tomasz Grabiec	67f9afbd1a	mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner	2018-12-27 18:08:50 +01:00
Gleb Natapov	37b4043677	streaming: always read from rpc::source until end-of-stream during mutation sending rpc::source cannot be abandoned until EOS is reached, but current code does not obey it if error code is received, it throws exception instead that aborts the reading loop. Fix it by moving exception throwing out of the loop. Fixes: #4025 Message-Id: <20181227135051.GC29458@scylladb.com>	2018-12-27 16:50:53 +02:00
Asias He	4d3c463536	storage_service: Stop cql server before gossip We saw failure in dtest concurrent_schema_changes_test.py: TestConcurrentSchemaChanges.changes_while_node_down_test test. ====================================================================== ERROR: changes_while_node_down_test (concurrent_schema_changes_test.TestConcurrentSchemaChanges) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 432, in changes_while_node_down_test self.make_schema_changes(session, namespace='ns2') File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 86, in make_schema_changes session.execute('USE ks_%s' % namespace) File "cassandra/cluster.py", line 2141, in cassandra.cluster.Session.execute return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state).result() File "cassandra/cluster.py", line 4033, in cassandra.cluster.ResponseFuture.result raise self._final_exception ConnectionShutdown: Connection to 127.0.0.1 is closed The test: session = self.patient_cql_connection(node2) self.prepare_for_changes(session, namespace='ns2') node1.stop() self.make_schema_changes(session, namespace='ns2') --> ConnectionShutdown exception throws The problem is that, after receiving the DOWN event, the python Cassandra driver will call Cluster:on_down which checks if this client has any connections to the node being shutdown. If there is any connections, the Cluster:on_down handler will exit early, so the session to the node being shutdown will not be removed. If we shutdown the cql server first, the connection count will be zero and the session will be removed. Fixes: #4013 Message-Id: <7388f679a7b09ada10afe7e783d7868a58aac6ec.1545634941.git.asias@scylladb.com>	2018-12-27 14:13:43 +02:00
Duarte Nunes	2f69ba2844	lwt: Remove Paxos-related Cassandra code Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181227112526.4180-1-duarte@scylladb.com>	2018-12-27 13:30:10 +02:00
Duarte Nunes	66e45469b2	streaming/stream_session: Don't use table reference across defer points When creating a sstable from which to generate view updates, we held on to a table reference across defer points. In case there's a concurrent schema drop, the table object might be destroyed and we will incur in a use-after-free. Solve this by holding on to a shared pointer and pinning the table object. Refs #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181227105921.3601-1-duarte@scylladb.com>	2018-12-27 13:05:46 +02:00
Avi Kivity	b349e11aba	tools: toolchain: avoid docker-provided /tmp On at least one system, using the container's /tmp as provided by docker results in spurious EINVALs during aio: INFO 2018-12-27 09:54:08,997 [shard 0] gossip - Feature ROW_LEVEL_REPAIR is enabled unknown location(0): fatal error: in "test_write_many_range_tombstones": storage_io_error: Storage I/O error: 22: Invalid argument seastar/tests/test-utils.cc(40): last checkpoint The setup is overlayfs over xfs. To avoid this problem, pass through the host's /tmp to the container. Using --tmpfs would be better, but it's not possible to guess a good size as the amount of temporary space needed depends on build concurrency. Message-Id: <20181227101345.11794-1-avi@scylladb.com>	2018-12-27 10:17:23 +00:00
Avi Kivity	2c4a732735	tools: toolchain: update baseline Fedora packages Image fedora-29-20181219 was broken due to the followin chain of events: - we install gnutls, which currently is at version 3.6.5 - gnutls 3.6.5 introduced a dependency on nettle 3.4.1 - the gnutls rpm does not include a version requirement on nettle, so an already-installed nettle will not be upgraded when gnutls is installed - the fedora:29 image which we use as a baseline has nettle installed - docker does not pull the latest tag in FROM statements during "docker build" - my build machine already had a fedora:29 image, with nettle 3.4 installed (the repository's image has 3.4.1, but docker doesn't automatically pull if an image with the required tag exists) As a result, the image ended up hacing gnutls 3.6.5 and nettle 3.4, which are incompatible. To fix, update all packages after installation to attempt to have a self consistent package set even if dependencies are not perfect, and regenerate the image. Message-Id: <20181226135711.24074-1-avi@scylladb.com>	2018-12-26 14:58:23 +00:00
Avi Kivity	1414837fcc	tools: toolchain: improve dbuild for continuous integration environments The '-t' flag to 'docker run' passes the tty from the caller environment to the container, which is nice for interactive jobs, but fails if there is no tty, such as in a continuous integration environment. Given that, the '-i' flag doesn't make sense either as there isn't any input to pass. Remove both, and replace with --sig-proxy=true which allows SIGTERM to terminate the container instead of leaving it alive. This reduces the chances of the build stopping but leaving random containers around. Message-Id: <20181222105837.22547-1-avi@scylladb.com>	2018-12-26 10:50:34 +00:00
Avi Kivity	bfd8ade914	tools: toolchain: update toolchain for gcc-8.2.1-6 gcc was updated with some important fixes; update the toolchain to include it. Message-Id: <20181219190548.28675-1-avi@scylladb.com>	2018-12-26 10:21:02 +00:00
Benny Halevy	206483e6af	position_in_partition_view: print bound_weight as int Rather than a non-printable char. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181226091115.18530-1-bhalevy@scylladb.com>	2018-12-26 11:19:30 +02:00
Rafael Ávila de Espíndola	f73c60d8cf	sstables: Convert an unreachable throw into an assert in read path The function pending_collection is only called when cdef->is_multi_cell() is true, so the throw is dead. This patch converts it to an assert. Message-Id: <20181207022119.38387-1-espindola@scylladb.com>	2018-12-26 11:10:19 +02:00
Benny Halevy	52188a20fa	HACKING.md: Add details about unit test debug info Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181225133513.20751-1-bhalevy@scylladb.com>	2018-12-25 16:03:24 +02:00
Avi Kivity	c96fc1d585	Merge "Introduce row level repair" from Asias " === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes: #3033 Tests: dtests/repair_additional_test.py " * 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits) repair: Enable row level repair repair: Add row_level_repair repair: Add docs for row level repair repair: Add repair_init_messaging_service_handler repair: Add repair_meta repair: Add repair_writer repair: Add repair_reader repair: Add repair_row repair: Add fragment_hasher repair: Add decorated_key_with_hash repair: Add get_random_seed repair: Add get_common_diff_detect_algorithm repair: Add shard_config repair: Add suportted_diff_detect_algorithms repair: Add repair_stats to repair_info repair: Introduce repair_stats flat_mutation_reader: Add make_generating_reader storage_service: Introduce ROW_LEVEL_REPAIR feature messaging_service: Add RPC verbs for row level repair repair: Export the repair logger ...	2018-12-25 13:13:00 +02:00
Takuya ASADA	b9a06ae552	dist/offline_installer/redhat: support building RHEL 7 offline installer We had issue to build offline installer on RHEL because of repository difference. This fix enables to build offline installer both on CentOS and RHEL. Also it introduces --releasever <ver>, to build offline installer for specific minor version of CentOS and RHEL. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181212032129.29515-1-syuu@scylladb.com>	2018-12-25 12:50:09 +02:00
Botond Dénes	3ae77a2587	configure.py: generate ${mode}-objects targets Sometimes one wants to just compile all the source files in the projects, because for example one just moved around code or files and there is no need to link and run anything, just check that everything still compiles. Since linking takes up a considerable amount of time it is worthwhile to have a specific target that caters for such needs. This patch introduces a ${mode}-objects target for each mode (e.g. release-objects) that only runs the compilation step for each source file but does not link anything. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <eaad329bf22dfaa3deff43344f3e65916e2c8aaf.1545045775.git.bdenes@scylladb.com>	2018-12-25 12:40:20 +02:00
Benny Halevy	f104951928	sstable_test: read_file should open the file read-only Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181218145156.12716-1-bhalevy@scylladb.com>	2018-12-25 12:02:46 +02:00
Rafael Ávila de Espíndola	f8c81d4d89	tests: sstables: mc: add tests with incompatible schemas In one test the types in the schema don't match the types in the statistics file. In another a column is missing. The patch also updates the exceptions to have more human readable messages. Tests: unit (release) Part of issue #3960. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181219233046.74229-1-espindola@scylladb.com>	2018-12-25 11:11:54 +02:00
Yibo Cai (Arm Technology China)	422987ab04	utils: add fast ascii string validation Validate ascii string by ORing all bytes and check if 7-th bit is 0. Compared with original std::any_of(), which checks ascii string byte by byte, this new approach validates input in 8 bytes and two independent streams. Performance is much higher for normal cases, though slightly slower when string is very short. See table below. Speed(MB/s) of ascii string validation +---------------+-------------+---------+ \| String length \| std::any_of \| u64 x 2 \| +---------------+-------------+---------+ \| 9 bytes \| 1691 \| 1635 \| +---------------+-------------+---------+ \| 31 bytes \| 2923 \| 3181 \| +---------------+-------------+---------+ \| 129 bytes \| 3377 \| 15110 \| +---------------+-------------+---------+ \| 1039 bytes \| 3357 \| 31815 \| +---------------+-------------+---------+ \| 16385 bytes \| 3448 \| 47983 \| +---------------+-------------+---------+ \| 1048576 bytes \| 3394 \| 31391 \| +---------------+-------------+---------+ Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1544669646-31881-1-git-send-email-yibo.cai@arm.com>	2018-12-24 09:58:08 +02:00
Tomasz Grabiec	419c771791	sstables: index_reader: Fix abort when _trust_pi == trust_promoted_index::no data is not moved-from if _trust_pi == trust_promoted_index::no, which triggers the assert on data.empty(). We should make it empty unconditionally. Message-Id: <1545408731-14333-1-git-send-email-tgrabiec@scylladb.com>	2018-12-23 12:09:21 +02:00
Tomasz Grabiec	07d153c769	sstables: mc: reader: Use enum class instead of variant variant is an overkill here. Message-Id: <1545409014-16289-1-git-send-email-tgrabiec@scylladb.com>	2018-12-23 12:04:02 +02:00
Duarte Nunes	e6a8883228	service/storage_proxy: Protect against empty mutation when storing hint mutation_holder::get_mutation_for() can return nullptr's, so protect against those when storing a hint. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181221194853.98775-2-duarte@scylladb.com>	2018-12-23 11:14:44 +02:00
Duarte Nunes	6c4a34f378	service/storage_proxy: Protect against empty mutation in mutation_holder The per_destination_mutation holder can contain empty mutations, so make sure release_mutation() skips over those. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181221194853.98775-1-duarte@scylladb.com>	2018-12-23 11:14:43 +02:00
Duarte Nunes	5e7d18380d	Merge 'Reduce dependencies on config.hh for extensions access' from Avi " Some files use db/config.hh just to access extensions. Reduce dependencies on this global and volatile file by providing another path to access extensions. Tests: unit(release) " * tag 'unconfig-2/v1' of https://github.com/avikivity/scylla: hints: reduce dependencies on db/config.hh commitlog: reduce dependencies on db/config.hh cql3: reduce dependencies on db/config.hh database: provide accessor to db::extensions	2018-12-21 20:15:44 +00:00
Avi Kivity	eae030b061	hints: reduce dependencies on db/config.hh Instead of accessing extensions via config, access it via database::extensions(). This reduces recompilations when configuration is extended.	2018-12-21 20:15:44 +00:00
Avi Kivity	cc8312a8b9	commitlog: reduce dependencies on db/config.hh Instead of accessing extensions via config, access it via database::extensions(). This reduces recompilations when configuration is extended.	2018-12-21 20:15:43 +00:00
Avi Kivity	d2dae3af86	cql3: reduce dependencies on db/config.hh Instead of accessing extensions via config, access it via database::extensions(). This reduces recompilations when configuration is extended.	2018-12-21 20:15:43 +00:00
Avi Kivity	74c1afad29	database: provide accessor to db::extensions Rather than forcing callers to go through get_config(), provide a direct accessor. This reduces dependencies on config.hh, and will allow separation of extensions from configuration.	2018-12-21 20:15:43 +00:00
Tomasz Grabiec	d2f96a60f6	sstables: mc: index_reader: Handle CK_SIZE split across buffers properly we incorrectly falled-through to the next state instead of returning to read more data. This can manifest in a number of ways, an abort, or incorrect read. Introduced in `917528c` Fixes #4011. Message-Id: <1545402032-4114-1-git-send-email-tgrabiec@scylladb.com>	2018-12-21 16:34:10 +02:00
Tomasz Grabiec	7afe2bad51	sstables: mc: reader: Avoid unnecessary index reads on fast forwarding When the next pending fragments are after the start of the new range, we know there is no need to skip. Caught by perf_fast_forward --datasets large-part-ds3 \ --run-tests=large-partition-slicing Refs #3984 Message-Id: <1545308006-16389-1-git-send-email-tgrabiec@scylladb.com>	2018-12-20 16:21:07 +00:00
Gleb Natapov	393269d34b	streaming: hold to sink while close() is running and call close on error as well Currently if something throws while streaming in mutation sending loop sink is not closed. Also when close() is running the code does not hold onto sink object. close() is async, so sink should be kept alive until it completes. The patch uses do_with() to hold onto sink while close is running and run close() on error path too. Fixes #4004. Message-Id: <20181220155931.GL3075@scylladb.com>	2018-12-20 18:03:37 +02:00
Rafi Einstein	533e46ac72	top_k: map template arguments Added Hash and KeyEqual template arguments to enable unordered_map in top_k implementation. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-20 16:41:40 +02:00
Rafi Einstein	75f21954d4	top_k: whitespace and minor fixes Style and minor logic changes from code review. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-20 16:41:33 +02:00
Tomasz Grabiec	2b55ab8c8e	Merge "Add more extensive test for mutation reader fast-forwarding" from Paweł Mutation readers allow fast-forwarding the ranges from which the data is being read. The main user of this feature is cache which, when reading from the underlying reader, may want to skip some data it already has. Unsurprisingly, this adds more complexity to the implementation of the readers and more edge cases the developers need to take care of. While most of the readers were at least to some extent checked in this area those test usually were quite isolated (e.g. one test doing inter-partition fast-forwarding, another doing intra-partition fast-forwarding) and as a consequence didn't cover many corner cases. This patch adds a generic test for fast-forwarding and slicing that covers more complicated scenarios when those operations are combined. Needless to say that did uncover some problems, but fortunately none of them is user-visible. Fixes #3963. Fixes #3997. Tests: unit(release, debug) * https://github.com/pdziepak/scylla.git test-fast-forwarding/v4.1: tests/flat_mutation_reader_assertions: accumulate received tombstones tests/flat_mutation_reader_assertions: add more test messages tests/flat_mutation_reader_assertions: relax has_monotonic_positions() check tests/mutation_readers: do not ignore streamed_mutation::forwarding Revert "mutation_source_test: add option to skip intra-partition fast-forwarding tests" memtable: it is not a single partition read if partition fast-forwaring is enabled sstables: add more tracing in mp_row_consumer_m row_cache: use make_forwardable() to implement streamed_mutation::forwarding row_cache: read is not single-partition if inter-partition forwarding is enabled row_cache: drop support for streamed_mutation::forwarding::yes entirely sstables/mp_row_consumer: position_range end bound is exclusive mutation_fragment_filter: handle streamed_mutation::forwarding::yes properly tests/mutation_reader: reduce sleeping time tests/memtable: fix partition_range use-after-free tests/mutation: fix partition range use-after-free flat_mutation_reader_from_mutations: add overload that accepts a slice and partition range flat_mutation_reader_from_mutations: fix empty range case flat_mutation_reader_from_mutations: destroy all remaining mutations tests/mutation_source: drop dropped column handling test tests/mutation_source: add test for complex fast_forwarding and slicing	2018-12-20 15:05:21 +01:00
Paweł Dziepak	3355d16938	tests/mutation_source: add test for complex fast_forwarding and slicing While we already had tests that verified inter- and intra-partition fast-forwarding as well as slicing, they had quite limited scope and didn't combine those operations. The new test is meant to extensively test these cases.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	26a30375b1	tests/mutation_source: drop dropped column handling test Schema changes are now covered by for_each_schema_change() function. Having some additional tests in run_mutation_source_tests() is problematic when it is used to test intermediate mutation readers because schema changes may be irrelevant for them, which makes the test a waste of time (might be a problem in debug mode) and requires those intermediate reader to use more complex underlying reader that supports schema changes (again, problem in a very slow debug mode).	2018-12-20 13:27:25 +00:00
Paweł Dziepak	048ed2e3d3	flat_mutation_reader_from_mutations: destroy all remaining mutations If the reader is fast-forwarded to another partition range mutation_ may be left with some partial mutations. Make sure that those are properly destroyed.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	d50cd31eee	flat_mutation_reader_from_mutations: fix empty range case An iterator shall not be dereferenced until it is verified that it is dereferencable.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	93488209de	tests/mutation: fix partition range use-after-free	2018-12-20 13:27:25 +00:00
Paweł Dziepak	e91165d929	tests/memtable: fix partition_range use-after-free	2018-12-20 13:27:25 +00:00
Paweł Dziepak	5db8dacd1f	tests/mutation_reader: reduce sleeping time It is a very bad taste to sleep anywhere in the code. The test should be fixed to explicitly test various orderings between concurrent operations, but before that happens let's at least readuce how much those sleeps slow it down by changing it from milliseconds to microseconds.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	243aade3b2	mutation_fragment_filter: handle streamed_mutation::forwarding::yes properly	2018-12-20 13:27:25 +00:00
Paweł Dziepak	dfa5b3d996	sstables/mp_row_consumer: position_range end bound is exclusive	2018-12-20 13:27:25 +00:00
Paweł Dziepak	df1d438fcd	row_cache: drop support for streamed_mutation::forwarding::yes entirely	2018-12-20 13:27:25 +00:00
Paweł Dziepak	adcb3ec20c	row_cache: read is not single-partition if inter-partition forwarding is enabled	2018-12-20 13:27:25 +00:00
Paweł Dziepak	7ecee197c4	row_cache: use make_forwardable() to implement streamed_mutation::forwarding Implementing intra-partition fast-forwarding adds more complexity to already very-much-not-trivial cache readers and isn't really critical in any way since it is not used outside of the tests. Let's use the generic adapter instead of natively implementing it.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	e96a5f96d9	sstables: add more tracing in mp_row_consumer_m	2018-12-20 13:27:25 +00:00
Paweł Dziepak	18825af830	memtable: it is not a single partition read if partition fast-forwaring is enabled Single-partition reader is less expensive than the one that accepts any range of partitions, but it doesn't support fast-forwarding to another partition range properly and therefore cannot be used if that option is enabled.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	bcb5aed1ef	Revert "mutation_source_test: add option to skip intra-partition fast-forwarding tests" This reverts commit `b36733971b`. That commit made run_mutation_reader_tests() support mutation_sources that do not implement streamed_mutation::forwarding::yes. This is wrong since mutation_sources are not allowed to ignore or otherwise not support that mode. Moreover, there is absolutely no reason for them to do so since there is a make_forwardable() adapter that can make any mutation_reader a forwardable one (at the cost of performance, but that's not always important).	2018-12-20 13:27:25 +00:00
Paweł Dziepak	8706750b9b	tests/mutation_readers: do not ignore streamed_mutation::forwarding It is wrong to silently ignore streamed_mutation::forwarding option which completely changes how the reader is supposed to operate. The best solution is to use make_forwardable() adapter which changes non-forwardable reader to a forwardable one.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	edf2c71701	tests/flat_mutation_reader_assertions: relax has_monotonic_positions() check Since `41ede08a1d` "mutation_reader: Allow range tombstones with same position in the fragment stream" mutation readers emit fragments in non-decreasing order (as opposed to strictly increasing), has_monotonic_posiitons() needs to be updated to allow that.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	787d1ba7b2	tests/flat_mutation_reader_assertions: add more test messages	2018-12-20 13:27:25 +00:00
Paweł Dziepak	593fb936c2	tests/flat_mutation_reader_assertions: accumulate received tombstones Current data model employed by mutation readers doesn't have an unique representation of range tombstones. This complicates testing by making multiple ways of emitting range tombstones and rows equally valid. This patch adds an option to verify mutation readers by checking whether tombstones they emit properly affect the clustered rows regardless of how exactly the tombstones are emitted. The interface of flat_mutation_reader_assertions is extended by adding may_produce_tombstones() that accepts any number of tombstones and accumulates them. Then, produces_row_with_key() accepts an additional argument which is the expected timestamp of the range tombstone that affects that row.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	e6d26a528f	Merge "Optimize slicing sstable readers" from Tomasz " Contains several improvements for fast-forwarding and slicing readers. Mainly for the MC format, but not only: - Exiting the parser early when going out of the fast-forwarding window [MC-format-only] - Avoiding reading of the head of the partition when slicing - Avoiding parsing rows which are going to be skipped [MC-format-only] " * 'sstable-mc-optimize-slicing-reads' of github.com:tgrabiec/scylla: sstables: mc: reader: Skip ignored rows before parsing them sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows sstables: mc: parser: Allow the consumer to skip the whole row sstables: continuous_data_consumer: Introduce skip() sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state() sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row sstables: reader: Do not read the head of the partition when index can be used sstables: mc: mutation_fragment_filter: Check the fast-forward window first sstables: mc: writer: Avoid calling unsigned_vint::serialized_size()	2018-12-20 12:48:22 +00:00
Avi Kivity	b66f59aa3d	Merge "materialized views: Apply backpressure from view replicas" from Duarte " As the amount of pending view updates increases we know that there’s a mismatch between the rate at which the base receives writes and the rate at which the view retires them. We react by applying backpressure to decrease the rate of incoming base writes, allowing the slow view replicas to catch up. We want to delay the client’s next writes to a base replica and we use the base’s backlog of view updates to derive this delay. To validate this approach we tested a 3 node Scylla cluster on GCE, using n1-standard-4 instances with NVMEs. A loader running on a n1-standard-8 instance run cassandra-stress with 100 threads. With the delay function d(x) set to 1s, we see no base write timeouts. With the delay function as defined in the series, we see that backlogs stabilize at some (arbitrary) point, as predicted, but this stabilization co-exists with base write timeouts. However, the system overall behaves better than the current version, with the 100 view update limit, and also better than the version without such limit or any backpressure. More work is necessary to further stabilize the system. Namely, we want to keep delaying until we see the backlog is decreasing. This will require us to add more delay beyond the stabilization point, which in turn should minimize the base write timeouts, and will also minimize the amount of memory the backlog takes at each base replica. Design document: https://docs.google.com/document/d/1J6GeLBvN8_c3SbLVp8YsOXHcLc9nOLlRY7pC6MH3JWo Fixes #2538 " Reviewed-by: Nadav Har'El <nyh@scylladb.com> * 'materialized-views/backpressure/v2' of https://github.com/duarten/scylla: (32 commits) service/storage_proxy: Release mutation as early as possible service/storage_proxy: Delay replica writes based on view update backlog service/storage_proxy: Get the backlog of a particular base replica service/storage_proxy: Add counters for delayed base writes main: Start and stop the view_update_backlog_broker service: Distribute a node's view update backlog service: Advertise view update backlog over gossip service/storage_proxy: Send view update backlog from replicas service/storage_proxy: Prepare to receive replica view update backlog service/storage_proxy: Expose local view update backlog tests/view_schema_test: Add simple test for db::view::node_update_backlog db/view: Introduce node_update_backlog class db/hints: Initialize current backlog database: Add counter for current view backlog database: Expose current memory view update backlog idl: Add db::view::update_backlog db/view: Add view_update_backlog database: Wait on view update semaphore for view building service/storage_proxy: Use near-infinite timeouts for view updates database: generate_and_propagate_view_updates no longer needs a timeout ...	2018-12-20 12:44:51 +02:00
Asias He	bcba6b4f4d	streaming: Futurize estimate_partitions The loop can take a long time if the number of sstables and/or ranges are large. To fix, futurize the loop. Fixes: #4005 Message-Id: <3b05cb84f3f57cc566702142c6365a04b075018e.1545290730.git.asias@scylladb.com>	2018-12-20 12:08:03 +02:00
Amos Kong	385d74db01	redhat/scylla.spec: add python34-setuptools dependency Commit `00476c3946` switched some scripts to python3, it introduced an ImportError: No module named 'pkg_resources'. Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <293c05d9315ec6c9da1f32e8cb3d2fdf8d8d3924.1545272049.git.amos@scylladb.com>	2018-12-20 06:32:36 +02:00
Duarte Nunes	2d7c026d6e	service/storage_proxy: Release mutation as early as possible When delaying a base write, there is no need to hold on to the mutation if all replicas have already replied. We introduce mutation_holder::release_mutation(), which frees the mutations that are no longer needed during the rest of the delay. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	756b601560	service/storage_proxy: Delay replica writes based on view update backlog As the amount of pending view updates increases we know that there’s a mismatch between the rate at which the base receives writes and the rate at which the view retires them. We react by applying backpressure to decrease the rate of incoming base writes, allowing the slow view replicas to catch up. We want to delay the client’s next writes to a base replica. We use the base’s backlog of view updates to derive this delay. If we achieve CL and the backlogs of all replicas involved were last seen to be empty, then we wouldn't delay the client's reply. However, it could be that one of the replicas is actually overloaded, and won't reply for many new such requests. We'll eventually start applying backpressure to the client via the background's write queue, but in the meanwhile we may be dropping view updates. To mitigate this we rely on the backlog being gossiped periodically. Fixes #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	997bdf5d98	service/storage_proxy: Get the backlog of a particular base replica Add a function that returns the view update backlog for a particular replica. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	819b6f3406	service/storage_proxy: Add counters for delayed base writes Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	6df32bfb0c	main: Start and stop the view_update_backlog_broker Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	37dfd22619	service: Distribute a node's view update backlog This patch introduces the view_update_backlog_broker class, which is responsible for periodically updating the local gossip state with the current node's view update backlog. It also registers to updates from other nodes, and updates the local coordinator's view of their view update backlogs. We consider the view update backlog received from a peer through the mutation_done verb to be always fresh, but we consider the one received through gossip to be fresh only if it has a higher timestamp than what we currently have recorded. This is because a node only updates its gossip state periodically, and also because a node can transitively receive gossip state about a third node with outdated information. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	8da6a31e75	service: Advertise view update backlog over gossip This lays the groundwork for brokering a node's view update backlog across the whole cluster. This is needed for when a coordinator does not contact a given replica for a long time, and uses a backlog view that is outdated and causes requests to be unnecessarily delayed. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	ede5742f9b	service/storage_proxy: Send view update backlog from replicas Change the inter-node protocol so we can propagate the view update backlog from a base replica to the coordinator through the mutation_done and mutation_failed verbs. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	34b48e1d98	service/storage_proxy: Prepare to receive replica view update backlog In subsequent patches, replicas will reply to the coordinator with their view update backlog. Before introducing changes to the messaging_service, prepare the storage_proxy to receive and store those backlogs. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	776fdd4d1a	service/storage_proxy: Expose local view update backlog The local view update backlog is the max backlog out of the relative memory backlog size and the relative hints backlog size. We leverage the db::view::node_update_backlog class so we can send the max backlog out of the node's shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	6662475dd9	tests/view_schema_test: Add simple test for db::view::node_update_backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	2bd76f8fc5	db/view: Introduce node_update_backlog class This class is an atomic view update backlog representation, safe to update from multiple shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	6afbec4685	db/hints: Initialize current backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	8d6718b6e4	database: Add counter for current view backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	2174eed640	database: Expose current memory view update backlog Expose the base replica's current memory view update backlog, which is defined in terms of units consumed from the semaphore. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	d54ac4961d	idl: Add db::view::update_backlog Add db::view::update_backlog to the newly created view.idl.hh. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	12ce517242	db/view: Add view_update_backlog The view update backlog represents the pending view data that a base replica maintains. It is the maximum of the memory backlog - how much memory pending view updates are consuming - and the disk backlog - how much view hints are consuming. The size of a backlog is relative to its maximum size. We will use this class to represent a base replica's view update backlog at the coordinator. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	fc9176e784	database: Wait on view update semaphore for view building View building sends view updates synchronously, which has natural backpressure. However, they 1) Contribute to the load on the view replicas, and; 2) Add memory pressure to the base replica. They should thus count towards the current view update backlog, and consume units from the view update concurrency semaphore. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	e33e187096	service/storage_proxy: Use near-infinite timeouts for view updates View updates are sent with a timeout of 5 minutes, unrelated to any user-defined value and meant as a protection mechanism. During normal operation we don’t benefit from timing out view writes and offloading them to the hinted-handoff queue, since they are an internal, non-real time workload that we already spent resources on. This value should be increases further, but that change depends on Refs #2538 Refs #3826 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	86198060e5	database: generate_and_propagate_view_updates no longer needs a timeout We no longer wait on the semaphore and instead over-subscribe it, so there's not reason to pass a timeout. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	39eda68094	database: Don't generate view updates when node is overloaded We arrive at an overloaded state when we fail to acquire semaphore units in the base replica. This can mean clients are working in interactive mode, we fail to throttle them and consequently should start shedding load. We want to avoid impacting base table availability by running out of memory, so we could offload the memory queue to disk by writing the view updates as hints without attempting to send them. However, the disk is also a limited resource and in extreme cases we won’t be able to write hints. A tension exists between forgetting the view updates, thereby opening up a window for inconsistencies between base and view, or failing the base replica write. The latter can fail the whole user write, or if the coordinator was able to achieve CL, can instead cause inconsistencies between base tables (we wouldn't want to store a hint, because if the base replica is still overloaded, we would redo the whole dance). Between the devil and the deep blue sea, we chose to forget view updates. As a further simplification, we don't even write hints, assuming that if clients can’t be throttled (as we'll attempt to do in future patches), it will only be a matter of time before view updates can’t be offloaded. We also start acquiring the semaphore units using consume(), which is non-blocking, but allows for underflow of the available semaphore units. This is okay, and we expect not to underflow by much, as we stop generating new view updates. Refs #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	a3d30ea99a	db/view: Propagate acquired semaphore units to mutate_MV() Propagate acquired semaphore units to mutate_MV() to allow the semaphore to be incrementally signalled as view updates are processed by view replicas. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	8c1e6fcee8	db/timeout_clock: Define timeout_semaphore_units Defines the type of semaphore_units<> associated with timeout_semaphore. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	11c02c51fe	database: Wait for pending view updates to drain before stopping Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	185a4594af	database: Restore formatting of table::stop() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	f286d2ec34	database: Wait for pending operations in table::stop() Stopping a table with in-flight reads and writes can be happening concurrently, which rely on table state and we must therefore prevent its destruction before those operations complete. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	1f1fc36b72	database: Make view update concurrency semaphore memory-based The semaphore currently limiting the amount of view updates a given base replica emits aims to control the load that is imposed on the cluster, to protect view replicas from being overloaded when there are bursts of traffic (especially for degenerate cases like an index with low selectivity). 100 is, however, an arbitrary number. It might allow too much load on the view replicas, and it might also allow too much memory from the base shard to be consumed. Conversely, it might allow for too few updates to be queued in case of a burst, or to absorb updates while a view replica becomes partitioned. To deal with the load that is inflicted on the cluster, future patches will ensure that the rate of base writes obeys the rate at which the slowest view replica can consume the corresponding view updates. To protect the current shard from using too much memory for this queue, we will limit it to 10% of the shard's memory. The goal is to both protect the shard from being overloaded, but also to allow it to absorb bursts of writes resulting in large view mutations. Refs #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	bf4277fd8c	service/storage_proxy: Remove unused send_to_endpoint() overloads The send_to_endpoint() overloads that receive a non-frozen mutation are no longer used. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	2753cfee88	db/view: Generate view updates as frozen_mutations Working in terms of frozen_mutations allows us to account more precisely the memory pending view updates consume at the storage_proxy layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	715da6fd6b	db/view: Reserve vector space in mutate_MV() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	5d011eb61f	db/view: Cleanup mutate_MV() In particular, extract out the logic updating the stats in case of a failed update. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	7cfcd21bbb	database: Make lambda in table::populate_views mutable This allows an std::move() in its body to work as intended. Also, make the lambda's argument type explicit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	122737a8ab	Merge seastar upstream * seastar 132e6cd...6c8c229 (3): > reactor: disable nowait aio due to a kernel bug > core/semaphore: Allow combining semaphore_units() > core/shared_ptr: Allow releasing a lw_shared_ptr to a non-const object Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181217153241.67514-2-duarte@scylladb.com>	2018-12-19 12:57:07 +02:00
Duarte Nunes	bf05e59672	seastar: Change the source repository to scylla-seastar Scylla is at the moment incompatible with the Seastar master branch, so in order to allow Scylla commits that depend on Seastar patches, we change the submodule to point to scylla-seastar and use a branch (master-20181217) to hold these dependent commits. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181217153241.67514-1-duarte@scylladb.com>	2018-12-19 12:57:03 +02:00
Rafael Ávila de Espíndola	ff18c837b7	tests: Add missing include in random-utils.hh This file uses std::cout and so should include <iostream>. Found with a patch to seastar that removes some redundant <iostream> includes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181218183816.34504-1-espindola@scylladb.com>	2018-12-19 10:52:19 +00:00
Avi Kivity	dd51c659f7	config: remove "to be removed before release" notice mc sstable config The "enable_sstables_mc_format" config item help text wants to remove itself before release. Since scylla-3.0 did not get enough mc format mileage, we decided to leave it in, so the notice should be removed. Fixes #4003. Message-Id: <20181219082554.23923-1-avi@scylladb.com>	2018-12-19 09:39:29 +00:00
Duarte Nunes	a7456db687	Merge 'Simplify natural endpoint calculation' from Calle " Implementation of origin change c000da13563907b99fe220a7c8bde3c1dec74ad5 Modifies network topology calculation, reducing the amount of maps/sets used by applying the knowledge of how many replicas we expect/need per dc and sharing endpoint and rack set (since we cannot have overlaps). Also includes a transposed origin test to ensure new calculation matches the old one. Fixes #2896 " * 'calle/network_topology' of github.com:scylladb/seastar-dev: network_topology_test: Add test to verify new algorith results equals old network_topology_strategy: Simplify calculate_natural_endpoints token_metadata: Add "get_location" ip to dc+rack accessor sequenced_set: Add "insert" method, following std::set semantics	2018-12-19 09:39:29 +00:00
Rafael Ávila de Espíndola	b93d8d863d	Add a test with mismatched timestamps. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181218035931.3554-1-espindola@scylladb.com>	2018-12-18 11:30:56 +01:00
Tomasz Grabiec	37d9ba68bc	sstables: mc: reader: Skip ignored rows before parsing them Currently filtering happens inside consume_row_end() after the whole row is parsed. It's much faster to skip without parsing. This patch moves filtering and range tombstones splitting to consume_row_start(). _stored_row is no longer needed because in case the filter returns store_and_finish, the consumer exits with retry_later, and the parser will call consume_row_start() again when resumed. Tests: ./build/release/tests/perf/perf_fast_forward_g \ --sstable-format=mc \ --datasets large-part-ds1 \ --run-tests=large-partition-skips Before: read skip time (s) frags frag/s mad f/s max f/s min f/s aio (KiB) 1 4096 1.085142 1953 1800 32 1803 1720 4990 159604 After: read skip time (s) frags frag/s mad f/s max f/s min f/s aio (KiB) 1 4096 0.694560 1953 2812 11 2813 2684 4986 159588	2018-12-18 11:13:52 +01:00
Tomasz Grabiec	e3c3ef2f0e	sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts This way we will later avoid calling clear() for ignored rows.	2018-12-18 11:11:48 +01:00
Tomasz Grabiec	fa126106f8	sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row	2018-12-18 11:11:48 +01:00
Tomasz Grabiec	522a75f761	sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows mp_row_consumer_m::consume_row_marker_and_tombstone() is called for both clustering and static rows, but it dereferences and modifies _in_progress_row, which is only set when inside a clustering row. Fixes #3999.	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	9498977a34	sstables: mc: parser: Allow the consumer to skip the whole row The MC format contains row size before the row body, which we can use to skip the row without parsing its contents, which will be much faster.	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	b4c3b78082	sstables: continuous_data_consumer: Introduce skip()	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	36dd660507	sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state() Will allow state_processor to know its position in the stream. Currently position() is meaningless inside process_state() because in some cases it points to the position after the buffer and in some cases before it. This patch standardizes on the former. This is more useful than the latter because process_state() trims from the front of the buffer as it consumes, so the position inside the stream can be obtained by subtracting the remaining buffer size from position(), without introducing any new variables.	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	e950c8b00a	sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row The size of the bitset is the same for given row kind across the sstable, so we can allocate it once. _columns_selector is moved into row_schema structure, which we have one for each row kind and setup in the constructor.	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	fb15759934	sstables: reader: Do not read the head of the partition when index can be used read_partition() was always called through read_next_partition(), even if we're at the beginning of the read. read_next_partition() is supposed to skip to the next partition. It still works when we're positioned before a partition, it doesn't advance the consumer, but it clears _index_in_current_partition, because it (correctly) assumes it corresponds to the partition we're about to leave, not the one we're about to enter. This means that index lookups we did in the read initializer will be disregarded when reading starts, and we'll always start by reading partition data from the data file. This is suboptimal for reads which are slicing a large partition and don't need to read the front of the partition. Regression introduced in `4b9a34a854`. The fix is to call read_partition() directly when we're positioned at the beginning of the partition. For that purpose a new flag was introduced. test_no_index_reads_when_rows_fall_into_range_boundaries has to be relaxed, because it assumed that slicing reads will read the head of the partition. Refs #3984 Fixes #3992 Tested using: ./build/release/tests/perf/perf_fast_forward_g \ --sstable-format=mc \ --datasets large-part-ds1 \ --run-tests=large-partition-slicing-clustering-keys Before (focus on aio): offset read time (s) frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 4000000 1 0.001378 1 726 5 736 102 6 200 4 2 0 1 1 0 0 0 65.8% After: offset read time (s) frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 4000000 1 0.001290 1 775 6 788 716 2 136 2 0 0 1 1 0 0 0 69.1%	2018-12-18 11:11:37 +01:00
Tomasz Grabiec	385a4c23fd	sstables: mc: mutation_fragment_filter: Check the fast-forward window first Otherwise the parser will keep consuming and dropping fragments needlessly, rather than giving the user a chance to consume end-of-stream condition, and maybe skip again. Refs #3984	2018-12-18 11:11:37 +01:00
Tomasz Grabiec	62a1afaac9	sstables: mc: writer: Avoid calling unsigned_vint::serialized_size() Rather than adding serialized_size() to the body size before serializing the field, we can serialize the field to _tmp_bufs at the beginning and have the body size automatically account for it.	2018-12-18 11:11:36 +01:00
Duarte Nunes	1f578be187	Merge 'Fix evictable shard reader related issues' from Botond " Recently some additional issues were discovered related to recent changes to the way inactive readers are evicted and making shard readers evictable. One such issue is that the `querier_cache` is not prepared for the querier to be immediately evicted by the reader concurrency semaphore, when registered with it as an inactive read (#3987). The other issue is that the multishard mutation query code was not fully prepared for evicted shard readers being re-created, or failing why being re-created (#3991). This series fixes both of these issues and adds a unit test which covers the second one. I am working on a unit test which would cover the second issue, but it's proving to be a difficult one and I don't want to delay the fixes for these issues any longer as they also affect 3.0. Fixes: #3987 Fixes: #3991 Tests: unit(release, debug) " * 'evictable-reader-related-issues/v2' of https://github.com/denesb/scylla: multishard_mutation_query: reset failed readers to inexistent state multishard_mutation_query: handle missing readers when dismantling multishard_mutation_query: add support for keeping stats for discarded partitions multishard_mutation_query: expect evicted reader state when creating reader multishard_mutation_query: pretty-print the reader state in log messages querier_cache: check that the query wasn't evicted during registering reader_concurrency_semaphore: use the correct types in the constructor reader_concurrency_semaphore: add consume_resources() reader_concurrency_semaphore::inactive_read_handle: add operator bool()	2018-12-17 15:36:23 +00:00
Calle Wilund	e353a8633a	network_topology_test: Add test to verify new algorith results equals old Transposed from origin unit test. Creates a semi-random topology of racks, dcs, tokens and replication factors and verifies endpoint calculation equals old algo.	2018-12-17 13:10:59 +00:00
Calle Wilund	bfc6c89b00	network_topology_strategy: Simplify calculate_natural_endpoints Fixes #2896 (hopefully) Implementation of origin change c000da13563907b99fe220a7c8bde3c1dec74ad5 Reduces the amount of maps and sets and general complexity of endpoint calculation by simply mapping dc:s to expected node counts, re-using endpoint sets and iterate thusly. Tested with transposed origin unit test comparing old vs. new algo results. (Next patch)	2018-12-17 13:10:59 +00:00
Botond Dénes	b4c3aab4a7	multishard_mutation_query: reset failed readers to inexistent state When attempting to dismantling readers, some of the to-be-dismantled readers might be in a failed state. The code waiting on the reader to stop is expecting failures, however it didn't do anything besides logging the failure and bumping a counter. Code in the lower layers did not know how to deal with a failed reader and would trigger `std::bad_variant_access` when trying to process (save or cleanup) it. To prevent this, reset the state of failed readers to `inexistent_state` so code in the lower layers doesn't attempt to further process them.	2018-12-17 13:18:08 +02:00
Botond Dénes	9cef043841	multishard_mutation_query: handle missing readers when dismantling When dismantling the combined buffer and the compaction state we are no longer guaranteed to have the reader each partition originated from. The reader might have been evicted and not resumed, or resuming it might have failed. In any case we can no longer assume the originating reader of each partition will be present. If a reader no longer exists, discard the partitions that it emitted.	2018-12-17 13:18:08 +02:00
Botond Dénes	438bef333b	multishard_mutation_query: add support for keeping stats for discarded partitions In the next patches we will add code that will have to discard some of the dismantled partitions/fragments/bytes. Prepare the `dismantle_buffer_stats` struct for being able to track the discarded partitions/fragments/bytes in addition to those that were successfully dismantled.	2018-12-17 13:18:08 +02:00
Botond Dénes	ce52436af4	multishard_mutation_query: expect evicted reader state when creating reader Previously readers were created once, so `make_remote_reader()` had a validation to ensure readers were not attempted at being created more than once. This validation was done by checking that the reader-state is either `inexistent` or `successful_lookup`. However with the introduction of pausing shard readers, it is now possible that a reader will have to be created and then re-created several times, however this validation was not updated to expect this. Update the validation so it also expects the reader-state to be `evicted`, the state the reader will be if it was evicted while paused.	2018-12-17 13:18:08 +02:00
Botond Dénes	1effb1995b	multishard_mutation_query: pretty-print the reader state in log messages	2018-12-17 13:18:08 +02:00
Botond Dénes	5780f2ce7a	querier_cache: check that the query wasn't evicted during registering The reader concurrency semaphore can evict the querier when it is registered as an inactive read. Make the `querier_cache` aware of this so that it doesn't continue to process the inserted querier when this happens. Also add a unit test for this.	2018-12-17 13:18:08 +02:00
Botond Dénes	e1d8237e6b	reader_concurrency_semaphore: use the correct types in the constructor Previously there was a type mismatch for `count` and `memory`, between the actual type used to store them in the class (signed) and the type of the parameters in the constructor (unsigned). Although negative numbers are completely valid for these members, initializing them to negative numbers don't make sense, this is why they used unsigned types in the constructor. This restriction can backfire however when someone intends to give these parameters the maximum possible value, which, when interpreted as a signed value will be `-1`. What's worse the caller might not even be aware of this unsigned->signed conversion and be very suprised when they find out. So to prevent surprises, expose the real type of these members, trusting the clients of knowing what they are doing. Also add a `no_limits` constructor, so clients don't have to make sure they don't overflow internal types.	2018-12-17 13:18:08 +02:00
Botond Dénes	dfd649a6b4	reader_concurrency_semaphore: add consume_resources()	2018-12-17 13:18:08 +02:00
Botond Dénes	21b44adbfe	reader_concurrency_semaphore::inactive_read_handle: add operator bool()	2018-12-17 13:18:08 +02:00
Amnon Heiman	571755e117	node-exporter.service: Update command line to fix service startup The upgrade to node_exporter 0.17 commit `09c2b8b48a` ("node_exporter_install: switch to node_exporter 0.17") caused the service to no longer start. Turns out node_exported broke backwards compatibility of the command line between 0.15 to 0.16. Fix it up. While fixing the command line, all the collector that are enabled by default were removed. Fixes #3989 Signed-off-by: Amnon Heiman <amnon@scylladb.com> [ penberg@scylladb.com: edit commit message ] Message-Id: <20181213114831.27216-1-amnon@scylladb.com>	2018-12-17 10:22:17 +02:00
Rafael Ávila de Espíndola	4de14e6143	Add tests on broken mc range tombstones. This tests that we diagnose both two consecutive range starts and two consecutive range ends. Message-Id: <20181214212608.95452-1-espindola@scylladb.com>	2018-12-15 13:53:25 +01:00
Avi Kivity	b023e8b45d	Merge " Extract MC sstable writer to a separate compilation unit" from Tomasz " The motivation is to keep code related to each format separate, to make it easier to comprehend and reduce incremental compilation times. Also reduces dependency on sstable writer code by removing writer bits from sstales.hh. The ka/la format writers are still left in sstables.cc, they could be also extracted. " * 'extract-sstable-writer-code' of github.com:tgrabiec/scylla: sstables: Make variadic write() not picked on substitution error sstables: Extract MC format writer to mc/writer.cc sstables: Extract maybe_add_summary_entry() out of components_writer sstables: Publish functions used by writers in writer.hh sstables: Move common write functions to writer.hh sstables: Extract sstable_writer_impl to a header sstables: Do not include writer.hh from sstables.hh sstables: mc: Extract bound_kind_m related stuff into mc/types.hh sstables: types: Extract sstable_enabled_features::all() sstables: Move components_writer to .cc tests: sstable_datafile_test: Avoid dependency on components_writer	2018-12-14 15:05:00 +02:00
Duarte Nunes	224821303c	Merge 'Reduce the dependency on database.hh' from Botond " Working on database.hh or any header that is included in database.hh (of which there is a lot), is a major pain as each change involves the recompilation of half of our compilation units. Reduce the impact by removing the `#include "database.hh"` directive from as many header files as possible. Many headers can make do with just some forward declarations and don't need to include the entire headers. I also found some headers that included database.hh without actually needing it. Results Before: $ touch database.hh $ ninja build/release/scylla [1/154] CXX build/release/gen/cql3/CqlParser.o After: $ touch database.hh $ ninja build/release/scylla [1/107] CXX build/release/gen/cql3/CqlParser.o " * 'reduce-dependencies-on-database-hh/v2' of https://github.com/denesb/scylla: treewide: remove include database.hh from headers where possible database_fwd.hh: add keyspace fwd declaration service/client_state: de-inline set_keyspace() Move cache_temperature into its own header	2018-12-14 12:24:48 +00:00
Piotr Sarna	63bd43e57e	cql3: add refusing to create an index on static column Secondary indexes on static columns are not yet supported, so creating such index should return an appropriate error. Fixes #3993 Message-Id: <700b0a71e80da52d2d5250edacc12626b55681fa.1544785127.git.sarna@scylladb.com>	2018-12-14 11:15:28 +00:00
Rafael Ávila de Espíndola	f48d54543f	Use read_rows_flat to test broken sstables. The previous code was using mp_row_consumer_k_l to be as close to the tested code as possible. Given that it is testing for an unhandled exception, there is probably more value in moving it to a higher level, easier to use, API. This patch changes it to use read_rows_flat(). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181210235016.41133-1-espindola@scylladb.com>	2018-12-14 10:14:28 +01:00
Botond Dénes	1865e5da41	treewide: remove include database.hh from headers where possible Many headers don't really need to include database.hh, the include can be replaced by forward declarations and/or including the actually needed headers directly. Some headers don't need this include at all. Each header was verified to be compilable on its own after the change, by including it into an empty `.cc` file and compiling it. `.cc` files that used to get `database.hh` through headers that no longer include it were changed to include it themselves.	2018-12-14 08:03:57 +02:00
Botond Dénes	efe2b2c75d	database_fwd.hh: add keyspace fwd declaration	2018-12-14 08:03:57 +02:00
Tomasz Grabiec	245a0d953a	tests: cql_test_env: Start the compaction manager Broken in `fee4d2e` Not doing this results in compaction requests being ignored. One effect of this is that perf_fast_forward produces many sstables instead of one. Refs #3984 Refs #3983 Message-Id: <1544719540-10178-1-git-send-email-tgrabiec@scylladb.com>	2018-12-13 18:58:50 +02:00
Piotr Sarna	6743af5dbd	cql3: refuse to create index on COMPACT STORAGE with ck To follow C* compatibility, creating an index on COMPACT STORAGE table should be disallowed not only on base primary keys, but also when the base table contains clustering keys. Message-Id: <ab40c39730aff2e164d11ee5159ff62b8ec9e8e8.1544698186.git.sarna@scylladb.com>	2018-12-13 13:39:12 +00:00
Duarte Nunes	f8878238ed	service/storage_proxy: Embed the expire timer in the response handler Embedding the expire timer for a write response in the abstract_write_response_handler simplifies the code as it allows removing the rh_entry type. It will also make the timeout easily accessible inside the handler, for future patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181213111818.39983-1-duarte@scylladb.com>	2018-12-13 14:25:21 +02:00
Tomasz Grabiec	3889b05d7e	Merge "Tests and small fixes for composite markers" from Rafael * https://github.com/espindola/scylla espindola/add-composite-tests: Remove newline from exception messages. Fix end marker exception message. Add tests for broken start and end composite markers.	2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola	51fd880892	Add tests for broken start and end composite markers.	2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola	64439f6477	Fix end marker exception message. The code tested the end marker, but the exception mentioned the start marker. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola	cfd07185b7	Remove newline from exception messages. They are inconsistent with other uses of malformed_sstable_exception and incompatible with adding " in sstable ..." to the message. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-13 10:29:44 +01:00
Vlad Zolotarov	7da1ac2c2c	large_partition_handler: fix the message We currently detect large partitions - not rows. So this is what we should be reporting. Fixes #3986 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181212215506.9879-1-vladz@scylladb.com>	2018-12-13 00:11:27 +00:00
Rafael Ávila de Espíndola	894f07f912	Move default case out of two switches. These switches are fully covered, having the default label disables -Wswitch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181212160904.17341-1-espindola@scylladb.com>	2018-12-12 18:20:24 +01:00
Botond Dénes	10336c13fc	service/client_state: de-inline set_keyspace()	2018-12-12 18:14:03 +02:00
Botond Dénes	76fe4ebc18	Move cache_temperature into its own header Some headers need to include database.hh just because of cache_temperature. Move it into its own header so these includes can be removed.	2018-12-12 16:03:45 +02:00
Tomasz Grabiec	0a853b8866	sstables: index_reader: Avoid schema copy in advance_to() Introduced in `7e15e43`. Exposed by perf_fast_forward: running: large-partition-skips on dataset large-part-ds1 Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s (...) 1 0 5.268780 8000000 1518378 1 1 31.695985 4000000 126199 Message-Id: <1544614272-21970-1-git-send-email-tgrabiec@scylladb.com>	2018-12-12 11:33:46 +00:00
Tomasz Grabiec	ff2ad2f6bb	sstables: Make variadic write() not picked on substitution error If write(v, out, x) doesn't match any overload, the variadic write() will be picked, with Rest = {}. The compiler will print error messages about unable to find write(v, out), which totally obscures the original cause of mismatch. Make it picked only when there are at least two write() parameters so that debugging compilation errors is actually possible.	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	a14633c6d0	sstables: Extract MC format writer to mc/writer.cc This moves all MC-related writing code to mc/writer.cc: - m_format_write_helpers.hh is dropped - m_format_write_helpers_impl.hh is dropped - sstable_writer_m is moved out of sstables.cc sstable_writer_m is renamed to sstables::mc::writer	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	2636e6b5ab	sstables: Extract maybe_add_summary_entry() out of components_writer So that it can be used from writer implementations, which don't have access to the definition of the components_writer.	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	577e71478d	sstables: Publish functions used by writers in writer.hh	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	faf0ff1843	sstables: Move common write functions to writer.hh They are common for sstable writers of different formats. Note that writer.hh is supposed to be included only by writer implementations, not writer users.	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	3b4ccc85d0	sstables: Extract sstable_writer_impl to a header	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	6e3c9c3e5e	sstables: Do not include writer.hh from sstables.hh It is only needed by writer implementations.	2018-12-12 12:07:05 +01:00
Tomasz Grabiec	bd7e9ad3ab	sstables: mc: Extract bound_kind_m related stuff into mc/types.hh	2018-12-12 12:06:46 +01:00
Tomasz Grabiec	a4721b4d50	sstables: types: Extract sstable_enabled_features::all()	2018-12-12 12:06:45 +01:00
Tomasz Grabiec	90074d0b75	sstables: Move components_writer to .cc	2018-12-12 12:06:45 +01:00
Tomasz Grabiec	eff47a59ee	tests: sstable_datafile_test: Avoid dependency on components_writer It's LA format specific and it's going to become private to sstable.cc	2018-12-12 12:06:22 +01:00
Avi Kivity	fa96e07e6b	build: pass C compiler configuration in relocatable package build Just like we allow customizing the C++ compiler, we should allow customizing the C compiler. Ref #3978 Message-Id: <20181211172821.30830-1-avi@scylladb.com>	2018-12-12 11:45:13 +01:00
Calle Wilund	707bff563e	token_metadata: Add "get_location" ip to dc+rack accessor	2018-12-12 09:32:05 +00:00
Calle Wilund	66472bc52d	sequenced_set: Add "insert" method, following std::set semantics	2018-12-12 09:32:05 +00:00
Asias He	b9e0db801d	repair: Enable row level repair Finally, enable new row level repair if the cluster supports it. If not, fallback to the old partition level repair. Fixes #3033	2018-12-12 16:49:01 +08:00
Asias He	d372317e99	repair: Add row_level_repair === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes #3033	2018-12-12 16:49:01 +08:00
Asias He	b2b20cd5c0	repair: Add docs for row level repair	2018-12-12 16:49:01 +08:00
Asias He	fab31efae1	repair: Add repair_init_messaging_service_handler This patch implements all the rpc handlers for row level repair.	2018-12-12 16:49:01 +08:00
Asias He	3c80727d51	repair: Add repair_meta This patch introduces repair_meta class that is the core class for the row level repair. For each range to repair, repair_meta objects are created on both repair master and repair slaves. It stores the meta data for the row level repair algorithms, e.g, the current sync boundary, the buffer used to hold the rows the peers are working on, the reader to read data from sstable and the writer to write data to sstable. This patch also implements the RPC verbs for row level repair, for example, REPAIR_ROW_LEVEL_START/REPAIR_ROW_LEVEL_STOP to starts/stops row level repair for a range, REPAIR_GET_SYNC_BOUNDARY to get sync boundary peers want to work on, REPAIR_GET_ROW_DIFF to get missing rows from repair slaves and REPAIR_PUT_ROW_DIFF to pus missing rows to repair slaves.	2018-12-12 16:49:01 +08:00
Asias He	65099bac85	repair: Add repair_writer repair_writer uses multishard_writer to apply the mutation_fragments to sstable. The repair master needs one such writer for each of the repair slave. The repair slave needs one writer for the repair master.	2018-12-12 16:49:01 +08:00
Asias He	5b75f64e0e	repair: Add repair_reader repair_reader is used to read data from disk. It is simply a local flat_mutation_reader reader for the repair master. It is more complicated for the repair slave. The repair slaves have to follow what repair master read from disk. For example, Assume repair master has 2 shards and repair slave has 3 shards Repair master on shard 0 asks repair slave on shard 0 to read range [0,100). Repair master on shard 1 asks repair slave on shard 1 to read range [0,100). Repair master on shard 0 will only read the data that belongs to shard 0 within range [0,100). Since master and slave have different shard count, repair slave on shard 0 has to use the multi shard reader to collect data on all the shards. It can not pass range [0, 100) to the multi shard reader, otherwise it will read more data than the repair master. Instead, repair slave uses a sharder using sharding configuration of the repair master, to generate the sub ranges belong to shard 0 of repair master. If repair master and slave has the same sharding configuration, a simple local reader is enough for repair slave.	2018-12-12 16:49:01 +08:00
Asias He	27128d132d	repair: Add repair_row repair_row is the in-memory representation of "row" that the row level repair works on. It represents a mutation_fragment that is read from the flat_mutation reader. The hash of a repair_row is the combination of the mutation_fragment hash and partition_key hash.	2018-12-12 16:49:01 +08:00
Asias He	3e7b1d2ef4	repair: Add fragment_hasher It is used to calculate the hash of a mutation_fragment.	2018-12-12 16:49:01 +08:00
Asias He	e135871e4a	repair: Add decorated_key_with_hash Represents a decorated_key and the hash for it so that we do not need to calculate more than once if the decorated_key is used more than once.	2018-12-12 16:49:01 +08:00
Asias He	16c1b26937	repair: Add get_random_seed Get a random uint64_t number as the seed for the repair row hashing. The seed is passed to xx_hasher. We add the randomization when hashing rows so that when we run repair for the next time the same row produces different hashing number.	2018-12-12 16:49:01 +08:00
Asias He	54888ac52c	repair: Add get_common_diff_detect_algorithm It is used to find the common difference detection algorithms supported by repair master and repair slaves. It is up to repair master to choose what algorithm to use.	2018-12-12 16:49:01 +08:00
Asias He	0b294d5829	repair: Add shard_config It is used to store the shard configuration.	2018-12-12 16:49:01 +08:00
Asias He	a36b0966cf	repair: Add suportted_diff_detect_algorithms It returns a vector of row level repair difference detection algorithms supported by this node. We are going to implement the "send_full_set" in the following patches.	2018-12-12 16:49:01 +08:00
Asias He	42f2cd8dc5	repair: Add repair_stats to repair_info Also add update_statistics() to update current stats.	2018-12-12 16:49:01 +08:00
Asias He	43c04302f3	repair: Introduce repair_stats It is used by row level repair to track repair statistics.	2018-12-12 16:49:01 +08:00
Asias He	0067d32b47	flat_mutation_reader: Add make_generating_reader Move generating_reader from stream_session.cc to flat_mutation_reader.cc. It will be used by repair code soon. Also introduce a helper make_generating_reader to hide the implementation of generating_reader.	2018-12-12 16:49:01 +08:00
Asias He	fe4afb1aa3	storage_service: Introduce ROW_LEVEL_REPAIR feature With this feature enabled, the node supports row level repair.	2018-12-12 16:49:01 +08:00
Asias He	acc9ff8dce	messaging_service: Add RPC verbs for row level repair This patch adds the RPC verbs that are needed by the row level repair. The usage of those verbs are in the following patches. All the verbs for row level repair are sent by the repair master. Repair master asks repair slaves to create repair meta objects, a.k.a, repair_meta object, to store the repair meta data needed by row level repair algorithm. The repair meta object is identified by the IP address of the repair master and a uint32 number repair_meta_id chosen by repair master. When repair master restarts or is out of the cluster, repair slaves will detect it and remove all existing repair_meta for the repair master. When repair slave restarts, the existing repair_meta on the slave will be gone. The sync boundary used in the verbs is the position_in_partition of the last mutation_fragment. In each repair round, peers work on (last_sync_boundary, current_sync_boundary]	2018-12-12 16:49:01 +08:00
Asias He	8cfdcf435e	repair: Export the repair logger It will be used by the row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	e62aeae2db	repair: Export repair_info It will be used by the row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	6be3b35d52	repair: Export estimate_partitions It will be used by row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	48341a2d4d	idl: Add decorated_key support Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	1db4e3fd0a	idl: Add row_level_diff_detect_algorithm Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	ccc706559f	idl: Add get_sync_boundary_response Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	1173d1dd5a	idl: Add repair_sync_boundary Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	dc223e9216	idl: Add partition_key_and_mutation_fragments Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	5fbbc63676	idl: Add position_in_partition Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	e9fbc27740	idl: Add bound_weight It will be used by the row level repair code.	2018-12-12 16:49:01 +08:00
Asias He	3c39462397	idl: Add partition_region Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	e2b9840e24	idl: Add repair_hash Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	1a0bc8acf1	repair: Add struct hash<node_repair_meta_id> for node_repair_meta_id	2018-12-12 16:49:01 +08:00
Asias He	28d090ffda	repair: Add struct hash<repair_hash> for repair_hash	2018-12-12 16:49:01 +08:00
Asias He	ce70225b1c	repair: Introduce row_level_diff_detect_algorithm It specifies the algorithm that is used to find the row difference in repair.	2018-12-12 16:49:01 +08:00
Asias He	e9251df478	repair: Introduce partition_key_and_mutation_fragments Represent a partition_key and frozen_mutation_fragments within the partition_key.	2018-12-12 16:49:01 +08:00
Asias He	5d5a1beaec	repair: Introduce node_repair_meta_id It uses an IP address and a repair_meta_id to identify a repair instance started by the row level repair.	2018-12-12 16:49:01 +08:00
Asias He	edd72e10ac	repair: Introduce get_sync_boundary_response The return value of the REPAIR_GET_SYNC_BOUNDARY verb. It will be used in the row level repair code soon.	2018-12-12 16:49:01 +08:00
Asias He	95b9a889cf	repair: Introduce repair_hash It represents the hash value of a repair row.	2018-12-12 16:49:01 +08:00
Asias He	3e86b7a646	repair: Introduce repair_sync_boundary Represent a position of a mutation_fragment read from a flat mutation reader. Repair nodes negotiate a small sub range identified by two repair_sync_boundary to work on in each round.	2018-12-12 16:49:01 +08:00
Asias He	063dfcda26	messaging_service: Add constructor for msg_addr Which takes the ip address and shard id.	2018-12-12 16:49:01 +08:00
Asias He	8cb3ea98d0	xx_hasher: Allow specifying seed It will be used by row level repair.	2018-12-12 16:49:01 +08:00
Asias He	165d3053b1	position_in_partition: Add get_type, get_bound_weight and get_clustering_key_prefix Needed by the RPC serialization code.	2018-12-12 16:49:01 +08:00
Asias He	4e55d22a8f	position_in_partition: Switch _bound_weight to use enum The _bound_weight in position_in_partition will be sent on wire in rpc. Make it enum instead of int.	2018-12-12 16:49:01 +08:00
Asias He	5bc109e1ee	position_in_partition: Add bound_weight It will be used to change _bound_weight to use enum instead of int8_t.	2018-12-12 16:49:01 +08:00
Asias He	05c663b932	position_in_partition: Use std::optional for clustering_key_prefix The new row level repair code will access clustering_key_prefix and it uses std::optional everywhere. Convert position_in_partition to use std::optional.	2018-12-12 16:49:01 +08:00
Asias He	0b31d7059b	position_in_partition: Make partition_region uint8_t It will be sent over rpc. Make the type explicit.	2018-12-12 16:49:01 +08:00
Asias He	dfd206b3a3	serializer: Add std::optional support	2018-12-12 16:49:01 +08:00
Asias He	3eecdc670f	serializer: Add std::list support Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	b540df2819	serializer: Add std::unordered_set support Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	1367c8c47e	dht: Add make_partitioner Given the name and shard count and the sharding_ignore_msb_bits, make a partitioner. It is used by row level repair.	2018-12-12 16:49:01 +08:00
Asias He	f1a914060b	dht: Add constructor for decorated_key which takes token and partition_key decorated_key(const dht::token& t, const partition_key& k)	2018-12-12 16:49:01 +08:00
Asias He	71c1681f6c	storage_service: Notify NEW_NODE only when a node is new node This is a backport of CASSANDRA-11038. Before this, a restarted node will be reported as new node with NEW_NODE cql notification. To fix, only send NEW_NODE notification when the node was not part of the cluster Fixes: #3979 Tests: pushed_notifications_test.py:TestPushedNotifications.restart_node_test Message-Id: <453d750b98b5af510c4637db25b629f07dd90140.1544583244.git.asias@scylladb.com>	2018-12-12 07:33:49 +02:00
Juliana Oliveira	5eb76c9bc6	compress: add support for Cassandra's compression parameter This patch adds compatibility for Cassandra's "chunk_size_in_kb", as well as it keeps Scylla's "chunk_size_kb" compression parameter. Fixes #3669 Tests: unit (release) v2: use variable instead of array v3: fix commited files Signed-off-by: Juliana Oliveira <juliana@scylladb.com> Message-Id: <20181211215840.GA7379@shenzou.localdomain>	2018-12-11 23:33:27 +00:00
Nadav Har'El	a0379209e6	secondary indexes: fail attempts to create a CUSTOM INDEX Cassandra supports a "CREATE CUSTOM INDEX" to create a secondary index with a custom implementation. The only custom implementation that Cassandra supports is SASI. But Scylla doesn't support this, or any other custom index implementation. If a CREATE CUSTOM INDEX statement is used, we shouldn't silently ignore the "CUSTOM" tag, we should generate an error. This patch also includes a regression test that "CREATE CUSTOM INDEX" statements with valid syntax fail (before this patch, they succeeded). Fixes #3977 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181211224545.18349-2-nyh@scylladb.com>	2018-12-11 23:33:02 +00:00
Nadav Har'El	36db4fba23	Fix typo in error message Interestingly, this typo was copied from the original Cassandra source code :-) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181211224545.18349-1-nyh@scylladb.com>	2018-12-11 23:32:58 +00:00
Avi Kivity	5b08e91bdb	tools: add SYS_PTRACE capability to dbuild LeakSanitizer uses ptrace, and docker disables ptrace by default. Add it back so tests pass. Message-Id: <20181208112524.19229-1-avi@scylladb.com>	2018-12-11 19:09:12 +00:00
Avi Kivity	34a31a807d	build: build libdeflate with user selected C compiler If the user specified a C compiler, use it to build libdeflate. Fixes #3978. Message-Id: <20181211145604.14847-1-avi@scylladb.com>	2018-12-11 14:58:16 +00:00
Duarte Nunes	89ae3fbf11	db/system_distributed_keyspace: Create the schema with min_timestamp Different nodes can concurrently create the distributed system keyspace on boot, before the "if not exists" clause can take effect. However, the resulting schema mutations will be different since different nodes use different timestamps. This patch forces the timestamps to be the same across all nodes, so we save some schema mismatches. This fixes a bug exposed by `ca5dfdf`, whereby the initialization of the distributed system keyspace is done before waiting for schema agreement. While waiting for schema agreement in storage_service::join_token_ring(), the node still hasn't joined the ring and schemas can't be pulled from it, so nodes can deadlock. A similar situation can happen between a seed node and a non-seed node, where the seed node progresses to a different "wait for schema agreement" barrier, but still can't make progress because it can't pull the schema from the non-seed node still trying to join the ring. Finally, it is assumed that changes to the schema of the current distributed system keyspace tables will be protected by a cluster feature and a subsequent schema synchronization, such that all nodes will be at a point where schemas can be transferred around. Fixes #3976 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181211113407.20075-1-duarte@scylladb.com>	2018-12-11 13:35:48 +01:00
Paweł Dziepak	e3f53542c9	Merge "Optimize sstable writing of large partitions" from Tomasz " This series contains several optimizations of the MC format sstable writer, mainly: - Avoiding output_stream when serializing into memory (e.g. a row) - Faster serialization of primitive types when serializing into memory I measured the improvement in throughput (frag/s) using perf_fast_forward for datasets with a single large partition with many small rows: - 10% for a row with a single cell of 8 bytes - 10% for a row with a single cell of 100 bytes - 9% for a row with a single cell of 1000 bytes - 13% for a row with 6 cells of 100 bytes " * tag 'avoid-output-stream-in-sstable-writer-v2' of github.com:tgrabiec/scylla: bytes_ostream: Optimize writing of fixed-size types sstables: mc: Write temporary data to bytes_ostream rather than file_writer sstables: mc: Avoid double-serialization of a range tombstone marker sstables: file_writer: Generalize bytes& writer to accept bytes_view sstables: Templetize write() functions on the writer sstables: Turn m_format_write_helpers.cc into an impl header sstables: De-futurize file_writer bytes_ostream: Implement clear() bytes_ostream: Make initial chunk size configurable	2018-12-11 12:29:24 +00:00
Duarte Nunes	d66bd0100b	Merge 'Simplify db::extensions' from Avi " Carry out simplifications of db::extensions: less magical types, de-inline complex functions, and reduce #include dependencies Tests: unit(release) " * tag 'extensions-simplify/v1' of https://github.com/avikivity/scylla: extensions: remove unneeded includes extensions: deinline extension accessors extensions: return concrete types from the extension accessors extensions: remove dependency on cql layer	2018-12-10 22:00:51 +00:00
Avi Kivity	b251183359	extensions: remove unneeded includes <boost/any.hpp> is not used, and "schema.hh" can be replaced with forward declarations.	2018-12-10 21:34:09 +02:00
Avi Kivity	119a83bf2f	extensions: deinline extension accessors Quite complex code that is not performance sensitive. Move it out of line.	2018-12-10 21:22:56 +02:00
Avi Kivity	e9f5641b64	extensions: return concrete types from the extension accessors Returning "auto" makes it harder to understand what the function is returning, and impossible to de-inline. Return a vector of pointers instead. The caller should iterate immediately, in any case, and since the previous return value was a range of references to const unique_ptrs, nothing else could be done with it anyway.	2018-12-10 21:16:45 +02:00
Tomasz Grabiec	f206ef0038	bytes_ostream: Optimize writing of fixed-size types Inlining write() allows the writing code to be optimized for fixed-size types. In particular, memcpy() calls and loops will be eliminated. Saw 4% improvement in throughput in perf_fast_forward for tiny rows.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	5a35240d47	sstables: mc: Write temporary data to bytes_ostream rather than file_writer Currently temporary data is serialized into a file_writer, because that's what write() functions used to expect, which goes through an output_stream, a data_sink, into an in-memory data sink implementation which collects the temporary_buffers. Going through those abstractions is relatively expensive if we don't write much, because each time we begin to write after a flush() of the file_writer the output stream has to allocate a new buffer, which means a large allocation for small amount of data. We could avoid that and write into bytes_ostream directly, which will keep its buffer across clear(). write() functions which are used both to write directly into the data file and to a temporary arena were templatized to accept a Writer to which both file_writer and bytes_ostream conform.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	c4003b3e79	sstables: mc: Avoid double-serialization of a range tombstone marker	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	9edb9434e5	sstables: file_writer: Generalize bytes& writer to accept bytes_view Note that bytes is imlpicitly convertible to bytes_view.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	fad4fba4bc	sstables: Templetize write() functions on the writer Will allow writing to both a file_writer, or an in-memory writer like a bytes_ostream.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	f4016996d3	sstables: Turn m_format_write_helpers.cc into an impl header I need to templatize functions defined in it and want to avoid explicit instantiations. There is only one compilation unit in which this is used (sstables.cc). I think in the long term we should move all those "helpers" into sstables/mc/writer.{cc,hh} together with their only user, the sstable_writer_m class from sstables.cc.	2018-12-10 20:07:43 +01:00
Tomasz Grabiec	13999a4d09	sstables: De-futurize file_writer	2018-12-10 20:07:43 +01:00
Tomasz Grabiec	a1fb441df8	bytes_ostream: Implement clear()	2018-12-10 20:07:43 +01:00
Tomasz Grabiec	7cf5de3d9c	bytes_ostream: Make initial chunk size configurable	2018-12-10 20:07:43 +01:00
Avi Kivity	8e05bcbe71	extensions: remove dependency on cql layer The extensions class reaches into cql's property_definitions class to grab a map<sstring, sstring> type. This generates a few unneeded dependencies. Reduce dependencies by defining the map type ourselves; if cql's property_definitions changes in an incompatible way, it will have to adapt, rather than the extensions class.	2018-12-10 20:55:30 +02:00
Tomasz Grabiec	1dd2bf52ca	Merge "Add a couple of tests of broken sstables" From Rafael These are the current uninteresting cases I found when looking at malformed_sstable_exception. The existing code is working, just not being tested. * https://github.com/espindola/scylla.git espindola/espindola/broken-sst: Add a broken sstable test. Add a test with mismatched schema.	2018-12-10 19:30:58 +01:00
Tomasz Grabiec	538e041f22	Merge "Remove some dependencies on db::config" from Avi db::config is a global class; changes in any module can cause changes in db::config. Therefore, it is a cause of needless recompilation. Remove some of these dependencies by having consumers of db::config declare an intermediate config struct that is contains only configuration of interest to them, and have their caller fill it out (in the case of auth, it already followed this scheme and the patchset only moves the translation function). In addition, some outright pointless inclusions of db/config.hh are removed. The result is somewhat shorter compile times, and fewer needless recompiles. * https://github.com/avikivity/scylla unconfig-1/v1: config: remove inclusions of db/config.hh from header files repair: remove unneeded config.hh inclusion batchlog_manager: remove dependency on db::config auth: remove permissions_cache dependency on db::config auth: remove auth::service dependency on db::config auth: remove unneeded db/config.hh includes	2018-12-10 14:53:14 +01:00
Benny Halevy	ef53ddf3ae	scylla_io_setup: correct units in low space warning GiB -> GB Refs #2676 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181210092503.10344-1-bhalevy@scylladb.com>	2018-12-10 13:58:49 +02:00
Avi Kivity	475b151c97	Merge "Use utils::small_vector more in read path" from Paweł " This series optimises the read path by replacing some usages of std::vector by utils::small_vector. The motivation for this change was an observation that memory allocation functions are pointed out by the profiler as the ones where we spent most time and while they have a large number of callers storage allocation for some vectors was close to the top. The gains are not huge, since the problem is a lot of things adding up and not a single slow thing, but we need to start with something. Unfortunately, the performance of boost::container::small_vector is quite disappointing so a new implementation of a small_vector was introduced. perf_simple_query -c4 --duration 60, medians: ./perf_before ./perf_after diff read 343086.80 360720.53 5.1% Tests: unit(release, small_vector in debug) " * tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla: partition_slice: use small_vector for column_ids mutation_fragment_merger: use small_vector auth: use small_vector in resource auth: avoid list-initialisation of vectors idl: serialiser: add serialiser for utils::small_vector idl: serialiser: deduplicate vector serialisers utils: introduce small_vector intrusive_set_external_comparator: make iterator nothrow move constructible mutation_fragment_merger: value-initialise iterator	2018-12-10 13:50:59 +02:00
Duarte Nunes	a42b2895c2	Merge branch 'gossip: Send node UP event to cql client after cql server is up' from Asias " This is a backport of CASSANDRA-8236. Before this patch, scylla sends the node UP event to cql client when it sees a new node joins the cluster, i.e., when a new node's status becomes NORMAL. The problem is, at this time, the cql server might not be ready yet. Once the client receives the UP event, it tries to connect to the new node's cql port and fails. To fix, a new application_sate::RPC_READY is introduced, new node sets RPC_READY to false when it starts gossip in the very beginning and sets RPC_READY to true when the cql server is ready. The RPC_READY is a bad name but I think it is better to follow Cassandra. Nodes with or without this patch are supposed to work together with no problem. Refs #3843 " * 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev: storage_service: Use cql_ready facility storage_service: Handle application_state::RPC_READY storage_service: Add notify_cql_change storage_service: Add debug log in notify_joined storage_service: Add extra check in notify_joined storage_service: Add notify_joined storage_service: Add debug log in notify_up storage_service: Add extra check in notify_up storage_service: Add notify_up storage_service: Make notify_left log debug level storage_service: Introduce notify_left storage_service: Add debug log in notify_down storage_service: Introduce notify_down storage_service: Add set_cql_ready gossip: Add gossiper::is_cql_ready gms: Add endpoint_state::is_cql_ready gms: Add application_state::RPC_READY gms: Introduce cql_ready in versioned_value	2018-12-10 11:37:59 +00:00
Asias He	06dc9b8da0	storage_service: Use cql_ready facility At this point the cql_ready facility is ready. To use it, advertise the RPC_READY application state in the following cases: - When a node boots, set it to false - When cql server is ready, set it to true - When cql server is down, set it to false	2018-12-10 19:20:20 +08:00
Asias He	4761b53035	storage_service: Handle application_state::RPC_READY	2018-12-10 19:20:20 +08:00
Asias He	0e64814206	storage_service: Add notify_cql_change It is called when a RPC_READY gossip application state is received.	2018-12-10 19:20:20 +08:00
Asias He	a1bbd7bcc7	storage_service: Add debug log in notify_joined	2018-12-10 19:20:20 +08:00
Asias He	17d68cb408	storage_service: Add extra check in notify_joined Do not send node joined event if node is not in NORMAL status which means the node has joined the cluster officially.	2018-12-10 19:20:20 +08:00
Asias He	9abb15192f	storage_service: Add notify_joined Add a helper for node joined event.	2018-12-10 19:20:20 +08:00
Asias He	60c74431f7	storage_service: Add debug log in notify_up	2018-12-10 19:20:20 +08:00
Asias He	948d2b6c78	storage_service: Add extra check in notify_up Do not send up event if is_cql_ready is false which means cql server is not ready yet or node is down.	2018-12-10 19:20:20 +08:00
Asias He	48cd31dc1e	storage_service: Add notify_up Add a helper for node up event.	2018-12-10 19:20:20 +08:00
Asias He	03f9c3e7e5	storage_service: Make notify_left log debug level Be consistent with other notification log.	2018-12-10 19:20:20 +08:00
Asias He	a5ec25f28b	storage_service: Introduce notify_left Add a helper for node left event.	2018-12-10 19:20:20 +08:00
Asias He	15d7fce902	storage_service: Add debug log in notify_down	2018-12-10 19:20:19 +08:00
Asias He	f18cb0654d	storage_service: Introduce notify_down Add a helper for node down event.	2018-12-10 19:20:19 +08:00
Asias He	2f3130b36f	storage_service: Add set_cql_ready It is used to set the status of the RPC_READY of this node so it can be advertised by gossip.	2018-12-10 19:20:17 +08:00
Asias He	e07150166a	gossip: Add gossiper::is_cql_ready - New scylla node always send application_state::RPC_READY = false when the node boots and send application_state::RPC_READY = true when cql server is up - Old scylla node that does not support the application_state::RPC_READY never has application_state::RPC_READY in the endpoint_state, we can only think their cql server is up, so we return true here if application_state::RPC_READY is not present	2018-12-10 19:16:44 +08:00
Asias He	2737654c75	gms: Add endpoint_state::is_cql_ready Retrun if the endpoint_state has the RPC_READY application_state.	2018-12-10 19:16:44 +08:00
Asias He	67093324ad	gms: Add application_state::RPC_READY It is used to tell peer nodes that the cql server is ready and can accept clients request. Follow the same name which Cassandra uses.	2018-12-10 19:16:44 +08:00
Asias He	4ed2ef23e9	gms: Introduce cql_ready in versioned_value	2018-12-10 19:16:43 +08:00
Avi Kivity	7c7da0b462	sstables: fix overflow in clustering key blocks header bit access _ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too. Otherwise, a shift in the range 32-63 will produce wrong results. Fix by using a 64-bit mask. Found by Fedora 29's ubsan. Fixes #3973. Message-Id: <20181209120549.21371-1-avi@scylladb.com>	2018-12-10 11:09:25 +00:00
Takuya ASADA	a2d0ebf4d9	dist/offline_installer/redhat: fix missing dependencies Offline installer with Scylla 3.0 causes dependency error on CentOS, added missing packages. Fixes #3969 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181207020711.23055-1-syuu@scylladb.com>	2018-12-10 12:47:10 +02:00
Avi Kivity	904db433d9	Merge "Re-use commitlog segments" from Calle " Refs #3929 Enables re-use of commitlog segments. First, ensures we never succeed playing back a commitlog segment with name not matching the ID:s in the actual file data, by determining expected id based on file name. This will also handle partially written re-used files, as each chunk headers CRC is dependent on the ID, and will fail once we hit any left-overs. Second part renamed and puts files into a recycle list instead of actually deleting them when finished. Allocating new files will the prioritize this list before creating a new file. Note that since consumtion and release of segments can be somewhat unbalanced, this does not really guarantee we will use recycled files even in all cases when it might be possible, simply because of timing. It does however give a good chance of it. We limit recycled files based on the max disk size setting, thus we can potentially grow disk size more than without depending on timing, but not uncontrolled. While all this theoretially might improve disk writes in some cases, it is far from any magic bullet. No real performance testing has been done yet, only functional. " * 'calle/commitlog-reuse' of github.com:scylladb/seastar-dev: commitlog: Recycle used segments instead of delete + new file commitlog: Terminate all segments with a zero chunk commitlog_replay: Enforce file name based id matching	2018-12-10 11:15:02 +02:00
Calle Wilund	55f10ffc43	commitlog: Recycle used segments instead of delete + new file Refs #3929 When deleting a segment, IFF we have not yet filled up all reserves, instead of actually deleting the file, put it on a "recycle" list. Next segment allocation will instead of creating a new one simply rename the segment and reuse the file and its allocated space. We rename the file twice: Once on adding to recycle list, with special prefix so we don't mix up actual replayable segments and these. Second when we actually re-use the file (also to ensure consecutive names). Note that we limit the amount of recyclables, so a really stressed application which somehow fills up the replenish queue might cause us to still drop the segments. Could skip this but risk getting to many files on disk. Replay should be safe, since all entries are guarded by CRC based on the file ID (i.e. file name). Thus replaying a recycled segment will simply cause a CRC error in the main header and be ignored (see previous patch). Segments that are fully synced will have terminating zero-header (see previous patch) so we know when to stop processing a recycled file. If a file is the result of a mid-write crash, we will generate a CRC processing error as "normally" in this case, when hitting partially written block or coming to an old/new chunk boundary. v2: * Sync dir on rename * auto -> const sstring& * Allow recycling files as long as we're within disk space limits v3: * Use special names for files waiting for reuse	2018-12-10 09:09:07 +00:00
Calle Wilund	b13b6ef6a0	commitlog: Terminate all segments with a zero chunk Writes a final chunk header of zero to the file on close, to mark end-of-segment. This allows us to gracefully stop replay processing of a segment file even if it was not zeroed from the beginning (maybe recycled - hint hint).	2018-12-10 09:09:07 +00:00
Calle Wilund	b35af84599	commitlog_replay: Enforce file name based id matching When reading the header chunk of a commitlog file, check the stored id value against the id derived from the file name, and ignore if mismatched. This is a prerequisite for re-using renamed commitlog files, as we can then fail-fast should one such be left on disk, instead of trying to replay it. We also check said id via the CRC check for each chunk parsed. If we find a chunk with mismatched id, we will get a CRC error for the chunk, and replay will terminate (albeit not gracefully).	2018-12-10 09:09:07 +00:00
Amnon Heiman	09c2b8b48a	node_exporter_install: switch to node_exporter 0.17 The newer version of node_exporter comes with important bug fixes, that is especially important for I3.metal is not supported with the older version of node_exporter. The dashboards can now support both the new and the old version of node_exporter. Fixes #3927 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20181210085251.23312-1-amnon@scylladb.com>	2018-12-10 10:54:50 +02:00
Benny Halevy	bcb486b8b9	scylla_io_setup: io_tune should not run when there is less than 10GB of disk space Fixes #2676 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181209174852.3620-1-bhalevy@scylladb.com>	2018-12-10 10:38:33 +02:00
Yibo Cai (Arm Technology China)	6717816a8d	utils/gz: optimize crc_combine for arm64 Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1544418903-26290-1-git-send-email-yibo.cai@arm.com>	2018-12-10 10:31:08 +02:00
Avi Kivity	40677fae37	Merge "Compaction strategy aware major compaction" from Raphael " Make major compaction aware of compaction strategy, by using an optimal approach which suits the strategy needs. Refs #1431. " * 'compaction_strategy_aware_major_compaction_v2' of github.com:raphaelsc/scylla: tests: add test for compaction-strategy-aware major compaction compaction: implement major compaction heuristic for leveled strategy compaction: introduce notion of compaction-strategy-aware major compaction	2018-12-10 10:10:22 +02:00
Avi Kivity	d7c7949d43	auth: remove unneeded db/config.hh includes	2018-12-09 20:11:38 +02:00
Avi Kivity	37a681e46d	auth: remove auth::service dependency on db::config auth::service already has its own configuration and a function to create it from db::config; just move it to the caller. This reduces dependencies on the global db::config class.	2018-12-09 20:11:38 +02:00
Avi Kivity	77e6b7a155	auth: remove permissions_cache dependency on db::config permissions_cache already has its own configuration and a function to create it from db::config; just move it to the caller. This reduces dependencies on the global db::config class.	2018-12-09 20:11:38 +02:00
Avi Kivity	89be47e291	batchlog_manager: remove dependency on db::config Extract configuration into a new struct batchlog_manager_config and have the callers populate it using db::config. This reduces dependencies on global objects.	2018-12-09 20:11:38 +02:00
Avi Kivity	85e9b0d78d	repair: remove unneeded config.hh inclusion	2018-12-09 20:11:38 +02:00
Avi Kivity	864f55e745	config: remove inclusions of db/config.hh from header files Instead, distribute those inclusions to .cc files that require them. This reduces rebuilds when config.hh changes, and makes it easier to locate files that need config disaggregation.	2018-12-09 20:11:38 +02:00
Amos Kong	09a3b11c2f	scylla_setup: only ask for nic in interactive mode Current scylla_setup still asks for nic even nic is already assigned in cmdline. Fixes #3908 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <6b867e17a5583c495c771a37d5fa1e8366b1d61b.1542337635.git.amos@scylladb.com>	2018-12-09 15:29:31 +02:00
Gleb Natapov	9fb79bf379	storage_proxy: fix crash during write timeout callback invocation rh_entry address is captured inside timeout's callback lambda, so the structure should not be moved after it is created. Change the code to create rh_entry in-place instead of moving it into the map. Fixes #3972. Message-Id: <20181206164043.GN25283@scylladb.com>	2018-12-09 10:33:37 +02:00
Vladimir Krivopalov	6a5d8934a6	db: Enable SSTables 'mc' format by default. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <ab4394b98a520b87c986bea2ceef13d015688967.1544227350.git.vladimir@scylladb.com>	2018-12-08 11:07:38 +02:00
Tomasz Grabiec	b78d98a358	tests: perf_fast_forward: Fix result_collector::add() for multi-element results The results vector should be populated vertically, not horizontally. Responsible for assertion failure with --cache-enabled: void result_collector::add(test_result_vector): Assertion `rs.size() == results.size()' failed. Introduced in `3fc78a25bf`. Message-Id: <1544105835-24530-2-git-send-email-tgrabiec@scylladb.com>	2018-12-07 12:44:32 +00:00
Tomasz Grabiec	10cde9ae50	tests: perf_fast_forward: Fix live_range not being initialized Broken in `470552b7ab` Causes test failure when running with --cache-enabled Message-Id: <1544105835-24530-1-git-send-email-tgrabiec@scylladb.com>	2018-12-07 12:38:01 +00:00
Tomasz Grabiec	bb24d378b2	Merge "Fixes for collecting stats in SST3 + more tests" from Vladimir This patchset fixes several remaining issues found during thorough testing of SSTables 3.x statistics and enriches ~30 unit tests with statistics validation against Cassandra-generated golden copies. * https://github.com/argenet/scylla/tree/projects/sstables-30/sst3-tests-statistics/v1: sstables: Enforce estimated_partitions in generate_summary() to be always positive. sstables: Don't enforce default max_local_deletion_time value for 'mc' files. sstables: Update TTL/local deletion stats for non-expiring and live liveness_info. sstables: Collect statistics when writing RT markers to SSTables 3.x. tests: Return sstable_assertions from validate_read() helper. tests: Introduce helper for validating stats metadata in SSTables 3.x tests. tests: Add stats metadata validation to test_write_static_row. tests: Add stats metadata validation to test_write_composite_partition_key. tests: Add stats metadata validation to test_write_composite_clustering_key. tests: Add stats metadata validation to test_write_wide_partitions. tests: Add stats metadata validation to write_ttled_row tests: Add stats metadata validation to write_ttled_column tests: Add stats metadata validation to write_deleted_column tests: Add stats metadata validation to write_deleted_row tests: Add stats metadata validation to write_collection_wide_update tests: Add stats metadata validation to write_collection_incremental_update tests: Add stats metadata validation to write_multiple_partitions tests: Add stats metadata validation to write_multiple_rows tests: Add stats metadata validation to write_missing_columns_large_set tests: Add stats metadata validation to write_different_types tests: Add stats metadata validation to write_empty_clustering_values tests: Add stats metadata validation to write_large_clustering_key tests: Add stats metadata validation to write_compact_table tests: Add stats metadata validation to write_user_defined_type_table tests: Add stats metadata validation to write_simple_range_tombstone tests: Add stats metadata validation to write_adjacent_range_tombstones tests: Add stats metadata validation to write_non_adjacent_range_tombstones tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows tests: Add stats metadata validation to write_range_tombstone_same_start_with_row tests: Add stats metadata validation to write_range_tombstone_same_end_with_row tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests.	2018-12-07 12:05:55 +01:00
Vladimir Krivopalov	98ae39f920	tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	dcd639b4d5	tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	d07ab3b3ef	tests: Add stats metadata validation to write_range_tombstone_same_end_with_row Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	b856cf837e	tests: Add stats metadata validation to write_range_tombstone_same_start_with_row Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	ba24572fb6	tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	4167c9e51d	tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	fd1c9b84c6	tests: Add stats metadata validation to write_non_adjacent_range_tombstones Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	1a6d613654	tests: Add stats metadata validation to write_adjacent_range_tombstones Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	57d2d1a1c6	tests: Add stats metadata validation to write_simple_range_tombstone Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	bc5d5633dc	tests: Add stats metadata validation to write_user_defined_type_table Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	d9f2829ca0	tests: Add stats metadata validation to write_compact_table Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	3a1e287c6a	tests: Add stats metadata validation to write_large_clustering_key Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	722fc7222a	tests: Add stats metadata validation to write_empty_clustering_values Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	1367243b7e	tests: Add stats metadata validation to write_different_types Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	12b10c0cca	tests: Add stats metadata validation to write_missing_columns_large_set Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	c990c518fc	tests: Add stats metadata validation to write_multiple_rows Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	9bb46f7cc6	tests: Add stats metadata validation to write_multiple_partitions Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	99d3cbd2fc	tests: Add stats metadata validation to write_collection_incremental_update Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	0118b15c06	tests: Add stats metadata validation to write_collection_wide_update Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	85782ed729	tests: Add stats metadata validation to write_deleted_row Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	66913adcc6	tests: Add stats metadata validation to write_deleted_column Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	997101f105	tests: Add stats metadata validation to write_ttled_column Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	a018388049	tests: Add stats metadata validation to write_ttled_row Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	260dfb3492	tests: Add stats metadata validation to test_write_wide_partitions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	349a73c464	tests: Add stats metadata validation to test_write_composite_clustering_key. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	4f14e65d70	tests: Add stats metadata validation to test_write_composite_partition_key. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	a7b85e8009	tests: Add stats metadata validation to test_write_static_row. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	ccb2dec22b	tests: Introduce helper for validating stats metadata in SSTables 3.x tests. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	5f6240cd7d	tests: Return sstable_assertions from validate_read() helper. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	cc12449646	sstables: Collect statistics when writing RT markers to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	2e5c221865	sstables: Update TTL/local deletion stats for non-expiring and live liveness_info. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Rafael Ávila de Espíndola	298873d33b	Add a test with mismatched schema. The sstable in the test is fine, but the schema thinks a static column is regular.	2018-12-06 15:38:01 -08:00
Rafael Ávila de Espíndola	d392bc4924	Add a broken sstable test. This sstable has a static column with clustering information.	2018-12-06 15:23:33 -08:00
Raphael S. Carvalho	1ddbbe51e6	tests: add test for compaction-strategy-aware major compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:37:16 -02:00
Raphael S. Carvalho	525ee18560	compaction: implement major compaction heuristic for leveled strategy Major compaction for leveled strategy will now create a run of non-overlapping sstables at the highest level. Until now, a single sstable would be created at level 0 which was very suboptimal because all data would need to climb up the levels again, making it a very expensive I/O process. Refs #1431. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:22:31 -02:00
Raphael S. Carvalho	3d9566e40d	compaction: introduce notion of compaction-strategy-aware major compaction That's only the very first step which introduces the machinery for making major compaction aware of all strategies. By the time being, default implementation is used for them all which only suits size tiered. Refs #1431. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:22:30 -02:00
Vladimir Krivopalov	d2dfa2e15d	sstables: Don't enforce default max_local_deletion_time value for 'mc' files. Commit `cc6c383249` has fixed an issue with incorrectly tracking max_local_deletion_time and the check in validate_max_local_deletion_time was called to work around old files. This fix relaxes conditions for enforcing defaut max_local_deletion_time so that they don't apply to SSTables in 'mc' format because the original problem has been resolved before 'mc' format have been introduced. This is needed to be able to read correct values from Cassandra-generated SSTables that don't have a Scylla.db component. Its presence or absence is used as an indicator of possibly affected files. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 10:15:07 -08:00
Vladimir Krivopalov	0b1e6427ad	sstables: Enforce estimated_partitions in generate_summary() to be always positive. For tiny index files (< 8 bytes long) it could turn to zero and trigger an assertion in prepare_summary(). Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 10:15:07 -08:00
Raphael S. Carvalho	ffb00d2118	storage_service: remove outdated comment on ongoing compaction interrupt After commit `5e953b5e47`, compaction manager will forcefully stop ongoing compactions instead of waiting for them to finish. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181206142600.21354-1-raphaelsc@scylladb.com>	2018-12-06 15:43:42 +01:00
Tomasz Grabiec	6012a63660	Merge "Fix window during init where waiting for a feature can be ignored" from Avi storage_service keeps a bunch of "feature" variables, indicating cluster-wide supported features, and has the ability to wait until the entire cluster supports a given feature. The propagation of features depends on gossip, but gossip is initialized after storage_service, so the current code late-initializes the features. However, that means that whoever waits on a feature between storage_service initialization and gossip initialization loses their wait entry. In #3952, we have proof that this in fact happens. Fix this by removing the circular dependency. We now store features in a new service, feature_service, that is started before both gossip and storage_service. Gossip updates feature_service while storage_service reads for it. Fixes #3953. * https://github.com/avikivity/3953/v4.1: storage_service: deinline enable_all_features() gossiper: keep features registered tests/gossip: switch to seastar::thread storage_service: deinline init/deinit functions gossiper: split feature storage into a new feature_service gossiper: maybe enable features after start_gossiping() storage_service: fix gap when feature::when_enabled() doesn't work	2018-12-06 15:42:26 +01:00
Avi Kivity	33a0366ed8	storage_service: fix gap when feature::when_enabled() doesn't work storage_service::register_features() reassigns to feature variables in storage_service. This means that any call to feature::when_enabled() will be orphaned when the feature is assigned. Now that feature lifetimes are not tied to gossip, we can move the feature initialization to the constructor and eliminate the gap. When gossip is started it will evaluate application_states and enable features that the cluster agrees on.	2018-12-06 16:31:05 +02:00
Avi Kivity	587fd9b6c0	gossiper: maybe enable features after start_gossiping() Since we may now start with features already registered, we need to enable features immediately after gossip is started. This case happens in a cluster that already is fully upgraded on startup. Before this series, features were only added after this point.	2018-12-06 16:31:04 +02:00
Avi Kivity	4e553b692e	gossiper: split feature storage into a new feature_service Feature lifetime is tied to storage_service lifetime, but features are now managed by gossip. To avoid circular dependency, add a new feature_service service to manage feature lifetime. To work around the problem, the current code re-initializes features after gossip is initialized. This patch does not fix this problem; it only makes it possible to solve it by untyping features from gossip.	2018-12-06 16:31:04 +02:00
Avi Kivity	9b476fc377	storage_service: deinline init/deinit functions Reduces #include dependencies later on.	2018-12-06 16:31:04 +02:00
Avi Kivity	db72a7e8bd	tests/gossip: switch to seastar::thread Much simpler to manage the long initialization chain.	2018-12-06 16:31:04 +02:00
Avi Kivity	1215512e98	gossiper: keep features registered Gossiper unregisters enabled features as an optimization. However that makes decoupling features from gossiper harder. Disable this optimization; since the number of features is small and normal access is to a single feature at a time, there is no significant performance or memory loss.	2018-12-06 16:31:04 +02:00
Paweł Dziepak	9024187222	partition_slice: use small_vector for column_ids	2018-12-06 14:21:04 +00:00
Paweł Dziepak	a014367c5b	mutation_fragment_merger: use small_vector	2018-12-06 14:21:04 +00:00
Paweł Dziepak	142c4a9d84	auth: use small_vector in resource	2018-12-06 14:21:04 +00:00
Paweł Dziepak	edbcac85cb	auth: avoid list-initialisation of vectors List-initialisation forces often completely unnecessary copies of the elements.	2018-12-06 14:21:04 +00:00
Paweł Dziepak	890a5ba8ac	idl: serialiser: add serialiser for utils::small_vector	2018-12-06 14:21:04 +00:00
Paweł Dziepak	abb4953209	idl: serialiser: deduplicate vector serialisers In Scylla we have three implementations of vector-like structures std::vector, utils::chunked_vector and utils::small_vector. Which one is used is largerly an implementation detail and all should be serialised by the IDL infrastructure in exactly the same way. To make sure that it's indeed the case let's make them share the serialiser implementation.	2018-12-06 14:21:04 +00:00
Paweł Dziepak	23d19d21bd	utils: introduce small_vector small_vector is a variation of std::vector<> that reserves a configurable amount of storage internally, without the need for memory allocation. This can bring measurable gains if the expected number of elements is small. The drawback is that moving such small_vector is more expensive and invalidates iterators as well as references which disqualifies it in some cases.	2018-12-06 14:21:04 +00:00
Avi Kivity	21b4b2b9a1	Merge "Fix deadlocking multishard readers" from Botond " Multishard combining readers, running concurrently, with limited concurrency and no timeout may deadlock, due to inactive shard readers sitting on permits. To avoid this we have to make sure that all shard readers belonging to a multishard combining readers, that are not currently active, can be evicted to free up their permits, ensuring that all readers can make progress. Making inactive shard readers evictable is the solution for this problem, however the original series introducing this solution (`414b14a6bd`) did not go all they way and left some loose ends. These loose ends are tied up by this mini-series. Namely, two issues remained: * The last reader to reach EOS was not paused (made evictable). * Readers created/resumed as part of a read-ahead were not paused immediately after finishing the read-ahead. This series fixes both of these. Fixes: #3865 Tests: unit(release, debug) " * 'fix-multishard-reader-deadlock/v1' of https://github.com/denesb/scylla: multishard_combining_reader: pause readers after reading ahead multishard_combining_reader: pause all EOS'd readers	2018-12-06 16:08:11 +02:00
Botond Dénes	ee193f1ab4	multishard_combining_reader: pause readers after reading ahead Readers created or resumed just to read ahead should be paused right after, to avoid consuming all available permits on the shards they operate on, causing a deadlock.	2018-12-06 13:20:30 +02:00
Avi Kivity	d4f353d3c8	Merge "normalized python3 compatibility, shebang and encoding" from Alexys " This series of patches ensures that all the Python code base is python3 compliant and consistent by applying the following logic: - python3 classifier on setup.py to explicitly state our python compatibility matrix - add UTF-8 encoding header - correct every shebang to the same /usr/bin/env python3 - shebang is only added on scripts meant to be executed on their own (removed otherwise) - migrate some leftover scripts from python2 to python3 with minimal QA This work is important to prepare for a more drastic change on Python code styling using the black formatter and the setting up of automated QA checks on Python code base. " * 'python3_everywhere' of https://github.com/numberly/scylla: scylla-housekeeping: fix python3 compat and shebang dist/ami/files/scylla_install_ami: python3 shebang dist/docker/redhat/docker-entrypoint.py: add encoding comment fix_system_distributed_tables.py: fix python3 compat and shebang gen_segmented_compress_params.py: add encoding comment idl-compiler.py: python3 shebang scylla-gdb.py: python3 shebang configure.py: python3 shebang tools/scyllatop/: add / normalize python3 shebang scripts/: add / normalize python3 shebang dist/common/scripts: add / normalize python3 shebang test.py: add encoding comment setup.py: add python3 classifiers	2018-12-06 12:16:57 +02:00
Avi Kivity	f073ea5f87	Merge "Fix tombstone histogram when writing SSTables 3.x" from Vladimir " This patchset extends a number of existing tests to check SSTables statistics for 'mc' format and fixes an issue discovered with the help of one of the tests. Tests: unit {release} " * 'projects/sstables-30/check-stats/v2' of https://github.com/argenet/scylla: tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions. tests: Run sstable_tombstone_histogram_test for all SSTables versions. tests: Run min_max_clustering_key_test on all SSTables versions. tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions. tests: Run test_sstable_max_local_deletion_time on all SSTables versions. tests: Extend test checking tombstones histogram to cover all SSTables versions. sstables: Properly track row-level tombstones when writing SSTables 3.x. tests: Run min_max_clustering_key_test_2 for all SSTables versions. tests: Make reusable_sst() helper accept SSTables version parameter.	2018-12-06 11:44:33 +02:00
Botond Dénes	170fa382fa	multishard_combining_reader: pause all EOS'd readers Previously the last shard reader to reach EOS wasn't paused. This is a mistake and can contribute to causing deadlocks when the number of concurrently active readers on any shard is limited.	2018-12-06 10:30:43 +02:00
Vladimir Krivopalov	dd769f2b41	tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	a098387e9f	tests: Run sstable_tombstone_histogram_test for all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	06a47fc9f9	tests: Run min_max_clustering_key_test on all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	c53afd7bba	tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	cfbde5b89c	tests: Run test_sstable_max_local_deletion_time on all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	9955710cac	tests: Extend test checking tombstones histogram to cover all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	cdae62ec29	sstables: Properly track row-level tombstones when writing SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	0f3fb32028	tests: Run min_max_clustering_key_test_2 for all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	c474b0d851	tests: Make reusable_sst() helper accept SSTables version parameter. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Paweł Dziepak	504c586392	intrusive_set_external_comparator: make iterator nothrow move constructible	2018-12-05 20:07:29 +00:00
Paweł Dziepak	402902ac78	mutation_fragment_merger: value-initialise iterator ForwardIterators are default constructible, but they have to be value-initialised to compare equal to other value-initialised instances of that iterator.	2018-12-05 20:07:29 +00:00
Tomasz Grabiec	2c2d202354	tests: perf_fast_forward: Make output directory configurable Message-Id: <1544020034-16340-1-git-send-email-tgrabiec@scylladb.com>	2018-12-05 21:51:01 +02:00
Tomasz Grabiec	247347058c	tests: perf_fast_forward: Always print to stdout Otherwise errors cannot be made sense of, since error are reported always to stdout. Without test output we don't know what they're referring to. This change makes the output always go to stdout, in addition to other reportes, if any. Message-Id: <1544020084-16492-1-git-send-email-tgrabiec@scylladb.com>	2018-12-05 21:51:01 +02:00
Yibo Cai (Arm Technology China)	6fadba56cc	utils: optimize UTF-8 validation UTF-8 string is now validated by boost::locale::conv::utf_to_utf, it actually does string conversions which is more than necessary. As observed on Arm server, UTF-8 validation can become bottleneck under heavy loads. This patch introduces a brand new SIMD implementation supporting both NEON and SSE, as well as a naive approach to handle short strings. The naive approach is 3x faster than boost utf_to_utf, whilst SIMD method outperforms naive approach 3x ~ 5x on Arm and x86. Details at https://github.com/cyb70289/utf8/. UTF-8 unit test is added to check various corner cases. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1543978498-12123-1-git-send-email-yibo.cai@arm.com>	2018-12-05 21:51:01 +02:00
Tomasz Grabiec	3e70ae1d06	Merge "Improve times to start / stop the nodes" from Glauber If the compaction manager is started, compactions may start (this is regardless of whether or not we trigger them). The problem with that is that they start at a time in which we are flushing the commitlog and the initialization procedure waits for the commitlog to be fully flushed and the resulting memtables flushed before we move on. Because there are no incoming writes, the amount of shares in memtable flushes decrease as memory used decreases and that can cause the startup procedure to take a long time. We have recently started to bump the shares manually for manual flushes. While that guarantees that we will not drive the shares to zero, I will make the argument that we can do better by making sure that those things are, at this point, running alone: user experience is affected by startup times and the bump we give to user-triggered operations will only do so much. Even if we increase the shares a lot flushes will still be fighting for resources with compactions and startup will take longer than it could. By making sure that flushes are this point running alone we improve the user experience by making sure the startup is as fast as it can be. There is a similar problem at the drain level, which is also fixed in this series. Fixes #3958 * git@github.com:glommer/scylla.git faster-restart compaction_manager: delay initialization of the compaction manager. drain: stop compactions early	2018-12-05 21:51:01 +02:00
Asias He	eeeb2da7bb	gossip: Fix race in real_mark_alive and shutdown msg In dtest, we have self.check_rows_on_node(node1, 2000) self.check_rows_on_node(node2, 2000) which introduce the following cluster operations: 1) Initially: - node1 up - node2 up 2) self.check_rows_on_node(node1, 2000) - node2 down - node2 up (A: node2 will call gossiper::real_mark_alive when node2 boots up to mark node1 up) 3) self.check_rows_on_node(node2, 2000) - node1 down (B: node1 will send shutdown gossip message to node2, node2 will mark node1 down) - node1 up (C: when node1 is up, node2 will call gossiper::real_mark_alive) Since there is no guarantee the order of Operation A and Operation B, it is possible node2 will mark node1 as status=shutdown and mark node1 is UP. In Operation C, node2 will call gossiper::real_mark_alive to mark node1 up, but since node2 might think node1 is already up, node2 will exit early in gossiper::real_mark_alive and not log "InetAddress 127.0.0.1 is now UP, status={}" As a result, dtest fails to see node2 reports node1 is up when it boots node1 and fail the test. TimeoutError: 23 Nov 2018 10:44:19 [node2] Missing: ['127.0.0.1.* now UP'] In the log we can see node1 marked as DOWN and UP almost at the same time on node2: INFO 2018-11-23 22:31:29,999 [shard 0] gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown INFO 2018-11-23 22:31:30,006 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = shutdown Fixes #3940 Tests: dtest with 20 consecutive succesful runs Message-Id: <996dc325cbcc3f94fc0b7569217aa65464eaaa1c.1543213511.git.asias@scylladb.com>	2018-12-05 21:51:01 +02:00
Tomasz Grabiec	edbef7400b	configure.py: Always add a rule for building gen_crc_combine_table Fixes a build failure when only the scylla binary was selected for building like this: ./configure.py --with scylla In this case the rule for gen_crc_combine_table was missing, but it is needed to build crc_combine_table.o Message-Id: <1544010138-21282-1-git-send-email-tgrabiec@scylladb.com>	2018-12-05 21:51:01 +02:00
Botond Dénes	77dbc7d09a	querier: fix evict_one() and evict_all_for_table() Both of these have the same problem. They remove the to-be-evicted entries from `_entries` but they don't unregister the `entry` from the `read_concurrency_semaphore`. This results in the `reader_concurrency_semaphore` being left with a dangling pointer to the entries will trigger segfault when it tries to evict the associated inactive reads. Also add a unit test for `evict_all_for_table()` to check that it works properly (`evict_one()` is only used in tests, so no dedicated test for it). Fixes: #3962 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <57001857e3791c6385721b624d33b667ccda2e7d.1544010868.git.bdenes@scylladb.com>	2018-12-05 21:51:01 +02:00
Avi Kivity	0be554c337	storage_service: deinline enable_all_features() Next commit wants to make it depend on config, which is best done out-of-line.	2018-12-05 17:30:42 +02:00
Asias He	a5d8b66f2c	gossip: Make favor newly added node log debug level It is not very useful for user to know this. Message-Id: <6c2dfc522d6974adb97c34fbc1e3a0339d2d530c.1543997137.git.asias@scylladb.com>	2018-12-05 10:45:03 +02:00
Avi Kivity	b0cb69ec25	Merge "Make sstable reader fail on unknown colum names in MC format" from Piotr " Before the reader was just ignoring such columns but this creates a risk of data loss. Refs #2598 " * 'haaawk/2598/v3' of github.com:scylladb/seastar-dev: sstables: Add test_sstable_reader_on_unknown_column sstables: Exception on sstable's column not present in schema sstables: store column name in column_translation::column_info sstables: Make test_dropped_column_handling test dropped columns	2018-12-05 10:43:29 +02:00
Takuya ASADA	9388f3d626	reloc: drop --jobs from build_deb.sh/build_rpm.sh scripts Since we merged relocatable package, build_deb.sh/build_rpm.sh only does packaging using prebuilt binary taken from relocatable package, won't compile anything. So passing --jobs option to build_deb.sh/build_rpm.sh becomes meaningless, we can drop it. Note that we still can specify --jobs option on reloc/build_reloc.sh, it runs "ninja-build -jN" to compile Scylla, then generate relocatable package. See #3956 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181204205652.25138-1-syuu@scylladb.com>	2018-12-04 21:00:51 +00:00
Glauber Costa	0b7818d2b9	drain: stop compactions early drain suffers from the same problem as startup suffers now: memtables are flushed as part of the drain routine, and because there are no incoming writes the shares the controller assign to flushes go down over time, slowing down the process of drain. This patch reorders things so that we stop compactions first, and flush later. It guarantees that when flush do happen it will have the full bandwidth to work with. There is a comment in the code saying we should stop compactions forcefully instead of waiting for them to finish. I consider this orthogonal to this patch therefore I am not touching this. Doing so will make the drain operation even faster but can be done later. Even when we do it, having the flushes proceed alone instead of during compactions will make it faster. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-12-04 13:55:59 -05:00
Glauber Costa	fee4d2eb9b	compaction_manager: delay initialization of the compaction manager. If the compaction manager is started, compactions may start (this is regardless of whether or not we trigger them). The problem with that is that they start at a time in which we are flushing the commitlog and the initialization procedure waits for the commitlog to be fully flushed and the resulting memtables flushed before we move on. Because there are no incoming writes, the amount of shares in memtable flushes decrease as memory used decreases and that can cause the startup procedure to take a long time. We have recently started to bump the shares manually for manual flushes. While that guarantees that we will not drive the shares to zero, I will make the argument that we can do better by making sure that those things are, at this point, running alone: user experience is affected by startup times and the bump we give to user-triggered operations will only do so much. Even if we increase the shares a lot flushes will still be fighting for resources with compactions and startup will take longer than it could. By making sure that flushes are this point running alone we improve the user experience by making sure the startup is as fast as it can be. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-12-04 13:48:42 -05:00
Tomasz Grabiec	b8c405c019	Merge "Correct the usage of row ttl and add write-read test" from Piotr Fixes the condition which determines whether a row ttl should be used for a cell and adds a test that uses each generated mutation to populate mutation source and then verifies that it can read back the same mutation. * seastar-dev.git haaawk/sst3/write-read-test/v3: Fix use_row_ttl condition Add test_all_data_is_read_back	2018-12-04 19:47:28 +01:00
Tomasz Grabiec	9a4c00beb7	utils/gz: Fix compilation on non-x86 archs gen_crc_combine_table is now executed on every build, so it should not fail on unsupported archs. The generated file will not contain data, but this is fine since it should not be used. Another problem is that u32 and u64 aliases were not visible in the #else branch in crc_combine.cc Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com>	2018-12-04 18:17:27 +00:00
Piotr Jastrzebski	fed3b51abe	Add test_all_data_is_read_back This tests that a source after being populated with a mutation returns exactly the same mutation when read. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-12-04 11:42:08 +01:00
Piotr Sarna	7b0a3fbf8a	auth: add abort_source to waiting for schema agreement When the auth service is requested to stop during bootstrap, it might have still not reached schema agreement. Currently, waiting for this agreement is done in an infinite loop, without taking abort_source into account. This patch introduces checking if abort was requested and breaking the loop in such case, so auth service can terminate. Tests: unit (release) dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test) Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com>	2018-12-04 10:41:09 +00:00
Piotr Jastrzebski	75b99838fc	Fix use_row_ttl condition Previous condition was wrong and was using row ttl too often. We also have to change test_dead_row_marker to compare resulting sstable with sstable generated by Origin not by sstableupgrade. This is because sstableupgrade transmits information about deleted row marker automatically to cells in that row. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-12-04 10:51:36 +01:00
Avi Kivity	c3e664eec2	Merge "Improve corrupt sstable reporting" from Rafael " This is a small step in fixing issue #2347. It is mostly tests and testing infrastructure, but it does include a fix for a case where we were missing the filename in the malformed_sstable_exception. " * 'espindola/sstable-corruption-v2' of https://github.com/espindola/scylla: Add a filename to a malformed_sstable_exception. Try to read the full sst in broken_sst. Convert tests to SEASTAR_THREAD_TEST_CASE. Check the exception message. Move some tests to broken_sstable_test.cc	2018-12-04 10:32:10 +02:00
Avi Kivity	414b14a6bd	Merge "Make inactive shard readers evictable" from Botond " This series attempts to solve the regressions recently discovered in performance of multi-partition range-scans. Namely that they: * Flood the reader concurrency semaphore's queues, trampling other reads. * Behave very badly when too many of them is running concurrently (trashing). * May deadlock if enough of them is running without a timeout. The solution for these problems is to make inactive shard readers evictable. This should address all three issues listed above, to varying degrees: * Shard readers will now not cling onto their permits for the entire duration of the scan, which might be a lot of time. * Will be less affected by infinite concurrency (more than the node can handle) as each scan now can make progress by evicting inactive shard readers belonging to other scans. * Will not deadlock at all. In addition to the above fix, this series also bundles two further improvements: * Add a mechanism to `reader_concurrecy_semaphore` to be notified of newly inserted evictables. * General cleanups and fixes for `multishard_combining_reader` and `foreign_reader`. I can unbundle these mini series and send them separately, if the maintainers so prefer, altough considering that this series will have to be backported to 3.0, I think this present form is better. Fixes: #3835 " * 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits) tests/multishard_mutation_query_test: test stateless query too tests/querier_cache: fail resource-based eviction test gracefully tests/querier_cache: simplify resource-based eviction test tests/mutation_reader_test: add test_multishard_combining_reader_next_partition tests/mutation_reader_test: restore indentation tests/mutation_reader_test: enrich pause-related multishard reader test multishard_combining_reader: use pause-resume API query::partition_slice: add clear_ranges() method position_in_partition: add region() accessor foreign_reader: add pause-resume API tests/mutation_reader_test: implement the pause-resume API query_mutations_on_all_shards(): implement pause-resume API make_multishard_streaming_reader(): implement the pause-resume API database: add accessors for user and streaming concurrency semaphores reader_lifecycle_policy: extend with a pause-resume API query_mutations_on_all_shards(): restore indentation query_mutations_on_all_shards(): simplify the state-machine multishard_combining_reader: use the reader lifecycle policy multishard_combining_reader: add reader lifecycle policy multishard_combining_reader: drop unnecessary `reader_promise` member ...	2018-12-04 10:22:35 +02:00
Botond Dénes	9de4f3a834	tests/multishard_mutation_query_test: test stateless query too In the `test_read_all`, do a stateless read as well to ensure that path works correctly as well.	2018-12-04 08:51:05 +02:00
Botond Dénes	6676ceba7f	tests/querier_cache: fail resource-based eviction test gracefully Currently when this test fails, resources are not released in the correct order, which results in ASAN complaining about use-after-free in debug builds. This is due to the BOOST_REQUIRE macro aborting the test when the predicate fails, not allowing for correct destruction order to take place. To avoid this ugly failure, that adds noise and might cause a developer investigating into the failure starting on the wrong path, use the more mild BOOST_CHECK family of test macros. These will allow the test to run to completion even when the predicate fails, allowing for the correct destruction of the resources.	2018-12-04 08:51:05 +02:00
Botond Dénes	93e41397f7	tests/querier_cache: simplify resource-based eviction test Now that we have an accessor for all concurrency semaphores, we don't need the tricks of creating a dummy keyspace to get them. Use the accessors instead.	2018-12-04 08:51:05 +02:00
Botond Dénes	dcd2d116a3	tests/mutation_reader_test: add test_multishard_combining_reader_next_partition Test the interaction of the multishard reader with the foreign reader w.r.t next_partition(). next_partition() is a special operation, as it its execution is deferred until the next cross-shard operations. Give it some extra stress-testing.	2018-12-04 08:51:05 +02:00
Botond Dénes	20e994e526	tests/mutation_reader_test: restore indentation Left over from the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	a577ff97e9	tests/mutation_reader_test: enrich pause-related multishard reader test Enrich the existing test_multishard_combining_reader_as_mutation_source test case with delaying the pause/resume and eviction of paused readers.	2018-12-04 08:51:05 +02:00
Botond Dénes	22b14d593b	multishard_combining_reader: use pause-resume API Refactor the multishard combining reader to make use of the new pause-resume API to pause inactive shard readers. Make the pause-resume API mandatory to implement, as by now all existing clients have adapted it.	2018-12-04 08:51:05 +02:00
Botond Dénes	77b758707c	query::partition_slice: add clear_ranges() method Allows for clearing any custom partition ranges, effectively resetting them to the default ones. Useful for code that needs to set several different specific partition ranges, one after the other, but doesn't want to remember the last key it set a range for to be able to clear the previous range with `clear_range()`.	2018-12-04 08:51:05 +02:00
Botond Dénes	a594fd39ce	position_in_partition: add region() accessor	2018-12-04 08:51:05 +02:00
Botond Dénes	9601d23e0d	foreign_reader: add pause-resume API Allowing for pausing the reader and later resume it. Pausing the reader waits on the ongoing read ahead (if any), executes any pending `next_partition()` calls and than detaches the shard reader's buffer. The paused shard reader is returned to the client. Resuming the reader consists of getting the previously detached reader back, or one that has the same position as the old reader had. This API allows for making the inactive shard readers of the `multishard_combining_reader` evictable. The API is private, it's only accessible for classes knowing the full definition of the `foreign_reader` (which resides in a .cc file).	2018-12-04 08:51:05 +02:00
Botond Dénes	a12fae366d	tests/mutation_reader_test: implement the pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	f334d3717f	query_mutations_on_all_shards(): implement pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	72ed655ef0	make_multishard_streaming_reader(): implement the pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	bf0d1f4eea	database: add accessors for user and streaming concurrency semaphores These will soon be needed to register inactive user and streaming reads with the respective semaphores.	2018-12-04 08:51:05 +02:00
Botond Dénes	5f67a065c6	reader_lifecycle_policy: extend with a pause-resume API This API provides a way for the mulishard reader to pause inactive shard readers and later resume them when they are needed again. This allows for these paused shard readers to be evicted when the node is under pressure. How the readers are made evictable while paused is up to the clients. Using this API in the `multishard_combining_reader` and implementing it in the clients will be done in the next patches. Provide default implementation for the new virtual methods to facilitate gradual adoption.	2018-12-04 08:51:05 +02:00
Botond Dénes	6f0e0c4ed7	query_mutations_on_all_shards(): restore indentation The previous patch added half-aligned lines to improve readability of that patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	aa6083a75b	query_mutations_on_all_shards(): simplify the state-machine The `read_context` which handles creating, saving and looking-up the shard readers has to deal with its `destroy_reader()` method called any time, even before some other method finished its work. For example it is valid for a reader to be requested to be destroyed, even before the contexts finishes creating it. This means that state transitions that take time can be interleaved with another state transition request. To deal with this the read context uses `future_` states, states that mark an ongoing state transitions. This allows for state transition request that arrive in the middle of another state transition to be attached as a continuation to the ongoing transition, and to be executed after that finishes. This however resulted in complex code, that has to handle readers being in all sorts of different states, when the `save_readers()` method is called. To avoid all this complexity, exploit the fact that `destroy_reader()` receives a future<> as its argument, which resolves when all previous state transitions have finished. Use a gate to wait on all these futures to resolve. This way we don't need all those transitional states, instead in `save_readers()` we only need to wait on the gate to close. Thus the number of states `save_readers()` has to consider drops drastically. This has the theoretical drawback of the process of saving the readers having to wait on each of the readers to stop, but in practice the process finishes when the last reader is saved anyway, so I don't expect this to result in any slowdown.	2018-12-04 08:51:05 +02:00
Botond Dénes	007619de4c	multishard_combining_reader: use the reader lifecycle policy Refactor the multishard combining reader and its clients to use the reader lifecycle policy introduced in the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	0a616c899e	multishard_combining_reader: add reader lifecycle policy Currently `multishard_combining_reader` takes two functors, one for creating the readers and optionally one for destroying them. A bag of functors (`std::function`) however make for a terrible interface, and as we are about to add some more customization points, it's time to use something more formal: policy based design, a well-known design pattern. As well as merging the job of the two functors into a single policy class, also widen the area of responsibility of the policy to include keeping alive any resource the shard readers might need on their home shard. Implementing a proper reader cleanup is now not optional either. This patch only adds the `reader_managing_policy` interface, refactoring the multishard reader to use it will be done in the next patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	301abaca07	multishard_combining_reader: drop unnecessary `reader_promise` member The `reader_promise` member of the `shard_reader` was used to synchronize a foreground request to create the underlying reader with an ongoing background request with the same goal. This is however unnecessary. The underlying reader is created in the background only as part of a read ahead. In this case there is no need for extra synchronization point, the foreground reader create request can just wait for the read ahead to finish, for which there already exists a mean. Furthermore, foreground reader create requests are always followed by a `fill_buffer()` request, so by waiting on the read ahead we ensure that the following `fill_buffer()` call will not block.	2018-12-04 08:51:05 +02:00
Botond Dénes	a73175fdbc	multishard_combining_reader: drop tracking of pending next_partition calls Shard readers used to track pending `next_partition()` calls that they couldn't execute, because their underlying reader wasn't created yet. These pending calls were then executed after the reader was created. However the only situation where a shard reader can receive a `next_partition()` call, before its underlying reader wasn't created is when `next_partition()` is called on the multishard reader before a single fragment is read. In this case we know we are at a partition boundary and thus this call has no effect, therefore it is safe to ignore it.	2018-12-04 08:51:05 +02:00
Botond Dénes	ab3e639c3b	foreign_reader: use bool for pending_next_partition Foreign reader doesn't execute `next_partition()` calls straight away, when this would require interaction with the remote reader. Instead these calls are "remembered" and executed on the next occasion the foreign reader has to interact with the remote reader. This was implemented with a counter that counts the number of pending `next_partition()` calls. However when `next_partition()` is called multiple times, without interleaving calls to `operator()()` or `fast_forward_to()`, only the first such call has effect. Thus it doesn't make sense to count these calls, it is enough to just set a flag if there was at least one such call.	2018-12-04 08:51:05 +02:00
Botond Dénes	5a4fd1abab	multishard_combining_reader: drop support for streamed_mutation fast-forwarding It doesn't make sense for the multishard reader anyway, as it's only used by the row-cache. We are about to introduce the pausing of inactive shard readers, and it would require complex data structures and code to maintain support for this feature that is not even used. So drop it.	2018-12-04 08:51:05 +02:00
Botond Dénes	b36733971b	mutation_source_test: add option to skip intra-partition fast-forwarding tests To allow for using this test suite for testing mutation sources that don't support intra-partition fast-forwarding.	2018-12-04 08:51:05 +02:00
Botond Dénes	37f0117747	reader_concurrency_semaphore: refactor eviction mechanism As we are about to add multiple sources of evictable readers, we need a more scalable solution than a single functor being passed that opaquely evicts a reader when called. Add a generic way to register and unregister evictable (inactive) readers to the semaphore. The readers are expected to be registered when they become evictable and are expected to be unregistered when they cease to become evictable. The semaphore might evict any reader that is registered to it, when it sees fit. This also solves the problem of notifying the semaphore when new readers become evictable. Previously there was no such mechanism, and the semaphore would only evict any such new readers when a new permit was requested from it.	2018-12-04 08:51:00 +02:00
Rafael Ávila de Espíndola	21199a7a5c	Add a filename to a malformed_sstable_exception. It is reasonable for parse() to throw when it finds something wrong with the format. This seems to be the best spot to add the filename and rethrow. Also add a testcase to make sure we keep handling this error gracefully. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:50:23 -08:00
Rafael Ávila de Espíndola	a6e25e4bd0	Try to read the full sst in broken_sst. With this patch we use data_consume_rows to try to read the entire sstable. The patch also adds a test with a particular corruption that would not be found without parsing the file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:47:49 -08:00
Rafael Ávila de Espíndola	b1190c58ec	Convert tests to SEASTAR_THREAD_TEST_CASE. This will simplify future changes to broken_sst. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:26:06 -08:00
Rafael Ávila de Espíndola	e5c5afffc9	Check the exception message. This makes the tests a bit more strict by also checking the message returned by the what() function. This shows that some of the tests are out of sync with which errors they check for. I will hopefully fix this in another pass. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 12:31:53 -08:00
Rafael Ávila de Espíndola	f9d81bcd43	Move some tests to broken_sstable_test.cc sstable_test.cc was already a bit too big and there is potential for having a lot of tests about broken sstables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 12:16:30 -08:00
Rafael Ávila de Espíndola	cf4dc38259	Simplify state machine loop. These loops have the structure : while (true) { switch (state) { case state1: ... break; case state2: if (...) { ... break; } else {... continue; } ... } break; } There a couple things I find a bit odd on that structure: * The break refers to the switch, the continue to the loop. * A while (true) loop always hits a break or a continue. This patch uses early returns to simplify the logic to while (true) { switch (state) { case state1: ... return case state2: if (...) { ... return; } ... } } Now there are no breaks or continues. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181126171726.84629-1-espindola@scylladb.com>	2018-12-03 20:34:03 +01:00
Avi Kivity	b098b5b987	Merge "Optimize checksum_combine() for CRC32" from Tomek " zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977). Refs #3874 " * tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla: tests: perf_checksum: Test fast_crc32_combine() tests: Rename libdeflate_test to checksum_utils_test tests: libdeflate: Add more tests for checksum_combine() tests: libdeflate: Check both libdeflate and default checksummers sstables: Use fast_crc_combine() in the default checksummer utils/gz: Add fast implementation of crc32_combine() utils/gz: Add pre-computed polynomials utils/gz: Import Barett reduction implementation from libdeflate utils: Extract clmul() from crc.hh	2018-12-03 19:02:01 +02:00
Tomasz Grabiec	aa19f98d18	sstables: Write Statistics.db offset map entries in the same order as Cassandra Before this patch we were writing offset map enteies in unspecified order, the one returned by std::unorderd_map. Cassandra writes them sorted by metadata_type. Use the same order for improved compatibility. Fixes #3955. Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com>	2018-12-03 16:40:24 +02:00
Avi Kivity	4dc402b53f	Merge "Create sstable in a sub-directory" from Benny " Due to an XFS heuristic, if all files are in one (or a few) directories, then block allocation can become very slow. This is because XFS divides the disk into a few allocation groups (AGs), and each directory allocates preferentially from a single AG. That AG can become filled long before the disk is full. This patchset works around the problem by: - creating sstable component files in their own temporary, per-sstable sub-directory, - moving the files back into the canonical location right after begin created, and finally - removing the temp sub-directory when the sstable is sealed. - In addition, any temporary sub-directories that might have been left over if scylla crashed while creating sstables are looked up and removed when populating the table. Fixes: #3167 Tests: unit (release) " * 'issues/3167/v7' of https://github.com/bhalevy/scylla: distributed_loader::populate_column_family: lookup and remove temp sstable directories database: directly use std::experimental::filesystem::path for lister::path database: use std::experimental::filesystem::path for lister::path sstable: use std::experimental::filesystem rather than boost sstable::seal_sstable: fixup indentation sstable: create sstable component files in a subdirectory sstable::new_sstable_component_file: pass component_type rather than filename sstable: cleanup filename related functions sstable: make write_crc, write_digest, and new_sstable_component_file private methods	2018-12-03 16:26:12 +02:00
Tomasz Grabiec	feefb23232	tests: perf_checksum: Test fast_crc32_combine()	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	dda0f9b6eb	tests: Rename libdeflate_test to checksum_utils_test	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	7febdb5a5c	tests: libdeflate: Add more tests for checksum_combine()	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	b22ed75416	tests: libdeflate: Check both libdeflate and default checksummers	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	1eb03b6ff1	sstables: Use fast_crc_combine() in the default checksummer	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	1fb792c547	utils/gz: Add fast implementation of crc32_combine() zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977).	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	cd3d9d357b	utils/gz: Add pre-computed polynomials gen_crc_combine_table.cc will be run during build to produce tables with precomputed polynomials (4 x 256 x u32). The definitions will reside in: build/<mode>/gen/utils/gz/crc_combine_table.cc It takes 20ms to generate on my machine. The purpose of those polynomials will be explained in crc_combine.cc	2018-12-03 14:36:09 +01:00
Tomasz Grabiec	63e0da9e58	utils/gz: Import Barett reduction implementation from libdeflate	2018-12-03 14:36:09 +01:00
Tomasz Grabiec	bb7d95d6c3	utils: Extract clmul() from crc.hh	2018-12-03 14:36:08 +01:00
Botond Dénes	0cb7c43fb5	reader_concurrency_semaphore: add dedicated .cc file As we are about to extend the functionality of the reader concurrency semaphore, adding more method implementations that need to go to a .cc file, it's time we create a dedicated file, instead of keep shoving them into unrelated .cc files (mutation_reader.cc).	2018-12-03 13:37:02 +02:00
Avi Kivity	d6a22c50cb	Update libdeflate submodule * libdeflate 17ec6c9...e7e54ea (1): > build: improve out-of-tree build with multiple output trees	2018-12-03 11:18:02 +02:00
Botond Dénes	34c2d67614	reader_concurrency_semaphore: rearrange members Use standard convention of the rest of the code base. Type definitions first, then data members and finally member functions. As we are about to add more members, its especially important to make the growing class have a familiar member arrangement.	2018-12-03 08:26:10 +02:00
Benny Halevy	9e7125a9de	distributed_loader::populate_column_family: lookup and remove temp sstable directories These may be left over in case we crash while writing sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	857ff4f59a	database: directly use std::experimental::filesystem::path for lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	585ac6e641	database: use std::experimental::filesystem::path for lister::path We would like to get rid of boost::filesystem and gradually replace it with std::experimental::filesystem. TODO: using namespace fs = std::experimental::filesystem, use fs::path directly, rather than lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	0b74927757	sstable: use std::experimental::filesystem rather than boost Note: Requires linking with -lstdc++fs Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	61d116a1f1	sstable::seal_sstable: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	90118fa9ef	sstable: create sstable component files in a subdirectory When writing the sstable, create a temporary directory for creating all components so that each sstable files' will be assigned a different allocaton groups on xfs. Files are immediately renamed to their default location after creation. Temp directory is removed when the sstable is sealed. Additional work to be introduced in the following patches: - When populating tables, temp directories need to be looked up and removed. Fixes #3167 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	23d8afb20d	sstable::new_sstable_component_file: pass component_type rather than filename So we can create the file in the sstable directory and then move into the final location Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	7b170eb0dc	sstable: cleanup filename related functions - use const sstring& params rather than sstring - returning const sstring is superfleous Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	ad5f1e4fbb	sstable: make write_crc, write_digest, and new_sstable_component_file private methods Prepare for per-sstable sub directory. Also, these functions get most of their parameters from the sst at hand so they might as well be first class members. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Avi Kivity	2a0a36d48b	tools: update toolchain to fedora-29-20181202 Added: git, sudo, python Message-Id: <20181202185608.14141-1-avi@scylladb.com>	2018-12-02 19:00:55 +00:00
Benny Halevy	d257e5c123	sstable: remove unused get_sstable_key_range Since `024c8ef8a1` db: adjust sstable load to use sstable self-reporting of shard ownership Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181202114523.14296-1-bhalevy@scylladb.com>	2018-12-02 18:32:34 +02:00
Avi Kivity	224c4c0b81	tools: add frozen toolchain support Add a reference to a docker image that contains an "official" toolchain for building Scylla. In addition, add a script that allows easy usage of the image, and some documentation. Message-Id: <20181202120829.21218-1-avi@scylladb.com>	2018-12-02 18:32:34 +02:00
Takuya ASADA	0fdf807f51	install-dependencies.sh: add missing packages to run build in Fedora container git, python, sudo packages are installed by default on normal Fedora installation but not in Docker image, we need to install it by this script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181201020834.24961-1-syuu@scylladb.com>	2018-12-02 12:51:29 +02:00
Avi Kivity	009cbd3dcb	Merge "Fix multiple summary regeneration bugs." from Vladimir " This patchset addresses two recently discovered bugs both triggered by summary regeneration: Tests: unit {release} + Validated with debug build of Scylla (ASAN) that no use-after-free occurs when re-generating Summary.db. " * 'projects/sstables-30/summary-regeneration/v1' of https://github.com/argenet/scylla: tests: Add test reading SSTables in 'mc' format with missing summary. sstables: When loading, read statistics before summary. database: Capture io_priority_class by reference to avoid dangling ref.	2018-12-02 11:56:18 +02:00
Vladimir Krivopalov	d24875b736	tests: Add test reading SSTables in 'mc' format with missing summary. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 11:56:56 -08:00
Vladimir Krivopalov	b0e5404071	sstables: When loading, read statistics before summary. In case if summary is missing and we attempt to re-generate it, statistics must be already read to provide us with values stored in serialization header to facilitate clustering prefixes deserialization. Fixes #3947 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 11:56:56 -08:00
Vladimir Krivopalov	68458148e7	database: Capture io_priority_class by reference to avoid dangling ref. The original reference points to a thread-local storage object that guaranteed to outlive the continuation, but copying it make the subsequent calls point to a local object and introduces a use-after-free bug. Fixes #3948 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 10:43:36 -08:00
Piotr Jastrzebski	329303cae7	sstables: Add test_sstable_reader_on_unknown_column This test checks that sstable reader throws an exception when sstable contains a column that's not present in the schema. It also checks that dropped columns do not cause exceptions. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-30 10:29:47 +01:00
Piotr Jastrzebski	5cc3f904ce	sstables: Exception on sstable's column not present in schema Previously such column was ignored but it's better to be explicit about this situation. Refs #2598 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-30 08:59:13 +01:00
Piotr Jastrzebski	c0ce94c6f9	sstables: store column name in column_translation::column_info This will be used for better diagnostics. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-30 08:59:00 +01:00
Duarte Nunes	1afda28cf3	Merge 'Fix filtering with LIMIT' from Piotr " This series adds proper handling of filtering queries with LIMIT. Previously the limit was erroneously applied before filtering, which leads to truncated results. To avoid that, paged filtering queries now use an enhanced pager, which remembers how many rows dropped and uses that information to fetch for more pages if the limit is not yet reached. For unpaged filtering queries, paging is done internally as in case of aggregations to avoid returning keeping huge results in memory. Also, previously, all limited queries used the page size counted from max(page size, limit). It's not good for filtering, because with LIMIT 1 we would then query for rows one-by-one. To avoid that, filtered queries ask for the whole page and the results are truncated if need be afterwards. Tests: unit (release) " * 'fix_filtering_with_limit_2' of https://github.com/psarna/scylla: tests: add filtering with LIMIT test tests: split filtering tests from cql_query_test cql3: add proper handling of filtering with LIMIT service/pager: use dropped_rows to adjust how many rows to read service/pager: virtualize max_rows_to_fetch function cql3: add counting dropped rows in filtering pager	2018-11-29 23:07:40 +00:00
Piotr Jastrzebski	654eeb30ac	sstables: Make test_dropped_column_handling test dropped columns Before it was testing missing columns. It's better to test dropped columns because they should be ignored while for missing columns some sources will throw. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-29 16:16:44 +01:00
Avi Kivity	2dba809844	Merge "scylla_io_setup: support multiple devices" from Benny " This patchset adds support to scylla_io_setup for multiple data directories as well as commitlog, hints, and saved_caches directories. Refs #2415 Tests: manual testing with scylla-ccm generated scylla.yaml " * 'projects/multidev/v3' of https://github.com/bhalevy/scylla: scylla_io_setup: assume default directories under /var/lib/scylla scylla_io_setup: add support for commitlog, hints, and saved_caches directory scylla_io_setup: support multiple data directories	2018-11-29 16:44:33 +02:00
Piotr Sarna	7adbdaba0b	tests: add filtering with LIMIT test Refs #3902	2018-11-29 14:53:30 +01:00
Piotr Sarna	5f97c78875	tests: split filtering tests from cql_query_test In order to avoid blowing cql_query_test even more out of proportions, all filtering tests are moved to a separate file.	2018-11-29 14:53:30 +01:00
Piotr Sarna	acf4eadf88	cql3: add proper handling of filtering with LIMIT Previously, limit was erroneously applied before filtering, which might have resulted in truncated results. Now, both paged and unpaged queries are filtered first, and only after that properly trimmed so only X rows are returned for LIMIT X. Fixes #3902	2018-11-29 14:53:30 +01:00
Piotr Sarna	5b052bdae5	service/pager: use dropped_rows to adjust how many rows to read Filtering pager may drop some rows and as a result return less than what was fetched from the replica. To properly adjust how many rows were actually read, dropped_rows variable is introduced.	2018-11-29 14:53:29 +01:00
Piotr Sarna	021caeddf7	service/pager: virtualize max_rows_to_fetch function Regular pagers use max_rows to figure out how many rows to fetch, but filtering pager potentially needs the whole page to be fetched in order to filter the results.	2018-11-29 14:14:37 +01:00
Benny Halevy	5ec191536e	scylla_io_setup: assume default directories under /var/lib/scylla If a specific directory is not configure in scylla.yaml, scylla assumes a default location under /var/lib/scylla. Hard code these locations in scylla_io_setup until we have a better way to probe scylla about it. Be permissive and ignore the default directories if they don't not exist on disk and silently ignore them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-11-29 15:07:29 +02:00
Piotr Sarna	4f5ee3dfcd	cql3: add counting dropped rows in filtering pager Counter for dropped rows is added to the filtering pager. This metrics can be used later to implement applying LIMIT to filtering queries properly. Dropped rows are returned on visitor::accept_partition_end.	2018-11-29 14:06:59 +01:00
Benny Halevy	88b85b363a	scylla_io_setup: add support for commitlog, hints, and saved_caches directory Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-11-29 10:09:17 +02:00
Benny Halevy	e4382caa4a	scylla_io_setup: support multiple data directories Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-11-29 10:09:17 +02:00
Alexys Jacob	00476c3946	scylla-housekeeping: fix python3 compat and shebang	2018-11-29 00:04:02 +01:00
Alexys Jacob	1cf41760a8	dist/ami/files/scylla_install_ami: python3 shebang	2018-11-29 00:00:41 +01:00
Alexys Jacob	a6447f543c	dist/docker/redhat/docker-entrypoint.py: add encoding comment	2018-11-29 00:00:19 +01:00
Alexys Jacob	9f041158df	fix_system_distributed_tables.py: fix python3 compat and shebang	2018-11-28 23:59:51 +01:00
Alexys Jacob	887322daa2	gen_segmented_compress_params.py: add encoding comment	2018-11-28 23:59:18 +01:00
Alexys Jacob	14e65e1089	idl-compiler.py: python3 shebang	2018-11-28 23:58:38 +01:00
Alexys Jacob	170120a391	scylla-gdb.py: python3 shebang	2018-11-28 23:58:14 +01:00
Alexys Jacob	3902922113	configure.py: python3 shebang	2018-11-28 23:57:54 +01:00
Alexys Jacob	d2dbbba139	tools/scyllatop/: add / normalize python3 shebang	2018-11-28 23:57:03 +01:00
Alexys Jacob	e321b839c7	scripts/: add / normalize python3 shebang	2018-11-28 23:56:35 +01:00
Alexys Jacob	02656fb00e	dist/common/scripts: add / normalize python3 shebang	2018-11-28 23:55:26 +01:00
Alexys Jacob	954da947f8	test.py: add encoding comment	2018-11-28 23:54:41 +01:00
Alexys Jacob	cbd72786dd	setup.py: add python3 classifiers	2018-11-28 23:54:03 +01:00
Dan Yasny	019a2e3a27	scylla_setup: Mark required args Fixes #3945 Message-Id: <20181128220549.3083-1-dyasny@gmail.com>	2018-11-28 22:30:02 +00:00
Avi Kivity	de17150cb2	Update seastar submodule * seastar 1fbb633...132e6cd (2): > scripts: json2code: port to Python 3 > docker/dev/Dockerfile: add c-ares-devel to docker setup	2018-11-28 19:05:21 +02:00
Duarte Nunes	a589dade07	Merge 'Fix checking for multi-column restrictions in filtering' from Piotr " This series fixes #3891 by amending the way restrictions are checked for filtering. Previous implementation that returned false from need_filtering() when multi-column restrictions were present was incorrect. Now, the error is going to be returned from restrictions filter layer, and once multi-column support is implemented for filtering, it will require no further changes. Tests: unit (release) " * 'fix_multi_column_filtering_check_3' of https://github.com/psarna/scylla: tests: add multi-column filtering check cql3: remove incorrect multi-column check cql3: check filtering restrictions only if applicable cql3: add pk/ck_restrictions_need_filtering()	2018-11-28 15:36:37 +00:00
Piotr Sarna	ae0ffa6575	tests: add multi-column filtering check Multi-column restrictions filtering is not supported yet, so a simple case to ensure that is added.	2018-11-28 13:58:16 +01:00
Piotr Sarna	0013929782	cql3: remove incorrect multi-column check need_filtering() incorrectly returned false if multi-column restrictions were present. Instead, these restrictions should be allowed to need filtering. Fixes #3891	2018-11-28 13:58:16 +01:00
Piotr Sarna	65f21cc518	cql3: check filtering restrictions only if applicable Primary key restrictions should be checked only when they need filtering - otherwise it's superfluous, since they were already applied on query level.	2018-11-28 13:58:16 +01:00
Piotr Sarna	f59ddcab52	cql3: add pk/ck_restrictions_need_filtering() These functions return true if partition/clustering key restriction parts of statement restrictions require filtering.	2018-11-28 13:58:16 +01:00
Duarte Nunes	d09d4bbd91	Merge 'Fix checking if system tables need view updates' from Piotr " This miniseries ensures that system tables are not checked for having view updates, because they never do. What's more, distributed system table is used in the process, so it's unsafe to query the table while streaming it. Tests: unit (release), dtest(update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test) " * 'fix_checking_if_system_tables_need_view_updates_3' of https://github.com/psarna/scylla: streaming: don't check view building of system tables database: add is_internal_keyspace streaming: remove unused sstable_is_staging bool class	2018-11-28 10:00:34 +00:00
Piotr Sarna	8e6021dfa1	streaming: don't check view building of system tables System tables will never need view building, and, what's more, are actually used in the process of view build checking. So, checking whether system tables need a view update path is simplified to returning 'false'.	2018-11-28 09:21:56 +01:00
Piotr Sarna	1336b9ee31	database: add is_internal_keyspace Similarly to is_system_keyspace, it will allow checking if a keyspace is created for internal use.	2018-11-28 09:21:56 +01:00
Piotr Sarna	6ad2c39f88	streaming: remove unused sstable_is_staging bool class sstable_is_staging bool class is not used anywhere in the code anymore, so it's removed.	2018-11-28 09:21:56 +01:00
Duarte Nunes	9f639edaa2	Merge 'storage_proxy: fix some bugs in early (due to errors) request completion' from Gleb " The series fixed #3565 and #3566 " * 'gleb/write_failure_fixes' of github.com:scylladb/seastar-dev: storage_proxy: store hint for CL=ANY if all nodes replied with failure storage_proxy: complete write request early if all replicas replied with success of failure storage_proxy: check that write failure response comes from recognized replica storage_proxy: move code executed on write timeout into separate function	2018-11-27 21:44:01 +00:00
Takuya ASADA	52f030806f	install-dependencies.sh: fix dependency issues on Debian variants Sync Debian variants dependencies with dist/debian/control.mustache (before merging relocatable), use scylla 3rdparty packages. Since we use 3rdparty repo on seastar/install-dependencies.sh, drop repo setup part from this script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181031122800.11802-1-syuu@scylladb.com>	2018-11-27 21:44:01 +00:00
Gleb Natapov	17197fb005	storage_proxy: store hint for CL=ANY if all nodes replied with failure Current code assumes that request failed if all replicas replied with failure, but this is not true for CL=ANY requests. Take it into account. Fixed: #3565	2018-11-27 15:06:37 +02:00
Gleb Natapov	d1d04eae3c	storage_proxy: complete write request early if all replicas replied with success of failure Currently if write request reaches CL and all replicas replied, but some replied with failures, the request will wait for timeout to be retired. Detect this case and retire request immediately instead. Fixes #3566	2018-11-27 14:49:37 +02:00
Gleb Natapov	76ab3d716b	storage_proxy: check that write failure response comes from recognized replica Before accounting failure response we need to make sure it comes from a replica that participates in the request.	2018-11-27 14:44:49 +02:00
Rafael Ávila de Espíndola	777ea893e6	Delete data_consume_rows_at_once. As far as I can tell the old sstable reading code required reading the data into a contiguous buffer. The function data_consume_rows_at_once implemented the old behavior and incrementally code was moved away from it. Right now the only use is in two tests. The sstables used in those tests are already used in other tests with data_consume_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181127024319.18732-2-espindola@scylladb.com>	2018-11-27 14:11:50 +02:00
Avi Kivity	1ff6b8fb96	Merge "Don't binary compare compressed sstables in test_write_many_partitions_* tests" from Piotr " Compression is not deterministic so instead of binary comparing the sstable files we just read data back and make sure everything that was written down is still present. Tests: unit(release) " * 'haaawk/binary-compare-of-compressed-sstables/v3' of github.com:scylladb/seastar-dev: sstables: Remove compressed parameter from get_write_test_path sstables: Remove unused sstable test files sstables: Ensure compare_sstables isn't used for compressed files sstables: Don't binary compare compressed sstables sstables: Remove debug printout from test_write_many_partitions	2018-11-27 14:01:20 +02:00
Duarte Nunes	098dd90bd2	Merge 'Reduce dependencies around consistency_level.hh' from Avi " consistency_level.hh is rather heavyweighy in both its contents and what it includes. Reduce the number of inclusion sites and split the file to reduce dependencies. " * tag 'cl-header/v2' of https://github.com/avikivity/scylla: consistency_level: simplify validation API Split consistency_level.hh header database: remove unneeded consistency_level.hh include cql: remove unneeded includes of consistency_level.hh	2018-11-27 11:59:34 +00:00
Piotr Jastrzebski	4366302c4c	sstables: Extract mp_row_cosumer_m::check_schema_mismatch This method will contain common logic used in multiple places and reduce code duplication. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <bbda2f4ea4f9325055f096dc549f63b1bb03d3b6.1543311990.git.piotr@scylladb.com>	2018-11-27 12:45:12 +01:00
Avi Kivity	4676e07400	consistency_level: simplify validation API Remove unused parameters, replace refcounted pointers by references.	2018-11-27 13:41:49 +02:00
Avi Kivity	2c08bff8d5	Split consistency_level.hh header It has two unrelated users: cql for validation, and storage_proxy for complicated calculations. Split the simple stuff into a new header to reduce dependencies.	2018-11-27 13:32:10 +02:00
Avi Kivity	b015f41344	database: remove unneeded consistency_level.hh include	2018-11-27 13:30:56 +02:00
Gleb Natapov	7bc68aa0eb	storage_proxy: move code executed on write timeout into separate function Currently the callback is in lambda, but we will want to call the code not only during timer expiration.	2018-11-27 13:23:30 +02:00
Avi Kivity	9201d22c06	cql: remove unneeded includes of consistency_level.hh Move the includes to .cc to reduce include pollution.	2018-11-27 13:18:33 +02:00
Raphael S. Carvalho	626afa6973	database: conditionally release sstable references from compaction manager Not all compaction operations submitted through compaction manager sets a callback for releasing references of exhausted sstables in compaction manager itself. That callback lives in compaction descriptor which is passed to table::compaction(). Let's make the call conditional to avoid bad function call exceptions. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181126235616.10452-1-raphaelsc@scylladb.com>	2018-11-27 12:10:43 +01:00
Avi Kivity	2eaeb3e4eb	Update swagger-ui submodule Updates to version 2.2.10 with a local change (from Amnon) to support our location. Fixes #3942.	2018-11-27 13:01:02 +02:00
Tomasz Grabiec	17a8a9d13d	gdb: Properly parse unique_ptr in 'scylla lsa' There's no _M_t._M_head_impl any more in the standard library. We now have std_unique_ptr wrapper which abstracts this fact away so use that. Message-Id: <20181126174837.11542-1-tgrabiec@scylladb.com>	2018-11-27 12:32:41 +02:00
Tomasz Grabiec	eecda72175	gdb: Adjust 'scylla lsa' for removal of emergency reserve There's no _emergency_reserve any more. Show _free_segments instead. Message-Id: <20181126174837.11542-2-tgrabiec@scylladb.com>	2018-11-27 12:32:37 +02:00
Avi Kivity	5e759b0c07	Merge "Optimize checksum computation for the MC sstable format" from Tomek " One part of the improvement comes from replacing zlib's CRC32 with the one from libdeflate, which is optimized for modern architecture and utilizes the PCLMUL instruction. perf_checksum test was introduced to measure performance of various checksumming operations. Results for 514 B (relevant for writing with compression enabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 58414 16.711us 3.483ns 16.708us 16.725us crc_test.perf_adler_combine 165788278 6.059ns 0.031ns 6.027ns 7.519ns crc_test.perf_zlib_crc32_combine 59546 16.767us 26.191ns 16.741us 16.801us --- crc_test.perf_deflate_crc32_checksum 12705072 83.267ns 4.580ns 78.687ns 98.964ns crc_test.perf_adler_checksum 3918014 206.701ns 23.469ns 183.231ns 258.859ns crc_test.perf_zlib_crc32_checksum 2329682 428.787ns 0.085ns 428.702ns 510.085ns Results for 64 KB (relevant for writing with compression disabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 25364 38.393us 17.683ns 38.375us 38.545us crc_test.perf_adler_combine 169797143 5.842ns 0.009ns 5.833ns 6.901ns crc_test.perf_zlib_crc32_combine 26067 38.663us 95.094ns 38.546us 40.523us --- crc_test.perf_deflate_crc32_checksum 202821 4.937us 14.426ns 4.912us 5.093us crc_test.perf_adler_checksum 44684 22.733us 206.263ns 22.492us 25.258us crc_test.perf_zlib_crc32_checksum 18839 53.049us 36.117ns 53.013us 53.274us The new CRC32 implementation (deflate_crc32) doesn't provide a fast checksum_combine() yet, it delegates to zlib so it's as slow as the latter. Because for CRC32 checksum_combine() is several orders of magnitude slower than checksum(), we avoid calling checksum_combine() completely for this checksummer. We still do it for adler32, which has combine() which is faster than checksum(). SStable write performance was evaluated by running: perf_fast_forward --populate --data-directory /tmp/perf-mc \ --rows=10000000 -c1 -m4G --datasets small-part Below is a summary of the average frag/s for a memtable flush. Each result is an average of about 20 flushes with stddev of about 4k. Before: [1] MC,lz4: 330'903 [2] LA,lz4: 450'157 [3] MC,checksum: 419'716 [4] LA,checksum: 459'559 After: [1'] MC,lz4: 446'917 ([1] + 35%) [2'] LA,lz4: 456'046 ([2] + 1.3%) [3'] MC,checksum: 462'894 ([3] + 10%) [4'] LA,checksum: 467'508 ([4] + 1.7%) After this series, the performance of the MC format writer is similar to that of the LA format before the series. There seems to be a small but consistent improvement for LA too. I'm not sure why. " * tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla: tests: perf: Introduce perf_checksum tests: Add test for libdeflate CRC32 implementation sstables: compress: Use libdeflate for crc32 sstables: compress: Rename crc32_utils to zlib_crc32_checksummer licenses: Add libdeflate license Integrate libdeflate with the build system Add libdeflate submodule sstables: Avoid checksum_combine() for the crc32 checksummer sstables: compress: Avoid unnecessary checksum_combine() sstables: checksum_utils: Add missing include	2018-11-26 20:10:46 +02:00
Tomasz Grabiec	f1a35b654a	tests: perf: Introduce perf_checksum	2018-11-26 18:59:43 +01:00
Tomasz Grabiec	5b6e3fb5ed	tests: Add test for libdeflate CRC32 implementation	2018-11-26 18:59:42 +01:00
Tomasz Grabiec	bf0164cdaf	sstables: compress: Use libdeflate for crc32 Improves memtable flush performance by 10% in a CPU-bound case. Unlike the zlib implementation, libdeflate is optimized for modern CPUs. It utilizes the PCLMUL instruction.	2018-11-26 18:59:42 +01:00
Tomasz Grabiec	0ac1905f4f	sstables: compress: Rename crc32_utils to zlib_crc32_checksummer	2018-11-26 18:59:42 +01:00
Tomasz Grabiec	ba141a4852	licenses: Add libdeflate license	2018-11-26 18:59:41 +01:00
Tomasz Grabiec	048d569b45	Integrate libdeflate with the build system	2018-11-26 18:59:09 +01:00
Tomasz Grabiec	f704f7bc19	Add libdeflate submodule	2018-11-26 18:57:51 +01:00
Tomasz Grabiec	743cf43847	sstables: Avoid checksum_combine() for the crc32 checksummer checksum_combine() is much slower than re-feeding the buffer to checksum() for the zlib CRC32 checksummer. Introduce Checksum::prefer_combine() to determine this and select more optimal behavior for given checksummer. Improves performance of memtable flush with compression enabled by 30%.	2018-11-26 18:57:33 +01:00
Avi Kivity	b351a9fee7	db/repair_decision.hh: add missing #include Message-Id: <20181126154948.2453-1-avi@scylladb.com>	2018-11-26 18:49:08 +01:00
Tomasz Grabiec	88cf1c61ba	sstables: compress: Avoid unnecessary checksum_combine()	2018-11-26 14:31:38 +01:00
Tomasz Grabiec	8372cf7bcc	sstables: checksum_utils: Add missing include	2018-11-26 14:31:38 +01:00
Avi Kivity	c6d700279b	class_registry: introduce a non-static variant of class_registry class_registry's staticness brings has the usual problem of static classes (loss of dependency information) and prevents us from librarifying Scylla since all objects that define a registration must be linked in. Take a first step against this staticness by defining a nonstatic variant. The static class_registry is then redefined in terms of the nonstatic class. After all uses have been converted, the static variant can be retired. Message-Id: <20181126130935.12837-1-avi@scylladb.com>	2018-11-26 13:30:21 +00:00
Paweł Dziepak	62ea153629	Merge "Check for schema mismatch after dropping dead cells" from Piotr " Previously we were checking for schema incompatibility between current schema and sstable serialization header before reading any data. This isn't the best approach because data in sstable may be already irrelevant due to column drop for example. This patchset moves the check after actual data is read and verified that it has a timestamp new enough to classify it as nonobsolete. Fixes #3924 " * 'haaawk/3924/v3' of github.com:scylladb/seastar-dev: sstables: Enable test_schema_change for MC format sstables3: Throw error on schema mismatch only for live cells sstables: Pass column_info to consume_*_column sstables: Add schema_mismatch to column_info sstables: Store column data type in column_info sstables: Remove code duplication in column_translation	2018-11-26 13:10:18 +00:00
Avi Kivity	9a46ee69d4	doc: fix BYPASS CACHE documentation BYPASS CACHE was mistakenly documenting an earlier version of the patch. Correct it to document th committed version. Message-Id: <20181126125810.9344-1-avi@scylladb.com>	2018-11-26 13:04:52 +00:00
Piotr Jastrzebski	dec48dd1e2	sstables: Remove compressed parameter from get_write_test_path This parameter is no longer used. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:46:23 +01:00
Piotr Jastrzebski	92ffccd636	sstables: Remove unused sstable test files Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:35:15 +01:00
Piotr Jastrzebski	a29c9189cb	sstables: Ensure compare_sstables isn't used for compressed files Binary comparing compressed sstables is wrong because compression is not deterministic. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:35:15 +01:00
Piotr Jastrzebski	7e263208f0	sstables: Don't binary compare compressed sstables This family of test_write_many_partitions_* tests writes sstables down from memtable using different compressions. Then it compares the resulting file with a blueprint file and reads the data back to check everything is there. Compression is not deterministic so this patch makes the tests not compare resulting compressed sstable file with blueprint file and instead only read data back. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:35:03 +01:00
Piotr Jastrzebski	5c86294a56	sstables: Enable test_schema_change for MC format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:25:23 +01:00
Piotr Jastrzebski	4bdb86c712	sstables3: Throw error on schema mismatch only for live cells Previously we were throwing exception during the creation of column_translation. This wasn't always correct because sometimes column for which the mismatch appeared was already dropped and data present in sstable should be ignored anyway. Fixes #3924 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:25:10 +01:00
Piotr Sarna	6ab8235369	main: fix deinitialization order for view update generator View update generator should be stopped only after drain_on_shutdown() is performed on storage service. Message-Id: <4d2bda4c73422a2ebf46d6dcd06c95d960839889.1543230849.git.sarna@scylladb.com>	2018-11-26 11:21:37 +00:00
Duarte Nunes	2a371c2689	Merge 'Allow bypassing cache on a per-query basis' from Avi " Some queries are very unlikely to hit cache. Usually this includes range queries on large tables, but other patterns are possible. While the database should adapt to the query pattern, sometimes the user has information the database does not have. By passing this information along, the user helps the database manage its resources more optimally. To do this, this patch introduces a BYPASS CACHE clause to the SELECT statement. A query thus marked will not attempt to read from the cache, and instead will read from sstables and memtables only. This reduces CPU time spent to query and populate the cache, and will prevent the cache from being flooded with data that is not likely to be read again soon. The existing cache disabled path is engaged when the option is selected. Tests: unit (release), manual metrics verification with ccm with and without the BYPASS CACHE clause. Ref #3770. " * tag 'cache-bypass/v2' of https://github.com/avikivity/scylla: doc: document SELECT ... BYPASS CACHE tests: add test for SELECT ... BYPASS CACHE cql: add SELECT ... BYPASS CACHE clause db: add query option to bypass cache	2018-11-26 09:59:40 +00:00
Paweł Dziepak	13385778fd	Merge "Measure performance of dataset population in perf_fast_forward" from Tomasz * tag 'perf-ffwd-dataset-population-v2' of github.com:tgrabiec/scylla: tests: perf_fast_forward: Measure performance of dataset population tests: perf_fast_forward: Record the dataset on which test case was run tests: perf_fast_forward: Introduce the concept of a dataset tests: perf_fast_forward: Introduce make_compaction_disabling_guard() tests: perf_fast_forward: Initialize output manager before population tests: perf_fast_forward: Handle empty test parameter set tests: perf_fast_forward: Extract json_output_writer::write_common_test_group() tests: perf_fast_forward: Factor out access to cfg to a single place per function tests: perf_fast_forward: Extract result_collector tests: perf_fast_forward: Take writes into account in AIO statistics tests: perf_fast_forward: Reorder members tests: perf_fast_forward: Add --sstable-format command line option	2018-11-26 09:45:55 +00:00
Avi Kivity	58033ad3a4	doc: document SELECT ... BYPASS CACHE Add a new cql-extensions.md file and document BYPASS CACHE there.	2018-11-26 11:37:52 +02:00
Avi Kivity	f69401c609	tests: add test for SELECT ... BYPASS CACHE The test verifies that cache read metrics are not incremented during a cache bypass read.	2018-11-26 11:37:52 +02:00
Avi Kivity	ecf3f92ec7	cql: add SELECT ... BYPASS CACHE clause The BYPASS CACHE clause instructs the database not to read from or populate the cache for this query. The new keywords (BYPASS and CACHE) are not reserved.	2018-11-26 11:37:49 +02:00
Takuya ASADA	7740cd2142	dist/common/systemd/scylla-housekeeping-restart.service.mustache: specify correct repo for Debian variants We do specify correct repo for both Red Hat/Debian variants on -deily, but mistakenly don't for -restart, so do same on -restart. Fixes #3906 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181109224509.27380-1-syuu@scylladb.com>	2018-11-26 11:02:25 +02:00
Rafael Ávila de Espíndola	6746907999	Use fully covered switches in continuous_data_consumer do_process_buffer had two unreachable default cases and a long if-else-if chain. This converts the the if-else-if chain to a switch and a helper function. This moves the error checking from run time to compile time. If we were to add a 128 bit integer for example, gcc would complain about it missing from the switch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181125221451.106067-1-espindola@scylladb.com>	2018-11-25 22:52:11 +00:00
Avi Kivity	b4765af790	Merge "Introduce SSTable-run-based compaction" from Raphael " This new compaction approach consists of releasing exhausted fragments[1] of a run[2] a compaction proceeds, so decreasing considerably the space requirement. These changes will immediately benefit leveled strategy because it already works with the run concept. [1] fragment is a sstable composing a run; exhausted means sstable was fully consumed by compaction procedure. [2] run is a set of non-overlapping sstables which roughly span the entire token range. Note: Last patch includes an example compaction strategy showing how to work with the interface. unit tests: all modes passing dtests: compaction ones passing " * 'sstable_run_based_compaction_v10' of github.com:raphaelsc/scylla: tests: add example compaction strategy for sstable run based approach sstables/compaction: propagate sstable replacement to all compaction of a CF sstables: store cf pointer in compaction_info tests/sstable_test: add test for compaction replacement of exhausted sstable sstables: add sstable's on closed handling tests/sstables: add test for sstable run based compaction sstables/compaction_manager: prevent partial run from being selected for compaction compaction: use same run identifier for sstables generated by same compaction sstables: introduce sstable run sstables/compaction_manager: release reference to exhausted sstable through callback sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor database: do not keep reference to sstable in selector when done selecting compaction: share sstable set with incremental reader selector sstables/compaction: release space earlier of exhausted input sstables sstables: make partitioned sstable set's incremental selector resilient to changes in the set database: do not store reference to sstable in incremental selector tests/sstables: add run identifier correctness test sstables: use a random uuid for sstables without run identifier sstables: add run identifier to scylla metadata	2018-11-25 17:20:24 +02:00
Avi Kivity	b835b93ee6	db: add query option to bypass cache With the option enabled, we bypass the cache unconditionally and only read from memtables+sstables. This is useful for analytics queries.	2018-11-25 16:26:08 +02:00
Piotr Jastrzebski	c2561a2796	sstables: Remove debug printout from test_write_many_partitions Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-25 13:29:10 +01:00
Raphael S. Carvalho	3fa70d6b5f	tests: add example compaction strategy for sstable run based approach Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 20:16:54 -02:00
Raphael S. Carvalho	2058001f94	sstables/compaction: propagate sstable replacement to all compaction of a CF This is needed for parallel compaction to work with sstable run based approach. That's because regular compaction clones a set containing all sstables of its column family. So compaction A can potentially hold a reference to a compacting sstable of compaction B, so preventing compacting B from releasing its exhausted sstable. So all replacements are propagated to all compactions of a given column family, and compactions in turn, including the one which initiated the propagation, will do the replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:30 -02:00
Raphael S. Carvalho	953fdcc867	sstables: store cf pointer in compaction_info motivation is that we need a more efficient way to find compactions that belong to a given column family in compaction list. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:28 -02:00
Raphael S. Carvalho	baf89f0df3	tests/sstable_test: add test for compaction replacement of exhausted sstable Make sure that compaction is capable of releasing exhausted sstable space early in the procedure. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:26 -02:00
Raphael S. Carvalho	824c20b76d	sstables: add sstable's on closed handling Motivation is that it will be useful for catching regression on compaction when releasing early exhausted sstables. That's because sstable's space is only released once it's closed. So this will allow us to write a test case and possibly use it for entities holding exhausted sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:25 -02:00
Raphael S. Carvalho	0085e8371d	tests/sstables: add test for sstable run based compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:23 -02:00
Raphael S. Carvalho	e88d1d54b9	sstables/compaction_manager: prevent partial run from being selected for compaction Filter out sstable belonging to a partial run being generated by an ongoing compaction. Otherwise, that could lead to wrong decisions by the compaction strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:22 -02:00
Raphael S. Carvalho	23884fe9f6	compaction: use same run identifier for sstables generated by same compaction SSTables composing the same run will share the same run identifier. Therefore, a new compaction strategy will be able to get all sstables belong to the same run from sstable_set, which now keeps track of existing runs. Same UUID is passed to writers of a given compaction. Otherwise, a new UUID is picked for every sstable created by compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:20 -02:00
Raphael S. Carvalho	4f68cb34a6	sstables: introduce sstable run sstable run is a structure that will hold all sstables that has the same run identifier. All sstables belonging to the same run will not overlap with one another. It can be used by compaction strategy to work on runs instead of individual sstables. sstable_set structure which holds all sstables for a given column family will be responsible for providing to its user an interface to work with runs instead of individual sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:18 -02:00
Raphael S. Carvalho	fc92fb955d	sstables/compaction_manager: release reference to exhausted sstable through callback That's important for the reference to sstable to not be kept throughout the compaction procedure, which would break the goal of releasing space during compaction. Manager passes a callback to compaction which calls it whenever there's sstable replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:16 -02:00
Raphael S. Carvalho	3f309ebba9	sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor Motivation is that we want to release space for exhausted sstable and that will only happen when all references to it are gone and that backlog tracker takes the early replacement into account. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:13 -02:00
Raphael S. Carvalho	3433de3dc0	database: do not keep reference to sstable in selector when done selecting When compacting, we'll create all readers at once and will not select again from incremental selector, meaning the selector will keep all respective sstables in current_sstables, preventing compaction from releasing space as it goes on. The change is about refreshing sstable set's selector such that it will not hold a reference to an exhausted sstable whatsoever. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:12 -02:00
Raphael S. Carvalho	f6df949c1a	compaction: share sstable set with incremental reader selector By doing that, we'll be able to release exhausted sstable from both simulteaneously. That's achieved by sharing set containing input sstables with the incremental reader selector and removing exhausted sstables from shared set when the time has come. Step towards reducing disk requirement for compaction by making it delete sstable which all data is in a sealed new sstable. For that to happen, all references must be gone. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:10 -02:00
Raphael S. Carvalho	e5a0b05c15	sstables/compaction: release space earlier of exhausted input sstables Currently, compaction only replace input sstables at end of compaction, meaning compaction must be finished for all the space of those sstables to be released. What we can do instead is to delete earlier some input sstable under some conditions: 1) SStable data should be committed to a new, sealed output sstable, meaning it's exhausted. 2) Exhausted sstable mustn't overlap with a non-exhausted sstable because a tombstone in the exhausted could have been purged and the shadowed data in non-exhausted could be ressurected if system crashes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:07 -02:00
Raphael S. Carvalho	ace070c8fc	sstables: make partitioned sstable set's incremental selector resilient to changes in the set The motivation is that compaction may remove a sstable from the set while the incremental selector is alive, and for that to work, we need to invalidate the iterators stored by the selector. We could have added a method to notify it, but there will be a case where the one keeping the set cannot forward the notification to the selector. So it's better for the selector to take care of itself. Change counter approach is used which allows the selector to know when to invalidate the iterators. After invalidation, selector will move the iterator back into its right place by looking for lower bound for current pos. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:05 -02:00
Raphael S. Carvalho	8d11b0bbb4	database: do not store reference to sstable in incremental selector Use sstable generation instead to keep track of read sstables. The motivation is that we'll not keep reference to sstables, so allowing their space on disk to be released as soon they get exhausted. Generation is used because it guarantees uniqueness of the sstable. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:04 -02:00
Raphael S. Carvalho	edc87014c1	tests/sstables: add run identifier correctness test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:02 -02:00
Raphael S. Carvalho	a66b1954cc	sstables: use a random uuid for sstables without run identifier Older sstables must have an identifier for them to be associated with their own run. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:01 -02:00
Raphael S. Carvalho	62025fa52c	sstables: add run identifier to scylla metadata It identifies a run which a particular sstable belongs to. Existing sstables will have a random uuid associated with it in memory. UUID is the correct choice because it allows sstables to be exported without having conflicts when using identifier generated by different nodes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:52:44 -02:00
Rafael Ávila de Espíndola	d18bbe9d45	Remove unreachable default cases. These switches are fully covered. We can be sure they will stay this way because of -Werror and gcc's -Wswitch warning. We can also be sure that we never have an invalid enum value since the state machine values are not read from disk. The patch also removes a superfluous ';'. Message-Id: <20181124020128.111083-1-espindola@scylladb.com>	2018-11-24 09:31:51 +00:00
Piotr Jastrzebski	569508158c	sstables: Pass column_info to consume_*_column This will allow checking for schema mismatches and better error messages. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-23 21:48:14 +01:00
Piotr Jastrzebski	9ca6877cbd	sstables: Add schema_mismatch to column_info This field is true when there's a mismatch between column type in serialization header and current schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-23 21:48:14 +01:00
Piotr Jastrzebski	51fa8e0c94	sstables: Store column data type in column_info This will be used to check schema mismatch and provide informative error message. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-23 21:48:14 +01:00
Piotr Jastrzebski	99dfb9cc96	sstables: Remove code duplication in column_translation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-23 21:48:14 +01:00
Raphael S. Carvalho	d29482dce8	sstables: deprecate sstable metadata's ancestors The reason for that is that it's not available in sstable format mc, so we can no longer rely on it in common code for the currently supported formats. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>	2018-11-23 19:38:32 +01:00
Tomasz Grabiec	8e93046abc	tests: perf_fast_forward: Measure performance of dataset population	2018-11-23 19:22:50 +01:00
Tomasz Grabiec	2c95aa4d8d	tests: perf_fast_forward: Record the dataset on which test case was run Now any given test case can potentially run on many different datasets.	2018-11-23 19:22:12 +01:00
Tomasz Grabiec	470552b7ab	tests: perf_fast_forward: Introduce the concept of a dataset A dataset represents a table with data, populated in certain way, with certain characteristics of the schema and data. Before this change, datasets were implicitly defined, with population hard-coded inside the populate() function. This change gathers logic related to datasets into classes, in order to: - make it easier to define new datasets. - be able to measure performance of dataset population in a standardized way. - being able to express constraints on datasets imposed by different test cases. Test cases are matched with possible datasets based on the abstract interface they accept (e.g. clustered_ds, multipartition_ds), and which must be implemented by a compatible dataset. To facilitate this matching, test function is now wrapped into a dataset_acceptor object, with an automatically-generated can_run() virtual method, deduced by make_test_fn(). - be able to select tests to run based on the dataset name. Only tests which are compatible with that dataset will be run.	2018-11-23 19:22:09 +01:00
Tomasz Grabiec	2746f78a9f	tests: perf_fast_forward: Introduce make_compaction_disabling_guard()	2018-11-23 19:18:10 +01:00
Tomasz Grabiec	b00d360281	tests: perf_fast_forward: Initialize output manager before population	2018-11-23 19:18:10 +01:00
Tomasz Grabiec	25dc481030	tests: perf_fast_forward: Handle empty test parameter set	2018-11-23 19:18:10 +01:00
Tomasz Grabiec	38a1b7e87b	tests: perf_fast_forward: Extract json_output_writer::write_common_test_group()	2018-11-23 19:18:10 +01:00
Tomasz Grabiec	a507ca8159	tests: perf_fast_forward: Factor out access to cfg to a single place per function Preparatory change before making n_rows be determined through a dataset object.	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	3fc78a25bf	tests: perf_fast_forward: Extract result_collector Extracts the result collection and reporting logic out of run_test_case(). Will be needed in population tests, for which we don't want the looping logic.	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	f4a70283ee	tests: perf_fast_forward: Take writes into account in AIO statistics Relevant for population tests. So far all tests were read tests.	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	96f5bd2f46	tests: perf_fast_forward: Reorder members	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	3ac5e8887e	tests: perf_fast_forward: Add --sstable-format command line option	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	564b328b2e	Merge 'Add tests for schema changes' from Paweł This series adds a generic test for schema changes that generates various schema and data before and after an ALTER TABLE operation. It is then used to check correctness of mutation::upgrade() and sstable readers and lead to the discovery of #3924 and #3925. Fixes #3925. * https://github.com/pdziepak/scylla.git schema-change-test/v3.1 schema_builder: make member function names less confusing converting_mutation_partition_applier: fix collection type changes converting_mutation_partition_applier: do not emit empty collections sstable: use format() instead of sprint() tests/random-utils: make functions and variables inline tests: add models for schemas and data tests: generate schema changes tests/mutation: add test for schema changes tests/sstable: add test for schema changes	2018-11-23 15:11:31 +01:00
Paweł Dziepak	09439cd809	tests/sstable: add test for schema changes for_each_schema_change() is used for testing reading an sstable that was written with a different schema. Because of #3924, for now the mc format is not verified this way.	2018-11-23 12:14:06 +00:00
Paweł Dziepak	dc7f9fea5b	tests/mutation: add test for schema changes	2018-11-23 12:14:06 +00:00
Paweł Dziepak	35f9f424e9	tests: generate schema changes This patch adds for_each_schema_change() functions which generates schemas and data before and after some modification to the schema (e.g. adding a column, changing its type). It can be used to test schema upgrades.	2018-11-23 12:14:06 +00:00
Paweł Dziepak	daee4bd3b8	tests: add models for schemas and data This patch introduces a model of Scylla schemas and data, implemented using simple standard library primitives. It can be used for testing the actuall schemas, mutation_partitions, etc. used by the schema by comparing the results of various actions. The initial use case for this model was to test schema changes, but there is no reason why in the future it cannot be extended to test other things as well.	2018-11-23 12:14:06 +00:00
Takuya ASADA	cf0d00b81a	dist/ami: fix 'unknown configuration key: "enhanced_networking"' error while building AMI packer 1.3.2 no longer supported enhanced_networking directive, we need to use new directives("sriov_support" and "ena_support") to build with new version. packer provides automatic configuration file fixing tool, so new scylla.json is generated by following command: ./packer/packer fix scylla.json Fixes #3938 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181123053719.32451-1-syuu@scylladb.com>	2018-11-23 08:15:47 +02:00
Paweł Dziepak	91793c0a43	bytes_ostream: drop appending_hash specialisation appending_hash is used for computing hashes that become part of the binary interface. They cannot change between Scylla version and the same data needs to always result in the same hash. At the moment, appending_hash<bytes_ostream> doesn't fulfil those requirements since it leaks information how the underlying buffer is fragmented. Fortunately, it has no users so it doesn't casue any compatibility issues. Moreover, bytes_ostream is usually used as an output of some serialisation routine (e.g. frozen_mutation_fragment or CQL response). Those serialisation formats do not guarantee that there is a single representation of a given data and therefore are not fit to be hashed by appending_hash. Removing appending_hash<bytes_ostream> may help preventing such incorrect uses. Message-Id: <20181122163823.12759-1-pdziepak@scylladb.com>	2018-11-22 23:53:54 +00:00
Tomasz Grabiec	fb38f0e9f8	Update seastar submodule * seastar b924495...1fbb633 (3): > rpc: Reduce code duplication > tests: perf: Make do_not_optimize() take the argument by const& > doc: Fix import paths in the tutorial	2018-11-22 23:53:54 +00:00
Paweł Dziepak	2a0e929830	tests/random-utils: make functions and variables inline random-utils.hh is a header which may be included in multiple translation units so all members should be non-static inline to avoid any duplication.	2018-11-22 11:30:31 +00:00
Paweł Dziepak	edb5402a73	sstable: use format() instead of sprint() The format message was using the new stlye formatting markers ("{}") which are understood by format() but not by sprint() (the latter is basically deprecated).	2018-11-22 11:30:31 +00:00
Paweł Dziepak	1fbe33791d	converting_mutation_partition_applier: do not emit empty collections This patch changes the behaviour of the schema upgrade code so that if all cells and the tombstons of a collection are removed during the upgrade the collection is not emitted (as opposed to emitting an empty one). Both behaviours are valid, but the new one makes it more consistent with how atomic cells are upgraded and how schema upgrades work for sstable readers.	2018-11-22 11:30:31 +00:00
Paweł Dziepak	7b12aaa093	converting_mutation_partition_applier: fix collection type changes ALTER TABLE allows changing the type of a collection to a compatible one. This includes changes from a fixed-sized type to a variable-sized one. If that happens the atomic_cells representing collection elements need to be rewritten so that the value size is included. The logic for rewritting atomic cells already exists (for those that are not collection members) and is reused in this patch. Fixes #3925.	2018-11-22 11:30:31 +00:00
Paweł Dziepak	43e0201ec6	schema_builder: make member function names less confusing Right now, schema_builder member functions have names that very poorly convey the actions that are performed for them. This is made even worse by some overloads which drastically change the semantics. For example: schema_builder() .with_column("v1", /* ... /) .without_column("v1", removal_timestamp); Creates a column "v1" and adds an information that there was a column with that name that was removed at 'removal_timestamp'. schema_builder() .with_coulmn("v1") .without_column(utf8_type->decompose("v1")); This adds column "v1" and then immediately removes it. In order to clean up this mess the names were changes so that: with_/without_ functions only add informations to the schema (e.g. info that a column was removed, but without removing a column of that name if one exists) * functions which names start with a verb actually perform that action, e.g. the new remove_column() removes the column (and adds information that it used to exist) as in the second example.	2018-11-22 11:30:31 +00:00
Benny Halevy	dcd18e2b62	remove exec permission from top_k source files This was introduced by `32525f2694` Cc: Rafi Einstein <rafie@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181121163352.13325-1-bhalevy@scylladb.com>	2018-11-21 18:38:50 +02:00
Gleb Natapov	b4a8802edc	hints: make hints manager more resilient to unexpected directory content Currently if hints directory contains unexpected directories Scylla fails to start with unhandled std::invalid_argument exception. Make the manager ignore malformed files instead and try to proceed anyway. Message-Id: <20181121134618.29936-2-gleb@scylladb.com>	2018-11-21 14:53:03 +00:00
Gleb Natapov	9433d02624	hints: add auxiliary function for scanning high level hints directory We scan hints directory in two places: to search for files to replay and to search for directories to remove after resharding. The code that translates directory name to a shard is duplicated. It is simple now, so not a bit issue but in case it grows better have it in one place. Message-Id: <20181121134618.29936-1-gleb@scylladb.com>	2018-11-21 14:53:03 +00:00
Paweł Dziepak	4aa5d83590	Merge "Optimize sstable writing of the MC format" from Tomasz " Tested with perf_fast_forward from: github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1 Using the following command line: build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \ --data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \ --datasets small-part The average reported flush throughput was (stdev for the avergages is around 4k): - for mc before the series: 367848 frag/s - for lc before the series: 463458 frag/s (= mc.before +25%) - for mc after the series: 429276 frag/s (= mc.before +16%) - for lc after the series: 466495 frag/s (= mc.before +26%) Refs #3874. " * tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla: sstables: mc: Avoid serialization of promoted index when empty sstables: mc: Avoid double serialization of rows tests: sstable 3.x: Do not compare Statistics component utils: Introduce memory_data_sink schema: Optimize column count getters sstables: checksummed_file_data_sink_impl: Bypass output_stream	2018-11-21 13:11:40 +00:00
Tomasz Grabiec	049926bfb8	sstables: mc: Avoid serialization of promoted index when empty calculate_write_size() adds some overhead, even if we're not going to write anything.	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	0a9f5b563a	sstables: mc: Avoid double serialization of rows The old code was serializing the row twice. Once to get the size of its block on disk, which is needed to write the block length, and then to actually write the block. This patch avoids this by serializing once into a temporary buffer and then appending that buffer to the data file writer. I measured about 10% improvement in memtable flush throughput with this for the small-part dataset in perf_fast_forward.	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	8f686af9af	tests: sstable 3.x: Do not compare Statistics component The Statistics component recorded in the test was generated using a buggy verion of Scylla, and is not correct. Exposed by fixing the bug in the way statistics are generated. Rather than comparing binary content, we should have explicit checks for statistics.	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	143fd6e1c2	utils: Introduce memory_data_sink	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	789fac9884	schema: Optimize column count getters	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	8e8b96c6ed	sstables: checksummed_file_data_sink_impl: Bypass output_stream We can avoid the data copying by switching from this: sink -> stream -> sink to this: sink -> sink	2018-11-21 14:04:27 +01:00
Avi Kivity	bb85a21a8f	Merge "compress: Restore lz4 as default compressor" from Duarte " Enables sstable compression with LZ4 by default, which was the long-time behavior until a regression turned off compression by default. Fixes #3926 " * 'restore-default-compression/v2' of https://github.com/duarten/scylla: tests/cql_query_test: Assert default compression options compress: Restore lz4 as default compressor tests: Be explicit about absence of compression	2018-11-21 14:20:39 +02:00
Benny Halevy	76b1c184b7	conf: clean up cassandra references in scylla.yaml Indicate the default scylla directories, rather than Cassandra's. Provide links to Scylladocumentation where possible, update links to Casandra documentation otherwise. Clean up a few typos. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181119141912.28830-1-bhalevy@scylladb.com>	2018-11-21 13:04:24 +02:00
Rafael Ávila de Espíndola	7fa7e9716d	Mention scylla-tools-java and scylla-jmx in HACKING.md I struggled a bit finding out why nodetool was not working, so it might be a good idea to expand the documentation a bit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181120233358.25859-1-espindola@scylladb.com>	2018-11-21 12:55:17 +02:00
Tomasz Grabiec	349c9f7a69	HACKING.md: Add a link to the slides about core dump debugging tools Message-Id: <1542793207-1620-1-git-send-email-tgrabiec@scylladb.com>	2018-11-21 11:45:23 +02:00
Michael Munday	53fdde75f6	dht: use little endian byte order explicitly for token hash This avoids a difference between little and big endian sytems. We now also calculate a full murmur hash for tokens with less than 8 bytes, however in practice the token size is always 8. Message-Id: <20181120214733.43800-1-mike.munday@ibm.com>	2018-11-21 11:44:29 +02:00
Michael Munday	360374cfde	tests: fix compilation of partitioner_test with boost 1.68 on IBM Z The boost multiprecision library that I am compiling against seems to be missing an overload for the cast to a string. The easy workaround seems to be to call str() directly instead. This also fixes #3922. Message-Id: <20181120215709.43939-1-mike.munday@ibm.com>	2018-11-21 11:43:42 +02:00
Duarte Nunes	9464fffc8c	tests/cql_query_test: Assert default compression options Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-11-20 22:47:27 +00:00
Duarte Nunes	36dc9e3280	compress: Restore lz4 as default compressor Fixes a regression introduced in `74758c87cd`, where tables started to be created without compression by default (before they were created with lz4 by default). Fixes #3926 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-11-20 22:47:27 +00:00
Duarte Nunes	5f64e34fcc	tests: Be explicit about absence of compression Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-11-20 22:47:26 +00:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Takuya ASADA	42baf6a6f7	dist/ami: update packer Update packer to latest version, 1.3.2. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181031110441.16284-2-syuu@scylladb.com>	2018-11-20 21:29:57 +02:00
Takuya ASADA	b9a42e83ad	dist/ami: enable AMI build log To make easier to debug AMI build error, enable logging. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181031110441.16284-1-syuu@scylladb.com>	2018-11-20 21:29:57 +02:00
Takuya ASADA	72411f95cb	reloc/build_reloc.sh: find ninja-build after executed install-dependencies.sh The build environment may not installed ninja-build before running install-dependencies.sh, so do it after running the script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181031110737.17755-1-syuu@scylladb.com>	2018-11-20 21:29:57 +02:00
Avi Kivity	183c2369f3	Update seastar submodule * seastar a44cedf...d59fcef (10): > dns: Set tcp output stream buffer size to zero explicitly > tests: add libc-ares to travis dependencies > tests: add dns_test to test suite > build: drop bundled c-ares package > prometheus: replace the instance label with an optional one > build: Refactor C++ dialect detection > build: add libatomic to install-depenencies.sh > core: use std::underlying_type for open_flags > core: introduce open_flags::operator& > core: Fix build for `gnu++14`	2018-11-20 21:29:57 +02:00
Tomasz Grabiec	57e25fa0f8	utils: phased_barrier: Make advance_and_await() have strong exception guarantees Currently, when advance_and_await() fails to allocate the new gate object, it will throw bad_alloc and leave the phased_barrier object in an invalid state. Calling advance_and_await() again on it will result in undefined behavior (typically SIGSEGV) beacuse _gate will be disengaged. One place affected by this is table::seal_active_memtable(), which calls _flush_barrier.advance_and_await(). If this throws, subsequent flush attempts will SIGSEGV. This patch rearranges the code so that advance_and_await() has strong exception guarantees. Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>	2018-11-20 16:15:12 +00:00
Glauber Costa	9f403334c8	remove monitor if sstable write failed In (almost) all SSTable write paths, we need to inform the monitor that the write has failed as well. The monitor will remove the SSTable from controller's tracking at that point. Except there is one place where we are not doing that: streaming of big mutations. Streaming of big mutations is an interesting use case, in which it is done in 2 parts: if the writing of the SSTable fails right away, then we do the correct thing. But the SSTables are not commited at that point and the monitors are still kept around with the SSTables until a later time, when they are finally committed. Between those two points in time, it is possible that the streaming code will detect a failure and manually call fail_streaming_mutations(), which marks the SSTable for deletions. At that point we should propagate that information to the monitor as well, but we don't. Fixes #3732 (hopefully) Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181114213618.16789-1-glauber@scylladb.com>	2018-11-20 16:15:12 +00:00
Gleb Natapov	d144e6ceac	messaging_service: enable port load balancing algorithm for RPC server In a homogeneous cluster this will reduce number of internal cross-shard hops per request since RPC calls will arrive to correct shard. Message-Id: <20181118150817.GF2062@scylladb.com>	2018-11-20 16:15:12 +00:00
Michael Munday	b9a2f4a228	dht: fix byte ordered partitioner midpoint calculation New versions of boost saturate the output of the convert_to method so we need to mask the part we want to extract. Updates #3922. Message-Id: <20181116191441.35000-1-mike.munday@ibm.com>	2018-11-16 21:19:06 +02:00
Glauber Costa	c6811bd877	sstables: correctly parse estimated histograms In commit `a33f0d6`, we changed the way we handle arrays during the write and parse code to avoid reactor stalls. Some potentially big loops were transformed into futurized loops, and also some calls to vector resizes were replaced by a reserve + push_back idiom. The latter broke parsing of the estimated histogram. The reason being that the vectors that are used here are already initialized internally by the estimated_histogram object. Therefore, when we push_back, we don't fill the array all the way from index 0, but end up with a zeroed beginning and only push back some of the elements we need. We could revert this array to a resize() call. After all, the reason we are using reserve + push_back is to avoid calling the constructor member for each element, but We don't really expect the integer specialization to do any of that. However, to avoid confusion with future developers that may feel tempted to converted this as well for the sake of consistency, it is safer to just make sure these arrays are zeroed. Fixes #3918 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181116130853.10473-1-glauber@scylladb.com>	2018-11-16 20:52:44 +02:00
Avi Kivity	d708dabab9	doc: add reference to Linux' submitting-patches document Since our development process is a derivative of Linux, almost everything there is pertinent. Message-Id: <20181115184037.5256-1-avi@scylladb.com>	2018-11-16 20:15:40 +02:00
Vladimir Krivopalov	759fbbd5f6	random_mutation_generator: Add row_marker to rows regardless of whether they're deleted. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <f55b91f1349f0e98def6b7ca9755b5ccf4f48a3e.1542308626.git.vladimir@scylladb.com>	2018-11-16 13:17:07 +01:00
Avi Kivity	6548a404b2	Remove patch file committed by mistake	2018-11-15 19:47:55 +02:00
Duarte Nunes	6fbf792777	db/view/view_builder: Don't timeout waiting for view to be built Remove the timeout argument to db::view::view_builder::wait_until_built(), a test-only function to wait until a given materialized view has finished building. This change is motivated by the fact that some tests running on slow environments will timeout. Instead of incrementally increasing the timeout, remove it completely since tests are already run under an exterior timeout. Fixes #3920 Tests: unit release(view_build_test, view_schema_test) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181115173902.19048-1-duarte@scylladb.com>	2018-11-15 19:41:43 +02:00
Amnon Heiman	25378916bc	API: colummn_family.hh yield in map_reduce_column_families_locally map_reduce_column_families_locally iterate over all tables (column family) in a shard. If the number of tables is big it can cause latency spikes. This patch replaces the current loop with a do_for_each allowing preepmtion inside the loop. Fixes #3886 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20181115154825.23430-1-amnon@scylladb.com>	2018-11-15 18:58:23 +02:00
Nadav Har'El	45f05b06d2	view_complex_test: fix another ttl In a previous patch I fixed most TTLs in the view_complex_test.cc tests from low numbers to 100 seconds. I missed one. This one never caused problems in practice, but for good form, let's fix it too. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115160234.26478-1-nyh@scylladb.com>	2018-11-15 18:03:28 +02:00
Nadav Har'El	78ed7d6d0c	Materialized Views and Secondary Index: no longer experimental After this patch, the Materialized Views and Secondary Index features are considered generally-available and no longer require passing an explicit "--experimental=on" flag to Scylla. The "--experimental=on" flag and the db::config::check_experimental() function remain unused, as we graduated the only two features which used this flag. However, we leave the support for experimental features in the code, to make it easier to add new experimental features in the future. Another reason to leave the command-line parameter behind is so existing scripts that still use it will not break. Fixes #3917 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115144456.25518-1-nyh@scylladb.com>	2018-11-15 17:59:27 +02:00
Vladimir Krivopalov	51afb1d8bd	tests: Generate deleted rows and shadowable tombstones in random_mutation_generator. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <77e956890264023227e07cc6d295df870d0a5af2.1542295208.git.vladimir@scylladb.com>	2018-11-15 16:26:07 +01:00
Avi Kivity	0216f49bb0	Merge "Add filtering support for CONTAINS" from Piotr " This series enables filtering support for CONTAINS restriction. " * 'enable_filtering_for_contains_2' of https://github.com/psarna/scylla: tests: add CONTAINS test case to filtering tests cql3: enable filtering for CONTAINS restriction cql3: add is_satisfied_by(bytes_view) for CONTAINS	2018-11-15 16:49:29 +02:00
Nadav Har'El	4108458b8e	view_complex_test: increase low ttl which may fail test on busy machine Several of the tests in tests/view_complex_test.cc set a cell with a TTL, and then skip time ahead artificially with forward_jump_clocks(), to go past the TTL time and check the cell disappeared as expected. The TTLs chosen for these tests were arbitrary numbers - some had 3 seconds, some 5 seconds, and some 60 seconds. The actual number doesn't matter - it is completely artificial (we move the clock with forward_jump_clocks() and never really wait for that amount of time) and could very well be a million seconds. But low numbers, like the 3 seconds, present a problem on extremely overcomitted test machines. Our eventually() function already allows for the possibility that things can hang for up to 8 seconds, but with a 3 second TTL, we can find ourselves with data being expired and the test failing just after 3 seconds of wall time have passed - while the test intended that the dataq will expire only when we explicitly call forward_jump_clocks(). So this patch changes all the TTLs in this test to be the same high number - 100 seconds. This hopefully fixes #3918. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115125607.22647-1-nyh@scylladb.com>	2018-11-15 15:34:08 +02:00
Piotr Jastrzebski	411437f320	Fix format string in mutation_partition::operator<< fmt does not allow bool values for :d and previous format string was resulting in: fmt::v5::format_error: invalid type specifier Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <3980a3cdb903263e29689b1c6cd24e3592826fe0.1542284205.git.piotr@scylladb.com>	2018-11-15 12:22:10 +00:00
Yannis Zarkadas	d292d0c78d	dist/redhat: extend docker entrypoint with more cmd flags With the use of Docker image, some extra options needed to be exposed to provide extended functionality when starting the image. The flags added by this commit are: - cluster-name: name of the Scylla cluster. cluster_name option in scylla.yaml. - rpc-address: IP address for client connections (CQL). rpc_address option in scylla.yaml. - endpoint-snitch: The snitch used to discover the cluster topology. endpoint_snitch option in scylla.yaml. - replace-address-first-boot: Replace a Scylla node by its IP. replace_address_first_boot option in scylla.yaml. Signed-off-by: Yannis Zarkadas <yanniszarkadas@gmail.com> [ penberg@scylladb.com: fix up merge conflicts ] Message-Id: <20181108234212.19969-2-yanniszarkadas@gmail.com>	2018-11-15 09:07:52 +02:00
Alexys Jacob	cd9d01cd7e	test.py: coding style fixes test.py:26:1: F401 'signal' imported but unused test.py:27:1: F401 'shlex' imported but unused test.py:28:1: F401 'threading' imported but unused test.py:173:1: E305 expected 2 blank lines after class or function definition, found 1 test.py:181:34: E241 multiple spaces after ',' test.py:183:34: E241 multiple spaces after ',' test.py:209:24: E222 multiple spaces after operator test.py:240:5: E301 expected 1 blank line, found 0 test.py:249:23: W504 line break after binary operator test.py:254:9: E306 expected 1 blank line before a nested definition, found 0 test.py:263:13: F841 local variable 'out' is assigned to but never used test.py:264:33: E128 continuation line under-indented for visual indent test.py:265:33: E128 continuation line under-indented for visual indent test.py:266:33: E128 continuation line under-indented for visual indent test.py:274:64: F821 undefined name 'e' test.py:278:53: F821 undefined name 'e' Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104115255.22547-1-ultrabug@gentoo.org>	2018-11-14 19:25:14 +02:00
Alexys Jacob	e76a1085d3	scylla-gdb.py: coding style fixes scylla-gdb.py:1:11: E401 multiple imports on one line scylla-gdb.py:5:1: F811 redefinition of unused 're' from line 2 scylla-gdb.py:10:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:19:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:24:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:30:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:39:9: E722 do not use bare 'except' scylla-gdb.py:47:33: E711 comparison to None should be 'if cond is None:' scylla-gdb.py:63:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:90:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:115:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:139:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:161:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:184:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:204:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:210:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:214:5: E301 expected 1 blank line, found 0 scylla-gdb.py:221:5: E301 expected 1 blank line, found 0 scylla-gdb.py:224:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:252:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:267:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:284:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:300:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:314:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:318:5: E301 expected 1 blank line, found 0 scylla-gdb.py:322:5: E301 expected 1 blank line, found 0 scylla-gdb.py:337:1: E305 expected 2 blank lines after class or function definition, found 1 scylla-gdb.py:339:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:342:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:345:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:348:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:352:129: E202 whitespace before ')' scylla-gdb.py:361:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:363:129: E202 whitespace before ')' scylla-gdb.py:371:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:375:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:378:5: E301 expected 1 blank line, found 0 scylla-gdb.py:383:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:386:5: E301 expected 1 blank line, found 0 scylla-gdb.py:393:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:396:5: E301 expected 1 blank line, found 0 scylla-gdb.py:407:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:410:5: E301 expected 1 blank line, found 0 scylla-gdb.py:412:9: E306 expected 1 blank line before a nested definition, found 0 scylla-gdb.py:439:26: E703 statement ends with a semicolon scylla-gdb.py:462:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:500:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:506:5: E722 do not use bare 'except' scylla-gdb.py:516:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:518:18: E271 multiple spaces after keyword scylla-gdb.py:522:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:530:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:533:5: E301 expected 1 blank line, found 0 scylla-gdb.py:537:13: E306 expected 1 blank line before a nested definition, found 0 scylla-gdb.py:547:9: E722 do not use bare 'except' scylla-gdb.py:550:26: E261 at least two spaces before inline comment scylla-gdb.py:568:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:571:5: E301 expected 1 blank line, found 0 scylla-gdb.py:577:13: E128 continuation line under-indented for visual indent scylla-gdb.py:577:39: E226 missing whitespace around arithmetic operator scylla-gdb.py:583:15: E128 continuation line under-indented for visual indent scylla-gdb.py:596:19: E128 continuation line under-indented for visual indent scylla-gdb.py:609:82: E227 missing whitespace around bitwise or shift operator scylla-gdb.py:609:90: E226 missing whitespace around arithmetic operator scylla-gdb.py:609:113: E226 missing whitespace around arithmetic operator scylla-gdb.py:613:1: E303 too many blank lines (3) scylla-gdb.py:645:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:659:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:671:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:678:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:679:9: E128 continuation line under-indented for visual indent scylla-gdb.py:680:9: E128 continuation line under-indented for visual indent scylla-gdb.py:681:9: E128 continuation line under-indented for visual indent scylla-gdb.py:682:9: E128 continuation line under-indented for visual indent scylla-gdb.py:708:12: E111 indentation is not a multiple of four scylla-gdb.py:721:13: E128 continuation line under-indented for visual indent scylla-gdb.py:723:13: E128 continuation line under-indented for visual indent scylla-gdb.py:725:13: E128 continuation line under-indented for visual indent scylla-gdb.py:727:13: E128 continuation line under-indented for visual indent scylla-gdb.py:729:13: E128 continuation line under-indented for visual indent scylla-gdb.py:748:33: E261 at least two spaces before inline comment scylla-gdb.py:770:17: E306 expected 1 blank line before a nested definition, found 0 scylla-gdb.py:795:17: E128 continuation line under-indented for visual indent scylla-gdb.py:796:17: E128 continuation line under-indented for visual indent scylla-gdb.py:797:17: E128 continuation line under-indented for visual indent scylla-gdb.py:798:17: E128 continuation line under-indented for visual indent scylla-gdb.py:800:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:807:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:814:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:820:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:823:5: E301 expected 1 blank line, found 0 scylla-gdb.py:845:35: E703 statement ends with a semicolon scylla-gdb.py:865:91: E703 statement ends with a semicolon scylla-gdb.py:896:9: F841 local variable 'segment_size' is assigned to but never used scylla-gdb.py:904:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:907:5: E301 expected 1 blank line, found 0 scylla-gdb.py:915:73: E128 continuation line under-indented for visual indent scylla-gdb.py:916:73: E128 continuation line under-indented for visual indent scylla-gdb.py:917:73: E126 continuation line over-indented for hanging indent scylla-gdb.py:922:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:925:5: E301 expected 1 blank line, found 0 scylla-gdb.py:933:13: E128 continuation line under-indented for visual indent scylla-gdb.py:934:13: E128 continuation line under-indented for visual indent scylla-gdb.py:934:49: E251 unexpected spaces around keyword / parameter equals scylla-gdb.py:934:51: E251 unexpected spaces around keyword / parameter equals scylla-gdb.py:934:74: E251 unexpected spaces around keyword / parameter equals scylla-gdb.py:934:76: E251 unexpected spaces around keyword / parameter equals scylla-gdb.py:940:13: E128 continuation line under-indented for visual indent scylla-gdb.py:941:13: E128 continuation line under-indented for visual indent scylla-gdb.py:949:17: E128 continuation line under-indented for visual indent scylla-gdb.py:950:17: E128 continuation line under-indented for visual indent scylla-gdb.py:951:17: E128 continuation line under-indented for visual indent scylla-gdb.py:952:21: E128 continuation line under-indented for visual indent scylla-gdb.py:953:21: E128 continuation line under-indented for visual indent scylla-gdb.py:954:21: E128 continuation line under-indented for visual indent scylla-gdb.py:955:21: E128 continuation line under-indented for visual indent scylla-gdb.py:958:1: E305 expected 2 blank lines after class or function definition, found 1 scylla-gdb.py:958:11: E261 at least two spaces before inline comment scylla-gdb.py:959:1: E302 expected 2 blank lines, found 0 scylla-gdb.py:971:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:989:5: E301 expected 1 blank line, found 0 scylla-gdb.py:993:5: E301 expected 1 blank line, found 0 scylla-gdb.py:995:5: E301 expected 1 blank line, found 0 scylla-gdb.py:997:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1001:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1005:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1029:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1034:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1037:46: E128 continuation line under-indented for visual indent scylla-gdb.py:1057:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1060:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1071:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1076:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1084:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1093:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1096:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1101:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1104:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1116:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1119:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1123:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1126:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1132:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1135:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1138:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1141:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1147:15: E241 multiple spaces after ':' scylla-gdb.py:1148:15: E241 multiple spaces after ':' scylla-gdb.py:1149:15: E241 multiple spaces after ':' scylla-gdb.py:1150:15: E241 multiple spaces after ':' scylla-gdb.py:1151:15: E241 multiple spaces after ':' scylla-gdb.py:1152:15: E241 multiple spaces after ':' scylla-gdb.py:1153:15: E241 multiple spaces after ':' scylla-gdb.py:1154:15: E241 multiple spaces after ':' scylla-gdb.py:1170:20: E221 multiple spaces before operator scylla-gdb.py:1191:40: E226 missing whitespace around arithmetic operator scylla-gdb.py:1191:59: E226 missing whitespace around arithmetic operator scylla-gdb.py:1225:1: E305 expected 2 blank lines after class or function definition, found 1 scylla-gdb.py:1227:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1233:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1236:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1240:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1278:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1281:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1284:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1287:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1293:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1296:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1320:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1323:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1355:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1362:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1383:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1386:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1388:9: E306 expected 1 blank line before a nested definition, found 0 scylla-gdb.py:1397:13: F841 local variable 'selector' is assigned to but never used scylla-gdb.py:1446:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1477:5: E301 expected 1 blank line, found 0 Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104113603.1111-1-ultrabug@gentoo.org>	2018-11-14 19:25:14 +02:00
Alexys Jacob	e58eb6d6ab	idl-compiler.py: coding style fixes idl-compiler.py:22:1: F401 'json' imported but unused idl-compiler.py:23:1: F401 'sys' imported but unused idl-compiler.py:24:1: F401 're' imported but unused idl-compiler.py:25:1: F401 'glob' imported but unused idl-compiler.py:27:1: F401 'os' imported but unused idl-compiler.py:54:1: F811 redefinition of unused 'reindent' from line 33 idl-compiler.py:57:1: E302 expected 2 blank lines, found 1 idl-compiler.py:61:1: E302 expected 2 blank lines, found 1 idl-compiler.py:66:1: E302 expected 2 blank lines, found 1 idl-compiler.py:96:1: E302 expected 2 blank lines, found 1 idl-compiler.py:160:1: E302 expected 2 blank lines, found 1 idl-compiler.py:163:1: E302 expected 2 blank lines, found 1 idl-compiler.py:166:1: E302 expected 2 blank lines, found 1 idl-compiler.py:172:1: E302 expected 2 blank lines, found 1 idl-compiler.py:176:1: E302 expected 2 blank lines, found 1 idl-compiler.py:176:47: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:176:49: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:191:24: E203 whitespace before ':' idl-compiler.py:191:43: E203 whitespace before ':' idl-compiler.py:191:67: E203 whitespace before ':' idl-compiler.py:191:84: E202 whitespace before '}' idl-compiler.py:195:1: E302 expected 2 blank lines, found 1 idl-compiler.py:195:45: E203 whitespace before ',' idl-compiler.py:195:69: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:195:71: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:198:28: E225 missing whitespace around operator idl-compiler.py:198:40: E225 missing whitespace around operator idl-compiler.py:198:43: E272 multiple spaces before keyword idl-compiler.py:212:25: E203 whitespace before ':' idl-compiler.py:212:45: E203 whitespace before ':' idl-compiler.py:212:100: E203 whitespace before ':' idl-compiler.py:218:1: E302 expected 2 blank lines, found 1 idl-compiler.py:225:1: E302 expected 2 blank lines, found 1 idl-compiler.py:226:11: E271 multiple spaces after keyword idl-compiler.py:228:1: E302 expected 2 blank lines, found 1 idl-compiler.py:235:1: E302 expected 2 blank lines, found 1 idl-compiler.py:238:1: E302 expected 2 blank lines, found 1 idl-compiler.py:241:5: E722 do not use bare 'except' idl-compiler.py:243:1: E305 expected 2 blank lines after class or function definition, found 0 idl-compiler.py:245:1: E302 expected 2 blank lines, found 1 idl-compiler.py:250:25: E231 missing whitespace after ',' idl-compiler.py:253:1: E302 expected 2 blank lines, found 1 idl-compiler.py:256:1: E302 expected 2 blank lines, found 1 idl-compiler.py:263:1: E302 expected 2 blank lines, found 1 idl-compiler.py:266:1: E302 expected 2 blank lines, found 1 idl-compiler.py:267:75: E225 missing whitespace around operator idl-compiler.py:269:1: E302 expected 2 blank lines, found 1 idl-compiler.py:272:1: E302 expected 2 blank lines, found 1 idl-compiler.py:275:1: E302 expected 2 blank lines, found 1 idl-compiler.py:278:1: E305 expected 2 blank lines after class or function definition, found 1 idl-compiler.py:280:1: E302 expected 2 blank lines, found 1 idl-compiler.py:283:1: E302 expected 2 blank lines, found 1 idl-compiler.py:286:1: E302 expected 2 blank lines, found 1 idl-compiler.py:288:1: E302 expected 2 blank lines, found 0 idl-compiler.py:293:1: E302 expected 2 blank lines, found 1 idl-compiler.py:294:20: E203 whitespace before ':' idl-compiler.py:294:22: E241 multiple spaces after ':' idl-compiler.py:294:51: E203 whitespace before ':' idl-compiler.py:294:55: E202 whitespace before '}' idl-compiler.py:296:1: E302 expected 2 blank lines, found 1 idl-compiler.py:298:23: E203 whitespace before ':' idl-compiler.py:300:1: E305 expected 2 blank lines after class or function definition, found 1 idl-compiler.py:301:1: E302 expected 2 blank lines, found 0 idl-compiler.py:304:1: E302 expected 2 blank lines, found 1 idl-compiler.py:304:45: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:304:47: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:311:67: E202 whitespace before '}' idl-compiler.py:314:74: E241 multiple spaces after ':' idl-compiler.py:316:114: E241 multiple spaces after ':' idl-compiler.py:316:129: E203 whitespace before ':' idl-compiler.py:326:1: E302 expected 2 blank lines, found 1 idl-compiler.py:328:27: E231 missing whitespace after ',' idl-compiler.py:328:34: E225 missing whitespace around operator idl-compiler.py:330:1: E302 expected 2 blank lines, found 1 idl-compiler.py:332:5: F841 local variable 'typ' is assigned to but never used idl-compiler.py:348:63: E202 whitespace before '}' idl-compiler.py:352:1: E302 expected 2 blank lines, found 1 idl-compiler.py:353:21: E231 missing whitespace after ',' idl-compiler.py:368:30: E203 whitespace before ':' idl-compiler.py:374:30: E203 whitespace before ':' idl-compiler.py:411:57: E203 whitespace before ':' idl-compiler.py:413:1: E302 expected 2 blank lines, found 1 idl-compiler.py:413:64: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:66: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:80: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:82: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:98: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:100: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:415:51: E225 missing whitespace around operator idl-compiler.py:417:57: E225 missing whitespace around operator idl-compiler.py:448:1: E302 expected 2 blank lines, found 1 idl-compiler.py:448:60: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:62: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:76: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:78: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:94: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:96: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:451:51: E225 missing whitespace around operator idl-compiler.py:453:57: E225 missing whitespace around operator idl-compiler.py:455:30: E231 missing whitespace after ',' idl-compiler.py:477:1: E302 expected 2 blank lines, found 1 idl-compiler.py:477:48: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:477:50: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:477:67: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:477:69: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:484:24: E222 multiple spaces after operator idl-compiler.py:488:74: E203 whitespace before ':' idl-compiler.py:498:20: E222 multiple spaces after operator idl-compiler.py:507:68: E203 whitespace before ':' idl-compiler.py:507:88: E203 whitespace before ':' idl-compiler.py:514:87: E231 missing whitespace after ',' idl-compiler.py:520:14: E211 whitespace before '(' idl-compiler.py:521:15: E703 statement ends with a semicolon idl-compiler.py:523:1: E302 expected 2 blank lines, found 1 idl-compiler.py:540:47: E231 missing whitespace after ':' idl-compiler.py:542:1: E302 expected 2 blank lines, found 1 idl-compiler.py:542:47: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:542:49: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:542:69: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:542:71: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:547:24: E222 multiple spaces after operator idl-compiler.py:553:47: E231 missing whitespace after ':' idl-compiler.py:558:43: E231 missing whitespace after ':' idl-compiler.py:560:1: E302 expected 2 blank lines, found 1 idl-compiler.py:564:1: E302 expected 2 blank lines, found 1 idl-compiler.py:564:82: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:564:84: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:564:105: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:564:107: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:573:21: E222 multiple spaces after operator idl-compiler.py:576:25: E222 multiple spaces after operator idl-compiler.py:577:13: F841 local variable 'sate' is assigned to but never used idl-compiler.py:584:66: E203 whitespace before ':' idl-compiler.py:589:66: E203 whitespace before ':' idl-compiler.py:589:89: E203 whitespace before ':' idl-compiler.py:589:113: E203 whitespace before ':' idl-compiler.py:600:48: E203 whitespace before ':' idl-compiler.py:600:68: E203 whitespace before ':' idl-compiler.py:602:1: E302 expected 2 blank lines, found 1 idl-compiler.py:602:1: F811 redefinition of unused 'add_vector_node' from line 330 idl-compiler.py:604:38: E231 missing whitespace after ',' idl-compiler.py:604:59: E202 whitespace before ')' idl-compiler.py:607:1: E305 expected 2 blank lines after class or function definition, found 1 idl-compiler.py:609:1: E302 expected 2 blank lines, found 1 idl-compiler.py:615:39: E231 missing whitespace after ',' idl-compiler.py:622:1: E302 expected 2 blank lines, found 1 idl-compiler.py:630:46: E203 whitespace before ':' idl-compiler.py:637:33: E231 missing whitespace after ':' idl-compiler.py:640:90: E203 whitespace before ':' idl-compiler.py:641:13: F841 local variable 'vr' is assigned to but never used idl-compiler.py:642:1: E305 expected 2 blank lines after class or function definition, found 0 idl-compiler.py:644:1: E302 expected 2 blank lines, found 1 idl-compiler.py:657:1: E302 expected 2 blank lines, found 1 idl-compiler.py:657:51: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:657:53: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:657:67: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:657:69: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:660:5: E265 block comment should start with '# ' idl-compiler.py:679:16: E272 multiple spaces before keyword idl-compiler.py:692:56: E271 multiple spaces after keyword idl-compiler.py:695:5: F841 local variable 'is_param_vector' is assigned to but never used idl-compiler.py:699:1: E302 expected 2 blank lines, found 1 idl-compiler.py:699:56: E202 whitespace before ')' idl-compiler.py:711:1: E302 expected 2 blank lines, found 1 idl-compiler.py:719:26: E201 whitespace after '{' idl-compiler.py:730:39: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:730:41: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:733:1: E302 expected 2 blank lines, found 1 idl-compiler.py:735:21: E225 missing whitespace around operator idl-compiler.py:738:1: E302 expected 2 blank lines, found 1 idl-compiler.py:747:1: E305 expected 2 blank lines after class or function definition, found 1 idl-compiler.py:749:1: E302 expected 2 blank lines, found 1 idl-compiler.py:767:17: E211 whitespace before '(' idl-compiler.py:767:26: E203 whitespace before ':' idl-compiler.py:770:5: E303 too many blank lines (2) idl-compiler.py:777:20: E211 whitespace before '(' idl-compiler.py:777:29: E203 whitespace before ':' idl-compiler.py:783:28: E203 whitespace before ':' idl-compiler.py:783:44: E203 whitespace before ':' idl-compiler.py:783:82: E203 whitespace before ':' idl-compiler.py:786:1: E302 expected 2 blank lines, found 1 idl-compiler.py:794:28: E203 whitespace before ':' idl-compiler.py:802:33: E203 whitespace before ':' idl-compiler.py:815:21: E126 continuation line over-indented for hanging indent idl-compiler.py:815:28: E203 whitespace before ':' idl-compiler.py:815:50: E203 whitespace before ':' idl-compiler.py:817:82: E203 whitespace before ':' idl-compiler.py:817:104: E203 whitespace before ':' idl-compiler.py:827:33: E203 whitespace before ':' idl-compiler.py:827:48: E203 whitespace before ':' idl-compiler.py:827:68: E203 whitespace before ':' idl-compiler.py:827:84: E203 whitespace before ':' idl-compiler.py:827:100: E203 whitespace before ':' idl-compiler.py:859:24: E203 whitespace before ':' idl-compiler.py:859:58: E203 whitespace before ':' idl-compiler.py:859:78: E203 whitespace before ':' idl-compiler.py:861:1: E302 expected 2 blank lines, found 1 idl-compiler.py:865:1: E302 expected 2 blank lines, found 1 idl-compiler.py:876:1: E302 expected 2 blank lines, found 1 idl-compiler.py:876:71: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:876:73: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:883:21: E222 multiple spaces after operator idl-compiler.py:884:28: E225 missing whitespace around operator idl-compiler.py:884:46: E225 missing whitespace around operator idl-compiler.py:884:49: E272 multiple spaces before keyword idl-compiler.py:904:86: E203 whitespace before ':' idl-compiler.py:904:107: E203 whitespace before ':' idl-compiler.py:906:81: E203 whitespace before ':' idl-compiler.py:906:106: E203 whitespace before ':' idl-compiler.py:906:124: E203 whitespace before ':' idl-compiler.py:906:143: E203 whitespace before ':' idl-compiler.py:911:49: E203 whitespace before ':' idl-compiler.py:911:69: E203 whitespace before ':' idl-compiler.py:911:93: E203 whitespace before ':' idl-compiler.py:918:85: E203 whitespace before ':' idl-compiler.py:918:108: E203 whitespace before ':' idl-compiler.py:918:151: E203 whitespace before ':' idl-compiler.py:922:62: E203 whitespace before ':' idl-compiler.py:922:90: E203 whitespace before ':' idl-compiler.py:925:82: E203 whitespace before ':' idl-compiler.py:925:110: E203 whitespace before ':' idl-compiler.py:940:70: E203 whitespace before ':' idl-compiler.py:940:128: E203 whitespace before ':' idl-compiler.py:942:110: E203 whitespace before ':' idl-compiler.py:942:168: E203 whitespace before ':' idl-compiler.py:948:25: E203 whitespace before ':' idl-compiler.py:948:75: E203 whitespace before ':' idl-compiler.py:954:78: E203 whitespace before ':' idl-compiler.py:954:101: E203 whitespace before ':' idl-compiler.py:954:144: E203 whitespace before ':' idl-compiler.py:957:62: E203 whitespace before ':' idl-compiler.py:957:90: E203 whitespace before ':' idl-compiler.py:969:13: E271 multiple spaces after keyword idl-compiler.py:971:13: E271 multiple spaces after keyword idl-compiler.py:976:1: E302 expected 2 blank lines, found 1 idl-compiler.py:987:1: E302 expected 2 blank lines, found 1 idl-compiler.py:1016:1: E302 expected 2 blank lines, found 1 idl-compiler.py:1023:42: E225 missing whitespace around operator idl-compiler.py:1024:79: E225 missing whitespace around operator idl-compiler.py:1027:1: E305 expected 2 blank lines after class or function definition, found 0 Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104112308.19409-1-ultrabug@gentoo.org>	2018-11-14 19:25:13 +02:00
Alexys Jacob	0cf480aad0	gen_segmented_compress_params.py: coding style fixes gen_segmented_compress_params.py:52:47: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:56:64: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:60:36: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:60:48: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:70:35: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:70:48: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:99:43: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:106:18: E225 missing whitespace around operator gen_segmented_compress_params.py:120:5: E303 too many blank lines (2) gen_segmented_compress_params.py:200:30: E261 at least two spaces before inline comment gen_segmented_compress_params.py:200:31: E262 inline comment should start with '# ' gen_segmented_compress_params.py:218:76: E261 at least two spaces before inline comment gen_segmented_compress_params.py:219:59: E703 statement ends with a semicolon gen_segmented_compress_params.py:219:60: E261 at least two spaces before inline comment Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104115753.4701-1-ultrabug@gentoo.org>	2018-11-14 19:25:12 +02:00
Alexys Jacob	43a04ad693	fix_system_distributed_tables.py: coding style fixes fix_system_distributed_tables.py:28:20: E203 whitespace before ':' fix_system_distributed_tables.py:29:20: E203 whitespace before ':' fix_system_distributed_tables.py:30:20: E203 whitespace before ':' fix_system_distributed_tables.py:31:20: E203 whitespace before ':' fix_system_distributed_tables.py:33:20: E203 whitespace before ':' fix_system_distributed_tables.py:34:23: E203 whitespace before ':' fix_system_distributed_tables.py:35:23: E203 whitespace before ':' fix_system_distributed_tables.py:39:20: E203 whitespace before ':' fix_system_distributed_tables.py:40:20: E203 whitespace before ':' fix_system_distributed_tables.py:41:20: E203 whitespace before ':' fix_system_distributed_tables.py:42:20: E203 whitespace before ':' fix_system_distributed_tables.py:43:20: E203 whitespace before ':' fix_system_distributed_tables.py:44:20: E203 whitespace before ':' fix_system_distributed_tables.py:45:20: E203 whitespace before ':' fix_system_distributed_tables.py:46:20: E203 whitespace before ':' fix_system_distributed_tables.py:47:20: E203 whitespace before ':' fix_system_distributed_tables.py:48:20: E203 whitespace before ':' fix_system_distributed_tables.py:52:20: E203 whitespace before ':' fix_system_distributed_tables.py:53:20: E203 whitespace before ':' fix_system_distributed_tables.py:54:20: E203 whitespace before ':' fix_system_distributed_tables.py:55:20: E203 whitespace before ':' fix_system_distributed_tables.py:56:20: E203 whitespace before ':' fix_system_distributed_tables.py:57:20: E203 whitespace before ':' fix_system_distributed_tables.py:58:20: E203 whitespace before ':' fix_system_distributed_tables.py:59:20: E203 whitespace before ':' fix_system_distributed_tables.py:60:20: E203 whitespace before ':' fix_system_distributed_tables.py:61:20: E203 whitespace before ':' fix_system_distributed_tables.py:62:20: E203 whitespace before ':' fix_system_distributed_tables.py:66:19: E203 whitespace before ':' fix_system_distributed_tables.py:67:19: E203 whitespace before ':' fix_system_distributed_tables.py:72:19: E203 whitespace before ':' fix_system_distributed_tables.py:73:19: E203 whitespace before ':' fix_system_distributed_tables.py:74:19: E203 whitespace before ':' fix_system_distributed_tables.py:78:19: E203 whitespace before ':' fix_system_distributed_tables.py:79:19: E203 whitespace before ':' fix_system_distributed_tables.py:80:19: E203 whitespace before ':' fix_system_distributed_tables.py:84:19: E203 whitespace before ':' fix_system_distributed_tables.py:85:19: E203 whitespace before ':' fix_system_distributed_tables.py:89:19: E203 whitespace before ':' fix_system_distributed_tables.py:90:19: E203 whitespace before ':' fix_system_distributed_tables.py:91:19: E203 whitespace before ':' fix_system_distributed_tables.py:95:22: E203 whitespace before ':' fix_system_distributed_tables.py:96:22: E203 whitespace before ':' fix_system_distributed_tables.py:99:1: E302 expected 2 blank lines, found 0 fix_system_distributed_tables.py:103:72: E201 whitespace after '[' fix_system_distributed_tables.py:103:82: E202 whitespace before ']' fix_system_distributed_tables.py:105:43: E201 whitespace after '[' fix_system_distributed_tables.py:105:53: E202 whitespace before ']' fix_system_distributed_tables.py:111:16: E713 test for membership should be 'not in' fix_system_distributed_tables.py:118:20: E713 test for membership should be 'not in' fix_system_distributed_tables.py:135:25: E722 do not use bare 'except' fix_system_distributed_tables.py:138:5: E722 do not use bare 'except' fix_system_distributed_tables.py:144:1: E305 expected 2 blank lines after class or function definition, found 0 fix_system_distributed_tables.py:145:47: E251 unexpected spaces around keyword / parameter equals fix_system_distributed_tables.py:145:49: E251 unexpected spaces around keyword / parameter equals fix_system_distributed_tables.py:160:1: W391 blank line at end of file Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104113001.22783-1-ultrabug@gentoo.org>	2018-11-14 19:25:12 +02:00
Alexys Jacob	c9e3b739ae	dist/docker/redhat/scyllasetup.py: coding style fixes dist/docker/redhat/scyllasetup.py:6:1: E302 expected 2 blank lines, found 1 dist/docker/redhat/scyllasetup.py:41:21: E128 continuation line under-indented for visual indent dist/docker/redhat/scyllasetup.py:65:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:65:51: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:67:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:67:45: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:69:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:69:42: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:79:18: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:79:42: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:80:39: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:81:70: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:84:48: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:84:70: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:86:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:86:53: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:86:78: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:89:42: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:89:58: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:92:44: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:92:63: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:95:41: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:95:57: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:98:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:98:42: E202 whitespace before ']' Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104110913.13796-1-ultrabug@gentoo.org>	2018-11-14 19:25:11 +02:00
Alexys Jacob	1585983fc9	dist/docker/redhat: coding style fixes dist/docker/redhat/docker-entrypoint.py:20:1: E722 do not use bare 'except' dist/docker/redhat/commandlineparser.py:13:13: E128 continuation line under-indented for visual indent Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104120134.9598-1-ultrabug@gentoo.org>	2018-11-14 19:25:10 +02:00
Alexys Jacob	c24e0e5599	dist/common/scripts/scylla_util.py: coding style fixes dist/common/scripts/scylla_util.py:388:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:414:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:418:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:453:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:468:5: E722 do not use bare 'except' dist/common/scripts/scylla_util.py:472:1: E302 expected 2 blank lines, found 1 Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104120832.11273-1-ultrabug@gentoo.org>	2018-11-14 19:25:09 +02:00
Vladimir Krivopalov	2c21fb4897	Use coloured tests results in test.py script output. With the number of unit tests approaching one hundred, the output of test.py becomes more challenging to read. If some test fails, we will only get the details after all the tests complete, but some tests take way longer than others. With the coloured status, it is much simpler to immediately locate failing tests. Developer can cancel others and repeat the failing ones. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <63a99a2fb70fdc33fd6eeb8e18fee977a47bd278.1541541184.git.vladimir@scylladb.com>	2018-11-14 19:23:39 +02:00
Piotr Sarna	b04508041d	tests: add CONTAINS test case to filtering tests	2018-11-14 16:08:19 +01:00
Piotr Sarna	0fc7d63842	cql3: enable filtering for CONTAINS restriction With contains::is_satisfied_by(bytes_view) implemented, it's possible to enable filtering support for CONTAINS restriction. Fixes #3573	2018-11-14 14:39:21 +01:00
Piotr Sarna	d8a1693d84	cql3: add is_satisfied_by(bytes_view) for CONTAINS is_satisfied_by that takes a bytes_view parameter is needed for filtering, so it's provided for CONTAINS restriction.	2018-11-14 14:39:21 +01:00
Botond Dénes	9e4276669b	flat_mutation_reader: document next_partition() Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <01fa57c7473c00e4dc891527a8628026b6dccc01.1542180913.git.bdenes@scylladb.com>	2018-11-14 13:38:38 +00:00
Avi Kivity	447f953a2c	Merge "Add DEFAULT UNSET support to JSON" from Piotr " This series adds DEFAULT UNSET and DEFAULT NULL keyword support to INSERT JSON statement, as stated in #3909. Tests: unit (release) " * 'add_json_default_unset_2' of https://github.com/psarna/scylla: tests: add DEFAULT UNSET case to JSON cql tests tests: split JSON part of cql query test cql3: add DEFAULT UNSET to INSERT JSON	2018-11-13 09:14:50 -08:00
Piotr Sarna	fc4ecf9be4	tests: add DEFAULT UNSET case to JSON cql tests A case covering DEFAULT UNSET/DEFAULT NULL params is added to json cql query test suite. Refs #3909	2018-11-13 18:06:15 +01:00
Piotr Sarna	cb6fd6a30d	tests: split JSON part of cql query test JSON part of cql query test is split into another file to make cql_query_test.cc less huge.	2018-11-13 18:06:15 +01:00
Piotr Sarna	e153e590c1	cql3: add DEFAULT UNSET to INSERT JSON When inserting a JSON, additional DEFAULT UNSET or DEFAULT NULL keywords can be appended. With DEFAULT UNSET, values omitted in JSON will not be changed at all. With DEFAULT NULL (default), omitted values will be treated as having a 'null' value. Fixes #3909	2018-11-13 18:05:55 +01:00
Avi Kivity	a089f66755	Merge "ec2_multi_region_snitch: print a proper error message when a Public IP is not available" from Vlad " Fix for #3897 "Ec2MultiRegionSnitch: prints a cryptic error when a Public IP is not available" Ec2MultiRegionSnitch naturally requires a Public IP to be available and therefore it's expected to refuse to work without it. However the error message that is printed today is a total disaster and has to be fixed ASAP to be something much more human readable. This series adds a human readable preabmle that will let a poor user understand what should he/she do. " * 'improve-ec2-multi-region-snitch-error-message-when-pulic-address-is-not-available-v2' of https://github.com/vladzcloudius/scylla: locator: ec2_multi_region_snitch::start(): print a human readable error if Public IP may not be retrieved locator: ec2_multi_region_snitch::start(): rework on top of seastar::thread	2018-11-13 09:02:55 -08:00
Duarte Nunes	a38f6078fb	Merge 'Generating view updates during streaming' from Piotr During streaming, there are cases when we should invoke the view write path. In particular, if we're streaming because of repair or if a view has not yet finished building and we're bootstrapping a new node. The design constraints are: 1) The streamed writes should be visible to new writes, but the sstable should not participate in compaction, or we would lose the ability to exclude the streamed writes on a restart; 2) The streamed writes must not be considered when generating view updates for them; 3) Resilient to node restarts; 4) Resilient to concurrent stream sessions, possibly streaming mutations for overlapping ranges. We achieve this by writing the streamed writes to an sstable in a different folder, call it "staging". We achieve 1) by publishing the sstable to the column family sstable set, but excluding it from compactions. We do these steps upon boot, by looking at the staging directory, thus achieving 3). Fixes #3275 * 'streaming_view_to_staging_sstables_9' of https://github.com/psarna/scylla: (29 commits) tests: add materialized views test tests: add view update generator to cql test env main: add registering staging sstables read from disk database: add a check if loaded sstable is already staging database: add get_staging_sstable method streaming: stream tables with views through staging sstables streaming: add system distributed keyspace ref to streaming streaming: add view update generator reference to streaming main: add generating missed mv updates from staging sstables storage_service: move initializing sys_dist_ks before bootstrap db/view: add view_update_from_staging_generator service db/view: add view updating consumer table: add stream_view_replica_updates table: split push_view_replica_updates table: add as_mutation_source_excluding table: move push_view_replica_updates to table.cc database: add populating tables with staging sstables database: add creating /staging directory for sstables database: add sstable-excluding reader table: add move_sstable_from_staging_in_thread function ...	2018-11-13 15:16:31 +00:00
Piotr Sarna	1724ee55c7	tests: add materialized views test Right now materialized_views_test.cc contains view updating tests, but the intention is to move mv-related tests from cql_query_test here and use it for all future unit testing of MV.	2018-11-13 15:21:55 +01:00
Piotr Sarna	056a78bbc7	tests: add view update generator to cql test env Keeping view update generator in cql test env enables generating updates from staging sstables in tests.	2018-11-13 15:04:43 +01:00
Piotr Sarna	16c042039c	main: add registering staging sstables read from disk Staging sstables read from disk are registered to the view update generator right after initializing non system keyspaces. Fixes #3275	2018-11-13 15:04:43 +01:00
Piotr Sarna	de43b4f41d	database: add a check if loaded sstable is already staging Staging sstables are loaded before regular ones. If the process fails midway, an sstable can be linked both in the regular directory and in staging directory. In such cases, the sstable remains in staging and will be moved to the regular directory by view update streamer service.	2018-11-13 15:04:43 +01:00
Piotr Sarna	d7849e6ea4	database: add get_staging_sstable method This method can be used to check if sstable is staging, i.e. it shouldn't be compacted and it will not be used for generating view updates from other staging tables, and return proper shared_sstable pointer if it is.	2018-11-13 15:04:43 +01:00
Piotr Sarna	32c0fe8df2	streaming: stream tables with views through staging sstables While streaming to a table with paired views, staging sstables are used. After the table is written to disk, it's used to generate all required view updates. It's also resistant to restarts as it's stored on a hard drive in staging/ directory. Refs #3275	2018-11-13 15:04:42 +01:00
Piotr Sarna	dc74887ff3	streaming: add system distributed keyspace ref to streaming Streaming code needs system distributed keyspace to check if streamed sstables should be staging, so a proper reference is added.	2018-11-13 15:01:53 +01:00
Piotr Sarna	7ef5e1b685	streaming: add view update generator reference to streaming Streaming code may need view update generator service to generate and send view updates, so a proper reference is added.	2018-11-13 15:01:53 +01:00
Piotr Sarna	eb0c507a45	main: add generating missed mv updates from staging sstables If any sstables are found in the staging directory, it means that they missed generating view updates, so it's performed now.	2018-11-13 15:01:53 +01:00
Piotr Sarna	ca5dfdffc6	storage_service: move initializing sys_dist_ks before bootstrap Bootstrapping process may need system distributed keyspace to generate view updates, so initializing sys_dist_ks is moved before the bootstrapping process is launched.	2018-11-13 15:01:53 +01:00
Piotr Sarna	fc7267c797	db/view: add view_update_from_staging_generator service A shardable service for generating mv updates after restarts is added.	2018-11-13 15:01:52 +01:00
Piotr Sarna	ed05d91adc	db/view: add view updating consumer This consumer is used to generate and push view replica updates from read mutations.	2018-11-13 14:54:39 +01:00
Piotr Sarna	348fa3b092	table: add stream_view_replica_updates Generating view replica updates during streaming ignores the staging sstable that is used to generate them.	2018-11-13 14:52:22 +01:00
Piotr Sarna	fed9c59eb8	table: split push_view_replica_updates push_view_replica_updates is split in order to allow different mutation source to be provided.	2018-11-13 14:52:22 +01:00
Piotr Sarna	466d780445	table: add as_mutation_source_excluding A variant of table::as_mutation_source that allows excluding a single sstable is added.	2018-11-13 14:52:22 +01:00
Piotr Sarna	c825a17b9d	table: move push_view_replica_updates to table.cc	2018-11-13 14:52:22 +01:00
Piotr Sarna	a17fcb8d94	database: add populating tables with staging sstables After populating tables with regular sstables, same procedure is performed for staging sstables.	2018-11-13 14:52:22 +01:00
Piotr Sarna	19bf94fa8f	database: add creating /staging directory for sstables staging directory is now created on boot.	2018-11-13 14:52:22 +01:00
Piotr Sarna	e88b85134c	database: add sstable-excluding reader When generating view updates from a staging sstable, this sstable should not be used in the process. Hence, a reader that skips a single sstable is added.	2018-11-13 14:52:22 +01:00
Avi Kivity	a8203ca799	Update seastar submodule * seastar c02150e...a44cedf (5): > build: link against libatomic > dns.cc: Include name/address in resolver error messages > log: Print full error message for std::system_error > tests: test-utils: Add missing include > fstream: Introduce make_file_data_sink() Fixes #3894.	2018-11-13 03:28:16 -08:00
Piotr Sarna	160a6d58d2	table: add move_sstable_from_staging_in_thread function After materialized view updates are generated, the sstable should be moved from staging/ to a regular directory. It's expected to be called from seastar::async thread context.	2018-11-13 11:45:30 +01:00
Piotr Sarna	ff361ca877	sstables: add move_to_new_dir_in_thread function When moving sstables between directories, this helper function will create links and update generation and dir accordingly. It's expected to be called in thread context.	2018-11-13 11:45:30 +01:00
Piotr Sarna	b7977f4790	sstables: add staging directory to regex datadir/staging directory becomes a valid path for an sstable.	2018-11-13 11:45:30 +01:00
Piotr Sarna	e42d97060f	database: provide nonfrozen version of push_view_replica_updates Now it's also possible to pass a mutation to push to view replicas.	2018-11-13 11:45:30 +01:00
Piotr Sarna	642c3ae0e0	database: add subdir param to make_streaming_sstable_for_write This function allows specifying a subfolder to put a newly created sstable in - e.g. staging/ subfolder for streamed base table mutations.	2018-11-13 11:45:30 +01:00
Piotr Sarna	788e03433c	table: init table.cc file This file will be used to move table-related functions to it.	2018-11-13 11:45:30 +01:00
Piotr Sarna	8e053f9efb	database: add staging sstables to a map SSTables that belong to staging/ directory are put in the _sstables_staging map.	2018-11-13 11:45:30 +01:00
Piotr Sarna	3970808294	sstables: add is_staging() method This method returns true if the last part of directory structure is /staging.	2018-11-13 11:45:30 +01:00
Piotr Sarna	3f34312aa6	database: skip staging sstables in compaction Staging sstables are not part of the compaction process to ensure than each sstable can be easily excluded from view generation process that depends on the mentioned sstable.	2018-11-13 11:45:30 +01:00
Piotr Sarna	701d88e39f	database: add staging sstables map In order to keep track of staging sstables (used for mv updates), a map of them is now kept in table class.	2018-11-13 11:45:30 +01:00
Paweł Dziepak	6469a1b451	Merge "Write static rows for all partitions if there are static columns" from Vladimir " It appears that in case when there are any static columns in serialization header, Cassandra would write a (possibly empty) static row to every partition in the SSTables file. This patchset alings Scylla's logic with that of Cassandra. Note that Scylla optimizes the case when no partition contains a static row because it keeps track of updated columns that Scylla currently does not do - see #3901 for details. Fixes #3900. " * 'projects/sstables-30/write-all-static-rows/v1' of https://github.com/argenet/scylla: tests: Test writing empty static rows for partitions in tables with static columns. sstables: Ignore empty static rows on reading. sstables: Write empty static rows when there are static columns in the table.	2018-11-09 12:01:25 -08:00
Raphael S. Carvalho	1c5934c934	sstables: fix procedure to get fully expired sstables with MC format MC format lacks ancestors metadata, so we need to workaround it by using ancestors in metadata collector, which is only available for a sstable written during this instance. It works fine here because we only want to know if a sstable recently compacted has an ancestor which wasn't yet deleted. Fixes #3852. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <20181102154951.22950-1-raphaelsc@scylladb.com>	2018-11-06 09:28:37 +02:00
Vladimir Krivopalov	69b453fb69	tests: Test writing empty static rows for partitions in tables with static columns. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-05 13:47:30 -08:00
Vladimir Krivopalov	f767dfbb33	sstables: Ignore empty static rows on reading. Fixes #3900. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-05 13:47:30 -08:00
Vladimir Krivopalov	89051d37e3	sstables: Write empty static rows when there are static columns in the table. This is consistent with what Cassandra does. Fixes #3900. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-05 13:28:50 -08:00
Vladimir Krivopalov	2ebab69ce7	mutation_source_test: Use counter and collection columns in static rows. They are legal and should be covered along with atomic columns. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <a1c0e0f8c0c0f12b68af6df426370511f4e1253b.1541106233.git.vladimir@scylladb.com> [tgrabiec: fixed the patch title]	2018-11-02 10:33:27 +01:00
Vlad Zolotarov	2636395c65	locator: ec2_multi_region_snitch::start(): print a human readable error if Public IP may not be retrieved Public IP is required for Ec2MultiRegionSnitch. If it's not available different snitch should be used. This patch would result in a readable error message to be printed instead of just a cryptic message with HTTP response body. Fixes #3897 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-11-01 11:50:58 -04:00
Vlad Zolotarov	c462af5549	locator: ec2_multi_region_snitch::start(): rework on top of seastar::thread Rework ec2_multi_region_snitch::start() on top of seastar::async() in order to simplify the code. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-11-01 10:48:37 -04:00
Paweł Dziepak	1129134a4a	Merge "Convert sprint() calls to fmt" from Avi " The update to libfmt 5.2.1 brought with it a subtle change - calls to sprint("%s", 3) now throw a format_error instead of returning "3". To prevent such hidden (or not so hidden) bugs from lurking, convert all calls to the modern fmt syntax. Such conversion has several benefits: - prevent the bug from biting us - as fmt is being standardized, we can later move to std::format() - commonality with the logger format syntax (indeed, we may move the logger to use libfmt itself) During the conversion, some bugs were caught and fixed. These are presented in individual patches in the patchset. Most of the conversion was scripted, using https://github.com/avikivity/unsprint. Some sprint() calls remain, as they were too complex for the script. They will be converted later. " * tag 'fmt-1/v1' of https://github.com/avikivity/scylla: toplevel: convert sprint() to format() repair: convert sprint() to format() tests: convert sprint() to format() tracing: convert sprint() to format() service: convert sprint() to format() exceptions: convert sprint() to format() index: convert sprint() to format() streaming: convert sprint() to format() streaming: progress_info: fix format string api: convert sprint() to format() dht: convert sprint() to format() thrift: convert sprint() to format() locator: convert sprint() to format() gms: convert sprint() to format() db: convert sprint() to format() transport: convert sprint() to format() utils: convert sprint() to format() sstables: convert sprint() to format() auth: convert sprint() to format() cql3: convert sprint() to format() row_cache: fix bad format string syntax repair: fix bad format string syntax tests: fix bad format string syntax dht: fix bad format string syntax sstables: fix bad format string syntax utils: estimated_histogram: convert generated format strings to fmt tests: perf_fast_forward: rename "format" variable tests: perf_fast_forward: massage result of sprint() into std::string utils: i_filter: rename "format" variable system_keyspace: simplify complicated sprint() cql: convert Cql.g sprint()s to fmt types: get rid of PRId64 formatting	2018-11-01 13:16:17 +00:00
Avi Kivity	a71ab365e3	toplevel: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	51ce53738f	repair: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	f70ece9f88	tests: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	239ecec043	tracing: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	bb0eb9dae8	service: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	71fc5fb738	exceptions: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	7ae23d8f9b	index: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	fd513c42ad	streaming: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	8501e2a45d	streaming: progress_info: fix format string We try to escape % as \%, but the correct escape is %%.	2018-11-01 13:16:17 +00:00
Avi Kivity	da17c29bd3	api: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	82818758ca	dht: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	7a125c6634	thrift: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	0c33d13165	locator: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	e096fa2fde	gms: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	d77e044cde	db: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	5f79ff0f54	transport: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	be99101f36	utils: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	455f00e993	sstables: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	eb74fe784d	auth: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	cb7ee5c765	cql3: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	8cca3b2879	row_cache: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	6488b017c3	repair: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	bceff1550c	tests: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	7ff5569ee8	dht: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	738e713edf	sstables: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	3cf434b863	utils: estimated_histogram: convert generated format strings to fmt Convert printf games to format games. Note that fmt supports specifying the field width as an argument, but that is left to a dedicated change.	2018-11-01 13:16:17 +00:00
Avi Kivity	8ca4b7abea	tests: perf_fast_forward: rename "format" variable The format local variable will soon alias with the format function which we intend to use in the same context. Rename it away to avoid a clash.	2018-11-01 13:16:17 +00:00
Avi Kivity	7908f09148	tests: perf_fast_forward: massage result of sprint() into std::string sprint() returns std::string(), but the new format() returns an sstring. Usually an sstring is wanted but in this case an sstring will fail as it is added to an std::string. Fix the failure (after spring->format conversion) by converting to an std::string.	2018-11-01 13:16:17 +00:00
Avi Kivity	7726ce23b7	utils: i_filter: rename "format" variable The format variable hides the format function, which we'll soon want to use here. Rename the format variable to unhide the function.	2018-11-01 13:16:17 +00:00
Avi Kivity	04b70a2ff8	system_keyspace: simplify complicated sprint() update_peer_info() uses two sprint()s where one would do, which confuses the sprint-to-fmt translator. Simplify the code by using just one call.	2018-11-01 13:16:17 +00:00
Avi Kivity	23e05a045b	cql: convert Cql.g sprint()s to fmt The only sprint() call had an extra complication due to quoting, which can be removed now.	2018-11-01 13:16:16 +00:00
Avi Kivity	8db8c01fbe	types: get rid of PRId64 formatting It's not needed for out sprint() implementation, and gets in the way of converting all formatting to fmt.	2018-11-01 13:16:16 +00:00
Avi Kivity	f170e3e589	Merge "dist: use perftune.py for disks tuning" from Vlad " Use perftune.py for tuning disks: - Distribute/pin disks' IRQs: - For NVMe drives: evenly among all present CPUs. - For non-NVMe drives: according to chosen tuning mode. - For all disks used by scylla: - Tune nomerges - Tune I/O scheduler. It's important to tune NIC and disks together in order to keep IRQ pinning in the same mode. Disk are detected and tuned based on the current content of /etc/scylla/scylla.yaml configuration file. " Fixes #3831. * 'use_perftune_for_disks-v3' of https://github.com/vladzcloudius/scylla: dist: change the sysconfig parameter name to reflect the new semantics scylla_util.py::sysconfig_parser: introduce has_option() dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics dist: don't distribute posix_net_conf.sh any more dist: use perftune.py to tune disks and NIC	2018-11-01 13:13:49 +00:00
Avi Kivity	96173e81e0	Update seastar submodule * seastar c1e0e5d...c02150e (5): > prometheus: pass names as query parameter instead of part of the URL > treewide: convert printf() style formatting to fmt > print: add fmt_print() > build: Remove experimental CMake support > Merge "Correct and clean-up `signal_test`" from Jesse	2018-11-01 13:13:48 +00:00
Yibo Cai (Arm Technology China)	79136e895f	utils/crc: calculate crc in parallel It achieves 2.0x speedup on intel E5 and 1.1x to 2.5x speedup on various arm64 microarchitectures. The algorithm cuts data into blocks of 1024 bytes and calculates crc for each block, which is furthur divided into three subblocks of 336 bytes(42 uint64) each, and 16 remaining bytes(2 uint64). For each iteration, three independent crc are caculated for one uint64 from each subgroup. It increases IPC(instructions per cycle) much. After subblocks are done, three crc and remaining two uint64 are combined using carry-less multiplication to reach the final result for one block of 1024 bytes. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1541042759-24767-1-git-send-email-yibo.cai@arm.com>	2018-11-01 10:19:32 +02:00
Vlad Zolotarov	84d341a12d	dist: change the sysconfig parameter name to reflect the new semantics We tune NIC and disks together now. Change the sysconfig parameter to reflect this new semantics. However if we detect an old parameter name in the scylla-server we would still update it thereby keeping the support for old installations. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:28:13 -04:00
Vlad Zolotarov	7950062a82	scylla_util.py::sysconfig_parser: introduce has_option() has_option() returns TRUE if a given configuration option is set. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:27:00 -04:00
Vlad Zolotarov	9a5373254a	dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics Change the name of the corresponding parameter (--setup-nic) to reflect the fact that we tune not just NIC now but rather NIC and disks together. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:27:00 -04:00
Vlad Zolotarov	c74e1a9368	dist: don't distribute posix_net_conf.sh any more We don't need it since we use perftune.py directly Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:27:00 -04:00
Vlad Zolotarov	0e47d8bb1d	dist: use perftune.py to tune disks and NIC Tune disks using perftune.py together with NIC. This is needed because disk(s) and NIC tuning has to be performed using the mode (for non-NVMe disks). We tune disks based on the current content of /etc/scylla/scylla.yaml. Don't use scylla-blocktune for optimizing disks' performance any more. Unite the decision to optimize the NIC and disks tuning. Optimize or not optimize them both together. Disable disk tuning for DPDK and "virtio" modes for now. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:27:00 -04:00
Takuya ASADA	5bf9a03d65	dist/debian: skip running dh_strip_nondeterminism On some Fedora environment dh build tries to run dh_strip_nondeterminism, and fails sice Fedora does not provide such command. (see: http://jenkins.cloudius-systems.com/view/master/job/scylla-master/job/unified-deb/3/console) To prevent the build error we need to skip it. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181030062935.9930-1-syuu@scylladb.com>	2018-10-31 10:23:54 +02:00
Tomasz Grabiec	62c7685b0d	Merge "Proper support for static rows in SSTables 3.x" from Vladimir This patchset addresses two issues with static rows support in SSTables 3.x. ('mc' format): 1. Since collections are allowed in static rows, we need to check for complex deletion, set corresponding flag and write tombstones, if any. 2. Column indices need to be partitioned for static columns the same way they are partitioned for regular ones. * github.com/argenet/scylla.git projects/sstables-30/columns-proper-order-followup/v1: sstables: Partition static columns by atomicity when reading/writing SSTables 3.x. sstables: Use std::reference_wrapper<> instead of a helper structure. sstables: Check for complex deletion when writing static rows. tests: Add/fix comments to test_write_interleaved_atomic_and_collection_columns. tests: Add test covering inverleaved atomic and collection cells in static row.	2018-10-30 10:36:46 +01:00
Vladimir Krivopalov	d82ac02fad	tests: Add test covering inverleaved atomic and collection cells in static row. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 15:01:34 -07:00
Vladimir Krivopalov	7bd95399ed	tests: Add/fix comments to test_write_interleaved_atomic_and_collection_columns. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 15:00:55 -07:00
Vladimir Krivopalov	6bd738ceb1	sstables: Check for complex deletion when writing static rows. It is possible to have collections in a static row so we need to check for collection-wide tombstones like with clustering rows. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 14:59:19 -07:00
Vladimir Krivopalov	6b7003088a	sstables: Use std::reference_wrapper<> instead of a helper structure. No need to store column_id separately as it can be accessed from the column_definition. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 14:58:08 -07:00
Vladimir Krivopalov	8592b834d1	sstables: Partition static columns by atomicity when reading/writing SSTables 3.x. Collections are permitted in static rows so same partitioning as for regular columns is required. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 10:32:02 -07:00
Takuya ASADA	2ac14dcf25	dist/redhat: prevent build error on older Fedora/CentOS Current scylla.spec fails build on Fedora 27, since python2-pystache is new package name that renamed on Fedora 28. But Fedora 28's python2-pystache has tag "Provides: pystache", so we can depends on old package name, this way we can build scylla.spec both on Fedora 27/28. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181028175450.31156-1-syuu@scylladb.com>	2018-10-29 11:36:40 +02:00
Yibo Cai (Arm Technology China)	1c48e3fbec	utils/crc: leverage arm64 crc extension It achieves 6.7x to 11x speedup on various arm64 microarchitectures. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1540781879-15465-1-git-send-email-yibo.cai@arm.com>	2018-10-29 10:50:48 +02:00
Nadav Har'El	b8337f8c9d	Materalized views: fix race condition in resharding while view building When a node reshards (i.e., restarts with a different number of CPUs), and is in the middle of building a view for a pre-existing table, the view building needs to find the right token from which to start building on all shards. We ran the same code on all shards, hoping they would all make the same decision on which token to continue. But in some cases, one shard might make the decision, start building, and make progress - all before a second shard goes to make the decision, which will now be different. This resulted, in some rare cases, in the new materialized view missing a few rows when the build was interrupted with a resharding. The fix is to add the missing synchronization: All shards should make the same decision on whether and how to reshard - and only then should start building the view. Fixes #3890 Fixes #3452 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181028140549.21200-1-nyh@scylladb.com>	2018-10-28 17:20:10 +00:00
Avi Kivity	75dbff984c	Merge "Re-order columns when reading/writing SSTables 3.x" from Vladimir " In Cassandra, row columns are stored in a BTree that uses the following ordering on them: - all atomic columns go first, then all multi-cell ones - columns of both types (atomic and multi-cell) are lexicographically ordered by name regarding each other Scylla needs to store columns and their respective indices using the same ordering as well as when reading them back. Fixes #3853 Tests: unit {release} + Checked that the following SSTables are dumped fine using Cassandra's sstabledump: cqlsh:sst3> CREATE TABLE atomic_and_collection3 ( pk int, ck int, rc1 text, rc2 list<text>, rc3 text, rc4 list<text>, rc5 text, rc6 list<text>, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''}; cqlsh:sst3> INSERT INTO atomic_and_collection3 (pk, ck, rc1, rc4, rc5) VALUES (0, 0, 'hello', ['beautiful','world'], 'here'); << flush >> sstabledump: [ { "partition" : { "key" : [ "0" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 96, "clustering" : [ 0 ], "liveness_info" : { "tstamp" : "1540599270139464" }, "cells" : [ { "name" : "rc1", "value" : "hello" }, { "name" : "rc5", "value" : "here" }, { "name" : "rc4", "deletion_info" : { "marked_deleted" : "1540599270139463", "local_delete_time" : "1540599270" } }, { "name" : "rc4", "path" : [ "45e22cb0-d97d-11e8-9f07-000000000000" ], "value" : "beautiful" }, { "name" : "rc4", "path" : [ "45e22cb1-d97d-11e8-9f07-000000000000" ], "value" : "world" } ] } ] } ] " * 'projects/sstables-30/columns-proper-order/v1' of https://github.com/argenet/scylla: tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x. sstables: Re-order columns (atomic first, then collections) for SSTables 3.x. sstables: Use a compound structure for storing information used for reading columns.	2018-10-28 10:56:09 +02:00
Rafi Einstein	32525f2694	Space-Saving Top-k algorithm for handling stream summary statistics Based on the following implementation ([2]) for the Space-Saving algorithm from [1]. [1] http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf [2] https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java The algorithm keeps a map between keys seen and their counts, keeping a bound on the number of tracked keys. Replacement policy evicts the key with the lowest count while inheriting its count, and recording an estimation of the error which results from that. This error estimation can be later used to prove if the distribution we arrived at corresponds to the real top-K, which we can display alongside the results. Accuracy depends on the number of tracked keys. Introduced as part of 'nodetool toppartition' query implementation. Refs #2811 Message-Id: <20181027220937.58077-1-rafie@scylladb.com>	2018-10-28 10:10:28 +02:00
Vladimir Krivopalov	f3dc2a4927	tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-26 16:58:34 -07:00
Vladimir Krivopalov	7e56e9fca6	sstables: Re-order columns (atomic first, then collections) for SSTables 3.x. In Cassandra, row columns are stored in a BTree that uses the following ordering on them: - all atomic columns go first, then all multi-cell ones - columns of both types (atomic and multi-cell) are lexicographically ordered by name regarding each other Since schema already has all columns lexicographically sorted by name, we only need to stably partition them by atomicity for that. Fixes #3853 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-26 15:58:33 -07:00
Vladimir Krivopalov	210507b867	sstables: Use a compound structure for storing information used for reading columns. This representation makes it easier to operate with compound structures instead of separate values that were stored in multiple containers. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-26 11:32:44 -07:00
Tomasz Grabiec	cf2d5c19fb	Merge "Properly write static rows missing columns for SSTables 3.x." from Vladimir Before this fix, write_missing_columns() helper would always deal with regular columns even when writing static rows. This would cause errors on reading those files. Now, the missing columns are written correctly for regular and static rows alike. * github.com/argenet/scylla.git projects/sstables-30/fix-writing-static-missing-columns/v1: schema: Add helper method returning the count of columns of specified kind. sstables: Honour the column kind when writing missing columns in 'mc' format. tests: Add test for a static row with missing columns (SStables 3.x.).	2018-10-26 09:06:01 +02:00
Vladimir Krivopalov	9843343ad8	tests: Add test for a static row with missing columns (SStables 3.x.). This is a test case for #3892. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-25 17:16:31 -07:00
Vladimir Krivopalov	44043cfd44	sstables: Honour the column kind when writing missing columns in 'mc' format. Previously, we've been writing the wrong missing columns indices for static rows because write_missing_columns() explicitly used regular columns internally. Now, it takes the proper column kind into account. Fixes #3892 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-25 17:09:09 -07:00
Vladimir Krivopalov	399f815a89	schema: Add helper method returning the count of columns of specified kind. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-25 17:07:20 -07:00
Tomasz Grabiec	dcac0ac80c	tests: sstables: Verify no index reads during scans which dont need it Reproducer for https://github.com/scylladb/scylla/issues/3868 Message-Id: <1540459849-27612-2-git-send-email-tgrabiec@scylladb.com>	2018-10-25 16:14:45 +03:00
Tomasz Grabiec	46d0c157ae	tests: sstables: Extract make_sstable_mutation_source() Message-Id: <1540459849-27612-1-git-send-email-tgrabiec@scylladb.com>	2018-10-25 16:14:39 +03:00
Tomasz Grabiec	fe0a0bdf1e	utils/loading_shared_values: Add missing stat update call in one of the cases Message-Id: <1540469591-32738-1-git-send-email-tgrabiec@scylladb.com>	2018-10-25 15:15:05 +03:00
Duarte Nunes	e46ef6723b	Merge seastar upstream * seastar d152f2d...c1e0e5d (6): > scripts: perftune.py: properly merge parameters from the command line and the configuration file > fmt: update to 5.2.1 > io_queue: only increment statistics when request is admitted > Adds `read_first_line.cc` and `read_first_line.hh` to CMake. > fstream: remove default extent allocation hint > core/semaphore: Change the access of semaphore_units main ctor Due to a compile-time fight between fmt and boost::multiprecision, a lexical_cast was added to mediate. sprint("%s", var) no longer accepts numeric values, so some sprint()s were converted to format() calls. Since more may be lurking we'll need to remove all sprint() calls. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-25 12:53:30 +03:00
Benny Halevy	2a57c454f2	update_compaction_history: handle execute_cql exception Fixes #3774 Tested using view_schema_test with and without injecting an exception in modification_statement::do_execute for "compaction_history". Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181017105758.9602-3-bhalevy@scylladb.com>	2018-10-24 18:39:53 +03:00
Benny Halevy	44e5c2643b	compaction_manager::maybe_stop_on_error: add stop_iteration param some call sites are stopping in any case, regardless of what maybe_stop_on_error returns. Reflect that in the log messages. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181017105758.9602-2-bhalevy@scylladb.com>	2018-10-24 18:39:52 +03:00
Avi Kivity	8210f4c982	Merge "Properly writing/reading shadowable deletions with SSTables 3.x." from Vladimir " This patchset adddresses two problems with shadowable deletions handling in SSTables 3.x. ('mc' format). Firstly, we previously did not set a flag indicating the presence of extended flags byte with HAS_SHADOWABLE_DELETION bitmask on writing. This would break subsequent reading and cause all types of failures up to crash. Secondly, when reading rows with this extended flag set, we need to preserve that information and create a shadowable_tombstone for the row. Tests: unit {release} + Verified manually with 'hexdump' and using modified 'sstabledump' that second (shadowable) tombstone is written for MV tables by Scylla. + DTest (materialized_views_test.py:TestMaterializedViews.hundred_mv_concurrent_test) that originally failed due to this issue has successfully passed locally. " * 'projects/sstables-30/shadowable-deletion/v4' of https://github.com/argenet/scylla: tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x. tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x. sstables: Support Scylla-specific extension for writing shadowable tombstones. sstables: Introduce a feature for shadowable tombstones in Scylla.db. memtable: Track regular and shadowable tombstones separately in encoding_stats_collector. sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion. sstables: Support checking row extension flags for Cassandra shadowable deletion.	2018-10-24 18:20:16 +03:00
Tomasz Grabiec	9e756d3863	sstable_mutation_reader: Do not read partition index when scanning Even when we're using a full clustering range, need_skip() will return true when we start a new partition and advance_context() will be called with position_in_partition::before_all_clustered_rows(). We should detect that there is no need to skip to that position before the call to advance_to(*_current_partition_key), which will read the index page. Fixes #3868. Message-Id: <1539881775-8578-1-git-send-email-tgrabiec@scylladb.com>	2018-10-24 15:55:13 +03:00
Avi Kivity	925ef48fce	Merge "Use relocatable package to generate .rpm/.deb" from Takuya " This patchset adds support generating .rpm/.deb from relocatable package. " * 'reloc_rpmdeb_v5' of https://github.com/syuu1228/scylla: configure.py: run create-relocatable-package.py everytime configure.py: add SCYLLA-RELEASE-FILE/SCYLLA-VERSION-FILE targets configure.py: use {mode} instead of $mode on scylla-package.tar.gz build target dist/ami: build relocatable .rpm when --localrpm specified dist/debian: use relocatable package to produce .deb dist/redhat: use relocatable package to produce .rpm install-dependencies.sh: add libsystemd as dependencies install.sh: drop hardcoded distribution name, add --target option to specify distribution build: add script to build relocatable package build: compress relocatable package build: add files on relocatable package to support generating .rpm/.deb	2018-10-24 14:44:09 +03:00
Takuya ASADA	59e4900ca7	configure.py: run create-relocatable-package.py everytime Right now we don't have dependencies for dist/, ninja not able to detect changes under the directory. To update relocatable package even only change is under dist/, we need to run create-relocatable-package.py everytime. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	6e1617d71c	configure.py: add SCYLLA-RELEASE-FILE/SCYLLA-VERSION-FILE targets To re-generate scylla version files when it removed, since these files required for relocatable package. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	0cb8a4cb0c	configure.py: use {mode} instead of $mode on scylla-package.tar.gz build target It's better to use {mode} to extract fixed path just like other build targets do. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	929f03533d	dist/ami: build relocatable .rpm when --localrpm specified Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	f3c3b9183c	dist/debian: use relocatable package to produce .deb Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	8e2dc9e4f4	dist/redhat: use relocatable package to produce .rpm Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	5fa7ed52e3	install-dependencies.sh: add libsystemd as dependencies Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	ce4067ca02	install.sh: drop hardcoded distribution name, add --target option to specify distribution Allow user to build .rpm for Fedora, need to support specifying distribution. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	6319229020	build: add script to build relocatable package To build relocatable package easier, add build_reloc.sh to build it in one command. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	a502715b29	build: compress relocatable package Since debian packaging system requires source package to compress tar file, so let's use .gz compression. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	85fed12c07	build: add files on relocatable package to support generating .rpm/.deb We are missing some files on relocatable package to generate .rpm/.deb, add them. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Paweł Dziepak	637b9a7b3b	atomic_cell_or_collection: make operator<< show cell content After the new in-memory representation of cells was introduced there was a regression in atomic_cell_or_collection::operator<< which stopped printing the content of the cell. This makes debugging more incovenient are time-consuming. This patch fixes the problem. Schema is propagated to the atomic_cell_or_collection printer and the full content of the cell is printed. Fixes #3571. Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>	2018-10-24 13:29:51 +03:00
Avi Kivity	a9836ad758	thrift: limit message size Limit message size according to the configuration, to avoid a huge message from allocating all of the server's memory. We also need to limit memory used in aggregate by thrift, but that is left to another patch. Fixes #3878. Message-Id: <20181024081042.13067-1-avi@scylladb.com>	2018-10-24 09:57:58 +01:00
Raphael S. Carvalho	c958294991	tests/sstable_perf: fix compaction mode for a multi shard instance Compaction mode fails if more than one shard is used because it doesn't make sure sstables used as input for compaction only contain local keys. Therefore, sstable generated by compaction has less keys than expected because non-local keys are purged out. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181022225153.12029-1-raphaelsc@scylladb.com>	2018-10-24 09:58:34 +03:00
Glauber Costa	fc5635100d	install seastar-addr2line and seastar-cpumap into scylla packages It is very useful for investigations in scylla issues, and we have been moving those scripts manually when needed. Make it officially part of the scylla package. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181023184400.23187-1-glauber@scylladb.com>	2018-10-24 09:52:17 +03:00
Amnon Heiman	6bcde841bd	scyllatop: Nicer error message when fail opening a log file or connecting scyllatop uses a log file, if opening the file fails, the user should get a clear response not an exception trace. The same is true for connecting to scylla After this patch the following: $ scyllatop.py -L /usr/lib/scyllatop.log scyllatop failed opening log file: '/usr/lib/scyllatop.log' With an error: [Errno 13] Permission denied: '/usr/lib/scyllatop.log' Fixes #3860 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20181021065525.22749-1-amnon@scylladb.com>	2018-10-24 09:50:45 +03:00
Vlad Zolotarov	4d1bb719a4	config: enable hinted handoff by default Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181019180401.12400-1-vladz@scylladb.com>	2018-10-24 09:47:36 +03:00
Vladimir Krivopalov	ad599d4342	tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	3dcf0acfc2	tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	759d36a26e	sstables: Support Scylla-specific extension for writing shadowable tombstones. The original SSTables 'mc' format, as defined in Cassandra, does not provide a way to store shadowable deletion in addition to regular row deletion for materialized views. It is essential to store it because of known corner-case issues that otherwise appear. For this to work, we introduce a Scylla-specific extended flag to be set in SSTables in 'mc' format that indicates a shadowable tombstone is written after the regular row tombstone. This is deemed to be safe because shadowable tombstones are specific to materialized views and MV tables are not supposed to be imported or exported. Note that a shadowable tombstone can be written without a regular tombstone as well as along with it. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	e168433945	sstables: Introduce a feature for shadowable tombstones in Scylla.db. This is used to indicate that the SSTables being read may contain a Scylla-specific HAS_SCYLLA_SHADOWABLE_TOMBSTONE extended flag set. If feature is not disabled, we should not honour this flag. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	a95ba2f38a	memtable: Track regular and shadowable tombstones separately in encoding_stats_collector. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	b7d48c1ccd	sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion. This flag can be only set in MV tables that are not supported to be imported to Scylla. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	8f79f76116	sstables: Support checking row extension flags for Cassandra shadowable deletion. This flag can be only used in MV tables that are not supposed to be imported to Scylla. Since Scylla representation of shadowable tombstones differs from that of Cassandra, such SSTables are rejected on read and Scylla never sets this flag on writing. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Avi Kivity	1533487ba8	Merge "hinted handoff: give a sender a low priority" from Vlad " Hinted handoff should not overpower regular flows like READs, WRITEs or background activities like memtable flushes or compactions. In order to achieve this put its sending in the STEAMING CPU scheduling group and its commitlog object into the STREAMING I/O scheduling group. Fixes #3817 " * 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla: db::hints::manager: use "streaming" I/O scheduling class for reads commitlog::read_log_file(): set the a read I/O priority class explicitly db::hints::manager: add hints sender to the "streaming" CPU scheduling group	2018-10-23 16:55:05 +00:00
Raphael S. Carvalho	65e8853e8d	tests: test that sstable cleanup wont get rid of key which token belongs to node Commit `1ce52d54` fixed sort order of local ranges, which is needed for cleanup to work properly because it relies on that to perform a binary search. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181023031322.22763-1-raphaelsc@scylladb.com>	2018-10-23 16:55:05 +00:00
Avi Kivity	d9e0ea6bb0	config: mark range_request_timeout_in_ms and request_timeout_in_ms as Used This makes them available in scylla --help. Fixes #3884. Message-Id: <20181023101150.29856-1-avi@scylladb.com>	2018-10-23 11:52:03 +01:00
Paweł Dziepak	c94d2b6aa6	cql3: restore original timeout behaviour for aggregate queries Commit `1d34ef38a8` "cql3: make pagers use time_point instead of duration" has unintentionally altered the timeout semantics for aggregate queries. Such requests fetch multiple pages before sending a response to the client. Originally, each of those fetches had a timeout-duration to finish, after the problematic commit the whole request needs to complete in a single timeout-duration. This, unsurprisingly, makes some queries that were successful before fail with a timeout. This patch restores the original behaviour. Fixes #3877. Message-Id: <20181022125318.4384-1-pdziepak@scylladb.com>	2018-10-23 12:52:42 +03:00
Takuya ASADA	950dbdb466	dist/common/sysctl.d: add new conf file to set fs.aio-max-nr We need raise fs.aio-max-nr to larger value since Seastar may allocates more then 65535 AIO events (= kernel default value) Fixes #3842 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181023030449.15445-1-syuu@scylladb.com>	2018-10-23 11:01:07 +03:00
Tomasz Grabiec	a34e417874	Merge "Stabilise perf_fast_forward results" from Paweł his series attempts to make fragments per second results reported by perf_fast_forward more stable. That includes running each test case multiple time and reporting median, median average deviation, maximum and minimum value. That should allow to relatively easily assess how repeatable the presented results are. Moreover, since perf_fast_forward does IO operation it is important that they do not introduce any excessive noise to the results. The location of the data directory is made configurable so that the user can choose less noisy disk or a ramdisk. * github.com/pdziepak/scylla.git stabilise-perf_fast_forward/v3: tests/perf_fast_forward: make fragments/s measurements more stable tests/perf_fast_forward: make data directory location configurable	2018-10-22 18:33:25 +02:00
Avi Kivity	d5d831f41b	tests: network_topology_strategy_test: remove quadratic complexity network_topology_strategy test creates a ring with hundreds of tokens (and one token per node). Then, for each token, it calls get_primary_ranges(), which in turn walks the token ring. However, because the each datacenter occupies a disjoint token range, this walk practically has to walk the entire ring until it collects enough endpoints for each datacenter. The whole thing takes 15 minutes. Speed this up by randomizing the token<->dc relationship. This is more realistic, and switches the algorithm to be O(token count), and now it completes in less than a minute (still not great, but better). Message-Id: <20181022154026.19618-1-avi@scylladb.com>	2018-10-22 17:06:57 +01:00
Paweł Dziepak	63a705dca3	tests/perf_fast_forward: make data directory location configurable perf_fast_forward populates perf_fast_forward_output with some data and then runs performance tests that read it. That makes the disk a significant factor in the final result and may make the results less repeatable. This patch adds a flag that allows setting the location of the data directory so that the user can opt for a less noisy disk or a ramdisk.	2018-10-22 16:52:58 +01:00
Paweł Dziepak	29e872f865	tests/perf_fast_forward: make fragments/s measurements more stable perf_fast_forward performs various operations, many of which involve sstable reads and verifies the metrics that there weren't any unnecessary IO operations. It also provides fragments per seconds measurements for the tests it runs. However, since some of the tests are very short and involve IO those values vary a lot what makes them not very useful. This commit attempts to stabilise those results. Each test case is run multiple time (by default for a second, but at least 3 times) and shows median, median absolute deviation, maximum and minimum value. This should allow assessing whether the changes in the results are just noise or a real regression or improvement.	2018-10-22 16:52:58 +01:00
Duarte Nunes	f3a5ec0fd9	db/view: Don't copy keyspace name Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181022104527.14555-1-duarte@scylladb.com>	2018-10-22 13:00:00 +02:00
George Kollias	c2343dc841	Make restricting reader fill_buffer more efficient Currently, restricting_mutation_reader::fill_buffer justs reads lower-layer reader's fragments one by one without doing any further transformations. This change just swaps the parent-child buffers in a single step, as suggested in #3604, and, hence, removing any possible per-fragment overhead. I couldn't find any test that exercises restricting_mutation_reader as a mutation source, so I added test_restricted_reader_as_mutation_source in mutation_reader_test. Tests: unit (release), though these 4 tests are failing regardless of my changes (they fail on master for me as well): snitch_reset_test, sstable_mutation_test, sstable_test, sstable_3_x_test. Fixes: #3604 Signed-off-by: George Kollias <georgioskollias@gmail.com> Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>	2018-10-22 11:36:54 +03:00
Duarte Nunes	3fe92663d4	Merge 'Fix for a select statement with filtered columns' from Eliran " This patchset fixes #3803. When a select statement with filtering is executed and the column that is needed for the filtering is not present in the select clause, rows that should have been filtered out according to this column will still be present in the result set. Tests: 1. The testcase from the issue. 2. Unit tests (release) including the newly added test from this patchset. " * 'issues/3803/v10' of https://github.com/eliransin/scylla: unit test: add test for filtering queries without the filtered column cql3 unit test: add assertion for the number of serialized columns cql3: ensure retrieval of columns for filtering cql3: refactor find_idx to be part of statement restrictions object cql3: add prefix size common functionality to all clustering restrictions cql3: rename selection metadata manipulation functions	2018-10-21 09:53:37 +01:00
Eliran Sinvani	145f931ae7	unit test: add test for filtering queries without the filtered column Test the usecase where the column that the filtering operates on is not a part of the select clause. The expected result is a set containing the columns of the select clause with the additional columns for filtering marked as non serializable. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:41:46 +03:00
Eliran Sinvani	86637a1d0d	cql3 unit test: add assertion for the number of serialized columns The result sets that the assertions are performed against are result sets before serialization to the user and therefore contain also columns that will not be serialized and sent as the query's final result. The patch adds an assertion on the number of columns that will be present in the final serialized result. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:41:46 +03:00
Eliran Sinvani	fd422c954e	cql3: ensure retrieval of columns for filtering When a query that needs filtering is executed, the columns that the coordinator is filtering by have to be retrieved.The columns should be retrieved even if they are not used for ordering or named in the actual select clause. If the columns are missing from the result set, then any filtering that restricts the missing column will not take place. Fixes #3803 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:41:46 +03:00
Eliran Sinvani	3e036e2c8c	cql3: refactor find_idx to be part of statement restrictions object find_idx calculates the index that will be used in the statement if indexes are to be used. In the static form it requires redundant information (the schema is already contained within the statement restrictions object). In addition find_idx will need to be used for filtering in order not to include redundant selectors in the selection objects. This change refactors find_idx to run under the statement restrictions object and changes it's scope from private to public. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:40:24 +03:00
Eliran Sinvani	4496086bf1	cql3: add prefix size common functionality to all clustering restrictions Up untill now, knowing the prefix size, which is used to determine if a filtering is needed was implemented only for a single column clustering restrictions. The patch adds a function to calculate the prefix size for all types of clustering key restrictions given the schema. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:39:57 +03:00
Vlad Zolotarov	a87c11bad2	storage_proxy::query_result_local: create a single tracing span on a replica shard Every call of a tracing::global_trace_state_ptr object instead of a tracing::tracing_state_ptr or a call to tracing::global_trace_state_ptr::get() creates a new tracing session (span) object. This should never be done unless query handling moves to a different shard. Fixes #3862 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181018003500.10030-1-vladz@scylladb.com>	2018-10-19 16:47:17 +00:00
Tomasz Grabiec	fc37b80d24	Merge "Correctly handle dropped columns in SSTable 3" from Piotr J. Previously we were making assumptions about missing columns (the size of its value, whether it's a collection or a counter) but they didn't have to be always true. Now we're using column type from serialization header to use the right values. Fixes #3859 * seastar-dev.git haaawk/projects/sstables-30/handling-dropped-columns/v4: sstables 3: Correctly handle dropped columns in column_translation sstables 3: Add test for dropped columns handling	2018-10-19 16:47:17 +00:00
Duarte Nunes	3a53b3cebc	Merge 'hinted handoff: add manager::state and split storing and replaying enablement' from Vlad " Refs #3828 (Probably fixes it) We found a few flaws in a way we enable hints replaying. First of all it was allowed before manager::start() is complete. Then, since manager::start() is called after messaging_service is initialized there was a time window when hints are rejected and this creates an issue for MV. Both issues above were found in the context of #3828. This series fixes them both. Tested {release}: dtest: materialized_views_test.py:TestMaterializedViews.write_to_hinted_handoff_for_views_test dtest: hintedhandoff_additional_test.py " * 'hinted_handoff_dont_create_hints_until_started-v1' of https://github.com/vladzcloudius/scylla: hinted handoff: enable storing hints before starting messaging_service db::hints::manager: add a "started" state db::hints::manager: introduce a _state	2018-10-19 16:47:16 +00:00
Avi Kivity	1ce52d5432	locator: fix abstract_replication_strategy::get_ranges() and friends violating sort order get_ranges() is supposed to return ranges in sorted order. However, `a35136533d` broke this and returned the range that was supposed to be last in the second position (e.g. [0, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9]). The broke cleanup, which relied on the sort order to perform a binary search. Other users of the get_ranges() family did not rely on the sort order. Fixes #3872. Message-Id: <20181019113613.1895-1-avi@scylladb.com>	2018-10-19 16:47:12 +00:00
Vlad Zolotarov	aca0882a3f	hinted handoff: enable storing hints before starting messaging_service When messaging_service is started we may immediately receive a mutation from another node (e.g. in the MV update context). If hinted handoff is not ready to store hints at that point we may fail some of MV updates. We are going to resolve this by start()ing hints::managers before we start messaging_service and blocking hints replaying until all relevant objects are initialized. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:49:58 -04:00
Vlad Zolotarov	cff4186517	db::hints::manager: add a "started" state Hinting is allowed after "started" before "stopping". Hints that attempted to be stored outside this time frame are going to be dropped. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:41:36 -04:00
Vlad Zolotarov	fb513a4b23	db::hints::manager: introduce a _state Introduce a multi-bit state field. In this patch it replaces the _stopping boolean. We are going to add more states in the following patches. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:41:33 -04:00
Piotr Jastrzebski	e94254b563	sstables 3: Add test for dropped columns handling Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-18 19:13:58 +02:00
Piotr Jastrzebski	cafb3dc2ae	sstables 3: Correctly handle dropped columns in column_translation Previously we were making assumptions about missing columns (the size of its value, whether it's a collection or a counter) but they didn't have to be always true. Now we're using column type from serialization header to use the right values. Fixes #3859 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-18 19:13:44 +02:00
Eliran Sinvani	ded3a03356	cql3: rename selection metadata manipulation functions In the past the addition of non serializable columns was being used only for post ordering of result sets.The newly added ALLOW FILTERING feature will need to use these functions to other post processing operations i.e filtering. The renaming accounts for the new and existing uses for the function. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-18 17:52:04 +03:00
Avi Kivity	472afea6cd	Update seastar submodule * seastar 4669469...d152f2d (5): > build: don't link with libgcc_s explicitly > scheduling: add std::hash<seastar::scheduling_group> > prometheus: Allow preemption between each metric > Merge "improve memory detection in containers" from Juliana > Merge "perf_tests: produce json reports" from Paweł	2018-10-18 14:55:18 +03:00
Duarte Nunes	7610cedc34	Merge "db/hints: Expose current backlog" from Duarte " Hints are stored on disk by a hints::manager, ensuring they are eventually sent. A hints::resource_manager ensures the hints::managers it tracks don't consume more than their allocated resources by monitoring disk space and disabling new hints if needed. This series fixes some bugs related to the backlog calculation, but mainly exposes the backlog through a hints::manager so upper layers can apply flow control. Refs #2538 " * 'hh-manager-backlog/v3' of https://github.com/duarten/scylla: db/hints/manager: Expose current backlog db/hints/manager: Move decision about blocking hints to the manager db/hints/resource_manager: Correctly account resources in space_watchdog db/hints/resource_manager: Replace timer with seastar::thread db/hints/resource_manager: Ensure managers are correctly registered db/hints/resource_manager: Fix formatting db/hints: Disallow moving or copying the managers	2018-10-16 20:35:34 +01:00
Duarte Nunes	624472d16a	db/hints/manager: Expose current backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:35:00 +01:00
Duarte Nunes	6dcb7a39d4	db/hints/manager: Move decision about blocking hints to the manager The space_watchdog enables or disables hints for the managers associated with a particular device. We encapsulate this decision inside the hints::managers by introducing the update_backlog() function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:35:00 +01:00
Duarte Nunes	207c9c8e38	db/hints/resource_manager: Correctly account resources in space_watchdog A db::hints::resource_manager manages the resources for one or two db::hints::managers. Each of these can be using the same or different devices. The db::hints::space_watchdog periodically checks whether each manager is within their resource allocation, and if not disables it. The watchdog iterates over the managers and accounts for the total size they are using. This is wrong, since it can account in the same variable the size consumed by managers using different devices. We fix this while taking advantage of the fact that on_timer is now called in the context of a seastar::thread, instead of using future combinators. Fixes #3821 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:34:54 +01:00
Duarte Nunes	25d266bdc1	db/hints/resource_manager: Replace timer with seastar::thread Will make on_timer() much simpler to allow fixing a bug in subsequent patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	278aa13bb0	db/hints/resource_manager: Ensure managers are correctly registered Registering a manager for a new device used std::unordered_map::emplace(), which may not insert the specified value if one with the same key has already been added. This could happen if both managers were using the same device and the fiber deferred in-between adding them. Found during code reading. Could cause hints to not be disabled for an overloaded manager. Fixes #3822 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	9e3b09cf48	db/hints/resource_manager: Fix formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	622ac734da	db/hints: Disallow moving or copying the managers Disable the copy and move ctors and assignment operators for both the hints::manager and the hints::resource_manager. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Glauber Costa	7edae5421d	sstables: print sstable path in case of an exception Without that, we don't know where to look for the problems Before: compaction failed: sstables::malformed_sstable_exception (Too big ttl: 3163676957) After: compaction_manager - compaction failed: sstables::malformed_sstable_exception (Too big ttl: 4294967295 in sstable /var/lib/scylla/data/system_traces/events-8826e8e9e16a372887533bc1fc713c25/mc-832-big-Data.db) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181016181004.17838-1-glauber@scylladb.com>	2018-10-16 20:31:20 +01:00
Asias He	7f826d3343	streaming: Expose reason for streaming On receiving a mutation_fragment or a mutation triggered by a streaming operation, we pass an enum stream_reason to notify the receiver what the streaming is used for. So the receiver can decide further operation, e.g., send view updates, beyond applying the streaming data on disk. Fixes #3276 Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com>	2018-10-15 22:03:28 +01:00
Benny Halevy	7eef527769	handle both special token_kinds in dht::tri_compare Handle the before_all_keys and after_all_keys token_kind at the highest layer before calling into the virtual i_partitioner::tri_compare that is not set up to handle these cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181015165612.29356-1-bhalevy@scylladb.com>	2018-10-15 20:00:54 +03:00
Glauber Costa	51906f7144	compactions: log tokens that we decide not to write down to an SSTable May be important when debugging issues related to cleanups Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181015162643.7834-1-glauber@scylladb.com>	2018-10-15 19:28:00 +03:00
Vladimir Krivopalov	092276b13d	sstables: Reset opened range tombstone when moving to another partition. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <f6dc6b0bd88ca44f2ef84c2a8bee43fde82c89cc.1539396572.git.vladimir@scylladb.com>	2018-10-14 11:20:11 +03:00
Vladimir Krivopalov	926b6430fd	sstables: Factor out code resetting values for a new partition. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <83a3a4ce6942b036be447bcfeb66142828e75293.1539396572.git.vladimir@scylladb.com>	2018-10-14 11:20:10 +03:00
Glauber Costa	98332de268	api: use longs instead of ints for snapshot sizes Int types in json will be serialized to int types in C++. They will then only be able to handle 4GB, and we tend to store more data than that. Without this patch, listsnapshots is broken in all versions. Fixes: #3845 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181012155902.7573-1-glauber@scylladb.com>	2018-10-12 21:17:24 +03:00
Tomasz Grabiec	b89556512a	Merge "Enable sstable_mutation_test with SSTables 3.x." from Vladimir Introduce uppermost_bound() method instead of upper_bound() in mutation_fragment_filter and clustering_ranges_walker. For now, this has been only used to produce the final range tombstone for sliced reads inside consume_partition_end(). Usage of the upper bound of the current range causes problems of two kinds: 1. If not all the slicing ranges have been traversed with the clustering range walker, which is possible when the last read mutation fragment was before some of the ranges and reading was limited to a specific range of positions taken from index, the emitted range tombstone will not cover the untraversed slices. 2. At the same time, if all ranges have been walked past, the end bound is set to after_all_clustered_rows and the emitted RT may span more data than it should. To avoid both situations, the uppermost bound is used instead, which refers to the upper bound of the last range in the sequence. * github.com/scylladb/seastar-dev.git haaawk/projects/sstables-30/enable-mc-with-sstable-mutation-test/v2 sstables: Use uppermost_bound() instead of upper_bound() in mutation_fragment_filter. tests: Enable sstable_mutation_test for SSTables 'mc' format. Rebased by Piotr J.	2018-10-12 15:14:17 +02:00
Vladimir Krivopalov	5b03fe7982	tests: Enable sstable_mutation_test for SSTables 'mc' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-12 14:18:15 +02:00
Vladimir Krivopalov	199dc9d5a7	sstables: Use uppermost_bound() instead of upper_bound() in mutation_fragment_filter. For now, this has been only used to produce the final range tombstone for sliced reads inside consume_partition_end(). Usage of the upper bound of the current range causes problems of two kinds: 1. If not all the slicing ranges have been traversed with the clustering range walker, which is possible when the last read mutation fragment was before some of the ranges and reading was limited to a specific range of positions taken from index, the emitted range tombstone will not cover the untraversed slices. 2. At the same time, if all ranges have been walked past, the end bound is set to after_all_clustered_rows and the emitted RT may span more data than it should. To avoid both situations, the uppermost bound is used instead, which refers to the upper bound of the last range in the sequence. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-12 14:18:15 +02:00
Tomasz Grabiec	193efef950	Merge "Make SST3 pass test_clustering_slices test" from Piotr * seastar-dev.git haaawk/sst3/test_clustering_slices/v8: sstables: Extract on_end_of_stream from consume_partition_end sstables: Don't call consume_range_tombstone_end in consume_partition_end sstables: Change the way fragments are returned from consumer	2018-10-12 14:11:51 +02:00
Piotr Jastrzebski	1a6cef80f0	sstables: Change the way fragments are returned from consumer Split range tombstone (if present) on every consume_row_end call and store both range tombstone and row in different fields called _stored_row and _stored_tombstone instead of using single field called _stored. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-12 13:51:39 +02:00
Piotr Jastrzebski	3109c94c84	sstables: Don't call consume_range_tombstone_end in consume_partition_end We don't need to check _opened_range_tombstone and _mf_filter again Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-12 13:51:28 +02:00
Piotr Jastrzebski	7dcea660e8	sstables: Extract on_end_of_stream from consume_partition_end The new function will be called when the stream of data is finished while old consume_partition_end will be called when partition is finished but stream is not done yet. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-12 13:50:52 +02:00
Piotr Jastrzebski	717cb2a9e7	sstables: Adopt test_clustering_slices test for SST3 Readers for SST3 return a bit more precise range tombstones when reader is slicing. Namely, SST2 readers return whole range tombstones that overlap with slicing range but SST3 trim those range tombstones to slicing range. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-11 15:47:47 +02:00
Tomasz Grabiec	a7a14e3af2	Merge "Handle dead row markers when writing to SSTables 3.x" from Vladimir There is a mismatch between row markers used in SSTables 2.x (ka/la) and liveness_info used by SSTables 3.x (mc) in that a row marker can be written as a deleted cell but liveness_info cannot. To handle this, for a dead row marker the corresponding liveness_info is written as expiring liveness_info with a fake TTL set to 1. This approach is adapted from the solution for CASSANDRA-13395 that exercised similar issue during SSTables upgrades. * github.com/argenet/scylla.git projects/sstables-30/dead-row-marker/v7: sstables: Introduce TTL limitation and special 'expired TTL' value. sstables: Write dead row marker as expired liveness info. tests: Add test covering dead row marker writing to SSTables 3.x.	2018-10-11 10:58:57 +02:00
Gleb Natapov	ceb361544a	stream_session: remove unused capture 'Consumer function' parameter for distribute_reader_and_consume_on_shards() captures schema_ptr (which is a seastar::shared_ptr), but the function is later copied on another shard at which point schema_ptr is also copied and its counter is incremented by the wrong shard. The capture is not even used, so lets just drop it. Fixes #3838 Message-Id: <20181011075500.GN14449@scylladb.com>	2018-10-11 11:10:58 +03:00
Botond Dénes	23f3831aaf	table::make_streaming_reader(): add forwarding parameter The single-range overload, when used by make_multishard_streaming_reader(), has to create a reader that is forwardable. Otherwise the multishard streaming reader will not produce any output as it cannot fast-forward its shard readers to the ranges produced by the generator. Also add a unit test, that is based on the real-life purpose the multishard streaming reader was designed for - serving partition from a shard, according to a sharding configuration that is different than the local one. This is also the scenario that found the buf in the first place. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <bf799961bfd535882ede6a54cd6c4b6f92e4e1c1.1539235034.git.bdenes@scylladb.com>	2018-10-11 10:59:18 +03:00
Vlad Zolotarov	5b12ec441d	db::hints::manager: use "streaming" I/O scheduling class for reads Make sure that read I/O in the context of HH sending do not overpower I/O in the context of queries, memtable flushes or compactions. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vlad Zolotarov	a89188de07	commitlog::read_log_file(): set the a read I/O priority class explicitly Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vlad Zolotarov	629972d586	db::hints::manager: add hints sender to the "streaming" CPU scheduling group Make sure that HH sends do not overpower (CPU wise) regular WRITEs flow. Fixes #3817 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vladimir Krivopalov	9a04200b03	tests: Add test covering dead row marker writing to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-10 11:44:54 -07:00
Vladimir Krivopalov	9c773fa6cf	sstables: Write dead row marker as expired liveness info. This allows to distinguish expired liveness info from yet-to-expire one and convert it into a dead row marker on read. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-10 11:44:14 -07:00
Vladimir Krivopalov	e71cc5ab20	sstables: Introduce TTL limitation and special 'expired TTL' value. This allows to store expired liveness info in SSTables 3.x format without introducing a possible conflict with real TTL values. As per Cassandra, TTL cannot exceed 20 years so taking the maximum value as a special value for indicating expired liveness info is safe. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-10 11:44:14 -07:00
Calle Wilund	3cb50c861d	messaging_service: Make rpc streaming sink respect tls connection Fixes #3787 Message service streaming sink was created using direct call to rpc::client::make_sink. This in turn needs a new socker, which it creates completely ignoring what underlying transport is active for the client in question. Fix by retaining the tls credential pointer in the client wrapper, and using this in a sink method to determine whether to create a new tls socker, or just go ahead with a plain one. Message-Id: <20181010003249.30526-1-calle@scylladb.com>	2018-10-10 12:55:28 +03:00
Avi Kivity	1891779e64	Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte " This series changes hinted handoff to work with `frozen_mutation`s instead of naked `mutation`s. Instead of unfreezing a mutation from the commitlog entry and then freezing it again for sending, now we'll just keep the read, frozen mutation. Tests: unit(release) " * 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla: db/hints/manager: Use frozen_mutation instead of mutation db/hints/manager: Use database::find_schema() db/commitlog/commitlog_entry: Allow moving the contained mutation service/storage_proxy: send_to_endpoint overload accepting frozen_mutation service/storage_proxy: Build a shared_mutation from a frozen_mutation service/storage_proxy: Lift frozen_mutation_and_schema service/storage_proxy: Allow non-const ranges in mutate_prepare()	2018-10-09 17:48:18 +03:00
Piotr Sarna	a93d27960c	tests: add secondary index paging unit test case A simple case for SI paging is added to secondary_index_test suite. This commit should be followed by more complex testing and serves as an example on how to extract paging state and use it across CQL queries. Message-Id: <b22bdb5da1ef8df399849a66ac6a1f377e6a650a.1539090350.git.sarna@scylladb.com>	2018-10-09 15:05:20 +01:00
Avi Kivity	cfab7a2be6	Update seastar submodule * seastar ed44af8...4669469 (2): > prometheus: Fix histogram text representation > reactor: count I/O errors Fixes #3827.	2018-10-09 16:36:47 +03:00
Gleb Natapov	319ece8180	storage_proxy: do not pass write_stats down to send_to_live_endpoints write_stats is referenced from write handler which is available in send_to_live_endpoints already. No need to pass it down. Message-Id: <20181009133017.GA14449@scylladb.com>	2018-10-09 16:33:53 +03:00
Botond Dénes	d467b518bc	multishard_mutation_query(): don't attempt to stop broken readers Currently, when stopping a reader fails, it simply won't be attempted to be saved, and it will be left in the `_readers` array as-is. This can lead to an assertion failure as the reader state will contain futures that were already waited upon, and that the cleanup code will attempt to wait on again. To prevent this, when stopping a reader fails, reset it to nonexistent state, so that the cleanup code doesn't attempt to do anything with it. Refs: #3830 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <a1afc1d3d74f196b772e6c218999c57c15ca05be.1539088164.git.bdenes@scylladb.com>	2018-10-09 15:59:50 +03:00
Gleb Natapov	207b57a892	storage_proxy: count number of timed out write attempts after CL is reached It is useful to have this counter to investigate the reason for read repairs. Non zero value means that writes were lost after CL is reached and RR is expected. Message-Id: <20181009120900.GF22665@scylladb.com>	2018-10-09 15:17:07 +03:00
Piotr Sarna	b3685342a6	service/pager: avoid dereferencing null partition key The pager::state() function returns a valid paging object even if the pager itself is exhausted. It may also not contain the partition key, so using it unconditionally was a bug - now, in case there is no partition key present, paging state will contain an empty partition key. Fixes #3829 Message-Id: <28401eb21ab8f12645c0a33d9e92ada9de83e96b.1539074813.git.sarna@scylladb.com>	2018-10-09 12:13:52 +03:00
Botond Dénes	4bb0bbb9e2	database: add make_multishard_streaming_reader() Creates a streaming reader that reads from all shards. Shard readers are created with `table::make_streaming_reader()`. This is needed for the new row-level repair. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4b74c710bed2ef98adf07555a4c841e5b690dd8c.1538470782.git.bdenes@scylladb.com>	2018-10-09 11:07:47 +03:00
Botond Dénes	3eeb6fbd23	table::make_streaming_reader(): add single-range overload This will be used by the `make_multishard_streaming_reader()` in the next patch. This method will create a multishard combining reader which needs its shard readers to take a single range, not a vector of ranges like the existing overload. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <cc6f2c9a8cf2c42696ff756ed6cb7949b95fe986.1538470782.git.bdenes@scylladb.com>	2018-10-09 11:07:46 +03:00
Botond Dénes	a56871fab7	tests/multishard_mutation_query_test: test rage-tombstones spanning multiple pages Extend the existing range-tombstone test, such that range tombstones span multiple pages worth of rows. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <583aa826ea12118289b08d483b55b5573d27e1ee.1539002810.git.bdenes@scylladb.com>	2018-10-09 10:18:28 +03:00
Vladimir Krivopalov	e9aba6a9c3	sstables: Add missing 'mc' format into format strings map in sstable::filename(). Fixes #3832. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <269421fb2ac8ab389231cbe9ed501da7e7ff936a.1539048008.git.vladimir@scylladb.com>	2018-10-09 10:07:08 +03:00
Asias He	8edf3defdf	range_streamer: Futurize add_ranges It might take long time for get_all_ranges_with_sources_for and get_all_ranges_with_strict_sources_for to calculate which cause reactor stall. To fix, run them in a thread and yield. Those functions are used in the slow path, it is ok to yield more than needed. Fixes #3639 Message-Id: <63aa7794906ac020c9d9b2984e1351a8298a249b.1536135617.git.asias@scylladb.com>	2018-10-09 09:46:50 +03:00
Nadav Har'El	b8668dc0f8	materialized views: refuse to filter by non-key column A materialized views can provide a filter so as to pick up only a subset of the rows from the base table. Usually, the filter operates on columns from the base table's primary key. If we use a filter on regular (non-key) columns, things get hairy, and as issue #3430 showed, wrong: merely updating this column in the base table may require us to delete, or resurrect, the view row. But normally we need to do the above when the "new view key column" was updated, when there is one. We use shadowable tombstones with one timestamp to do this, so it cannot take into account the two timestamp from those two columns (the filtered column and the new key column). So in the current code, filtering by a non-key column does not work correctly. In this patch we provide two test cases (one involving TTLs, and one involves only normal updates), which demonstrate vividly that it does not work correctly. With normal updates, trying to resurect a view row that has previously disappeared, fails. With TTLs, things are even worse, and the view row fails to disappear when the filtered column is TTLed. In Cassandra, the same thing doesn't work correctly as well (see CASSANDRA-13798 and CASSANDRA-13832) so they decided to refuse creating a materialized view filtering a non-key column. In this patch we also do this - fail the creation of such an unsupported view. For this reason, the two tests mentioned above are commented out in a "#if", with, instead, a trivial test verifying a failure to create such a view. Note that as explained above, when the filtered column and new view key column are different we have a problem. But when they are the same - namely we filter by a non-key base column which actually is a key in the view - we are actually fine. This patch includes additional test cases verifying that this case is really fine and provides correct results. Accordingly, this case is not forbidden in the view creation code. Fixes #3430. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181008185633.24616-1-nyh@scylladb.com>	2018-10-08 20:37:11 +01:00
Avi Kivity	0fa60660b8	Merge "Fix mutation fragments clobbering on fast_forward" from Vladimir " This patchset fixes a bug in SSTables 3.x reading when fast-forwarding is enabled. It is possible that a mutation fragment, row or RT marker, is read and then stored because it falls outside the current fast-forwarding range. If the reader is further fast-forwarded but the row still falls outside of it, the reader would still continue reading and get the next fragment, if any, that would clobber the currently stored one. With this fix, the reader does not attempt to read on after storing the current fragment. Tests: unit {release} " * 'projects/sstables-30/row-skipped-on-double-ff/v2' of https://github.com/argenet/scylla: tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x. sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment. sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value.	2018-10-08 20:18:42 +03:00
Vladimir Krivopalov	07d61683b6	tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-08 09:09:33 -07:00
Botond Dénes	d0eb443913	result_memory_accounter: drop state_for_another_shard() This is not used since range-scans were refactored (`e49a14e30`) as part of making them stateful. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <589f30163e29299e840750457919214a26f0da93.1539005336.git.bdenes@scylladb.com>	2018-10-08 14:29:48 +01:00
Duarte Nunes	48ebe6552c	Merge 'Fix issues with endpoint state replication to other shards' from Tomasz Fixes #3798 Fixes #3694 Tests: unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test) * tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla: gms/gossiper: Replicate enpoint states in add_saved_endpoint() gms/gossiper: Make reset_endpoint_state_map() have effect on all shards gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards gms/gossiper: Always override states from older generations	2018-10-08 14:19:19 +01:00
Avi Kivity	4b16867bd7	cql: relax writetime/ttl selections of collections writetime() or ttl() selections of non-frozen collections can work, as they are single cells. Relax the check to allow them, and only forbid non-frozen collections. Fixes #3825. Tests: cql_query_test (release). Message-Id: <20181008123920.27575-1-avi@scylladb.com>	2018-10-08 14:07:01 +01:00
Duarte Nunes	56e36ee14b	flat_mutation_reader: Use std::move(range) in move_buffer_content_to() Instead of open coding it. Tests: unit(release) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181008104328.13164-1-duarte@scylladb.com>	2018-10-08 13:57:13 +03:00
Avi Kivity	474bb4e44f	cql: functions: implement min/max/count for bytes type Uncomment existing declare() calls and implement tests. Because the data_value(bytes) constructor is explicit, we add explicit conversion to data_value in impl_min_function_for<> and impl_max_function_for<>. Fixes #3824. Message-Id: <20181008084127.11062-1-avi@scylladb.com>	2018-10-08 10:48:30 +01:00
Takuya ASADA	d89114d1fc	dist/debian: install GPG key for cross-building We found on some Debian environment Ubuntu .deb build fails with gpg error because lack of Ubuntu GPG key, so we need to install it before start pbuilder. Same as on Ubuntu, it needs to install Debian GPG key. Fixes #3823 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181008072246.13305-1-syuu@scylladb.com>	2018-10-08 10:43:25 +03:00
Botond Dénes	b01050e28c	HACKING.md: add link to the scylla-dev mailing list Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <9a5d967f791d7a0db584864f68f93bbc68f52372.1538977773.git.bdenes@scylladb.com>	2018-10-08 10:06:50 +03:00
Duarte Nunes	74d809f8be	db/hints/manager: Use frozen_mutation instead of mutation Instead of unfreezing a mutation from the commitlog and then freezing it again to send, just keep the read frozen mutation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Duarte Nunes	6eec9748fc	db/hints/manager: Use database::find_schema() Instead of using find_column_family() and repeatedly asking for column_family::schema(), use database::find_schema() instead. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Duarte Nunes	5b3d08defc	db/commitlog/commitlog_entry: Allow moving the contained mutation Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Duarte Nunes	3b6d2286e9	service/storage_proxy: send_to_endpoint overload accepting frozen_mutation Add an overload to send_to_endpoint() which accepts a frozen_mutation. The motivation is to allow better accounting of pending view updates, but this change also allows some callers to avoid unfreezing already frozen mutations. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:37:39 +01:00
Duarte Nunes	c7639f53e0	service/storage_proxy: Build a shared_mutation from a frozen_mutation Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Duarte Nunes	9e14412528	service/storage_proxy: Lift frozen_mutation_and_schema Lift frozen_mutation_and_schema to frozen_mutation.hh, since other subsystems using frozen_mutations will likely want to pass it around together with the schema. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Duarte Nunes	2c739f36cc	service/storage_proxy: Allow non-const ranges in mutate_prepare() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Avi Kivity	1cc81d1492	Update seastar submodule * seastar 71e914e...ed44af8 (4): > Merge "Add semaphore_units<>::split() function" from Duarte > scheduling: introduce destroy_scheduling_group() > tls: include "api.hh" for listen_options > rpc: connection-level resource isolation	2018-10-07 20:45:49 +03:00
Duarte Nunes	4162bff37a	Merge 'cql3: allow adding or dropping multiple columns in ALTER TABLE statement' from Benny " This patchset implements ALTER TABLE ADD/DROP for multiple columns. Fixes: #2907 Fixes: #3691 Tests: schema_change_test " * 'projects/cql3/alter-table-multi/v3' of https://github.com/bhalevy/scylla: cql3: schema_change_test: add test_multiple_columns_add_and_drop cql3: allow adding or dropping multiple columns in ALTER TABLE statement cql3: alter_table_statement: extract add/alter/drop per-column code into functions cql3: testing for MVs for alter_table_statement::type::drop is not per column cql3: schema_change_test: add test_static_column_is_dropped	2018-10-07 17:30:09 +01:00
Benny Halevy	0f350f5d59	cql3: schema_change_test: add test_multiple_columns_add_and_drop Add a unit test for adding or dropping multiple columns. See https://github.com/scylladb/scylla/issues/2907 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 19:14:29 +03:00
Benny Halevy	23fecc7e5e	cql3: allow adding or dropping multiple columns in ALTER TABLE statement Fixes #2907 Fixes #3691 See Cassandra reference: https://apache.googlesource.com/cassandra/+/cassandra-3.6/src/antlr/Parser.g /** * ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>; * ALTER COLUMN FAMILY <CF> ADD <column> <newtype>; \| ALTER COLUMN FAMILY <CF> ADD (<column> <newtype>,<column1> <newtype1>..... <column n> <newtype n>) * ALTER COLUMN FAMILY <CF> DROP <column>; \| ALTER COLUMN FAMILY <CF> DROP ( <column>,<column1>.....<column n>) * ALTER COLUMN FAMILY <CF> WITH <property> = <value>; * ALTER COLUMN FAMILY <CF> RENAME <column> TO <column>; / alterTableStatement returns [shared_ptr<alter_table_statement> expr] @init { alter_table_statement::type type; auto props = make_shared<cql3::statements::cf_prop_defs>(); std::vector<alter_table_statement::column_change> column_changes; std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>, shared_ptr<cql3::column_identifier::raw>>> renames; } : K_ALTER K_COLUMNFAMILY cf=columnFamilyName ( K_ALTER id=cident K_TYPE v=comparatorType { type = alter_table_statement::type::alter; column_changes.emplace_back(id, v); } \| K_ADD { type = alter_table_statement::type::add; } ( id1=cident v1=comparatorType b1=cfisStatic { column_changes.emplace_back(id1, v1, b1); } \| '(' id1=cident v1=comparatorType b1=cfisStatic { column_changes.emplace_back(id1, v1, b1); } (',' idn=cident vn=comparatorType bn=cfisStatic { column_changes.emplace_back(idn, vn, bn); } ) ')' ) \| K_DROP id=cident { type = alter_table_statement::type::drop; column_changes.emplace_back(id); } \| K_WITH properties[props] { type = alter_table_statement::type::opts; } \| K_RENAME { type = alter_table_statement::type::rename; } id1=cident K_TO toId1=cident { renames.emplace_back(id1, toId1); } ( K_AND idn=cident K_TO toIdn=cident { renames.emplace_back(idn, toIdn); } )* ) { $expr = ::make_shared<alter_table_statement>(std::move(cf), type, std::move(column_changes), std::move(props), std::move(renames)); } ; Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 19:14:26 +03:00
Benny Halevy	3fa6d3d3a8	cql3: alter_table_statement: extract add/alter/drop per-column code into functions In preparation to supporting ALTER TABLE with multiple columns (#3691) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 18:57:06 +03:00
Alexys Jacob	eebbae066a	dist/common/scripts/scylla_setup: fix gentoo linux installed package detection return code is expected to be 0 when installed package was found Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181002123433.4702-1-ultrabug@gentoo.org>	2018-10-07 16:46:02 +03:00
Alexys Jacob	850d046551	dist/common/scripts/scylla_ntp_setup: fix gentoo linux systemd service name fix typo as ntpd package systemd service is named ntpd, not sntpd Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181002123802.5576-1-ultrabug@gentoo.org>	2018-10-07 16:46:01 +03:00
Alexys Jacob	54151d2039	dist/common/scripts/scylla_cpuscaling_setup: fix file open mode for writing gentoo linux part tries to open the configuration file without the write flag, leading to an exception Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181002123957.6010-1-ultrabug@gentoo.org>	2018-10-07 16:46:00 +03:00
Avi Kivity	700994a4f2	Merge "Add GDB commands for examining gossiper and RPC state" from Tomasz * 'gdb-gms-netw' of github.com:tgrabiec/scylla: gdb: Introduce 'scylla netw' command gdb: Introduce 'scylla gms' command gdb: Add sharded service wrapper gdb: Add unique_ptr wrapper gdb: Add list_unordered_set() gdb: Make std_vector wrapper indexable gdb: Add wrapper for std_map	2018-10-07 16:42:52 +03:00
Vlad Zolotarov	7cbe5f2983	service: priority_manager.hh: add #pragma once Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181005040552.2183-3-vladz@scylladb.com>	2018-10-07 16:04:26 +03:00
Duarte Nunes	30d6ed8f92	service/storage_proxy: Consider target liveness in sent_to_endpoint() So we don't attempt to send mutations to unreachable endpoints and instead store a hint for them, we now check the endpoint status and populate dead_endpoints accordingly in storage_proxy::send_to_endpoint(). Fixes #3820 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181007100640.2182-1-duarte@scylladb.com>	2018-10-07 16:04:26 +03:00
Benny Halevy	581b9006d4	cql3: testing for MVs for alter_table_statement::type::drop is not per column No column can be dropped from a table with materialized views so the respective exception can ignore and omit the dropped column name. In preparation for refactoring the respective code, moving the per-column code to member functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 15:16:32 +03:00
Benny Halevy	8d298064b1	cql3: schema_change_test: add test_static_column_is_dropped Test dropping of static column defined in CREATE TABLE, and adding and dropping of a static column using ALTER TABLE. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 14:34:28 +03:00
Duarte Nunes	a69d468101	service/storage_proxy: Fix formatting of send_to_endpoint() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181006204756.32232-1-duarte@scylladb.com>	2018-10-07 11:05:32 +03:00
Vladimir Krivopalov	9db124c6e5	sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment. Without it, the reader will attempt to read further and may clobber the stored fragment with the next one read, if any. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-05 19:09:09 -07:00
Vladimir Krivopalov	8e004684e9	sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value. Instead of blindly proceeding, use whatever the call to maybe_push_*() has returned. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-05 19:05:03 -07:00
Duarte Nunes	b839f551cf	cql3/statements/select_statement: Don't double count unpaged queries Unpaged queries are those for which the client didn't enable paging, and we already account for them in indexed_table_select_statement::do_execute(). Remove the second increment in read_posting_list(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181003121811.11750-1-duarte@scylladb.com>	2018-10-05 17:36:39 +02:00
Nadav Har'El	e4ef7fc40a	materialized views: enable two tests in view_schema_test We had two commented out tests based on Cassandra's MV unit tests, for the case that the view's filter (the "SELECT" clause used to define the view) filtered by a non-primary-key column. These tests used to fail because of problems we had in the filtering code, but they now succeed, so we can enable them. This patch also adds some comments about what the tests do, and adds a few more cases to one of the tests. Refs #3430. However, note that the success of these tests does not really prove that the non-PK-column filtering feature works fully correctly and that issue forbidding it, as explained in https://issues.apache.org/jira/browse/CASSANDRA-13798. We can probably fix this feature with our "virtual cells" mechanism, but will need to add a test to confirm the possible problem and its (probably needed fix). We do not add such a test in this patch. In the meantime, issue #3430 should remain open: we still allow users to create MV with such a filter, and, as the tests in this patch show, this "mostly" works correctly. We just need to prove and/or fix what happens with the complex row liveness issues a la issue #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181004213637.32330-1-nyh@scylladb.com>	2018-10-04 22:43:38 +01:00
Tomasz Grabiec	3c7de9fee9	gms/gossiper: Replicate enpoint states in add_saved_endpoint()	2018-10-04 12:54:00 +02:00
Tomasz Grabiec	ddf3a61bcf	gms/gossiper: Make reset_endpoint_state_map() have effect on all shards	2018-10-04 12:53:56 +02:00
Tomasz Grabiec	9e3f744603	gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards Lack of this may result in non-zero shards on some nodes still seeing STATUS as NORMAL for a node which shut down, in some cases. mark_as_shutdown() is invoked in reaction to an RPC call initiated by the node which is shutting down. Another way a node can learn about other node shutting down is via gossiping with a node which knows this. In that case, the states will be replicated to non-zero shards. The node which learnt via mark_as_shutdown() may also eventually propagate this to non-zero shards, e.g. when it gossips about it with other nodes, and its local version number at the time of mark_as_shudown() was smaller than the one used to set the STATE by the shutting down node.	2018-10-04 12:51:42 +02:00
Tomasz Grabiec	c4ec81e126	gms/gossiper: Always override states from older generations Application states of each node are versioned per-node with a pair of generation number (more significant) and value version. Generation number uniquely identifies the life time of a scylla process. Generation number changes after restart. Value versions start from 0 on each restart. When a node gets updates for application states, it merges them with its view on given node. Value updates with older versions are ignored. Gossiper processes updates only on shard 0, and replicates value updates to other shards. When it sees a value with a new generation, it correclty forgets all previous values. However, non-zero shards don't forget values from previous generations. As a result, replication will fail to override the values on non-zero shards when generation number changes until their value version exceeds the version prior to the restart. This will result in incorrect STATUS for non-seed nodes on non-zero shards. When restarting a non-seed node, it will do a shadow gossip round before setting its STATUS to NORMAL. In the shadow round it will learn from other nodes about itself, and set its STATUS to shutdown on all shards with a high value version. Later, when it sets its status to NORMAL, it will override it only on shard 0, because on other shards the version of STATUS is higher. This will cause CQL truncate to skip current node if the coordinator runs on non-zero shards. The fix is to override the entries on remote shards in the same way we do on shard 0. All updates to endpoint states should be already serialized on shard 0, and remote shards should see them in the same order. Introduced in `2d5fb9d` Fixes #3798 Fixes #3694	2018-10-04 12:47:27 +02:00
Piotr Sarna	a5570cb288	tests: add missing get() calls in threaded context One test case missed a few get() calls in order to wait for continuations, which only accidentally worked, because it was followed by 'eventually()' blocks. Message-Id: <69c145575ac81154c4b5f500d01c6b045a267088.1536839959.git.sarna@scylladb.com>	2018-10-04 10:55:45 +01:00
Piotr Sarna	8a2abd45fb	tests: add collections test for secondary indexing Test case regarding creating indexes on collection columns is added to the suite. Refs #3654 Refs #2962 Message-Id: <1b6844634b6e9a353028545813571647c92fb330.1536839959.git.sarna@scylladb.com>	2018-10-04 10:55:45 +01:00
Piotr Sarna	2d355bdf47	cql3: prevent creation of indexes on non-frozen collections Until indexes for non-frozen collections is implemented, creating such indexes should be disallowed to prevent unnecessary errors on insertions/selections. Fixes #3653 Refs #2962 Message-Id: <218cf96d5e38340806fb9446b8282d2296ba5f43.1536839959.git.sarna@scylladb.com>	2018-10-04 10:55:45 +01:00
Duarte Nunes	959559d568	cql3/statements/select_statement: Remove outdated comment Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181003193033.13862-1-duarte@scylladb.com>	2018-10-04 09:45:17 +03:00
Eliran Sinvani	20f49566a2	cql3 : add workaround to antlr3 null dereference bug The Antlr3 exception class has a null dereference bug that crashes the system when trying to extract the exception message using ANTLR_Exception<...>::displayRecognitionError(...) function. When a parsing error occurs the CqlParser throws an exception which in turn processesed for some special cases in scylla to generate a custom message. The default case however, creates the message using displayRecognitionError, causing the system to crash. The fix is a simple workaround, making sure the pointer is not null before the call to the function. A "proper" fix can't be implemented because the exception class itself is implemented outside scylla in antlr headers that resides on the host machine os. Tested manualy 2 testcases, a typo causing scylla to crash and a cql comment without a newline at the end also caused scylla to crash. Ran unit tests (release). Fixes #3740 Fixes #3764 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <cfc7e0d758d7a855d113bb7c8191b0fd7d2e8921.1538566542.git.eliransin@scylladb.com>	2018-10-03 18:30:06 +03:00
Tomasz Grabiec	9c57abcce7	gossiper: Fix shutdown_announce_in_ms not being respected shutdown_announce_in_ms specifies a period of time that a node which is shutting down waits to allow its state to propagate to other nodes. However, we were setting _enabled to false before waiting, which will make the current node ignore gossip messages. Message-Id: <1538576996-26283-1-git-send-email-tgrabiec@scylladb.com>	2018-10-03 15:43:00 +01:00
Tomasz Grabiec	fda8e271e3	gdb: Introduce 'scylla netw' command Prints information about the state of the messaging service layer. Example: (gdb) scylla netw Dropped messages: {0 <repeats 25 times>} Outgoing connections: IP: 127.0.0.2, (netw::messaging_service::rpc_protocol_client_wrapper*) 0x6000051cd220: stats: {replied = 0, pending = 0, exception_received = 0, sent_messages = 23, wait_reply = 0, timeout = 0} outstanding: 0 Server: resources={_count = 85899345, _ex = {_M_exception_object = 0x0}, _wait_list = {_list = {_front_chunk = 0x0, _back_chunk = 0x0, _nchunks = 0, _free_chunks = 0x0, _nfree_chunks = 0}, _on_expiry = {<No data fields>}, _size = 0}} Incoming connections: 127.0.0.1:28071: {replied = 0, pending = 0, exception_received = 0, sent_messages = 2, wait_reply = 0, timeout = 0}	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	cf07cda08f	gdb: Introduce 'scylla gms' command Prints gossiper state. Example: (gdb) scylla gms 127.0.0.2: (gms::endpoint_state) 0x6010050c0550 ({_generation = 1538568389, _version = 2147483647}) gms::application_state::STATUS: {version=18, value="NORMAL,968364964011550971"} gms::application_state::LOAD: {version=267, value="494510"} gms::application_state::SCHEMA: {version=13, value="27e48f6a-a668-398a-b2f5-cf4b905450e9"} gms::application_state::DC: {version=10, value="datacenter1"} gms::application_state::RACK: {version=11, value="rack1"} gms::application_state::RELEASE_VERSION: {version=4, value="3.0.8"} gms::application_state::RPC_ADDRESS: {version=3, value="127.0.0.2"} gms::application_state::NET_VERSION: {version=1, value="0"} gms::application_state::HOST_ID: {version=2, value="ee281b83-1acb-4aa3-927c-985a7d9a7c6f"} 127.0.0.1: (gms::endpoint_state) 0x6010051422b0 ({_generation = 1538557402, _version = 0}) gms::application_state::STATUS: {version=18, value="NORMAL,9176584852507611499"} gms::application_state::LOAD: {version=22521, value="409817"} gms::application_state::SCHEMA: {version=13, value="27e48f6a-a668-398a-b2f5-cf4b905450e9"} gms::application_state::DC: {version=10, value="datacenter1"} gms::application_state::RACK: {version=11, value="rack1"} gms::application_state::RELEASE_VERSION: {version=4, value="3.0.8"} gms::application_state::RPC_ADDRESS: {version=3, value="127.0.0.1"} gms::application_state::NET_VERSION: {version=1, value="0"} gms::application_state::HOST_ID: {version=2, value="88ff543f-e9b8-42eb-a876-c0f917078a31"}	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	8c6f8b1773	gdb: Add sharded service wrapper	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	4adfed9dba	gdb: Add unique_ptr wrapper	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	e29e302272	gdb: Add list_unordered_set()	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	272bc88699	gdb: Make std_vector wrapper indexable	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	b436759d49	gdb: Add wrapper for std_map	2018-10-03 15:05:22 +02:00
Pekka Enberg	de48966abc	cql3: Move as_json_function class to separate file The as_json_function class is not registered as a function, but we can still keep it cql3/functions, as per its namespace, to reduce the size of select_statement.cc. Message-Id: <20181002132637.30233-1-penberg@scylladb.com>	2018-10-03 13:30:08 +01:00
Piotr Sarna	4a23297117	cql3: add asking for pk/ck in the base query Base query partition and clustering keys are used to generate paging state for an index query, so they always need to be present when a paged base query is processed. Message-Id: <f3bf69453a6fd2bc842c8bdbd602d62c91cf9218.1538568953.git.sarna@scylladb.com>	2018-10-03 13:26:51 +01:00
Piotr Sarna	50d3de0693	cql3: add checking for may_need_paging when executing base query It's not sufficient to check for positive page_size when preparing a base query for indexed select statement - may_need_paging() should be called as well. Message-Id: <d435820019e4082a64ca9807541f0c9ad334e6a8.1538568953.git.sarna@scylladb.com>	2018-10-03 13:26:51 +01:00
Piotr Sarna	11b8831c04	cql3: move base query command creation to a separate function Message-Id: <6b48b8cbd6312da4a17bfd3c85af628b4215e9f4.1538568953.git.sarna@scylladb.com>	2018-10-03 13:26:51 +01:00
Avi Kivity	7c8143c3c4	Revert "compaction: demote compaction start/end messages to DEBUG level" This reverts commit `b443a9b930`. The compaction history table doesn't have enough information to be a replacement for this log message yet.	2018-10-03 13:13:37 +03:00
Avi Kivity	b9702222f8	Merge "Handle simple column type schema changes in SST3" from Piotr " This patchset enables very simple column type conversions. It covers only handling variable and fixed size type differences. Two types still have to be compatiple on bits level to be able to convert a field from one to the other. " * 'haaawk/sst3/column_type_schema_change/v4' of github.com:scylladb/seastar-dev: Fix check_multi_schema to actually check the column type change Handle very basic column type conversions in SST3 Enable check_multi_schema for SST3	2018-10-03 13:12:10 +03:00
Piotr Jastrzebski	3a60eac1d5	Fix check_multi_schema to actually check the column type change Field 'e' was supposed to be read as blob but the test had a bug and the read schema was treating that field as int. This patch changes that and makes the test really check column type change. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-03 10:56:40 +02:00
Piotr Jastrzebski	3cecb61ac1	Handle very basic column type conversions in SST3 After this change very simple schema changes of column type will work. This change makes sure that variable size and fixed size types can be converted to each other but only if their bit representation can be automatically converted between those types. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-03 10:56:40 +02:00
Piotr Jastrzebski	c117a6b3c8	Enable check_multi_schema for SST3 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-03 10:56:39 +02:00
Nadav Har'El	bebe5b5df2	materialized views: add view_updates_pending statistic We are already maintaining a statistic of the number of pending view updates sent but but not yet completed by view replicas, so let's expose it. As all per-table statistics, also this one will only be exposed if the "--enable-keyspace-column-family-metrics" option is on. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-10-02 20:44:58 +01:00
Nadav Har'El	1d5f8d0015	materialized views: update stats.write statistics in all cases mutate_MV usually calls send_to_endpoint() to push view update to remote view replicas. This function gets passed a statistics object, service::storage_proxy_stats::write_stats and, in particular, updates its "writes" statistic which counts the number of ongoing writes. In the case that the paired view replica happens to be the same node, we avoid calling send_to_endpoint() and call mutate_locally() instead. That function does not take a write_stats object, so the "writes" statistic doesn't get incremented for the duration of the write. So we should do this explicitly. Co-authored-by: Nadav Har'El <nyh@scylladb.com> Co-authored-by: Duarte Nunes <duarte@scylladb.com>	2018-10-02 20:44:58 +01:00
Duarte Nunes	40a30d4129	db/schema_tables: Diff tables using ID instead of name Currently we diff schemas based on table/view name, and if the names match, then we detect altered schemas by comparing the schema mutations. This fails to detect transitions which involve dropping and recreating a schema with the same name, if a node receives these notifications simultaneously (for example, if the node was temporarily down or partitioned). Note that because the ID is persisted and created when executing a create_table_statement, then even if a schema is re-created with the exact same structure as before, we will still considered it altered because the mutations will differ. This also stops schema pulling from working, since it relies on schema merging. The solution is to diff schemas using their ID, and not their name. Keyspaces and user types are also susceptible to this, but in their case it's fine: these are values with no identity, and are just metadata. Dropping and recreating a keyspace can be views as dropping all tables from the keyspace, altering it, and eventually adding new tables to the keyspace. Note that this solution doesn't apply to tables dropped and created with the same ID (using the `WITH ID = {}` syntax). For that, we would need to detect deltas instead of applying changes and then reading the new state to find differences. However, this solution is enough, because tables are usually created with ID = {} for very specific, peculiar reasons. The original motivation meant for the new table to be treated exactly as the old, so the current behavior is in fact the desired one. Tests: unit(release), dtests(schema_test, schema_management_test) Fixes #3797 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181001230932.47153-2-duarte@scylladb.com>	2018-10-02 20:15:46 +02:00
Duarte Nunes	e404f09a23	db/schema_tables: Drop tables before creating new ones Doing it by the inverse order doesn't support dropping and creating a schema with the same name. Refs #3797 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181001230932.47153-1-duarte@scylladb.com>	2018-10-02 20:15:32 +02:00
Avi Kivity	aaab8a3f46	utils: crc32: mark power crc32 assembly as not requiring an executable stack The linker uses an opt-in system for non-executable stack: if all object files opt into a non-executable stack, the binary will have a non-executable stack, which is very desirable for security. The compiler cooperates by opting into a non-executable stack whenever possible (always for our code). However, we also have an assembly file (for fast power crc32 computations). Since it doesn't opt into a non-executable stack, we get a binary with executable stack, which Gentoo's build system rightly complains about. Fix by adding the correct incantation to the file. Fixes #3799. Reported-by: Alexys Jacob <ultrabug@gmail.com> Message-Id: <20181002151251.26383-1-avi@scylladb.com>	2018-10-02 18:48:23 +01:00
Avi Kivity	53a4b8ae86	Update seastar submodule * seastar 5712816...71e914e (12): > Merge "rpc shard to shard connection" from Gleb > Merge "Fix memory leaks when stoppping memcached" from Tomasz > scripts: perftune.py: prioritize I/O schedulers > alien: fix the size of local item[] > seastar-addr2line: don't invoke addr2line multiple times > reactor: use labels for different io_priority_class:s > util/spinlock: fix bad namespacing of <xmmintrin.h> > Merge "scripts: perftune.py: support different I/O schedulers" from Vlad > timer: Do not require callback to be copyable > core/reactor: Fix hang on shutdown with long task quota > build: use 'ppa:scylladb/ppa' instead of URL for sourceline > net/dns: add net::dns::get_srv_records() helper	2018-10-02 18:48:23 +01:00
Avi Kivity	7322ac105c	Merge "sstables_stats" from Benny " This patchset adds sstable partition/row read/write/seek statistics. Tests: dtest sstable_generation_loading_test.py stress_tool_test.py Fixes: #251 " * 'projects/sstables-stats/v5' of https://github.com/bhalevy/scylla: sstables stats: row reads sstables stats: partition seeks sstables stats: partition reads sstables stats: flat mutation reads sstables stats: cell/cell_tombstone writes sstables stats: partition/row/tombstone writes sstables_stats: writer_impl: move common members to base class	2018-10-02 15:05:10 +03:00
Duarte Nunes	7ba944a243	service/migration_manager: Validate duplicate ID in time We allow tables to be created with the ID property, mostly for advanced recovery cases. However, we need to validate that the ID doesn't match an existing one. We currently do this in database::add_column_family(), but this is already too late in the normal workflow: if we allow the schema change to go through, then it is applied to the system tables and loaded the next time the node boots, regardless of us throwing from database::add_column_family(). To fix this, we perform this validation when announcing a new table. Note that the check wasn't removed from database::add_column_family(); it's there since 2015 and there might have been other reasons to add it that are not related to the ID property. Refs #2059 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181001230142.46743-1-duarte@scylladb.com>	2018-10-02 13:40:40 +03:00
Benny Halevy	bd6533f471	sstables stats: row reads Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:43 +03:00
Benny Halevy	192c1949a3	sstables stats: partition seeks Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:43 +03:00
Benny Halevy	edb3c23125	sstables stats: partition reads Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:43 +03:00
Benny Halevy	e9dffa56c8	sstables stats: flat mutation reads Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:43 +03:00
Benny Halevy	4ccdc1115d	sstables stats: cell/cell_tombstone writes Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:41 +03:00
Benny Halevy	2f48f72d5c	sstables stats: partition/row/tombstone writes Introduce per-thread sstables stats infrastructure Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:01:14 +03:00
Benny Halevy	6853c1677d	sstables_stats: writer_impl: move common members to base class To be used by sstable_writer for stats collection. Note that this patch is factored out so it can be verified with no other change in functionality. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:01:00 +03:00

4789 changed files with 134679 additions and 44474 deletions

4

.dockerignore Normal file

View File

@@ -0,0 +1,4 @@
 .git
 build
 seastar/build
 testlog

									
										4

.github/PULL_REQUEST_TEMPLATE.md
									
										vendored
									
												View File
											
				@@ -1,4 +0,0 @@

				Scylla doesn't use pull-requests, please send a patch to the [mailing list](mailto:scylladb-dev@googlegroups.com) instead.

				See our [contributing guidelines](../CONTRIBUTING.md) and our [Scylla development guidelines](../HACKING.md) for more information.

				If you have any questions please don't hesitate to send a mail to the [dev list](mailto:scylladb-dev@googlegroups.com).

5

.gitignore vendored

View File

@@ -19,3 +19,8 @@ CMakeLists.txt.user
 __pycache__CMakeLists.txt.user
 .gdbinit
 resources
 .pytest_cache
 /expressions.tokens
 tags
 testlog/*
 test/*/*.reject

6

.gitmodules vendored

View File

@@ -12,3 +12,9 @@
 [submodule "libdeflate"]
 	path = libdeflate
 	url = ../libdeflate
 [submodule "zstd"]
 	path = zstd
 	url = ../zstd
 [submodule "abseil"]
 	path = abseil
 	url = ../abseil-cpp

									
										33

CMakeLists.txt
									
												View File
												
				@@ -5,13 +5,25 @@

				cmake_minimum_required(VERSION 3.7)

				project(scylla)

				if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)

				  message(STATUS "Setting build type to 'Release' as none was specified.")

				  set(CMAKE_BUILD_TYPE "Release" CACHE

				      STRING "Choose the type of build." FORCE)

				  # Set the possible values of build type for cmake-gui

				  set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS

				    "Debug" "Release" "Dev" "Sanitize")

				endif()

				if(CMAKE_BUILD_TYPE)

				    string(TOLOWER "${CMAKE_BUILD_TYPE}" BUILD_TYPE)

				else()

				    set(BUILD_TYPE "release")

				endif()

				if (NOT DEFINED FOR_IDE AND NOT DEFINED ENV{FOR_IDE} AND NOT DEFINED ENV{CLION_IDE})

				    message(FATAL_ERROR "This CMakeLists.txt file is only valid for use in IDEs, please define FOR_IDE to acknowledge this.")

				endif()

				# Default value. A more accurate list is populated through `pkg-config` below if `seastar.pc` is available.

				set(SEASTAR_INCLUDE_DIRS "seastar")

				# These paths are always available, since they're included in the repository. Additional DPDK headers are placed while

				# Seastar is built, and are captured in `SEASTAR_INCLUDE_DIRS` through parsing the Seastar pkg-config file (below).

				set(SEASTAR_DPDK_INCLUDE_DIRS

				@@ -22,9 +34,14 @@ set(SEASTAR_DPDK_INCLUDE_DIRS

				find_package(PkgConfig REQUIRED)

				set(ENV{PKG_CONFIG_PATH} "${CMAKE_SOURCE_DIR}/seastar/build/release:$ENV{PKG_CONFIG_PATH}")

				set(ENV{PKG_CONFIG_PATH} "${CMAKE_SOURCE_DIR}/build/${BUILD_TYPE}/seastar:$ENV{PKG_CONFIG_PATH}")

				pkg_check_modules(SEASTAR seastar)

				if(NOT SEASTAR_INCLUDE_DIRS)

				    # Default value. A more accurate list is populated through `pkg-config` below if `seastar.pc` is available.

				    set(SEASTAR_INCLUDE_DIRS "seastar/include")

				endif()

				find_package(Boost COMPONENTS filesystem program_options system thread)

				##

				@@ -70,7 +87,7 @@ scan_scylla_source_directories(

				          seastar/json

				          seastar/net

				          seastar/rpc

				          seastar/tests

				          seastar/testing

				          seastar/util)

				scan_scylla_source_directories(

				@@ -97,7 +114,7 @@ scan_scylla_source_directories(

				          service

				          sstables

				          streaming

				          tests

				          test

				          thrift

				          tracing

				          transport

				@@ -106,7 +123,7 @@ scan_scylla_source_directories(

				scan_scylla_source_directories(

				        VAR SCYLLA_GEN_SOURCE_FILES

				        RECURSIVE

				        PATHS build/release/gen)

				        PATHS build/${BUILD_TYPE}/gen)

				set(SCYLLA_SOURCE_FILES

				        ${SCYLLA_ROOT_SOURCE_FILES}

				@@ -139,4 +156,4 @@ target_include_directories(scylla PUBLIC

				        ${Boost_INCLUDE_DIRS}

				        xxhash

				        libdeflate

				        build/release/gen)

				        build/${BUILD_TYPE}/gen)

									
										2

CONTRIBUTING.md
									
												View File
												
				@@ -1,6 +1,6 @@

				# Asking questions or requesting help

				Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) for general questions and help.

				Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.

				# Reporting an issue

									
										97

HACKING.md
									
												View File
												
				@@ -22,9 +22,20 @@ Scylla depends on the system package manager for its development dependencies.

				Running `./install-dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.

				On Ubuntu and Debian based Linux distributions, some packages

				required to build Scylla are missing in the official upstream:

				- libthrift-dev and libthrift

				- antlr3-c++-dev

				Try running ```sudo ./scripts/scylla_current_repo``` to add Scylla upstream,

				and get the missing packages from it.

				### Build system

				**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native thread, and up to 3 GB per native thread while linking.

				**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native

				thread, and up to 3 GB per native thread while linking. GCC >= 8.1.1. is

				required.

				Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.

				@@ -43,11 +54,9 @@ The full suite of options for project configuration is available via

				$ ./configure.py --help

				```

				The most important options are:

				The most important option is:

				- `--mode={release,debug,all}`: Debug mode enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) and allows for debugging with tools like GDB. Debugging builds are generally slower and generate much larger object files than release builds.

				- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.

				- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.

				Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.

				@@ -55,6 +64,30 @@ To save time -- for instance, to avoid compiling all unit tests -- you can also

				```bash

				$ ninja-build build/release/tests/schema_change_test

				$ ninja-build build/release/service/storage_proxy.o

				```

				You can also specify a single mode. For example

				```bash

				$ ninja-build release

				```

				Will build everytihng in release mode. The valid modes are

				* Debug: Enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)

				  and other sanity checks. It has no optimizations, which allows for debugging with tools like

				  GDB. Debugging builds are generally slower and generate much larger object files than release builds.

				* Release: Fewer checks and more optimizations. It still has debug info.

				* Dev: No optimizations or debug info. The objective is to compile and link as fast as possible.

				  This is useful for the first iterations of a patch.

				Note that by default unit tests binaries are stripped so they can't be used with gdb or seastar-addr2line.

				To include debug information in the unit test binary, build the test binary with a `_g` suffix. For example,

				```bash

				$ ninja-build build/release/tests/schema_change_test_g

				```

				### Unit testing

				@@ -83,7 +116,7 @@ The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread

				### Preparing patches

				All changes to Scylla are submitted as patches to the public mailing list. Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.

				All changes to Scylla are submitted as patches to the public [mailing list](mailto:scylladb-dev@googlegroups.com). Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.

				Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/). There are also some guidelines that can help you make the patch review process smoother:

				@@ -108,10 +141,12 @@ In v3:

				"Tests: unit ({mode}), dtest ({smp})"

				```

				The usual is "Tests: unit (release)", although running debug tests is encouraged.

				The usual is "Tests: unit (dev)", although running debug tests is encouraged.

				5. When answering review comments, prefer inline quotes as they make it easier to track the conversation across multiple e-mails.

				6. The Linux kernel's [Submitting Patches](https://www.kernel.org/doc/html/v4.19/process/submitting-patches.html) document offers excellent advice on how to prepare patches and patchsets for review. Since the Scylla development process is derived from the kernel's, almost all of the advice there is directly applicable.

				### Finding a person to review and merge your patches

				You can use the `scripts/find-maintainer` script to find a subsystem maintainer and/or reviewer for your patches. The script accepts a filename in the git source tree as an argument and outputs a list of subsystems the file belongs to and their respective maintainers and reviewers. For example, if you changed the `cql3/statements/create_view_statement.hh` file, run the script as follows:

				@@ -164,6 +199,29 @@ On a development machine, one might run Scylla as

				$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes

				```

				To interact with scylla it is recommended to build our versions of

				cqlsh and nodetool. They are available at

				https://github.com/scylladb/scylla-tools-java and can be built with

				```bash

				$ sudo ./install-dependencies.sh

				$ ant jar

				```

				cqlsh should work out of the box, but nodetool depends on a running

				scylla-jmx (https://github.com/scylladb/scylla-jmx). It can be build

				with

				```bash

				$ mvn package

				```

				and must be started with

				```bash

				$ ./scripts/scylla-jmx

				```

				### Branches and tags

				Multiple release branches are maintained on the Git repository at https://github.com/scylladb/scylla. Release 1.5, for instance, is tracked on the `branch-1.5` branch.

				@@ -254,7 +312,7 @@ In this example, `10.0.0.2` will be sent up to 16 jobs and the local machine wil

				When a compilation is in progress, the status of jobs on all remote machines can be visualized in the terminal with `distccmon-text` or graphically as a GTK application with `distccmon-gnome`.

				One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next section speeding up this process.

				One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next sections speeding up this process.

				### Using the `gold` linker

				@@ -264,6 +322,24 @@ Linking Scylla can be slow. The gold linker can replace GNU ld and often speeds

				$ sudo alternatives --config ld

				```

				### Using split dwarf

				With debug info enabled, most of the link time is spent copying and

				relocating it. It is possible to leave most of the debug info out of

				the link by writing it to a side .dwo file. This is done by passing

				`-gsplit-dwarf` to gcc.

				Unfortunately just `-gsplit-dwarf` would slow down `gdb` startup. To

				avoid that the gold linker can be told to create an index with

				`--gdb-index`.

				More info at https://gcc.gnu.org/wiki/DebugFission.

				Both options can be enable by passing `--split-dwarf` to configure.py.

				Note that distcc is *not* compatible with it, but icecream

				(https://github.com/icecc/icecream) is.

				### Testing changes in Seastar with Scylla

				Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.

				@@ -277,3 +353,8 @@ $ git remote add local /home/tsmith/src/seastar

				$ git remote update

				$ git checkout -t local/my_local_seastar_branch

				```

				### Core dump debugging

				Slides:

				2018.11.20: https://www.slideshare.net/tomekgrabiec/scylla-core-dump-debugging-tools

31

MAINTAINERS

View File

@@ -5,8 +5,6 @@ F: Filename, directory, or pattern for the subsystem
 ---
 AUTH
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Calle Wilund <calle@scylladb.com>
 R: Vlad Zolotarov <vladz@scylladb.com>
 R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
@@ -14,22 +12,17 @@ F: auth/*
 CACHE
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 R: Piotr Jastrzebski <piotr@scylladb.com>
 F: row_cache*
 F: *mutation*
 F: tests/mvcc*
 COMMITLOG / BATCHLOGa
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Calle Wilund <calle@scylladb.com>
 F: db/commitlog/*
 F: db/batch*
 COORDINATOR
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Gleb Natapov <gleb@scylladb.com>
 F: service/storage_proxy*
@@ -49,12 +42,10 @@ M: Pekka Enberg <penberg@scylladb.com>
 F: cql3/*
 COUNTERS
 M: Paweł Dziepak <pdziepak@scylladb.com>
 F: counters*
 F: tests/counter_test*
 GOSSIP
 M: Duarte Nunes <duarte@scylladb.com>
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 R: Asias He <asias@scylladb.com>
 F: gms/*
@@ -65,14 +56,11 @@ F: dist/docker/*
 LSA
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 F: utils/logalloc*
 MATERIALIZED VIEWS
 M: Duarte Nunes <duarte@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 R: Duarte Nunes <duarte@scylladb.com>
 M: Nadav Har'El <nyh@scylladb.com>
 F: db/view/*
 F: cql3/statements/*view*
@@ -82,14 +70,12 @@ F: dist/*
 REPAIR
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Asias He <asias@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 F: repair/*
 SCHEMA MANAGEMENT
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
 F: db/schema_tables*
 F: db/legacy_schema_migrator*
@@ -98,15 +84,13 @@ F: schema*
 SECONDARY INDEXES
 M: Pekka Enberg <penberg@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 M: Nadav Har'El <nyh@scylladb.com>
 R: Pekka Enberg <penberg@scylladb.com>
 F: db/index/*
 F: cql3/statements/*index*
 SSTABLES
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Raphael S. Carvalho <raphaelsc@scylladb.com>
 R: Glauber Costa <glauber@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
@@ -114,18 +98,17 @@ F: sstables/*
 STREAMING
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Asias He <asias@scylladb.com>
 F: streaming/*
 F: service/storage_service.*
 THRIFT TRANSPORT LAYER
 M: Duarte Nunes <duarte@scylladb.com>
 F: thrift/*
 ALTERNATOR
 M: Nadav Har'El <nyh@scylladb.com>
 F: alternator/*
 F: alternator-test/*
 THE REST
 M: Avi Kivity <avi@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Nadav Har'El <nyh@scylladb.com>
 F: *

									
										29

README-DPDK.md
									
												View File
											
				@@ -1,29 +0,0 @@

				Seastar and DPDK

				================

				Seastar uses the Data Plane Development Kit to drive NIC hardware directly.  This

				provides an enormous performance boost.

				To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a

				run-time parameter.  This will use the DPDK package provided as a git submodule with the

				seastar sources.

				To use your own self-compiled DPDK package, follow this procedure:

				1. Setup host to compile DPDK:

				   - Ubuntu 

				     `sudo apt-get install -y build-essential linux-image-extra-$(uname -r)` 

				2. Prepare a DPDK SDK:

				   - Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`

				   - Untar it.

				   - Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.

				   - For DPDK 1.7.x: edit config/common_linuxapp: 

				     - Set CONFIG_RTE_LIBRTE_PMD_BOND  to 'n'.

				     - Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.

				     - Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.

				   - Start the tools/setup.sh script as root.

				   - Compile a linuxapp target (option 9).

				   - Install IGB_UIO module (option 11).

				   - Bind some physical port to IGB_UIO (option 17).

				   - Configure hugepage mappings (option 14/15).

				3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.

									
										66

README.md
									
												View File
												
				@@ -2,17 +2,23 @@

				## Quick-start

				To get the build going quickly, Scylla offers a [frozen toolchain](tools/toolchain/README.md)

				which would build and run Scylla using a pre-configured Docker image.

				Using the frozen toolchain will also isolate all of the installed

				dependencies in a Docker container.

				Assuming you have met the toolchain prerequisites, which is running

				Docker in user mode, building and running is as easy as:

				```bash

				$ git submodule update --init --recursive

				$ sudo ./install-dependencies.sh

				$ ./configure.py --mode=release

				$ ninja-build -j4 # Assuming 4 system threads.

				$ ./build/release/scylla

				$ # Rejoice!

				```

				$ ./tools/toolchain/dbuild ./configure.py

				$ ./tools/toolchain/dbuild ninja build/release/scylla

				$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1

				 ```

				Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.

				**Note**: GCC >= 8.1.1 is required to compile Scylla.

				## Running Scylla

				* Run Scylla

				@@ -21,10 +27,10 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev

				```

				* run Scylla with one CPU and ./tmp as data directory

				* run Scylla with one CPU and ./tmp as work directory

				```

				./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1

				./build/release/scylla --workdir tmp --smp 1

				```

				* For more run options:

				@@ -32,31 +38,34 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev

				./build/release/scylla --help

				```

				## Building Fedora RPM

				## Testing

				As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:

				See [test.py manual](docs/testing.md).

				```

				# Install mock:

				sudo yum install mock

				## Scylla APIs and compatibility

				By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and

				Thrift. There is also experimental support for the API of Amazon DynamoDB,

				but being experimental it needs to be explicitly enabled to be used. For more

				information on how to enable the experimental DynamoDB compatibility in Scylla,

				and the current limitations of this feature, see

				[Alternator](docs/alternator/alternator.md) and

				[Getting started with Alternator](docs/alternator/getting-started.md).

				# Add user to the "mock" group:

				usermod -a -G mock $USER && newgrp mock

				```

				## Documentation

				Then, to build an RPM, run:

				Documentation can be found in [./docs](./docs) and on the

				[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear

				definition of what goes where, so when looking for something be sure to check

				both.

				Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).

				User documentation can be found [here](https://docs.scylladb.com/).

				```

				./dist/redhat/build_rpm.sh

				```

				## Training 

				The built RPM is stored in the ``build/mock/<configuration>/result`` directory.

				For example, on Fedora 21 mock reports the following:

				```

				INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds

				INFO: Results and/or logs in: build/mock/fedora-21-x86_64/result

				```

				Training material and online courses can be found at [Scylla University](https://university.scylladb.com/). 

				The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, 

				administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, 

				multi-datacenters and how Scylla integrates with third-party applications.

				## Building Fedora-based Docker image

				@@ -75,4 +84,5 @@ docker run -p $(hostname -i):9042:9042 -i -t <image name>

				## Contributing to Scylla

				[Hacking howto](HACKING.md)

				[Guidelines for contributing](CONTRIBUTING.md)

12

SCYLLA-VERSION-GEN

View File

@@ -1,6 +1,7 @@
 #!/bin/sh
 VERSION=3.0.11
 PRODUCT=scylla
 VERSION=4.0.11
 if test -f version
 then
@@ -18,7 +19,16 @@ else
 	SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
 fi
 if [ -f build/SCYLLA-RELEASE-FILE ]; then
 	RELEASE_FILE=$(cat build/SCYLLA-RELEASE-FILE)
 	GIT_COMMIT_FILE=$(cat build/SCYLLA-RELEASE-FILE |cut -d . -f 3)
 	if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
 		exit 0
 	fi
 fi
 echo "$SCYLLA_VERSION-$SCYLLA_RELEASE"
 mkdir -p build
 echo "$SCYLLA_VERSION" > build/SCYLLA-VERSION-FILE
 echo "$SCYLLA_RELEASE" > build/SCYLLA-RELEASE-FILE
 echo "$PRODUCT" > build/SCYLLA-PRODUCT-FILE

1

abseil Submodule

Submodule abseil added at 2069dc796a

									
										147

alternator/auth.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,147 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "alternator/error.hh"

				#include "log.hh"

				#include <string>

				#include <string_view>

				#include <gnutls/crypto.h>

				#include <seastar/util/defer.hh>

				#include "hashers.hh"

				#include "bytes.hh"

				#include "alternator/auth.hh"

				#include <fmt/format.h>

				#include "auth/common.hh"

				#include "auth/password_authenticator.hh"

				#include "auth/roles-metadata.hh"

				#include "cql3/query_processor.hh"

				#include "cql3/untyped_result_set.hh"

				namespace alternator {

				static logging::logger alogger("alternator-auth");

				static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {

				    hmac_sha256_digest digest;

				    int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());

				    if (ret) {

				        throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));

				    }

				    return digest;

				}

				static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {

				    auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);

				    auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);

				    auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);

				    auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");

				    return signing;

				}

				static std::string apply_sha256(std::string_view msg) {

				    sha256_hasher hasher;

				    hasher.update(msg.data(), msg.size());

				    return to_hex(hasher.finalize());

				}

				static std::string format_time_point(db_clock::time_point tp) {

				    time_t time_point_repr = db_clock::to_time_t(tp);

				    std::string time_point_str;

				    time_point_str.resize(17);

				    ::tm time_buf;

				    // strftime prints the terminating null character as well

				    std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));

				    time_point_str.resize(16);

				    return time_point_str;

				}

				void check_expiry(std::string_view signature_date) {

				    //FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it

				    std::string expiration_str = format_time_point(db_clock::now() - 15min);

				    std::string validity_str = format_time_point(db_clock::now() + 15min);

				    if (signature_date < expiration_str) {

				        throw api_error("InvalidSignatureException",

				                fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",

				                signature_date, expiration_str));

				    }

				    if (signature_date > validity_str) {

				        throw api_error("InvalidSignatureException",

				                fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",

				                signature_date, validity_str));

				    }

				}

				std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,

				        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,

				        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {

				    auto amz_date_it = signed_headers_map.find("x-amz-date");

				    if (amz_date_it == signed_headers_map.end()) {

				        throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");

				    }

				    std::string_view amz_date = amz_date_it->second;

				    check_expiry(amz_date);

				    std::string_view datestamp = amz_date.substr(0, 8);

				    if (datestamp != orig_datestamp) {

				        throw api_error("InvalidSignatureException",

				                format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",

				                        orig_datestamp, datestamp));

				    }

				    std::string_view canonical_uri = "/";

				    std::stringstream canonical_headers;

				    for (const auto& header : signed_headers_map) {

				        canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';

				    }

				    std::string payload_hash = apply_sha256(body_content);

				    std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);

				    std::string_view algorithm = "AWS4-HMAC-SHA256";

				    std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);

				    std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope,  apply_sha256(canonical_request));

				    hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);

				    hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);

				    return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));

				}

				future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {

				    static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",

				            auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);

				    auto cl = auth::password_authenticator::consistency_for_user(username);

				    auto timeout = auth::internal_distributed_timeout_config();

				    return qp.execute_internal(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {

				        auto res = f.get0();

				        auto salted_hash = std::optional<sstring>();

				        if (res->empty()) {

				            throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));

				        }

				        salted_hash = res->one().get_opt<sstring>("salted_hash");

				        if (!salted_hash) {

				            throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));

				        }

				        return make_ready_future<std::string>(*salted_hash);

				    });

				}

				}

									
										46

alternator/auth.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,46 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string>

				#include <string_view>

				#include <array>

				#include "gc_clock.hh"

				#include "utils/loading_cache.hh"

				namespace cql3 {

				class query_processor;

				}

				namespace alternator {

				using hmac_sha256_digest = std::array<char, 32>;

				using key_cache = utils::loading_cache<std::string, std::string>;

				std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,

				        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,

				        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);

				future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);

				}

									
										111

alternator/base64.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,111 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				// The DynamoAPI dictates that "binary" (a.k.a. "bytes" or "blob") values

				// be encoded in the JSON API as base64-encoded strings. This is code to

				// convert byte arrays to base64-encoded strings, and back.

				#include "base64.hh"

				#include <ctype.h>

				// Arrays for quickly converting to and from an integer between 0 and 63,

				// and the character used in base64 encoding to represent it.

				static class base64_chars {

				public:

				    static constexpr const char* to =

				            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

				    int8_t from[255];

				    base64_chars() {

				        static_assert(strlen(to) == 64);

				        for (int i = 0; i < 255; i++) {

				            from[i] = 255; // signal invalid character

				        }

				        for (int i = 0; i < 64; i++) {

				            from[(unsigned) to[i]] = i;

				        }

				    }

				} base64_chars;

				std::string base64_encode(bytes_view in) {

				    std::string ret;

				    ret.reserve(((4 * in.size() / 3) + 3) & ~3);

				    int i = 0;

				    unsigned char chunk3[3]; // chunk of input

				    for (auto byte : in) {

				        chunk3[i++] = byte;

				        if (i == 3) {

				            ret += base64_chars.to[ (chunk3[0] & 0xfc) >> 2 ];

				            ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];

				            ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];

				            ret += base64_chars.to[ chunk3[2] & 0x3f ];

				            i = 0;

				        }

				    }

				    if (i) {

				        // i can be 1 or 2.

				        for(int j = i; j < 3; j++)

				            chunk3[j] = '\0';

				        ret += base64_chars.to[ ( chunk3[0] & 0xfc) >> 2 ];

				        ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];

				        if (i == 2) {

				            ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];

				        } else {

				            ret += '=';

				        }

				        ret += '=';

				    }

				    return ret;

				}

				bytes base64_decode(std::string_view in) {

				    int i = 0;

				    int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;

				    std::string ret;

				    ret.reserve(in.size() * 3 / 4);

				    for (unsigned char c : in) {

				        uint8_t dc = base64_chars.from[c];

				        if (dc == 255) {

				            // Any unexpected character, include the "=" character usually

				            // used for padding, signals the end of the decode.

				            break;

				        }

				        chunk4[i++] = dc;

				        if (i == 4) {

				            ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);

				            ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);

				            ret += ((chunk4[2] & 0x3) << 6) + chunk4[3];

				            i = 0;

				        }

				    }

				    if (i) {

				        // i can be 2 or 3, meaning 1 or 2 more output characters

				        if (i>=2)

				            ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);

				        if (i==3)

				            ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);

				    }

				    // FIXME: This copy is sad. The problem is we need back "bytes"

				    // but "bytes" doesn't have efficient append and std::string.

				    // To fix this we need to use bytes' "uninitialized" feature.

				    return bytes(ret.begin(), ret.end());

				}

									
										34

alternator/base64.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,34 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string_view>

				#include "bytes.hh"

				#include "rjson.hh"

				std::string base64_encode(bytes_view);

				bytes base64_decode(std::string_view);

				inline bytes base64_decode(const rjson::value& v) {

				  return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));

				}

									
										682

alternator/conditions.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,682 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include <list>

				#include <map>

				#include <string_view>

				#include "alternator/conditions.hh"

				#include "alternator/error.hh"

				#include "cql3/constants.hh"

				#include <unordered_map>

				#include "rjson.hh"

				#include "serialization.hh"

				#include "base64.hh"

				#include <stdexcept>

				#include <boost/algorithm/cxx11/all_of.hpp>

				#include <boost/algorithm/cxx11/any_of.hpp>

				#include "utils/overloaded_functor.hh"

				#include "expressions_eval.hh"

				namespace alternator {

				static logging::logger clogger("alternator-conditions");

				comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {

				    static std::unordered_map<std::string, comparison_operator_type> ops = {

				            {"EQ", comparison_operator_type::EQ},

				            {"NE", comparison_operator_type::NE},

				            {"LE", comparison_operator_type::LE},

				            {"LT", comparison_operator_type::LT},

				            {"GE", comparison_operator_type::GE},

				            {"GT", comparison_operator_type::GT},

				            {"IN", comparison_operator_type::IN},

				            {"NULL", comparison_operator_type::IS_NULL},

				            {"NOT_NULL", comparison_operator_type::NOT_NULL},

				            {"BETWEEN", comparison_operator_type::BETWEEN},

				            {"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},

				            {"CONTAINS", comparison_operator_type::CONTAINS},

				            {"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},

				    };

				    if (!comparison_operator.IsString()) {

				        throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));

				    }

				    std::string op = comparison_operator.GetString();

				    auto it = ops.find(op);

				    if (it == ops.end()) {

				        throw api_error("ValidationException", format("Unsupported comparison operator {}", op));

				    }

				    return it->second;

				}

				static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {

				    bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));

				    auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));

				    bytes raw_value = serialize_item(value);

				    auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));

				    return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));

				}

				static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {

				    bytes raw_value = get_key_from_typed_value(value, cdef);

				    auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));

				    return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));

				}

				::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {

				    clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));

				    auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);

				    for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {

				        std::string_view column_name(it->name.GetString(), it->name.GetStringLength());

				        const rjson::value& condition = it->value;

				        const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");

				        const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");

				        comparison_operator_type op = get_comparison_operator(comp_definition);

				        if (op != comparison_operator_type::EQ) {

				            throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");

				        }

				        if (attr_list.Size() != 1) {

				            throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));

				        }

				        if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {

				            // Primary key restriction

				            filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);

				        } else {

				            // Regular column restriction

				            filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);

				        }

				    }

				    return filtering_restrictions;

				}

				namespace {

				struct size_check {

				    // True iff size passes this check.

				    virtual bool operator()(rapidjson::SizeType size) const = 0;

				    // Check description, such that format("expected array {}", check.what()) is human-readable.

				    virtual sstring what() const = 0;

				};

				class exact_size : public size_check {

				    rapidjson::SizeType _expected;

				  public:

				    explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}

				    bool operator()(rapidjson::SizeType size) const override { return size == _expected; }

				    sstring what() const override { return format("of size {}", _expected); }

				};

				struct empty : public size_check {

				    bool operator()(rapidjson::SizeType size) const override { return size < 1; }

				    sstring what() const override { return "to be empty"; }

				};

				struct nonempty : public size_check {

				    bool operator()(rapidjson::SizeType size) const override { return size > 0; }

				    sstring what() const override { return "to be non-empty"; }

				};

				} // anonymous namespace

				// Check that array has the expected number of elements

				static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {

				    if (!array || !array->IsArray()) {

				        throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");

				    }

				    if (!expected(array->Size())) {

				        throw api_error("ValidationException",

				                        format("{} operator requires AttributeValueList {}, instead found list size {}",

				                               op, expected.what(), array->Size()));

				    }

				}

				struct rjson_engaged_ptr_comp {

				    bool operator()(const rjson::value* p1, const rjson::value* p2) const {

				        return rjson::single_value_comp()(*p1, *p2);

				    }

				};

				// It's not enough to compare underlying JSON objects when comparing sets,

				// as internally they're stored in an array, and the order of elements is

				// not important in set equality. See issue #5021

				static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {

				    if (set1.Size() != set2.Size()) {

				        return false;

				    }

				    std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;

				    for (auto it = set1.Begin(); it != set1.End(); ++it) {

				        set1_raw.insert(&*it);

				    }

				    for (const auto& a : set2.GetArray()) {

				        if (set1_raw.count(&a) == 0) {

				            return false;

				        }

				    }

				    return true;

				}

				// Check if two JSON-encoded values match with the EQ relation

				static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {

				    if (!v1) {

				        return false;

				    }

				    if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {

				        auto it1 = v1->MemberBegin();

				        auto it2 = v2.MemberBegin();

				        if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {

				            return check_EQ_for_sets(it1->value, it2->value);

				        }

				    }

				    return *v1 == v2;

				}

				// Check if two JSON-encoded values match with the NE relation

				static bool check_NE(const rjson::value* v1, const rjson::value& v2) {

				    return !v1 || *v1 != v2; // null is unequal to anything.

				}

				// Check if two JSON-encoded values match with the BEGINS_WITH relation

				static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {

				    // BEGINS_WITH requires that its single operand (v2) be a string or

				    // binary - otherwise it's a validation error. However, problems with

				    // the stored attribute (v1) will just return false (no match).

				    if (!v2.IsObject() || v2.MemberCount() != 1) {

				        throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));

				    }

				    auto it2 = v2.MemberBegin();

				    if (it2->name != "S" && it2->name != "B") {

				        throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));

				    }

				    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {

				        return false;

				    }

				    auto it1 = v1->MemberBegin();

				    if (it1->name != it2->name) {

				        return false;

				    }

				    if (it2->name == "S") {

				        std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());

				        std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());

				        return val1.substr(0, val2.size()) == val2;

				    } else /* it2->name == "B" */ {

				        // TODO (optimization): Check the begins_with condition directly on

				        // the base64-encoded string, without making a decoded copy.

				        bytes val1 = base64_decode(it1->value);

				        bytes val2 = base64_decode(it2->value);

				        return val1.substr(0, val2.size()) == val2;

				    }

				}

				static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {

				    return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");

				}

				// Check if two JSON-encoded values match with the CONTAINS relation

				bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				    if (!v1) {

				        return false;

				    }

				    const auto& kv1 = *v1->MemberBegin();

				    const auto& kv2 = *v2.MemberBegin();

				    if (kv2.name != "S" && kv2.name != "N" &&  kv2.name != "B") {

				        throw api_error("ValidationException",

				                        format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "

				                               "got {} instead", kv2.name));

				    }

				    if (kv1.name == "S" && kv2.name == "S") {

				        return rjson::to_string_view(kv1.value).find(rjson::to_string_view(kv2.value)) != std::string_view::npos;

				    } else if (kv1.name == "B" && kv2.name == "B") {

				        return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;

				    } else if (is_set_of(kv1.name, kv2.name)) {

				        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {

				            if (*i == kv2.value) {

				                return true;

				            }

				        }

				    } else if (kv1.name == "L") {

				        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {

				            if (!i->IsObject() || i->MemberCount() != 1) {

				                clogger.error("check_CONTAINS received a list whose element is malformed");

				                return false;

				            }

				            const auto& el = *i->MemberBegin();

				            if (el.name == kv2.name && el.value == kv2.value) {

				                return true;

				            }

				        }

				    }

				    return false;

				}

				// Check if two JSON-encoded values match with the NOT_CONTAINS relation

				static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				    if (!v1) {

				        return false;

				    }

				    return !check_CONTAINS(v1, v2);

				}

				// Check if a JSON-encoded value equals any element of an array, which must have at least one element.

				static bool check_IN(const rjson::value* val, const rjson::value& array) {

				    if (!array[0].IsObject() || array[0].MemberCount() != 1) {

				        throw api_error("ValidationException",

				                        format("IN operator encountered malformed AttributeValue: {}", array[0]));

				    }

				    const auto& type = array[0].MemberBegin()->name;

				    if (type != "S" && type != "N" && type != "B") {

				        throw api_error("ValidationException",

				                        "IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");

				    }

				    if (!val) {

				        return false;

				    }

				    bool have_match = false;

				    for (const auto& elem : array.GetArray()) {

				        if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {

				            throw api_error("ValidationException",

				                            "IN operator requires all AttributeValueList elements to have the same type ");

				        }

				        if (!have_match && *val == elem) {

				            // Can't return yet, must check types of all array elements. <sigh>

				            have_match = true;

				        }

				    }

				    return have_match;

				}

				// Another variant of check_IN, this one for ConditionExpression. It needs to

				// check whether the first element in the given vector is equal to any of the

				// others.

				static bool check_IN(const std::vector<rjson::value>& array) {

				    const rjson::value* first = &array[0];

				    for (unsigned i = 1; i < array.size(); i++) {

				        if (check_EQ(first, array[i])) {

				            return true;

				        }

				    }

				    return false;

				}

				static bool check_NULL(const rjson::value* val) {

				    return val == nullptr;

				}

				static bool check_NOT_NULL(const rjson::value* val) {

				    return val != nullptr;

				}

				// Check if two JSON-encoded values match with cmp.

				template <typename Comparator>

				bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {

				    if (!v2.IsObject() || v2.MemberCount() != 1) {

				        throw api_error("ValidationException",

				                        format("{} requires a single AttributeValue of type String, Number, or Binary",

				                               cmp.diagnostic));

				    }

				    const auto& kv2 = *v2.MemberBegin();

				    if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {

				        throw api_error("ValidationException",

				                        format("{} requires a single AttributeValue of type String, Number, or Binary",

				                               cmp.diagnostic));

				    }

				    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {

				        return false;

				    }

				    const auto& kv1 = *v1->MemberBegin();

				    if (kv1.name != kv2.name) {

				        return false;

				    }

				    if (kv1.name == "N") {

				        return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));

				    }

				    if (kv1.name == "S") {

				        return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),

				                   std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));

				    }

				    if (kv1.name == "B") {

				        return cmp(base64_decode(kv1.value), base64_decode(kv2.value));

				    }

				    clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");

				    return false;

				}

				struct cmp_lt {

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }

				    // We cannot use the normal comparison operators like "<" on the bytes

				    // type, because they treat individual bytes as signed but we need to

				    // compare them as *unsigned*. So we need a specialization for bytes.

				    bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) < 0; }

				    static constexpr const char* diagnostic = "LT operator";

				};

				struct cmp_le {

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs <= rhs; }

				    bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) <= 0; }

				    static constexpr const char* diagnostic = "LE operator";

				};

				struct cmp_ge {

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs >= rhs; }

				    bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) >= 0; }

				    static constexpr const char* diagnostic = "GE operator";

				};

				struct cmp_gt {

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs > rhs; }

				    bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) > 0; }

				    static constexpr const char* diagnostic = "GT operator";

				};

				// True if v is between lb and ub, inclusive.  Throws if lb > ub.

				template <typename T>

				bool check_BETWEEN(const T& v, const T& lb, const T& ub) {

				    if (cmp_lt()(ub, lb)) {

				        throw api_error("ValidationException",

				                        format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));

				    }

				    return cmp_ge()(v, lb) && cmp_le()(v, ub);

				}

				static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {

				    if (!v) {

				        return false;

				    }

				    if (!v->IsObject() || v->MemberCount() != 1) {

				        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", *v));

				    }

				    if (!lb.IsObject() || lb.MemberCount() != 1) {

				        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", lb));

				    }

				    if (!ub.IsObject() || ub.MemberCount() != 1) {

				        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", ub));

				    }

				    const auto& kv_v = *v->MemberBegin();

				    const auto& kv_lb = *lb.MemberBegin();

				    const auto& kv_ub = *ub.MemberBegin();

				    if (kv_lb.name != kv_ub.name) {

				        throw api_error(

				                "ValidationException",

				                format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",

				                       kv_lb.name, kv_ub.name));

				    }

				    if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.

				        return false;

				    }

				    if (kv_v.name == "N") {

				        const char* diag = "BETWEEN operator";

				        return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));

				    }

				    if (kv_v.name == "S") {

				        return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),

				                             std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),

				                             std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));

				    }

				    if (kv_v.name == "B") {

				        return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));

				    }

				    throw api_error("ValidationException",

				        format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",

				               kv_lb.name));

				}

				// Verify one Expect condition on one attribute (whose content is "got")

				// for the verify_expected() below.

				// This function returns true or false depending on whether the condition

				// succeeded - it does not throw ConditionalCheckFailedException.

				// However, it may throw ValidationException on input validation errors.

				static bool verify_expected_one(const rjson::value& condition, const rjson::value* got) {

				    const rjson::value* comparison_operator = rjson::find(condition, "ComparisonOperator");

				    const rjson::value* attribute_value_list = rjson::find(condition, "AttributeValueList");

				    const rjson::value* value = rjson::find(condition, "Value");

				    const rjson::value* exists = rjson::find(condition, "Exists");

				    // There are three types of conditions that Expected supports:

				    // A value, not-exists, and a comparison of some kind. Each allows

				    // and requires a different combinations of parameters in the request

				    if (value) {

				        if (exists && (!exists->IsBool() || exists->GetBool() != true)) {

				            throw api_error("ValidationException", "Cannot combine Value with Exists!=true");

				        }

				        if (comparison_operator) {

				            throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");

				        }

				        return check_EQ(got, *value);

				    } else if (exists) {

				        if (comparison_operator) {

				            throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");

				        }

				        if (!exists->IsBool() || exists->GetBool() != false) {

				            throw api_error("ValidationException", "Exists!=false requires Value");

				        }

				        // Remember Exists=false, so we're checking that the attribute does *not* exist:

				        return !got;

				    } else {

				        if (!comparison_operator) {

				            throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");

				        }

				        comparison_operator_type op = get_comparison_operator(*comparison_operator);

				        switch (op) {

				        case comparison_operator_type::EQ:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_EQ(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::NE:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_NE(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::LT:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_lt{});

				        case comparison_operator_type::LE:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_le{});

				        case comparison_operator_type::GT:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_gt{});

				        case comparison_operator_type::GE:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_ge{});

				        case comparison_operator_type::BEGINS_WITH:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_BEGINS_WITH(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::IN:

				            verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);

				            return check_IN(got, *attribute_value_list);

				        case comparison_operator_type::IS_NULL:

				            verify_operand_count(attribute_value_list, empty(), *comparison_operator);

				            return check_NULL(got);

				        case comparison_operator_type::NOT_NULL:

				            verify_operand_count(attribute_value_list, empty(), *comparison_operator);

				            return check_NOT_NULL(got);

				        case comparison_operator_type::BETWEEN:

				            verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);

				            return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);

				        case comparison_operator_type::CONTAINS:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_CONTAINS(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::NOT_CONTAINS:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_NOT_CONTAINS(got, (*attribute_value_list)[0]);

				        }

				        throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));

				    }

				}

				// Check if the existing values of the item (previous_item) match the

				// conditions given by the Expected and ConditionalOperator parameters

				// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).

				// This function can throw an ValidationException API error if there

				// are errors in the format of the condition itself.

				bool verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {

				    const rjson::value* expected = rjson::find(req, "Expected");

				    if (!expected) {

				        return true;

				    }

				    if (!expected->IsObject()) {

				        throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");

				    }

				    // ConditionalOperator can be "AND" for requiring all conditions, or

				    // "OR" for requiring one condition, and defaults to "AND" if missing.

				    const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");

				    bool require_all = true;

				    if (conditional_operator) {

				        if (!conditional_operator->IsString()) {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");

				        }

				        std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());

				        if (s == "AND") {

				            // require_all is already true

				        } else if (s == "OR") {

				            require_all = false;

				        } else {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");

				        }

				        if (expected->GetObject().ObjectEmpty()) {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");

				        }

				    }

				    for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {

				        const rjson::value* got = nullptr;

				        if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {

				            got = rjson::find((*previous_item)["Item"], rjson::to_string_view(it->name));

				        }

				        bool success = verify_expected_one(it->value, got);

				        if (success && !require_all) {

				            // When !require_all, one success is enough!

				            return true;

				        } else if (!success && require_all) {

				            // When require_all, one failure is enough!

				            return false;

				        }

				    }

				    // If we got here and require_all, none of the checks failed, so succeed.

				    // If we got here and !require_all, all of the checks failed, so fail.

				    return require_all;

				}

				bool calculate_primitive_condition(const parsed::primitive_condition& cond,

				        std::unordered_set<std::string>& used_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        const rjson::value& req,

				        schema_ptr schema,

				        const std::unique_ptr<rjson::value>& previous_item) {

				    std::vector<rjson::value> calculated_values;

				    calculated_values.reserve(cond._values.size());

				    for (const parsed::value& v : cond._values) {

				        calculated_values.push_back(calculate_value(v,

				                cond._op == parsed::primitive_condition::type::VALUE ?

				                        calculate_value_caller::ConditionExpressionAlone :

				                        calculate_value_caller::ConditionExpression,

				                rjson::find(req, "ExpressionAttributeValues"),

				                used_attribute_names, used_attribute_values,

				                req, schema, previous_item));

				    }

				    switch (cond._op) {

				    case parsed::primitive_condition::type::BETWEEN:

				        if (calculated_values.size() != 3) {

				            // Shouldn't happen unless we have a bug in the parser

				            throw std::logic_error(format("Wrong number of values {} in BETWEEN primitive_condition", cond._values.size()));

				        }

				        return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2]);

				    case parsed::primitive_condition::type::IN:

				        return check_IN(calculated_values);

				    case parsed::primitive_condition::type::VALUE:

				        if (calculated_values.size() != 1) {

				            // Shouldn't happen unless we have a bug in the parser

				            throw std::logic_error(format("Unexpected values in primitive_condition", cond._values.size()));

				        }

				        // Unwrap the boolean wrapped as the value (if it is a boolean)

				        if (calculated_values[0].IsObject() && calculated_values[0].MemberCount() == 1) {

				            auto it = calculated_values[0].MemberBegin();

				            if (it->name == "BOOL" && it->value.IsBool()) {

				                return it->value.GetBool();

				            }

				        }

				        throw api_error("ValidationException",

				                format("ConditionExpression: condition results in a non-boolean value: {}",

				                        calculated_values[0]));

				    default:

				        // All the rest of the operators have exactly two parameters (and unless

				        // we have a bug in the parser, that's what we have in the parsed object:

				        if (calculated_values.size() != 2) {

				            throw std::logic_error(format("Wrong number of values {} in primitive_condition object", cond._values.size()));

				        }

				    }

				    switch (cond._op) {

				    case parsed::primitive_condition::type::EQ:

				        return check_EQ(&calculated_values[0], calculated_values[1]);

				    case parsed::primitive_condition::type::NE:

				        return check_NE(&calculated_values[0], calculated_values[1]);

				    case parsed::primitive_condition::type::GT:

				        return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{});

				    case parsed::primitive_condition::type::GE:

				        return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{});

				    case parsed::primitive_condition::type::LT:

				        return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{});

				    case parsed::primitive_condition::type::LE:

				        return check_compare(&calculated_values[0], calculated_values[1], cmp_le{});

				    default:

				        // Shouldn't happen unless we have a bug in the parser

				        throw std::logic_error(format("Unknown type {} in primitive_condition object", (int)(cond._op)));

				    }

				}

				// Check if the existing values of the item (previous_item) match the

				// conditions given by the given parsed ConditionExpression.

				bool verify_condition_expression(

				        const parsed::condition_expression& condition_expression,

				        std::unordered_set<std::string>& used_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        const rjson::value& req,

				        schema_ptr schema,

				        const std::unique_ptr<rjson::value>& previous_item) {

				    if (condition_expression.empty()) {

				        return true;

				    }

				    bool ret = std::visit(overloaded_functor {

				        [&] (const parsed::primitive_condition& cond) -> bool {

				            return calculate_primitive_condition(cond, used_attribute_values,

				                    used_attribute_names, req, schema, previous_item);

				        },

				        [&] (const parsed::condition_expression::condition_list& list) -> bool {

				            auto verify_condition = [&] (const parsed::condition_expression& e) {

				                return verify_condition_expression(e, used_attribute_values,

				                        used_attribute_names, req, schema, previous_item);

				            };

				            switch (list.op) {

				            case '&':

				                return boost::algorithm::all_of(list.conditions, verify_condition);

				            case '|':

				                return boost::algorithm::any_of(list.conditions, verify_condition);

				            default:

				                // Shouldn't happen unless we have a bug in the parser

				                throw std::logic_error("bad operator in condition_list");

				            }

				        }

				    }, condition_expression._expression);

				    return condition_expression._negated ? !ret : ret;

				}

				}

									
										49

alternator/conditions.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,49 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				/*

				 * This file contains definitions and functions related to placing conditions

				 * on Alternator queries (equivalent of CQL's restrictions).

				 *

				 * With conditions, it's possible to add criteria to selection requests (Scan, Query)

				 * and use them for narrowing down the result set, by means of filtering or indexing.

				 *

				 * Ref: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html

				 */

				#pragma once

				#include "cql3/restrictions/statement_restrictions.hh"

				#include "serialization.hh"

				namespace alternator {

				enum class comparison_operator_type {

				    EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH

				};

				comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);

				::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);

				bool verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);

				}

									
										50

alternator/error.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,50 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				namespace alternator {

				// DynamoDB's error messages are described in detail in

				// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html

				// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser

				// HTTP code (almost always, 400), and a human readable message. Eventually these

				// will be wrapped into a JSON object returned to the client.

				class api_error : public std::exception {

				public:

				    using status_type = httpd::reply::status_type;

				    status_type _http_code;

				    std::string _type;

				    std::string _msg;

				    api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)

				        : _http_code(std::move(http_code))

				        , _type(std::move(type))

				        , _msg(std::move(msg))

				    { }

				    api_error() = default;

				    virtual const char* what() const noexcept override { return _msg.c_str(); }

				};

				}

3656

alternator/executor.cc Normal file

View File

File diff suppressed because it is too large Load Diff

									
										82

alternator/executor.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,82 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <seastar/core/future.hh>

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				#include <seastar/json/json_elements.hh>

				#include <seastar/core/sharded.hh>

				#include "service/storage_proxy.hh"

				#include "service/migration_manager.hh"

				#include "service/client_state.hh"

				#include "alternator/error.hh"

				#include "stats.hh"

				#include "rjson.hh"

				namespace alternator {

				class executor : public peering_sharded_service<executor> {

				    service::storage_proxy& _proxy;

				    service::migration_manager& _mm;

				    // An smp_service_group to be used for limiting the concurrency when

				    // forwarding Alternator request between shards - if necessary for LWT.

				    smp_service_group _ssg;

				public:

				    using client_state = service::client_state;

				    using request_return_type = std::variant<json::json_return_type, api_error>;

				    stats _stats;

				    static constexpr auto ATTRS_COLUMN_NAME = ":attrs";

				    static constexpr auto KEYSPACE_NAME_PREFIX = "alternator_";

				    executor(service::storage_proxy& proxy, service::migration_manager& mm, smp_service_group ssg)

				        : _proxy(proxy), _mm(mm), _ssg(ssg) {}

				    future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> list_tables(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> describe_endpoints(client_state& client_state, service_permit permit, rjson::value request, std::string host_header);

				    future<request_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> tag_resource(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> untag_resource(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> list_tags_of_resource(client_state& client_state, service_permit permit, rjson::value request);

				    future<> start();

				    future<> stop() { return make_ready_future<>(); }

				    future<> create_keyspace(std::string_view keyspace_name);

				    static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);

				};

				}

									
										127

alternator/expressions.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,127 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "expressions.hh"

				#include "alternator/expressionsLexer.hpp"

				#include "alternator/expressionsParser.hpp"

				#include "utils/overloaded_functor.hh"

				#include <seastarx.hh>

				#include <seastar/core/print.hh>

				#include <seastar/util/log.hh>

				#include <functional>

				namespace alternator {

				template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>

				Result do_with_parser(std::string input, Func&& f) {

				    expressionsLexer::InputStreamType input_stream{

				        reinterpret_cast<const ANTLR_UINT8*>(input.data()),

				        ANTLR_ENC_UTF8,

				        static_cast<ANTLR_UINT32>(input.size()),

				        nullptr };

				    expressionsLexer lexer(&input_stream);

				    expressionsParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());

				    expressionsParser parser(&tstream);

				    auto result = f(parser);

				    return result;

				}

				parsed::update_expression

				parse_update_expression(std::string query) {

				    try {

				        return do_with_parser(query,  std::mem_fn(&expressionsParser::update_expression));

				    } catch (...) {

				        throw expressions_syntax_error(format("Failed parsing UpdateExpression '{}': {}", query, std::current_exception()));

				    }

				}

				std::vector<parsed::path>

				parse_projection_expression(std::string query) {

				    try {

				        return do_with_parser(query,  std::mem_fn(&expressionsParser::projection_expression));

				    } catch (...) {

				        throw expressions_syntax_error(format("Failed parsing ProjectionExpression '{}': {}", query, std::current_exception()));

				    }

				}

				parsed::condition_expression

				parse_condition_expression(std::string query) {

				    try {

				        return do_with_parser(query,  std::mem_fn(&expressionsParser::condition_expression));

				    } catch (...) {

				        throw expressions_syntax_error(format("Failed parsing ConditionExpression '{}': {}", query, std::current_exception()));

				    }

				}

				namespace parsed {

				void update_expression::add(update_expression::action a) {

				    std::visit(overloaded_functor {

				        [&] (action::set&)    { seen_set = true; },

				        [&] (action::remove&) { seen_remove = true; },

				        [&] (action::add&)    { seen_add = true; },

				        [&] (action::del&)    { seen_del = true; }

				    }, a._action);

				    _actions.push_back(std::move(a));

				}

				void update_expression::append(update_expression other) {

				    if ((seen_set && other.seen_set) ||

				        (seen_remove && other.seen_remove) ||

				        (seen_add && other.seen_add) ||

				        (seen_del && other.seen_del)) {

				        throw expressions_syntax_error("Each of SET, REMOVE, ADD, DELETE may only appear once in UpdateExpression");

				    }

				    std::move(other._actions.begin(), other._actions.end(), std::back_inserter(_actions));

				    seen_set |= other.seen_set;

				    seen_remove |= other.seen_remove;

				    seen_add |= other.seen_add;

				    seen_del |= other.seen_del;

				}

				void condition_expression::append(condition_expression&& a, char op) {

				    std::visit(overloaded_functor {

				        [&] (condition_list& x) {

				            // If 'a' has a single condition, we could, instead of inserting

				            // it insert its single condition (possibly negated if a._negated)

				            // But considering it we don't evaluate these expressions many

				            // times, this optimization is not worth extra code complexity.

				            if (!x.conditions.empty() && x.op != op) {

				                // Shouldn't happen unless we have a bug in the parser

				                throw std::logic_error("condition_expression::append called with mixed operators");

				            }

				            x.conditions.push_back(std::move(a));

				            x.op = op;

				        },

				        [&] (primitive_condition& x) {

				            // Shouldn't happen unless we have a bug in the parser

				            throw std::logic_error("condition_expression::append called on primitive_condition");

				        }

				    }, _expression);

				}

				} // namespace parsed

				} // namespace alternator

265

alternator/expressions.g Normal file

View File

@@ -0,0 +1,265 @@
 /*
  * Copyright 2019 ScyllaDB
  *
  * This file is part of Scylla. See the LICENSE.PROPRIETARY file in the
  * top-level directory for licensing information.
  */
 /*
  * This file is part of Scylla.
  *
  * Scylla is free software: you can redistribute it and/or modify
  * it under the terms of the GNU Affero General Public License as published by
  * the Free Software Foundation, either version 3 of the License, or
  * (at your option) any later version.
  *
  * Scylla is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU Affero General Public License
  * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
  */
 /*
  * The DynamoDB protocol is based on JSON, and most DynamoDB requests
  * describe the operation and its parameters via JSON objects such as maps
  * and lists. Nevertheless, in some types of requests an "expression" is
  * passed as a single string, and we need to parse this string. These
  * cases include:
  *  1. Attribute paths, such as "a[3].b.c", are used in projection
  *     expressions as well as inside other expressions described below.
  *  2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
  *     used in conditional updates, filters, and other places.
  *  3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"
  *
  * All these expression syntaxes are very simple: Most of them could be
  * parsed as regular expressions, and the parenthesized condition expression
  * could be done with a simple hand-written lexical analyzer and recursive-
  * descent parser. Nevertheless, we decided to specify these parsers in the
  * ANTLR3 language already used in the Scylla project, hopefully making these
  * parsers easier to reason about, and easier to change if needed - and
  * reducing the amount of boiler-plate code.
  */
 grammar expressions;
 options {
     language = Cpp;
 }
 @parser::namespace{alternator}
 @lexer::namespace{alternator}
 /* TODO: explain what these traits things are. I haven't seen them explained
  * in any document... Compilation fails without these fail because a definition
  * of "expressionsLexerTraits" and "expressionParserTraits" is needed.
  */
 @lexer::traits {
     class expressionsLexer;
     class expressionsParser;
     typedef antlr3::Traits<expressionsLexer, expressionsParser> expressionsLexerTraits;
 }
 @parser::traits {
     typedef expressionsLexerTraits expressionsParserTraits;
 }
 @lexer::header {
 	#include "alternator/expressions.hh"
 	// ANTLR generates a bunch of unused variables and functions. Yuck...
     #pragma GCC diagnostic ignored "-Wunused-variable"
     #pragma GCC diagnostic ignored "-Wunused-function"
 }
 @parser::header {
 	#include "expressionsLexer.hpp"
 }
 /* By default, ANTLR3 composes elaborate syntax-error messages, saying which
  * token was unexpected, where, and so on on, but then dutifully writes these
  * error messages to the standard error, and returns from the parser as if
  * everything was fine, with a half-constructed output object! If we define
  * the "displayRecognitionError" method, it will be called upon to build this
  * error message, and we can instead throw an exception to stop the parsing
  * immediately. This is good enough for now, for our simple needs, but if
  * we ever want to show more information about the syntax error, Cql3.g
  * contains an elaborate implementation (it would be nice if we could reuse
  * it, not duplicate it).
  * Unfortunately, we have to repeat the same definition twice - once for the
  * parser, and once for the lexer.
  */
 @parser::context {
     void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
         throw expressions_syntax_error("syntax error");
     }
 }
 @lexer::context {
     void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
         throw expressions_syntax_error("syntax error");
     }
 }
 /*
  * Lexical analysis phase, i.e., splitting the input up to tokens.
  * Lexical analyzer rules have names starting in capital letters.
  * "fragment" rules do not generate tokens, and are just aliases used to
  * make other rules more readable.
  * Characters *not* listed here, e.g., '=', '(', etc., will be handled
  * as individual tokens on their own right.
  * Whitespace spans are skipped, so do not generate tokens.
  */
 WHITESPACE: (' ' | '\t' | '\n' | '\r')+ { skip(); };
 /* shortcuts for case-insensitive keywords */
 fragment A:('a'|'A');
 fragment B:('b'|'B');
 fragment C:('c'|'C');
 fragment D:('d'|'D');
 fragment E:('e'|'E');
 fragment F:('f'|'F');
 fragment G:('g'|'G');
 fragment H:('h'|'H');
 fragment I:('i'|'I');
 fragment J:('j'|'J');
 fragment K:('k'|'K');
 fragment L:('l'|'L');
 fragment M:('m'|'M');
 fragment N:('n'|'N');
 fragment O:('o'|'O');
 fragment P:('p'|'P');
 fragment Q:('q'|'Q');
 fragment R:('r'|'R');
 fragment S:('s'|'S');
 fragment T:('t'|'T');
 fragment U:('u'|'U');
 fragment V:('v'|'V');
 fragment W:('w'|'W');
 fragment X:('x'|'X');
 fragment Y:('y'|'Y');
 fragment Z:('z'|'Z');
 /* These keywords must be appear before the generic NAME token below,
  * because NAME matches too, and the first to match wins.
  */
 SET: S E T;
 REMOVE: R E M O V E;
 ADD: A D D;
 DELETE: D E L E T E;
 AND: A N D;
 OR: O R;
 NOT: N O T;
 BETWEEN: B E T W E E N;
 IN: I N;
 fragment ALPHA: 'A'..'Z' | 'a'..'z';
 fragment DIGIT: '0'..'9';
 fragment ALNUM: ALPHA | DIGIT | '_';
 INTEGER: DIGIT+;
 NAME: ALPHA ALNUM*;
 NAMEREF: '#' ALNUM+;
 VALREF: ':' ALNUM+;
 /*
  * Parsing phase - parsing the string of tokens generated by the lexical
  * analyzer defined above.
  */
 path_component: NAME | NAMEREF;
 path returns [parsed::path p]:
     root=path_component           { $p.set_root($root.text); }
     (   '.' name=path_component   { $p.add_dot($name.text); }
       | '[' INTEGER ']'           { $p.add_index(std::stoi($INTEGER.text)); }
     )*;
 value returns [parsed::value v]:
       VALREF       { $v.set_valref($VALREF.text); }
     | path         { $v.set_path($path.p); }
     | NAME         { $v.set_func_name($NAME.text); }
      '(' x=value   { $v.add_func_parameter($x.v); }
      (',' x=value  { $v.add_func_parameter($x.v); })*
      ')'
     ;
 update_expression_set_rhs returns [parsed::set_rhs rhs]:
     v=value  { $rhs.set_value(std::move($v.v)); }
     (   '+' v=value  { $rhs.set_plus(std::move($v.v)); }
       | '-' v=value  { $rhs.set_minus(std::move($v.v)); }
     )?
     ;
 update_expression_set_action returns [parsed::update_expression::action a]:
     path '=' rhs=update_expression_set_rhs { $a.assign_set($path.p, $rhs.rhs); };
 update_expression_remove_action returns [parsed::update_expression::action a]:
     path { $a.assign_remove($path.p); };
 update_expression_add_action returns [parsed::update_expression::action a]:
     path VALREF { $a.assign_add($path.p, $VALREF.text); };
 update_expression_delete_action returns [parsed::update_expression::action a]:
     path VALREF { $a.assign_del($path.p, $VALREF.text); };
 update_expression_clause returns [parsed::update_expression e]:
       SET s=update_expression_set_action { $e.add(s); }
       (',' s=update_expression_set_action { $e.add(s); })*
     | REMOVE r=update_expression_remove_action { $e.add(r); }
       (',' r=update_expression_remove_action { $e.add(r); })*
     | ADD a=update_expression_add_action { $e.add(a); }
       (',' a=update_expression_add_action { $e.add(a); })*
     | DELETE d=update_expression_delete_action { $e.add(d); }
       (',' d=update_expression_delete_action { $e.add(d); })*
     ;
 // Note the "EOF" token at the end of the update expression. We want to the
 //  parser to match the entire string given to it - not just its beginning!
 update_expression returns [parsed::update_expression e]:
     (update_expression_clause { e.append($update_expression_clause.e); })* EOF;
 projection_expression returns [std::vector<parsed::path> v]:
     p=path      { $v.push_back(std::move($p.p)); }
     (',' p=path { $v.push_back(std::move($p.p)); } )* EOF;
 primitive_condition returns [parsed::primitive_condition c]:
       v=value         { $c.add_value(std::move($v.v));
                         $c.set_operator(parsed::primitive_condition::type::VALUE); }
       (  (  '='       { $c.set_operator(parsed::primitive_condition::type::EQ); }
           | '<' '>'   { $c.set_operator(parsed::primitive_condition::type::NE); }
           | '<'       { $c.set_operator(parsed::primitive_condition::type::LT); }
           | '<' '='   { $c.set_operator(parsed::primitive_condition::type::LE); }
           | '>'       { $c.set_operator(parsed::primitive_condition::type::GT); }
           | '>' '='   { $c.set_operator(parsed::primitive_condition::type::GE); }
          )
          v=value      { $c.add_value(std::move($v.v)); }
        | BETWEEN      { $c.set_operator(parsed::primitive_condition::type::BETWEEN); }
          v=value      { $c.add_value(std::move($v.v)); }
          AND
          v=value      { $c.add_value(std::move($v.v)); }
        | IN '('       { $c.set_operator(parsed::primitive_condition::type::IN); }
          v=value      { $c.add_value(std::move($v.v)); }
          (',' v=value { $c.add_value(std::move($v.v)); })*
          ')'
       )?
     ;
 // The following rules for parsing boolean expressions are verbose and
 // somewhat strange because of Antlr 3's limitations on recursive rules,
 // common rule prefixes, and (lack of) support for operator precedence.
 // These rules could have been written more clearly using a more powerful
 // parser generator - such as Yacc.
 boolean_expression returns [parsed::condition_expression e]:
 	  b=boolean_expression_1       { $e.append(std::move($b.e), '|'); }
 	  (OR b=boolean_expression_1   { $e.append(std::move($b.e), '|'); } )*
 	;
 boolean_expression_1 returns [parsed::condition_expression e]:
 	  b=boolean_expression_2       { $e.append(std::move($b.e), '&'); }
 	  (AND b=boolean_expression_2  { $e.append(std::move($b.e), '&'); } )*
 	;
 boolean_expression_2 returns [parsed::condition_expression e]:
 	  p=primitive_condition        { $e.set_primitive(std::move($p.c)); }
 	| NOT b=boolean_expression_2   { $e = std::move($b.e); $e.apply_not(); }
 	| '(' b=boolean_expression ')' { $e = std::move($b.e); }
     ;
 condition_expression returns [parsed::condition_expression e]:
     boolean_expression { e=std::move($boolean_expression.e); } EOF;

									
										41

alternator/expressions.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,41 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string>

				#include <stdexcept>

				#include <vector>

				#include "expressions_types.hh"

				namespace alternator {

				class expressions_syntax_error : public std::runtime_error {

				public:

				    using runtime_error::runtime_error;

				};

				parsed::update_expression parse_update_expression(std::string query);

				std::vector<parsed::path> parse_projection_expression(std::string query);

				parsed::condition_expression parse_condition_expression(std::string query);

				} /* namespace alternator */

									
										78

alternator/expressions_eval.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,78 @@

				/*

				 * Copyright 2020 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string>

				#include <unordered_set>

				#include "rjson.hh"

				#include "schema_fwd.hh"

				#include "expressions_types.hh"

				namespace alternator {

				// calculate_value() behaves slightly different (especially, different

				// functions supported) when used in different types of expressions, as

				// enumerated in this enum:

				enum class calculate_value_caller {

				    UpdateExpression, ConditionExpression, ConditionExpressionAlone

				};

				inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {

				    switch (caller) {

				        case calculate_value_caller::UpdateExpression:

				            out << "UpdateExpression";

				            break;

				        case calculate_value_caller::ConditionExpression:

				            out << "ConditionExpression";

				            break;

				        case calculate_value_caller::ConditionExpressionAlone:

				            out << "ConditionExpression";

				            break;

				        default:

				            out << "unknown type of expression";

				            break;

				    }

				    return out;

				}

				bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);

				rjson::value calculate_value(const parsed::value& v,

				        calculate_value_caller caller,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        std::unordered_set<std::string>& used_attribute_values,

				        const rjson::value& update_info,

				        schema_ptr schema,

				        const std::unique_ptr<rjson::value>& previous_item);

				bool verify_condition_expression(

				        const parsed::condition_expression& condition_expression,

				        std::unordered_set<std::string>& used_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        const rjson::value& req,

				        schema_ptr schema,

				        const std::unique_ptr<rjson::value>& previous_item);

				} /* namespace alternator */

									
										228

alternator/expressions_types.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,228 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <vector>

				#include <string>

				#include <variant>

				/*

				 * Parsed representation of expressions and their components.

				 *

				 * Types in alternator::parse namespace are used for holding the parse

				 * tree - objects generated by the Antlr rules after parsing an expression.

				 * Because of the way Antlr works, all these objects are default-constructed

				 * first, and then assigned when the rule is completed, so all these types

				 * have only default constructors - but setter functions to set them later.

				 */

				namespace alternator {

				namespace parsed {

				// "path" is an attribute's path in a document, e.g., a.b[3].c.

				class path {

				    // All paths have a "root", a top-level attribute, and any number of

				    // "dereference operators" - each either an index (e.g., "[2]") or a

				    // dot (e.g., ".xyz").

				    std::string _root;

				    std::vector<std::variant<std::string, unsigned>> _operators;

				public:

				    void set_root(std::string root) {

				        _root = std::move(root);

				    }

				    void add_index(unsigned i) {

				        _operators.emplace_back(i);

				    }

				    void add_dot(std::string(name)) {

				        _operators.emplace_back(std::move(name));

				    }

				    const std::string& root() const {

				        return _root;

				    }

				    bool has_operators() const {

				        return !_operators.empty();

				    }

				};

				// "value" is is a value used in the right hand side of an assignment

				// expression, "SET a = ...". It can be a reference to a value included in

				// the request (":val"), a path to an attribute from the existing item

				// (e.g., "a.b[3].c"), or a function of other such values.

				// Note that the real right-hand-side of an assignment is actually a bit

				// more general - it allows either a value, or a value+value or value-value -

				// see class set_rhs below.

				struct value {

				    struct function_call {

				        std::string _function_name;

				        std::vector<value> _parameters;

				    };

				    std::variant<std::string, path, function_call> _value;

				    void set_valref(std::string s) {

				        _value = std::move(s);

				    }

				    void set_path(path p) {

				        _value = std::move(p);

				    }

				    void set_func_name(std::string s) {

				        _value = function_call {std::move(s), {}};

				    }

				    void add_func_parameter(value v) {

				        std::get<function_call>(_value)._parameters.emplace_back(std::move(v));

				    }

				    bool is_valref() const {

				        return std::holds_alternative<std::string>(_value);

				    }

				    bool is_path() const {

				        return std::holds_alternative<path>(_value);

				    }

				    bool is_func() const {

				        return std::holds_alternative<function_call>(_value);

				    }

				};

				// The right-hand-side of a SET in an update expression can be either a

				// single value (see above), or value+value, or value-value.

				class set_rhs {

				public:

				    char _op;  // '+', '-', or 'v''

				    value _v1;

				    value _v2;

				    void set_value(value&& v1) {

				        _op = 'v';

				        _v1 = std::move(v1);

				    }

				    void set_plus(value&& v2) {

				        _op = '+';

				        _v2 = std::move(v2);

				    }

				    void set_minus(value&& v2) {

				        _op = '-';

				        _v2 = std::move(v2);

				    }

				};

				class update_expression {

				public:

				    struct action {

				        path _path;

				        struct set {

				            set_rhs _rhs;

				        };

				        struct remove {

				        };

				        struct add {

				            std::string _valref;

				        };

				        struct del {

				            std::string _valref;

				        };

				        std::variant<set, remove, add, del> _action;

				        void assign_set(path p, set_rhs rhs) {

				            _path = std::move(p);

				            _action = set { std::move(rhs) };

				        }

				        void assign_remove(path p) {

				            _path = std::move(p);

				            _action = remove { };

				        }

				        void assign_add(path p, std::string v) {

				            _path = std::move(p);

				            _action = add { std::move(v) };

				        }

				        void assign_del(path p, std::string v) {

				            _path = std::move(p);

				            _action = del { std::move(v) };

				        }

				    };

				private:

				    std::vector<action> _actions;

				    bool seen_set = false;

				    bool seen_remove = false;

				    bool seen_add = false;

				    bool seen_del = false;

				public:

				    void add(action a);

				    void append(update_expression other);

				    bool empty() const {

				        return _actions.empty();

				    }

				    const std::vector<action>& actions() const {

				        return _actions;

				    }

				};

				// A primitive_condition is a condition expression involving one condition,

				// while the full condition_expression below adds boolean logic over these

				// primitive conditions.

				// The supported primitive conditions are:

				// 1. Binary operators - v1 OP v2, where OP is =, <>, <, <=, >, or >= and

				//    v1 and v2 are values - from the item (an attribute path), the query

				//    (a ":val" reference), or a function of the the above (only the size()

				//    function is supported).

				// 2. Ternary operator - v1 BETWEEN v2 and v3 (means v1 >= v2 AND v1 <= v3).

				// 3. N-ary operator - v1 IN ( v2, v3, ... )

				// 4. A single function call (attribute_exists etc.). The parser actually

				//    accepts a more general "value" here but later stages reject a value

				//    which is not a function call (because DynamoDB does it too).

				class primitive_condition {

				public:

				    enum class type {

				        UNDEFINED, VALUE, EQ, NE, LT, LE, GT, GE, BETWEEN, IN

				    };

				    type _op = type::UNDEFINED;

				    std::vector<value> _values;

				    void set_operator(type op) {

				        _op = op;

				    }

				    void add_value(value&& v) {

				        _values.push_back(std::move(v));

				    }

				    bool empty() const {

				        return _op == type::UNDEFINED;

				    }

				};

				class condition_expression {

				public:

				    bool _negated = false; // If true, the entire condition is negated

				    struct condition_list {

				        char op = '|'; // '&' or '|'

				        std::vector<condition_expression> conditions;

				    };

				    std::variant<primitive_condition, condition_list> _expression = condition_list();

				    void set_primitive(primitive_condition&& p) {

				        _expression = std::move(p);

				    }

				    void append(condition_expression&& c, char op);

				    void apply_not() {

				        _negated = !_negated;

				    }

				    bool empty() const {

				        return std::holds_alternative<condition_list>(_expression) &&

				               std::get<condition_list>(_expression).conditions.empty();

				    }

				};

				} // namespace parsed

				} // namespace alternator

									
										300

alternator/rjson.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,300 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "rjson.hh"

				#include "error.hh"

				#include <seastar/core/print.hh>

				#include <seastar/core/thread.hh>

				namespace rjson {

				static allocator the_allocator;

				/*

				 * This wrapper class adds nested level checks to rapidjson's handlers.

				 * Each rapidjson handler implements functions for accepting JSON values,

				 * which includes strings, numbers, objects, arrays, etc.

				 * Parsing objects and arrays needs to be performed carefully with regard

				 * to stack overflow - each object/array layer adds another stack frame

				 * to parsing, printing and destroying the parent JSON document.

				 * To prevent stack overflow, a rapidjson handler can be wrapped with

				 * guarded_json_handler, which accepts an additional max_nested_level parameter.

				 * After trying to exceed the max nested level, a proper rjson::error will be thrown.

				 */

				template<typename Handler, bool EnableYield>

				struct guarded_yieldable_json_handler : public Handler {

				    size_t _nested_level = 0;

				    size_t _max_nested_level;

				public:

				    using handler_base = Handler;

				    explicit guarded_yieldable_json_handler(size_t max_nested_level) : _max_nested_level(max_nested_level) {}

				    guarded_yieldable_json_handler(string_buffer& buf, size_t max_nested_level)

				            : handler_base(buf), _max_nested_level(max_nested_level) {}

				    void Parse(const char* str, size_t length) {

				        rapidjson::MemoryStream ms(static_cast<const char*>(str), length * sizeof(typename encoding::Ch));

				        rapidjson::EncodedInputStream<encoding, rapidjson::MemoryStream> is(ms);

				        rapidjson::GenericReader<encoding, encoding, allocator> reader(&the_allocator);

				        reader.Parse(is, *this);

				        if (reader.HasParseError()) {

				            throw rjson::error(format("Parsing JSON failed: {}", rapidjson::GetParseError_En(reader.GetParseErrorCode())));

				        }

				        //NOTICE: The handler has parsed the string, but in case of rapidjson::GenericDocument

				        // the data now resides in an internal stack_ variable, which is private instead of

				        // protected... which means we cannot simply access its data. Fortunately, another

				        // function for populating documents from SAX events can be abused to extract the data

				        // from the stack via gadget-oriented programming - we use an empty event generator

				        // which does nothing, and use it to call Populate(), which assumes that the generator

				        // will fill the stack with something. It won't, but our stack is already filled with

				        // data we want to steal, so once Populate() ends, our document will be properly parsed.

				        // A proper solution could be programmed once rapidjson declares this stack_ variable

				        // as protected instead of private, so that this class can access it.

				        auto dummy_generator = [](handler_base&){return true;};

				        handler_base::Populate(dummy_generator);

				    }

				    bool StartObject() {

				        ++_nested_level;

				        check_nested_level();

				        maybe_yield();

				        return handler_base::StartObject();

				    }

				    bool EndObject(rapidjson::SizeType elements_count = 0) {

				        --_nested_level;

				        return handler_base::EndObject(elements_count);

				    }

				    bool StartArray() {

				        ++_nested_level;

				        check_nested_level();

				        maybe_yield();

				        return handler_base::StartArray();

				    }

				    bool EndArray(rapidjson::SizeType elements_count = 0) {

				        --_nested_level;

				        return handler_base::EndArray(elements_count);

				    }

				    bool Null()                 { maybe_yield(); return handler_base::Null(); }

				    bool Bool(bool b)           { maybe_yield(); return handler_base::Bool(b); }

				    bool Int(int i)             { maybe_yield(); return handler_base::Int(i); }

				    bool Uint(unsigned u)       { maybe_yield(); return handler_base::Uint(u); }

				    bool Int64(int64_t i64)     { maybe_yield(); return handler_base::Int64(i64); }

				    bool Uint64(uint64_t u64)   { maybe_yield(); return handler_base::Uint64(u64); }

				    bool Double(double d)       { maybe_yield(); return handler_base::Double(d); }

				    bool String(const value::Ch* str, size_t length, bool copy = false) { maybe_yield(); return handler_base::String(str, length, copy); }

				    bool Key(const value::Ch* str, size_t length, bool copy = false) { maybe_yield(); return handler_base::Key(str, length, copy); }

				protected:

				    static void maybe_yield() {

				        if constexpr (EnableYield) {

				            thread::maybe_yield();

				        }

				    }

				    void check_nested_level() const {

				        if (RAPIDJSON_UNLIKELY(_nested_level > _max_nested_level)) {

				            throw rjson::error(format("Max nested level reached: {}", _max_nested_level));

				        }

				    }

				};

				std::string print(const rjson::value& value) {

				    string_buffer buffer;

				    guarded_yieldable_json_handler<writer, false> writer(buffer, 39);

				    value.Accept(writer);

				    return std::string(buffer.GetString());

				}

				rjson::value copy(const rjson::value& value) {

				    return rjson::value(value, the_allocator);

				}

				rjson::value parse(std::string_view str) {

				    guarded_yieldable_json_handler<document, false> d(39);

				    d.Parse(str.data(), str.size());

				    if (d.HasParseError()) {

				        throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));

				    }

				    rjson::value& v = d;

				    return std::move(v);

				}

				rjson::value parse_yieldable(std::string_view str) {

				    guarded_yieldable_json_handler<document, true> d(39);

				    d.Parse(str.data(), str.size());

				    if (d.HasParseError()) {

				        throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));

				    }

				    rjson::value& v = d;

				    return std::move(v);

				}

				rjson::value& get(rjson::value& value, std::string_view name) {

				    // Although FindMember() has a variant taking a StringRef, it ignores the

				    // given length (see https://github.com/Tencent/rapidjson/issues/1649).

				    // Luckily, the variant taking a GenericValue doesn't share this bug,

				    // and we can create a string GenericValue without copying the string.

				    auto member_it = value.FindMember(rjson::value(name.data(), name.size()));

				    if (member_it != value.MemberEnd())

				        return member_it->value;

				    else {

				        throw rjson::error(format("JSON parameter {} not found", name));

				    }

				}

				const rjson::value& get(const rjson::value& value, std::string_view name) {

				    auto member_it = value.FindMember(rjson::value(name.data(), name.size()));

				    if (member_it != value.MemberEnd())

				        return member_it->value;

				    else {

				        throw rjson::error(format("JSON parameter {} not found", name));

				    }

				}

				rjson::value from_string(const std::string& str) {

				    return rjson::value(str.c_str(), str.size(), the_allocator);

				}

				rjson::value from_string(const sstring& str) {

				    return rjson::value(str.c_str(), str.size(), the_allocator);

				}

				rjson::value from_string(const char* str, size_t size) {

				    return rjson::value(str, size, the_allocator);

				}

				rjson::value from_string(std::string_view view) {

				    return rjson::value(view.data(), view.size(), the_allocator);

				}

				const rjson::value* find(const rjson::value& value, std::string_view name) {

				    // Although FindMember() has a variant taking a StringRef, it ignores the

				    // given length (see https://github.com/Tencent/rapidjson/issues/1649).

				    // Luckily, the variant taking a GenericValue doesn't share this bug,

				    // and we can create a string GenericValue without copying the string.

				    auto member_it = value.FindMember(rjson::value(name.data(), name.size()));

				    return member_it != value.MemberEnd() ? &member_it->value : nullptr;

				}

				rjson::value* find(rjson::value& value, std::string_view name) {

				    auto member_it = value.FindMember(rjson::value(name.data(), name.size()));

				    return member_it != value.MemberEnd() ? &member_it->value : nullptr;

				}

				bool remove_member(rjson::value& value, std::string_view name) {

				    // Although RemoveMember() has a variant taking a StringRef, it ignores

				    // given length (see https://github.com/Tencent/rapidjson/issues/1649).

				    // Luckily, the variant taking a GenericValue doesn't share this bug,

				    // and we can create a string GenericValue without copying the string.

				    return value.RemoveMember(rjson::value(name.data(), name.size()));

				}

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member) {

				    base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), std::move(member), the_allocator);

				}

				void set_with_string_name(rjson::value& base, std::string_view name, rjson::value&& member) {

				    base.AddMember(rjson::value(name.data(), name.size(), the_allocator), std::move(member), the_allocator);

				}

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member) {

				    base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), rjson::value(member), the_allocator);

				}

				void set_with_string_name(rjson::value& base, std::string_view name, rjson::string_ref_type member) {

				    base.AddMember(rjson::value(name.data(), name.size(), the_allocator), rjson::value(member), the_allocator);

				}

				void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member) {

				    base.AddMember(name, std::move(member), the_allocator);

				}

				void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member) {

				    base.AddMember(name, rjson::value(member), the_allocator);

				}

				void push_back(rjson::value& base_array, rjson::value&& item) {

				    base_array.PushBack(std::move(item), the_allocator);

				}

				bool single_value_comp::operator()(const rjson::value& r1, const rjson::value& r2) const {

				   auto r1_type = r1.GetType();

				   auto r2_type = r2.GetType();

				   // null is the smallest type and compares with every other type, nothing is lesser than null

				   if (r1_type == rjson::type::kNullType || r2_type == rjson::type::kNullType) {

				       return r1_type < r2_type;

				   }

				   // only null, true, and false are comparable with each other, other types are not compatible

				   if (r1_type != r2_type) {

				       if (r1_type > rjson::type::kTrueType || r2_type > rjson::type::kTrueType) {

				           throw rjson::error(format("Types are not comparable: {} {}", r1, r2));

				       }

				   }

				   switch (r1_type) {

				   case rjson::type::kNullType:

				       // fall-through

				   case rjson::type::kFalseType:

				       // fall-through

				   case rjson::type::kTrueType:

				       return r1_type < r2_type;

				   case rjson::type::kObjectType:

				       throw rjson::error("Object type comparison is not supported");

				   case rjson::type::kArrayType:

				       throw rjson::error("Array type comparison is not supported");

				   case rjson::type::kStringType: {

				       const size_t r1_len = r1.GetStringLength();

				       const size_t r2_len = r2.GetStringLength();

				       size_t len = std::min(r1_len, r2_len);

				       int result = std::strncmp(r1.GetString(), r2.GetString(), len);

				       return result < 0 || (result == 0 && r1_len < r2_len);

				   }

				   case rjson::type::kNumberType: {

				       if (r1.IsInt() && r2.IsInt()) {

				           return r1.GetInt() < r2.GetInt();

				       } else if (r1.IsUint() && r2.IsUint()) {

				           return r1.GetUint() < r2.GetUint();

				       } else if (r1.IsInt64() && r2.IsInt64()) {

				           return r1.GetInt64() < r2.GetInt64();

				       } else if (r1.IsUint64() && r2.IsUint64()) {

				           return r1.GetUint64() < r2.GetUint64();

				       } else {

				           // it's safe to call GetDouble() on any number type

				           return r1.GetDouble() < r2.GetDouble();

				       }

				   }

				   default:

				       return false;

				   }

				}

				} // end namespace rjson

				std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {

				    return os << rjson::print(v);

				}

									
										177

alternator/rjson.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,177 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				/*

				 * rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.

				 *

				 * rapidjson has strict copy elision policies, which, among other things, involves

				 * using provided char arrays without copying them and allows copying objects only explicitly.

				 * As such, one should be careful when passing strings with limited liveness

				 * (e.g. data underneath local std::strings) to rjson functions, because created JSON objects

				 * may end up relying on dangling char pointers. All rjson functions that create JSONs from strings

				 * by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live

				 * at least as long as the object, e.g. a static char array) and for std::strings. The more optimal

				 * variants should be used *only* if the liveness of the string is guaranteed, otherwise it will

				 * result in undefined behaviour.

				 * Also, bear in mind that methods exposed by rjson::value are generic, but some of them

				 * work fine only for specific types. In case the type does not match, an rjson::error will be thrown.

				 * Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type

				 * or calling Size() on a non-array value.

				 */

				#include <string>

				#include <stdexcept>

				namespace rjson {

				class error : public std::exception {

				    std::string _msg;

				public:

				    error() = default;

				    error(const std::string& msg) : _msg(msg) {}

				    virtual const char* what() const noexcept override { return _msg.c_str(); }

				};

				}

				// rapidjson configuration macros

				#define RAPIDJSON_HAS_STDSTRING 1

				// Default rjson policy is to use assert() - which is dangerous for two reasons:

				// 1. assert() can be turned off with -DNDEBUG

				// 2. assert() crashes a program

				// Fortunately, the default policy can be overridden, and so rapidjson errors will

				// throw an rjson::error exception instead.

				#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)

				#include <rapidjson/document.h>

				#include <rapidjson/writer.h>

				#include <rapidjson/stringbuffer.h>

				#include <rapidjson/error/en.h>

				#include <seastar/core/sstring.hh>

				#include "seastarx.hh"

				namespace rjson {

				using allocator = rapidjson::CrtAllocator;

				using encoding = rapidjson::UTF8<>;

				using document = rapidjson::GenericDocument<encoding, allocator>;

				using value = rapidjson::GenericValue<encoding, allocator>;

				using string_ref_type = value::StringRefType;

				using string_buffer = rapidjson::GenericStringBuffer<encoding>;

				using writer = rapidjson::Writer<string_buffer, encoding>;

				using type = rapidjson::Type;

				// Returns an object representing JSON's null

				inline rjson::value null_value() {

				    return rjson::value(rapidjson::kNullType);

				}

				// Returns an empty JSON object - {}

				inline rjson::value empty_object() {

				    return rjson::value(rapidjson::kObjectType);

				}

				// Returns an empty JSON array - []

				inline rjson::value empty_array() {

				    return rjson::value(rapidjson::kArrayType);

				}

				// Returns an empty JSON string - ""

				inline rjson::value empty_string() {

				    return rjson::value(rapidjson::kStringType);

				}

				// Convert the JSON value to a string with JSON syntax, the opposite of parse().

				// The representation is dense - without any redundant indentation.

				std::string print(const rjson::value& value);

				// Returns a string_view to the string held in a JSON value (which is

				// assumed to hold a string, i.e., v.IsString() == true). This is a view

				// to the existing data - no copying is done.

				inline std::string_view to_string_view(const rjson::value& v) {

				    return std::string_view(v.GetString(), v.GetStringLength());

				}

				// Copies given JSON value - involves allocation

				rjson::value copy(const rjson::value& value);

				// Parses a JSON value from given string or raw character array.

				// The string/char array liveness does not need to be persisted,

				// as parse() will allocate member names and values.

				// Throws rjson::error if parsing failed.

				rjson::value parse(std::string_view str);

				// Needs to be run in thread context

				rjson::value parse_yieldable(std::string_view str);

				// Creates a JSON value (of JSON string type) out of internal string representations.

				// The string value is copied, so str's liveness does not need to be persisted.

				rjson::value from_string(const std::string& str);

				rjson::value from_string(const sstring& str);

				rjson::value from_string(const char* str, size_t size);

				rjson::value from_string(std::string_view view);

				// Returns a pointer to JSON member if it exists, nullptr otherwise

				rjson::value* find(rjson::value& value, std::string_view name);

				const rjson::value* find(const rjson::value& value, std::string_view name);

				// Returns a reference to JSON member if it exists, throws otherwise

				rjson::value& get(rjson::value& value, std::string_view name);

				const rjson::value& get(const rjson::value& value, std::string_view name);

				// Sets a member in given JSON object by moving the member - allocates the name.

				// Throws if base is not a JSON object.

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);

				void set_with_string_name(rjson::value& base, std::string_view name, rjson::value&& member);

				// Sets a string member in given JSON object by assigning its reference - allocates the name.

				// NOTICE: member string liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);

				void set_with_string_name(rjson::value& base, std::string_view name, rjson::string_ref_type member);

				// Sets a member in given JSON object by moving the member.

				// NOTICE: name liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);

				// Sets a string member in given JSON object by assigning its reference.

				// NOTICE: name liveness must be ensured to be at least as long as base's.

				// NOTICE: member liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);

				// Adds a value to a JSON list by moving the item to its end.

				// Throws if base_array is not a JSON array.

				void push_back(rjson::value& base_array, rjson::value&& item);

				// Remove a member from a JSON object. Throws if value isn't an object.

				bool remove_member(rjson::value& value, std::string_view name);

				struct single_value_comp {

				    bool operator()(const rjson::value& r1, const rjson::value& r2) const;

				};

				} // end namespace rjson

				namespace std {

				std::ostream& operator<<(std::ostream& os, const rjson::value& v);

				}

									
										124

alternator/rmw_operation.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,124 @@

				/*

				 * Copyright 2020 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <seastarx.hh>

				#include <service/storage_proxy.hh>

				#include <service/storage_proxy.hh>

				#include "rjson.hh"

				#include "executor.hh"

				namespace alternator {

				// An rmw_operation encapsulates the common logic of all the item update

				// operations which may involve a read of the item before the write

				// (so-called Read-Modify-Write operations). These operations include PutItem,

				// UpdateItem and DeleteItem: All of these may be conditional operations (the

				// "Expected" parameter) which requir a read before the write, and UpdateItem

				// may also have an update expression which refers to the item's old value.

				//

				// The code below supports running the read and the write together as one

				// transaction using LWT (this is why rmw_operation is a subclass of

				// cas_request, as required by storage_proxy::cas()), but also has optional

				// modes not using LWT.

				class rmw_operation : public service::cas_request, public enable_shared_from_this<rmw_operation> {

				public:

				    // The following options choose which mechanism to use for isolating

				    // parallel write operations:

				    // * The FORBID_RMW option forbids RMW (read-modify-write) operations

				    //   such as conditional updates. For the remaining write-only

				    //   operations, ordinary quorum writes are isolated enough.

				    // * The LWT_ALWAYS option always uses LWT (lightweight transactions)

				    //   for any write operation - whether or not it also has a read.

				    // * The LWT_RMW_ONLY option uses LWT only for RMW operations, and uses

				    //   ordinary quorum writes for write-only operations.

				    //   This option is not safe if the user may send both RMW and write-only

				    //   operations on the same item.

				    // * The UNSAFE_RMW option does read-modify-write operations as separate

				    //   read and write. It is unsafe - concurrent RMW operations are not

				    //   isolated at all. This option will likely be removed in the future.

				    enum class write_isolation {

				        FORBID_RMW, LWT_ALWAYS, LWT_RMW_ONLY, UNSAFE_RMW

				    };

				    static constexpr auto WRITE_ISOLATION_TAG_KEY = "system:write_isolation";

				    static write_isolation get_write_isolation_for_schema(schema_ptr schema);

				protected:

				    // The full request JSON

				    rjson::value _request;

				    // All RMW operations involve a single item with a specific partition

				    // and optional clustering key, in a single table, so the following

				    // information is common to all of them:

				    schema_ptr _schema;

				    partition_key _pk = partition_key::make_empty();

				    clustering_key _ck = clustering_key::make_empty();

				    write_isolation _write_isolation;

				    // All RMW operations can have a ReturnValues parameter from the following

				    // choices. But note that only UpdateItem actually supports all of them:

				    enum class returnvalues {

				        NONE, ALL_OLD, UPDATED_OLD, ALL_NEW, UPDATED_NEW

				    } _returnvalues;

				    static returnvalues parse_returnvalues(const rjson::value& request);

				    // When _returnvalues != NONE, apply() should store here, in JSON form,

				    // the values which are to be returned in the "Attributes" field.

				    // The default null JSON means do not return an Attributes field at all.

				    // This field is marked "mutable" so that the const apply() can modify

				    // it (see explanation below), but note that because apply() may be

				    // called more than once, if apply() will sometimes set this field it

				    // must set it (even if just to the default empty value) every time.

				    mutable rjson::value _return_attributes;

				public:

				    // The constructor of a rmw_operation subclass should parse the request

				    // and try to discover as many input errors as it can before really

				    // attempting the read or write operations.

				    rmw_operation(service::storage_proxy& proxy, rjson::value&& request);

				    // rmw_operation subclasses (update_item_operation, put_item_operation

				    // and delete_item_operation) shall implement an apply() function which

				    // takes the previous value of the item (if it was read) and creates the

				    // write mutation. If the previous value of item does not pass the needed

				    // conditional expression, apply() should return an empty optional.

				    // apply() may throw if it encounters input errors not discovered during

				    // the constructor.

				    // apply() may be called more than once in case of contention, so it must

				    // not change the state saved in the object (issue #7218 was caused by

				    // violating this). We mark apply() "const" to let the compiler validate

				    // this for us. The output-only field _return_attributes is marked

				    // "mutable" above so that apply() can still write to it.

				    virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const = 0;

				    // Convert the above apply() into the signature needed by cas_request:

				    virtual std::optional<mutation> apply(query::result& qr, const query::partition_slice& slice, api::timestamp_type ts) override;

				    virtual ~rmw_operation() = default;

				    schema_ptr schema() const { return _schema; }

				    const rjson::value& request() const { return _request; }

				    rjson::value&& move_request() && { return std::move(_request); }

				    future<executor::request_return_type> execute(service::storage_proxy& proxy,

				            service::client_state& client_state,

				            tracing::trace_state_ptr trace_state,

				            service_permit permit,

				            bool needs_read_before_write,

				            stats& stats);

				    std::optional<shard_id> shard_for_execute(bool needs_read_before_write);

				};

				} // namespace alternator

									
										268

alternator/serialization.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,268 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "base64.hh"

				#include "log.hh"

				#include "serialization.hh"

				#include "error.hh"

				#include "rapidjson/writer.h"

				#include "concrete_types.hh"

				#include "cql3/type_json.hh"

				static logging::logger slogger("alternator-serialization");

				namespace alternator {

				type_info type_info_from_string(std::string type) {

				    static thread_local const std::unordered_map<std::string, type_info> type_infos = {

				        {"S", {alternator_type::S, utf8_type}},

				        {"B", {alternator_type::B, bytes_type}},

				        {"BOOL", {alternator_type::BOOL, boolean_type}},

				        {"N", {alternator_type::N, decimal_type}}, //FIXME: Replace with custom Alternator type when implemented

				    };

				    auto it = type_infos.find(type);

				    if (it == type_infos.end()) {

				        return {alternator_type::NOT_SUPPORTED_YET, utf8_type};

				    }

				    return it->second;

				}

				type_representation represent_type(alternator_type atype) {

				    static thread_local const std::unordered_map<alternator_type, type_representation> type_representations = {

				        {alternator_type::S, {"S", utf8_type}},

				        {alternator_type::B, {"B", bytes_type}},

				        {alternator_type::BOOL, {"BOOL", boolean_type}},

				        {alternator_type::N, {"N", decimal_type}}, //FIXME: Replace with custom Alternator type when implemented

				    };

				    auto it = type_representations.find(atype);

				    if (it == type_representations.end()) {

				        throw std::runtime_error(format("Unknown alternator type {}", int8_t(atype)));

				    }

				    return it->second;

				}

				struct from_json_visitor {

				    const rjson::value& v;

				    bytes_ostream& bo;

				    void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };

				    void operator()(const string_type_impl& t) {

				        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));

				    }

				    void operator()(const bytes_type_impl& t) const {

				        bo.write(base64_decode(v));

				    }

				    void operator()(const boolean_type_impl& t) const {

				        bo.write(boolean_type->decompose(v.GetBool()));

				    }

				    void operator()(const decimal_type_impl& t) const {

				        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));

				    }

				    // default

				    void operator()(const abstract_type& t) const {

				        bo.write(from_json_object(t, Json::Value(rjson::print(v)), cql_serialization_format::internal()));

				    }

				};

				bytes serialize_item(const rjson::value& item) {

				    if (item.IsNull() || item.MemberCount() != 1) {

				        throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));

				    }

				    auto it = item.MemberBegin();

				    type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings

				    if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {

				        slogger.trace("Non-optimal serialization of type {}", it->name.GetString());

				        return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));

				    }

				    bytes_ostream bo;

				    bo.write(bytes{int8_t(type_info.atype)});

				    visit(*type_info.dtype, from_json_visitor{it->value, bo});

				    return bytes(bo.linearize());

				}

				struct to_json_visitor {

				    rjson::value& deserialized;

				    const std::string& type_ident;

				    bytes_view bv;

				    void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };

				    void operator()(const decimal_type_impl& t) const {

				        auto s = to_json_string(*decimal_type, bytes(bv));

				        //FIXME(sarna): unnecessary copy

				        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));

				    }

				    void operator()(const string_type_impl& t) {

				        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(reinterpret_cast<const char *>(bv.data()), bv.size()));

				    }

				    void operator()(const bytes_type_impl& t) const {

				        std::string b64 = base64_encode(bv);

				        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(b64));

				    }

				    // default

				    void operator()(const abstract_type& t) const {

				        rjson::set_with_string_name(deserialized, type_ident, rjson::parse(t.to_string(bytes(bv))));

				    }

				};

				rjson::value deserialize_item(bytes_view bv) {

				    rjson::value deserialized(rapidjson::kObjectType);

				    if (bv.empty()) {

				        throw api_error("ValidationException", "Serialized value empty");

				    }

				    alternator_type atype = alternator_type(bv[0]);

				    bv.remove_prefix(1);

				    if (atype == alternator_type::NOT_SUPPORTED_YET) {

				        slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));

				        return rjson::parse(std::string_view(reinterpret_cast<const char *>(bv.data()), bv.size()));

				    }

				    type_representation type_representation = represent_type(atype);

				    visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});

				    return deserialized;

				}

				std::string type_to_string(data_type type) {

				    static thread_local std::unordered_map<data_type, std::string> types = {

				        {utf8_type, "S"},

				        {bytes_type, "B"},

				        {boolean_type, "BOOL"},

				        {decimal_type, "N"}, // FIXME: use a specialized Alternator number type instead of the general decimal_type

				    };

				    auto it = types.find(type);

				    if (it == types.end()) {

				        throw std::runtime_error(format("Unknown type {}", type->name()));

				    }

				    return it->second;

				}

				bytes get_key_column_value(const rjson::value& item, const column_definition& column) {

				    std::string column_name = column.name_as_text();

				    const rjson::value* key_typed_value = rjson::find(item, column_name);

				    if (!key_typed_value) {

				        throw api_error("ValidationException", format("Key column {} not found", column_name));

				    }

				    return get_key_from_typed_value(*key_typed_value, column);

				}

				// Parses the JSON encoding for a key value, which is a map with a single

				// entry, whose key is the type (expected to match the key column's type)

				// and the value is the encoded value.

				bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column) {

				    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1 ||

				            !key_typed_value.MemberBegin()->value.IsString()) {

				        throw api_error("ValidationException",

				                format("Malformed value object for key column {}: {}",

				                        column.name_as_text(), key_typed_value));

				    }

				    auto it = key_typed_value.MemberBegin();

				    if (it->name != type_to_string(column.type)) {

				        throw api_error("ValidationException",

				                format("Type mismatch: expected type {} for key column {}, got type {}",

				                        type_to_string(column.type), column.name_as_text(), it->name.GetString()));

				    }

				    if (column.type == bytes_type) {

				        return base64_decode(it->value);

				    } else {

				        return column.type->from_string(rjson::to_string_view(it->value));

				    }

				}

				rjson::value json_key_column_value(bytes_view cell, const column_definition& column) {

				    if (column.type == bytes_type) {

				        std::string b64 = base64_encode(cell);

				        return rjson::from_string(b64);

				    } if (column.type == utf8_type) {

				        return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));

				    } else if (column.type == decimal_type) {

				        // FIXME: use specialized Alternator number type, not the more

				        // general "decimal_type". A dedicated type can be more efficient

				        // in storage space and in parsing speed.

				        auto s = to_json_string(*decimal_type, bytes(cell));

				        return rjson::from_string(s);

				    } else {

				        // We shouldn't get here, we shouldn't see such key columns.

				        throw std::runtime_error(format("Unexpected key type: {}", column.type->name()));

				    }

				}

				partition_key pk_from_json(const rjson::value& item, schema_ptr schema) {

				    std::vector<bytes> raw_pk;

				    // FIXME: this is a loop, but we really allow only one partition key column.

				    for (const column_definition& cdef : schema->partition_key_columns()) {

				        bytes raw_value = get_key_column_value(item, cdef);

				        raw_pk.push_back(std::move(raw_value));

				    }

				   return partition_key::from_exploded(raw_pk);

				}

				clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {

				    if (schema->clustering_key_size() == 0) {

				        return clustering_key::make_empty();

				    }

				    std::vector<bytes> raw_ck;

				    // FIXME: this is a loop, but we really allow only one clustering key column.

				    for (const column_definition& cdef : schema->clustering_key_columns()) {

				        bytes raw_value = get_key_column_value(item,  cdef);

				        raw_ck.push_back(std::move(raw_value));

				    }

				    return clustering_key::from_exploded(raw_ck);

				}

				big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        throw api_error("ValidationException", format("{}: invalid number object", diagnostic));

				    }

				    auto it = v.MemberBegin();

				    if (it->name != "N") {

				        throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));

				    }

				    if (it->value.IsNumber()) {

				         // FIXME(sarna): should use big_decimal constructor with numeric values directly:

				        return big_decimal(rjson::print(it->value));

				    }

				    if (!it->value.IsString()) {

				        throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));

				    }

				    return big_decimal(it->value.GetString());

				}

				const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        return {"", nullptr};

				    }

				    auto it = v.MemberBegin();

				    const std::string it_key = it->name.GetString();

				    if (it_key != "SS" && it_key != "BS" && it_key != "NS") {

				        return {"", nullptr};

				    }

				    return std::make_pair(it_key, &(it->value));

				}

				}

									
										72

alternator/serialization.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,72 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string>

				#include <string_view>

				#include "types.hh"

				#include "schema_fwd.hh"

				#include "keys.hh"

				#include "rjson.hh"

				#include "utils/big_decimal.hh"

				namespace alternator {

				enum class alternator_type : int8_t {

				    S, B, BOOL, N, NOT_SUPPORTED_YET

				};

				struct type_info {

				    alternator_type atype;

				    data_type dtype;

				};

				struct type_representation {

				    std::string ident;

				    data_type dtype;

				};

				type_info type_info_from_string(std::string type);

				type_representation represent_type(alternator_type atype);

				bytes serialize_item(const rjson::value& item);

				rjson::value deserialize_item(bytes_view bv);

				std::string type_to_string(data_type type);

				bytes get_key_column_value(const rjson::value& item, const column_definition& column);

				bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column);

				rjson::value json_key_column_value(bytes_view cell, const column_definition& column);

				partition_key pk_from_json(const rjson::value& item, schema_ptr schema);

				clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);

				// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it.  Otherwise,

				// raises ValidationException with diagnostic.

				big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);

				// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"

				// and returns set's type and a pointer to that set. If the object does not encode a set,

				// returned value is {"", nullptr}

				const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);

				}

									
										483

alternator/server.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,483 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "alternator/server.hh"

				#include "log.hh"

				#include <seastar/http/function_handlers.hh>

				#include <seastar/json/json_elements.hh>

				#include <seastarx.hh>

				#include "error.hh"

				#include "rjson.hh"

				#include "auth.hh"

				#include <cctype>

				#include "cql3/query_processor.hh"

				#include "service/storage_service.hh"

				#include "utils/overloaded_functor.hh"

				static logging::logger slogger("alternator-server");

				using namespace httpd;

				namespace alternator {

				static constexpr auto TARGET = "X-Amz-Target";

				inline std::vector<std::string_view> split(std::string_view text, char separator) {

				    std::vector<std::string_view> tokens;

				    if (text == "") {

				        return tokens;

				    }

				    while (true) {

				        auto pos = text.find_first_of(separator);

				        if (pos != std::string_view::npos) {

				            tokens.emplace_back(text.data(), pos);

				            text.remove_prefix(pos + 1);

				        } else {

				            tokens.emplace_back(text);

				            break;

				        }

				    }

				    return tokens;

				}

				// DynamoDB HTTP error responses are structured as follows

				// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html

				// Our handlers throw an exception to report an error. If the exception

				// is of type alternator::api_error, it unwrapped and properly reported to

				// the user directly. Other exceptions are unexpected, and reported as

				// Internal Server Error.

				class api_handler : public handler_base {

				public:

				    api_handler(const std::function<future<executor::request_return_type>(std::unique_ptr<request> req)>& _handle) : _f_handle(

				         [this, _handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {

				         return seastar::futurize_apply(_handle, std::move(req)).then_wrapped([this, rep = std::move(rep)](future<executor::request_return_type> resf) mutable {

				             if (resf.failed()) {

				                 // Exceptions of type api_error are wrapped as JSON and

				                 // returned to the client as expected. Other types of

				                 // exceptions are unexpected, and returned to the user

				                 // as an internal server error:

				                 api_error ret;

				                 try {

				                     resf.get();

				                 } catch (api_error &ae) {

				                     ret = ae;

				                 } catch (rjson::error & re) {

				                     ret = api_error("ValidationException", re.what());

				                 } catch (...) {

				                     ret = api_error(

				                             "Internal Server Error",

				                             format("Internal server error: {}", std::current_exception()),

				                             reply::status_type::internal_server_error);

				                 }

				                 generate_error_reply(*rep, ret);

				                 return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				             }

				             auto res = resf.get0();

				             std::visit(overloaded_functor {

				                 [&] (const json::json_return_type& json_return_value) {

				                     slogger.trace("api_handler success case");

				                     if (json_return_value._body_writer) {

				                         rep->write_body("json", std::move(json_return_value._body_writer));

				                     } else {

				                         rep->_content += json_return_value._res;

				                     }

				                 },

				                 [&] (const api_error& err) {

				                     generate_error_reply(*rep, err);

				                 }

				             }, res);

				             return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				         });

				    }), _type("json") { }

				    api_handler(const api_handler&) = default;

				    future<std::unique_ptr<reply>> handle(const sstring& path,

				            std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        return _f_handle(std::move(req), std::move(rep)).then(

				                [this](std::unique_ptr<reply> rep) {

				                    rep->done(_type);

				                    return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				                });

				    }

				protected:

				    void generate_error_reply(reply& rep, const api_error& err) {

				        rep._content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + err._type + "\"," +

				                "\"message\":\"" + err._msg + "\"}";

				        rep._status = err._http_code;

				        slogger.trace("api_handler error case: {}", rep._content);

				    }

				    future_handler_function _f_handle;

				    sstring _type;

				};

				class gated_handler : public handler_base {

				    seastar::gate& _gate;

				public:

				    gated_handler(seastar::gate& gate) : _gate(gate) {}

				    virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) = 0;

				    virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) final override {

				        return with_gate(_gate, [this, &path, req = std::move(req), rep = std::move(rep)] () mutable {

				            return do_handle(path, std::move(req), std::move(rep));

				        });

				    }

				};

				class health_handler : public gated_handler {

				public:

				    health_handler(seastar::gate& pending_requests) : gated_handler(pending_requests) {}

				protected:

				    virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        rep->set_status(reply::status_type::ok);

				        rep->write_body("txt", format("healthy: {}", req->get_header("Host")));

				        return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				    }

				};

				class local_nodelist_handler : public gated_handler {

				public:

				    local_nodelist_handler(seastar::gate& pending_requests) : gated_handler(pending_requests) {}

				protected:

				    virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        rjson::value results = rjson::empty_array();

				        // It's very easy to get a list of all live nodes on the cluster,

				        // using gms::get_local_gossiper().get_live_members(). But getting

				        // just the list of live nodes in this DC needs more elaborate code:

				        sstring local_dc = locator::i_endpoint_snitch::get_local_snitch_ptr()->get_datacenter(

				                utils::fb_utilities::get_broadcast_address());

				        std::unordered_set<gms::inet_address> local_dc_nodes =

				                service::get_local_storage_service().get_token_metadata().

				                get_topology().get_datacenter_endpoints().at(local_dc);

				        for (auto& ip : local_dc_nodes) {

				            if (gms::get_local_gossiper().is_alive(ip)) {

				                rjson::push_back(results, rjson::from_string(ip.to_sstring()));

				            }

				        }

				        rep->set_status(reply::status_type::ok);

				        rep->set_content_type("json");

				        rep->_content = rjson::print(results);

				        return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				    }

				};

				future<> server::verify_signature(const request& req) {

				    if (!_enforce_authorization) {

				        slogger.debug("Skipping authorization");

				        return make_ready_future<>();

				    }

				    auto host_it = req._headers.find("Host");

				    if (host_it == req._headers.end()) {

				        throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");

				    }

				    auto authorization_it = req._headers.find("Authorization");

				    if (authorization_it == req._headers.end()) {

				        throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");

				    }

				    std::string host = host_it->second;

				    std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');

				    std::string credential;

				    std::string user_signature;

				    std::string signed_headers_str;

				    std::vector<std::string_view> signed_headers;

				    for (std::string_view entry : credentials_raw) {

				        std::vector<std::string_view> entry_split = split(entry, '=');

				        if (entry_split.size() != 2) {

				            if (entry != "AWS4-HMAC-SHA256") {

				                throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));

				            }

				            continue;

				        }

				        std::string_view auth_value = entry_split[1];

				        // Commas appear as an additional (quite redundant) delimiter

				        if (auth_value.back() == ',') {

				            auth_value.remove_suffix(1);

				        }

				        if (entry_split[0] == "Credential") {

				            credential = std::string(auth_value);

				        } else if (entry_split[0] == "Signature") {

				            user_signature = std::string(auth_value);

				        } else if (entry_split[0] == "SignedHeaders") {

				            signed_headers_str = std::string(auth_value);

				            signed_headers = split(auth_value, ';');

				            std::sort(signed_headers.begin(), signed_headers.end());

				        }

				    }

				    std::vector<std::string_view> credential_split = split(credential, '/');

				    if (credential_split.size() != 5) {

				        throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));

				    }

				    std::string user(credential_split[0]);

				    std::string datestamp(credential_split[1]);

				    std::string region(credential_split[2]);

				    std::string service(credential_split[3]);

				    std::map<std::string_view, std::string_view> signed_headers_map;

				    for (const auto& header : signed_headers) {

				        signed_headers_map.emplace(header, std::string_view());

				    }

				    for (auto& header : req._headers) {

				        std::string header_str;

				        header_str.resize(header.first.size());

				        std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);

				        auto it = signed_headers_map.find(header_str);

				        if (it != signed_headers_map.end()) {

				            it->second = std::string_view(header.second);

				        }

				    }

				    auto cache_getter = [] (std::string username) {

				        return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));

				    };

				    return _key_cache.get_ptr(user, cache_getter).then([this, &req,

				                                                    user = std::move(user),

				                                                    host = std::move(host),

				                                                    datestamp = std::move(datestamp),

				                                                    signed_headers_str = std::move(signed_headers_str),

				                                                    signed_headers_map = std::move(signed_headers_map),

				                                                    region = std::move(region),

				                                                    service = std::move(service),

				                                                    user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {

				        std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,

				                datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");

				        if (signature != std::string_view(user_signature)) {

				            _key_cache.remove(user);

				            throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");

				        }

				    });

				}

				future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {

				    _executor._stats.total_operations++;

				    sstring target = req->get_header(TARGET);

				    std::vector<std::string_view> split_target = split(target, '.');

				    //NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)

				    std::string op = split_target.empty() ? std::string() : std::string(split_target.back());

				    slogger.trace("Request: {} {}", op, req->content);

				    return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {

				        auto callback_it = _callbacks.find(op);

				        if (callback_it == _callbacks.end()) {

				            _executor._stats.unsupported_operations++;

				            throw api_error("UnknownOperationException",

				                    format("Unsupported operation {}", op));

				        }

				        return with_gate(_pending_requests, [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] () mutable {

				            //FIXME: Client state can provide more context, e.g. client's endpoint address

				            // We use unique_ptr because client_state cannot be moved or copied

				            return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()),

				                    [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {

				                tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);

				                tracing::trace(trace_state, op);

				                // JSON parsing can allocate up to roughly 2x the size of the raw document, + a couple of bytes for maintenance.

				                // FIXME: by this time, the whole HTTP request was already read, so some memory is already occupied.

				                // Once HTTP allows working on streams, we should grab the permit *before* reading the HTTP payload.

				                size_t mem_estimate = req->content.size() * 3 + 8000;

				                auto units_fut = get_units(*_memory_limiter, mem_estimate);

				                if (_memory_limiter->waiters()) {

				                    ++_executor._stats.requests_blocked_memory;

				                }

				                return units_fut.then([this, callback_it = std::move(callback_it), &client_state, trace_state, req = std::move(req)] (semaphore_units<> units) mutable {

				                    return _json_parser.parse(req->content).then([this, callback_it = std::move(callback_it), &client_state, trace_state,

				                            units = std::move(units), req = std::move(req)] (rjson::value json_request) mutable {

				                        return callback_it->second(_executor, *client_state, trace_state, make_service_permit(std::move(units)), std::move(json_request), std::move(req)).finally([trace_state] {});

				                    });

				                });

				            });

				        });

				    });

				}

				void server::set_routes(routes& r) {

				    api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {

				        return handle_api_request(std::move(req));

				    });

				    r.put(operation_type::POST, "/", req_handler);

				    r.put(operation_type::GET, "/", new health_handler(_pending_requests));

				    // The "/localnodes" request is a new Alternator feature, not supported by

				    // DynamoDB and not required for DynamoDB compatibility. It allows a

				    // client to enquire - using a trivial HTTP request without requiring

				    // authentication - the list of all live nodes in the same data center of

				    // the Alternator cluster. The client can use this list to balance its

				    // request load to all the nodes in the same geographical region.

				    // Note that this API exposes - openly without authentication - the

				    // information on the cluster's members inside one data center. We do not

				    // consider this to be a security risk, because an attacker can already

				    // scan an entire subnet for nodes responding to the health request,

				    // or even just scan for open ports.

				    r.put(operation_type::GET, "/localnodes", new local_nodelist_handler(_pending_requests));

				}

				//FIXME: A way to immediately invalidate the cache should be considered,

				// e.g. when the system table which stores the keys is changed.

				// For now, this propagation may take up to 1 minute.

				server::server(executor& exec)

				        : _http_server("http-alternator")

				        , _https_server("https-alternator")

				        , _executor(exec)

				        , _key_cache(1024, 1min, slogger)

				        , _enforce_authorization(false)

				        , _enabled_servers{}

				        , _pending_requests{}

				      , _callbacks{

				        {"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.describe_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.delete_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.put_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.update_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.delete_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.list_tables(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.scan(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.describe_endpoints(client_state, std::move(permit), std::move(json_request), req->get_header("Host"));

				        }},

				        {"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.batch_write_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.batch_get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.query(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"TagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.tag_resource(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"UntagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.untag_resource(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"ListTagsOfResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.list_tags_of_resource(client_state, std::move(permit), std::move(json_request));

				        }},

				    } {

				}

				future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,

				        bool enforce_authorization, semaphore* memory_limiter) {

				    _memory_limiter = memory_limiter;

				    _enforce_authorization = enforce_authorization;

				    if (!port && !https_port) {

				        return make_exception_future<>(std::runtime_error("Either regular port or TLS port"

				                " must be specified in order to init an alternator HTTP server instance"));

				    }

				    return seastar::async([this, addr, port, https_port, creds] {

				        try {

				            _executor.start().get();

				            if (port) {

				                set_routes(_http_server._routes);

				                _http_server.set_content_length_limit(server::content_length_limit);

				                _http_server.listen(socket_address{addr, *port}).get();

				                _enabled_servers.push_back(std::ref(_http_server));

				                slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);

				            }

				            if (https_port) {

				                set_routes(_https_server._routes);

				                _https_server.set_content_length_limit(server::content_length_limit);

				                _https_server.set_tls_credentials(creds->build_server_credentials());

				                _https_server.listen(socket_address{addr, *https_port}).get();

				                _enabled_servers.push_back(std::ref(_https_server));

				                slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);

				            }

				        } catch (...) {

				            slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",

				                    addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());

				            std::throw_with_nested(std::runtime_error(

				                    format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",

				                            addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));

				        }

				    });

				}

				future<> server::stop() {

				    return parallel_for_each(_enabled_servers, [] (http_server& server) {

				        return server.stop();

				    }).then([this] {

				        return _pending_requests.close();

				    }).then([this] {

				        return _json_parser.stop();

				    });

				}

				server::json_parser::json_parser() : _run_parse_json_thread(async([this] {

				        while (true) {

				            _document_waiting.wait().get();

				            if (_as.abort_requested()) {

				                return;

				            }

				            try {

				                _parsed_document = rjson::parse_yieldable(_raw_document);

				                _current_exception = nullptr;

				            } catch (...) {

				                _current_exception = std::current_exception();

				            }

				            _document_parsed.signal();

				        }

				    })) {

				}

				future<rjson::value> server::json_parser::parse(std::string_view content) {

				    if (content.size() < yieldable_parsing_threshold) {

				        return make_ready_future<rjson::value>(rjson::parse(content));

				    }

				    return with_semaphore(_parsing_sem, 1, [this, content] {

				        _raw_document = content;

				        _document_waiting.signal();

				        return _document_parsed.wait().then([this] {

				            if (_current_exception) {

				                return make_exception_future<rjson::value>(_current_exception);

				            }

				            return make_ready_future<rjson::value>(std::move(_parsed_document));

				        });

				    });

				}

				future<> server::json_parser::stop() {

				    _as.request_abort();

				    _document_waiting.signal();

				    _document_parsed.broken();

				    return std::move(_run_parse_json_thread);

				}

				}

									
										83

alternator/server.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,83 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include "alternator/executor.hh"

				#include <seastar/core/future.hh>

				#include <seastar/http/httpd.hh>

				#include <seastar/net/tls.hh>

				#include <optional>

				#include <alternator/auth.hh>

				#include <utils/small_vector.hh>

				#include <seastar/core/units.hh>

				namespace alternator {

				class server {

				    static constexpr size_t content_length_limit = 16*MB;

				    using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,

				            tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<request>)>;

				    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;

				    http_server _http_server;

				    http_server _https_server;

				    executor& _executor;

				    key_cache _key_cache;

				    bool _enforce_authorization;

				    utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;

				    gate _pending_requests;

				    alternator_callbacks_map _callbacks;

				    semaphore* _memory_limiter;

				    class json_parser {

				        static constexpr size_t yieldable_parsing_threshold = 16*KB;

				        std::string_view _raw_document;

				        rjson::value _parsed_document;

				        std::exception_ptr _current_exception;

				        semaphore _parsing_sem{1};

				        condition_variable _document_waiting;

				        condition_variable _document_parsed;

				        abort_source _as;

				        future<> _run_parse_json_thread;

				    public:

				        json_parser();

				        future<rjson::value> parse(std::string_view content);

				        future<> stop();

				    };

				    json_parser _json_parser;

				public:

				    server(executor& executor);

				    future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,

				            bool enforce_authorization, semaphore* memory_limiter);

				    future<> stop();

				private:

				    void set_routes(seastar::httpd::routes& r);

				    future<> verify_signature(const seastar::httpd::request& r);

				    future<executor::request_return_type> handle_api_request(std::unique_ptr<request>&& req);

				};

				}

									
										104

alternator/stats.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,104 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "stats.hh"

				#include <seastar/core/metrics.hh>

				namespace alternator {

				const char* ALTERNATOR_METRICS = "alternator";

				stats::stats() : api_operations{} {

				    // Register the

				    seastar::metrics::label op("op");

				    _metrics.add_group("alternator", {

				#define OPERATION(name, CamelCaseName) \

				                seastar::metrics::make_total_operations("operation", api_operations.name, \

				                        seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),

				#define OPERATION_LATENCY(name, CamelCaseName) \

				                seastar::metrics::make_histogram("op_latency", \

				                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),

				            OPERATION(batch_write_item, "BatchWriteItem")

				            OPERATION(create_backup, "CreateBackup")

				            OPERATION(create_global_table, "CreateGlobalTable")

				            OPERATION(create_table, "CreateTable")

				            OPERATION(delete_backup, "DeleteBackup")

				            OPERATION(delete_item, "DeleteItem")

				            OPERATION(delete_table, "DeleteTable")

				            OPERATION(describe_backup, "DescribeBackup")

				            OPERATION(describe_continuous_backups, "DescribeContinuousBackups")

				            OPERATION(describe_endpoints, "DescribeEndpoints")

				            OPERATION(describe_global_table, "DescribeGlobalTable")

				            OPERATION(describe_global_table_settings, "DescribeGlobalTableSettings")

				            OPERATION(describe_limits, "DescribeLimits")

				            OPERATION(describe_table, "DescribeTable")

				            OPERATION(describe_time_to_live, "DescribeTimeToLive")

				            OPERATION(get_item, "GetItem")

				            OPERATION(list_backups, "ListBackups")

				            OPERATION(list_global_tables, "ListGlobalTables")

				            OPERATION(list_tables, "ListTables")

				            OPERATION(list_tags_of_resource, "ListTagsOfResource")

				            OPERATION(put_item, "PutItem")

				            OPERATION(query, "Query")

				            OPERATION(restore_table_from_backup, "RestoreTableFromBackup")

				            OPERATION(restore_table_to_point_in_time, "RestoreTableToPointInTime")

				            OPERATION(scan, "Scan")

				            OPERATION(tag_resource, "TagResource")

				            OPERATION(transact_get_items, "TransactGetItems")

				            OPERATION(transact_write_items, "TransactWriteItems")

				            OPERATION(untag_resource, "UntagResource")

				            OPERATION(update_continuous_backups, "UpdateContinuousBackups")

				            OPERATION(update_global_table, "UpdateGlobalTable")

				            OPERATION(update_global_table_settings, "UpdateGlobalTableSettings")

				            OPERATION(update_item, "UpdateItem")

				            OPERATION(update_table, "UpdateTable")

				            OPERATION(update_time_to_live, "UpdateTimeToLive")

				            OPERATION_LATENCY(put_item_latency, "PutItem")

				            OPERATION_LATENCY(get_item_latency, "GetItem")

				            OPERATION_LATENCY(delete_item_latency, "DeleteItem")

				            OPERATION_LATENCY(update_item_latency, "UpdateItem")

				    });

				    _metrics.add_group("alternator", {

				            seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,

				                    seastar::metrics::description("number of unsupported operations via Alternator API")),

				            seastar::metrics::make_total_operations("total_operations", total_operations,

				                    seastar::metrics::description("number of total operations via Alternator API")),

				            seastar::metrics::make_total_operations("reads_before_write", reads_before_write,

				                    seastar::metrics::description("number of performed read-before-write operations")),

				            seastar::metrics::make_total_operations("write_using_lwt", write_using_lwt,

				                    seastar::metrics::description("number of writes that used LWT")),

				            seastar::metrics::make_total_operations("shard_bounce_for_lwt", shard_bounce_for_lwt,

				                    seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements")),

				            seastar::metrics::make_total_operations("requests_blocked_memory", requests_blocked_memory,

				                    seastar::metrics::description("Counts a number of requests blocked due to memory pressure.")),

				            seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,

				                    seastar::metrics::description("number of rows read during filtering operations")),

				            seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,

				                    seastar::metrics::description("number of rows read and matched during filtering operations")),

				            seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },

				                    seastar::metrics::description("number of rows read and dropped during filtering operations")),

				    });

				}

				}

									
										98

alternator/stats.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,98 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <cstdint>

				#include <seastar/core/metrics_registration.hh>

				#include "seastarx.hh"

				#include "utils/estimated_histogram.hh"

				#include "cql3/stats.hh"

				namespace alternator {

				// Object holding per-shard statistics related to Alternator.

				// While this object is alive, these metrics are also registered to be

				// visible by the metrics REST API, with the "alternator" prefix.

				class stats {

				public:

				    stats();

				    // Count of DynamoDB API operations by types

				    struct {

				        uint64_t batch_get_item = 0;

				        uint64_t batch_write_item = 0;

				        uint64_t create_backup = 0;

				        uint64_t create_global_table = 0;

				        uint64_t create_table = 0;

				        uint64_t delete_backup = 0;

				        uint64_t delete_item = 0;

				        uint64_t delete_table = 0;

				        uint64_t describe_backup = 0;

				        uint64_t describe_continuous_backups = 0;

				        uint64_t describe_endpoints = 0;

				        uint64_t describe_global_table = 0;

				        uint64_t describe_global_table_settings = 0;

				        uint64_t describe_limits = 0;

				        uint64_t describe_table = 0;

				        uint64_t describe_time_to_live = 0;

				        uint64_t get_item = 0;

				        uint64_t list_backups = 0;

				        uint64_t list_global_tables = 0;

				        uint64_t list_tables = 0;

				        uint64_t list_tags_of_resource = 0;

				        uint64_t put_item = 0;

				        uint64_t query = 0;

				        uint64_t restore_table_from_backup = 0;

				        uint64_t restore_table_to_point_in_time = 0;

				        uint64_t scan = 0;

				        uint64_t tag_resource = 0;

				        uint64_t transact_get_items = 0;

				        uint64_t transact_write_items = 0;

				        uint64_t untag_resource = 0;

				        uint64_t update_continuous_backups = 0;

				        uint64_t update_global_table = 0;

				        uint64_t update_global_table_settings = 0;

				        uint64_t update_item = 0;

				        uint64_t update_table = 0;

				        uint64_t update_time_to_live = 0;

				        utils::estimated_histogram put_item_latency;

				        utils::estimated_histogram get_item_latency;

				        utils::estimated_histogram delete_item_latency;

				        utils::estimated_histogram update_item_latency;

				    } api_operations;

				    // Miscellaneous event counters

				    uint64_t total_operations = 0;

				    uint64_t unsupported_operations = 0;

				    uint64_t reads_before_write = 0;

				    uint64_t write_using_lwt = 0;

				    uint64_t shard_bounce_for_lwt = 0;

				    uint64_t requests_blocked_memory = 0;

				    // CQL-derived stats

				    cql3::cql_stats cql_stats;

				private:

				    // The metric_groups object holds this stat object's metrics registered

				    // as long as the stats object is alive.

				    seastar::metrics::metric_groups _metrics;

				};

				}

									
										53

alternator/tags_extension.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,53 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include "serializer.hh"

				#include "schema.hh"

				#include "db/extensions.hh"

				namespace alternator {

				class tags_extension : public schema_extension {

				public:

				    static constexpr auto NAME = "scylla_tags";

				    tags_extension() = default;

				    explicit tags_extension(const std::map<sstring, sstring>& tags) : _tags(std::move(tags)) {}

				    explicit tags_extension(bytes b) : _tags(tags_extension::deserialize(b)) {}

				    explicit tags_extension(const sstring& s) {

				        throw std::logic_error("Cannot create tags from string");

				    }

				    bytes serialize() const override {

				        return ser::serialize_to_buffer<bytes>(_tags);

				    }

				    static std::map<sstring, sstring> deserialize(bytes_view buffer) {

				        return ser::deserialize_from_buffer(buffer, boost::type<std::map<sstring, sstring>>());

				    }

				    const std::map<sstring, sstring>& tags() const {

				        return _tags;

				    }

				private:

				    std::map<sstring, sstring> _tags;

				};

				}

									
										30

api/api-doc/cache_service.json
									
												View File
												
				@@ -13,7 +13,7 @@

				            {

				               "method":"GET",

				               "summary":"get row cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -35,7 +35,7 @@

				                     "description":"row cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -48,7 +48,7 @@

				            {

				               "method":"GET",

				               "summary":"get key cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_key_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -70,7 +70,7 @@

				                     "description":"key cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -83,7 +83,7 @@

				            {

				               "method":"GET",

				               "summary":"get counter cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_counter_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -105,7 +105,7 @@

				                     "description":"counter cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -118,7 +118,7 @@

				            {

				               "method":"GET",

				               "summary":"get row cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -140,7 +140,7 @@

				                     "description":"row cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -153,7 +153,7 @@

				            {

				               "method":"GET",

				               "summary":"get key cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_key_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -175,7 +175,7 @@

				                     "description":"key cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -188,7 +188,7 @@

				            {

				               "method":"GET",

				               "summary":"get counter cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_counter_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -210,7 +210,7 @@

				                     "description":"counter cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -448,7 +448,7 @@

				        {

				          "method": "GET",

				          "summary": "Get key entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_key_entries",

				          "produces": [

				            "application/json"

				@@ -568,7 +568,7 @@

				        {

				          "method": "GET",

				          "summary": "Get row entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_row_entries",

				          "produces": [

				            "application/json"

				@@ -688,7 +688,7 @@

				        {

				          "method": "GET",

				          "summary": "Get counter entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_counter_entries",

				          "produces": [

				            "application/json"

									
										156

api/api-doc/column_family.json
									
												View File
												
				@@ -70,7 +70,7 @@

				            {

				               "method":"POST",

				               "summary":"Force a major compaction of this column family",

				               "type":"string",

				               "type":"void",

				               "nickname":"force_major_compaction",

				               "produces":[

				                  "application/json"

				@@ -121,7 +121,7 @@

				                     "description":"The minimum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -172,7 +172,7 @@

				                     "description":"The maximum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -223,7 +223,7 @@

				                     "description":"The maximum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  },

				                  {

				@@ -231,7 +231,7 @@

				                     "description":"The minimum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -544,7 +544,7 @@

				               "summary":"sstable count for each level. empty unless leveled compaction is used",

				               "type":"array",

				               "items":{

				                  "type":"int"

				                  "type": "long"

				               },

				               "nickname":"get_sstable_count_per_level",

				               "produces":[

				@@ -611,6 +611,54 @@

				            }

				         ]

				      },

				      {

				         "path":"/column_family/toppartitions/{name}",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Toppartitions query",

				               "type":"toppartitions_query_results",

				               "nickname":"toppartitions",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"name",

				                     "description":"The column family name in keyspace:name format",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"duration",

				                     "description":"Duration (in milliseconds) of monitoring operation",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type": "long",

				                     "paramType":"query"

				                  },

				                  {

				                    "name":"list_size",

				                    "description":"number of the top partitions to list",

				                    "required":false,

				                    "allowMultiple":false,

				                    "type": "long",

				                    "paramType":"query"

				                 },

				                 {

				                    "name":"capacity",

				                    "description":"capacity of stream summary: determines amount of resources used in query processing",

				                    "required":false,

				                    "allowMultiple":false,

				                    "type": "long",

				                    "paramType":"query"

				                 }

				              ]

				            }

				         ]

				      },

				      {

				         "path":"/column_family/metrics/memtable_columns_count/",

				         "operations":[

				@@ -873,7 +921,7 @@

				            {

				               "method":"GET",

				               "summary":"Get memtable switch count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_memtable_switch_count",

				               "produces":[

				                  "application/json"

				@@ -897,7 +945,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all memtable switch count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_memtable_switch_count",

				               "produces":[

				                  "application/json"

				@@ -1034,7 +1082,7 @@

				            {

				               "method":"GET",

				               "summary":"Get read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1187,7 +1235,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1203,7 +1251,7 @@

				            {

				               "method":"GET",

				               "summary":"Get range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_range_latency",

				               "produces":[

				                  "application/json"

				@@ -1227,7 +1275,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_range_latency",

				               "produces":[

				                  "application/json"

				@@ -1243,7 +1291,7 @@

				            {

				               "method":"GET",

				               "summary":"Get write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1396,7 +1444,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1412,7 +1460,7 @@

				            {

				               "method":"GET",

				               "summary":"Get pending flushes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_pending_flushes",

				               "produces":[

				                  "application/json"

				@@ -1436,7 +1484,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all pending flushes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_pending_flushes",

				               "produces":[

				                  "application/json"

				@@ -1452,7 +1500,7 @@

				            {

				               "method":"GET",

				               "summary":"Get pending compactions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_pending_compactions",

				               "produces":[

				                  "application/json"

				@@ -1476,7 +1524,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all pending compactions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_pending_compactions",

				               "produces":[

				                  "application/json"

				@@ -1492,7 +1540,7 @@

				            {

				               "method":"GET",

				               "summary":"Get live ss table count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_live_ss_table_count",

				               "produces":[

				                  "application/json"

				@@ -1516,7 +1564,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all live ss table count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_live_ss_table_count",

				               "produces":[

				                  "application/json"

				@@ -1532,7 +1580,7 @@

				            {

				               "method":"GET",

				               "summary":"Get live disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_live_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1556,7 +1604,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all live disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_live_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1572,7 +1620,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1596,7 +1644,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -2052,7 +2100,7 @@

				            {

				               "method":"GET",

				               "summary":"Get speculative retries",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_speculative_retries",

				               "produces":[

				                  "application/json"

				@@ -2076,7 +2124,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all speculative retries",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_speculative_retries",

				               "produces":[

				                  "application/json"

				@@ -2156,7 +2204,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache hit out of range",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_hit_out_of_range",

				               "produces":[

				                  "application/json"

				@@ -2180,7 +2228,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache hit out of range",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_hit_out_of_range",

				               "produces":[

				                  "application/json"

				@@ -2196,7 +2244,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache hit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_hit",

				               "produces":[

				                  "application/json"

				@@ -2220,7 +2268,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache hit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_hit",

				               "produces":[

				                  "application/json"

				@@ -2236,7 +2284,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache miss",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_miss",

				               "produces":[

				                  "application/json"

				@@ -2260,7 +2308,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache miss",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_miss",

				               "produces":[

				                  "application/json"

				@@ -2276,7 +2324,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas prepare",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_prepare",

				               "produces":[

				                  "application/json"

				@@ -2300,7 +2348,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas propose",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_propose",

				               "produces":[

				                  "application/json"

				@@ -2324,7 +2372,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas commit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_commit",

				               "produces":[

				                  "application/json"

				@@ -2816,6 +2864,44 @@

				               "description":"The column family type"

				            }

				         }

				      },

				      "toppartitions_record":{

				         "id":"toppartitions_record",

				         "description":"nodetool toppartitions query record",

				         "properties":{

				            "partition":{

				               "type":"string",

				               "description":"Partition key"

				            },

				            "count":{

				               "type":"long",

				               "description":"Number of read/write operations"

				            },

				            "error":{

				               "type":"long",

				               "description":"Indication of inaccuracy in counting PKs"

				            }

				         }

				      },

				      "toppartitions_query_results":{

				         "id":"toppartitions_query_results",

				         "description":"nodetool toppartitions query results",

				         "properties":{

				            "read":{

				               "type":"array",

				               "items":{

				                  "type":"toppartitions_record"

				               },

				               "description":"Read results"

				            },

				            "write":{

				               "type":"array",

				               "items":{

				                  "type":"toppartitions_record"

				               },

				               "description":"Write results"

				            }

				         }

				      }

				   }

				}

									
										41

api/api-doc/compaction_manager.json
									
												View File
												
				@@ -118,7 +118,7 @@

				        {

				          "method": "GET",

				          "summary": "Get pending tasks",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_pending_tasks",

				          "produces": [

				            "application/json"

				@@ -127,6 +127,24 @@

				        }

				      ]

				    },

				    {

				      "path": "/compaction_manager/metrics/pending_tasks_by_table",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get pending tasks by table name",

				          "type": "array",

				          "items": {

				              "type": "pending_compaction"

				           },

				          "nickname": "get_pending_tasks_by_table",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/compaction_manager/metrics/completed_tasks",

				      "operations": [

				@@ -163,7 +181,7 @@

				        {

				          "method": "GET",

				          "summary": "Get bytes compacted",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_bytes_compacted",

				          "produces": [

				            "application/json"

				@@ -179,7 +197,7 @@

				         "description":"A row merged information",

				         "properties":{

				            "key":{

				               "type":"int",

				               "type": "long",

				               "description":"The number of sstable"

				            },

				            "value":{

				@@ -244,6 +262,23 @@

				            }

				         }

				      },

				      "pending_compaction": {

				        "id": "pending_compaction",

				        "properties": {

				            "cf": {

				               "type": "string",

				               "description": "The column family name"

				            },

				            "ks": {

				               "type":"string",

				               "description": "The keyspace name"

				            },

				            "task": {

				               "type":"long",

				               "description": "The number of pending tasks"

				            }

				        }

				      },

				      "history": {

				      "id":"history",

				      "description":"Compaction history information",

									
										90

api/api-doc/error_injection.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,90 @@

				{

				   "apiVersion":"0.0.1",

				   "swaggerVersion":"1.2",

				   "basePath":"{{Protocol}}://{{Host}}",

				   "resourcePath":"/error_injection",

				   "produces":[

				      "application/json"

				   ],

				   "apis":[

				      {

				         "path":"/v2/error_injection/injection/{injection}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Activate an injection that triggers an error in code",

				               "type":"void",

				               "nickname":"enable_injection",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"injection",

				                     "description":"injection name, should correspond to an injection added in code",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"one_shot",

				                     "description":"boolean flag indicating whether the injection should be enabled to trigger only once",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            },

				            {

				               "method":"DELETE",

				               "summary":"Deactivate an injection previously activated by the API",

				               "type":"void",

				               "nickname":"disable_injection",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"injection",

				                     "description":"injection name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/v2/error_injection/injection",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"List all enabled injections on all shards, i.e. injections that will trigger an error in the code",

				               "type":"array",

				               "items":{

				                  "type":"string"

				               },

				               "nickname":"get_enabled_injections_on_all",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            },

				            {

				               "method":"DELETE",

				               "summary":"Deactivate all injections previously activated on all shards by the API",

				               "type":"void",

				               "nickname":"disable_on_all",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      }

				   ]

				}

									
										12

api/api-doc/failure_detector.json
									
												View File
												
				@@ -110,7 +110,7 @@

				            {

				               "method":"GET",

				               "summary":"Get count down endpoint",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_down_endpoint_count",

				               "produces":[

				                  "application/json"

				@@ -126,7 +126,7 @@

				            {

				               "method":"GET",

				               "summary":"Get count up endpoint",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_up_endpoint_count",

				               "produces":[

				                  "application/json"

				@@ -180,11 +180,11 @@

				                    "description": "The endpoint address"

				                },

				                "generation": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The heart beat generation"

				                },

				                "version": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The heart beat version"

				                },

				                "update_time": {

				@@ -209,7 +209,7 @@

				           "description": "Holds a version value for an application state",

				               "properties": {

				                "application_state": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The application state enum index"

				                },

				                "value": {

				@@ -217,7 +217,7 @@

				                    "description": "The version value"

				                },

				                "version": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The application state version"

				                }

				            }

									
										4

api/api-doc/gossiper.json
									
												View File
												
				@@ -75,7 +75,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_generation_number",

				               "produces":[

				                  "application/json"

				@@ -99,7 +99,7 @@

				            {

				               "method":"GET",

				               "summary":"Get heart beat version for a node",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_heart_beat_version",

				               "produces":[

				                  "application/json"

									
										4

api/api-doc/hinted_handoff.json
									
												View File
												
				@@ -99,7 +99,7 @@

				        {

				          "method": "GET",

				          "summary": "Get create hint count",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_create_hint_count",

				          "produces": [

				            "application/json"

				@@ -123,7 +123,7 @@

				        {

				          "method": "GET",

				          "summary": "Get not stored hints count",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_not_stored_hints_count",

				          "produces": [

				            "application/json"

									
										2

api/api-doc/messaging_service.json
									
												View File
												
				@@ -191,7 +191,7 @@

				            {

				               "method":"GET",

				               "summary":"Get the version number",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_version",

				               "produces":[

				                  "application/json"

									
										109

api/api-doc/storage_proxy.json
									
												View File
												
				@@ -105,7 +105,7 @@

				            {

				               "method":"GET",

				               "summary":"Get the max hint window",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_max_hint_window",

				               "produces":[

				                  "application/json"

				@@ -128,7 +128,7 @@

				                     "description":"max hint window in ms",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -141,7 +141,7 @@

				            {

				               "method":"GET",

				               "summary":"Get max hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_max_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -164,7 +164,7 @@

				                     "description":"max hints in progress",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -177,7 +177,7 @@

				            {

				               "method":"GET",

				               "summary":"get hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -602,7 +602,7 @@

				        {

				          "method": "GET",

				          "summary": "Get cas write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_write_metrics_unfinished_commit",

				          "produces": [

				            "application/json"

				@@ -632,7 +632,7 @@

				        {

				          "method": "GET",

				          "summary": "Get cas write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_write_metrics_condition_not_met",

				          "produces": [

				            "application/json"

				@@ -641,13 +641,28 @@

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_write/failed_read_round_optimization",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get cas write metrics",

				          "type": "long",

				          "nickname": "get_cas_write_metrics_failed_read_round_optimization",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_read/unfinished_commit",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get cas read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_read_metrics_unfinished_commit",

				          "produces": [

				            "application/json"

				@@ -671,28 +686,13 @@

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_read/condition_not_met",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get cas read metrics",

				          "type": "int",

				          "nickname": "get_cas_read_metrics_condition_not_met",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/read/timeouts",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_read_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -707,7 +707,7 @@

				        {

				          "method": "GET",

				          "summary": "Get read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_read_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -791,6 +791,36 @@

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_read/moving_average_histogram",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get CAS read rate and latency histogram",

				          "$ref": "#/utils/rate_moving_average_and_histogram",

				          "nickname": "get_cas_read_metrics_latency_histogram",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/view_write/moving_average_histogram",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get view write rate and latency histogram",

				          "$ref": "#/utils/rate_moving_average_and_histogram",

				          "nickname": "get_view_write_metrics_latency_histogram",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/range/moving_average_histogram",

				      "operations": [

				@@ -812,7 +842,7 @@

				        {

				          "method": "GET",

				          "summary": "Get range metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_range_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -827,7 +857,7 @@

				        {

				          "method": "GET",

				          "summary": "Get range metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_range_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -872,7 +902,7 @@

				        {

				          "method": "GET",

				          "summary": "Get write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_write_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -887,7 +917,7 @@

				        {

				          "method": "GET",

				          "summary": "Get write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_write_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -956,6 +986,21 @@

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_write/moving_average_histogram",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get CAS write rate and latency histogram",

				          "$ref": "#/utils/rate_moving_average_and_histogram",

				          "nickname": "get_cas_write_metrics_latency_histogram",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				         "path":"/storage_proxy/metrics/read/estimated_histogram/",

				         "operations":[

				@@ -978,7 +1023,7 @@

				            {

				               "method":"GET",

				               "summary":"Get read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1010,7 +1055,7 @@

				            {

				               "method":"GET",

				               "summary":"Get write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1042,7 +1087,7 @@

				            {

				               "method":"GET",

				               "summary":"Get range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_range_latency",

				               "produces":[

				                  "application/json"

									
										185

api/api-doc/storage_service.json
									
												View File
												
				@@ -458,7 +458,7 @@

				            {

				               "method":"GET",

				               "summary":"Return the generation value for this node.",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_generation_number",

				               "produces":[

				                  "application/json"

				@@ -582,7 +582,15 @@

				                  },

				                  {

				                     "name":"kn",

				                     "description":"Comma seperated keyspaces name to snapshot",

				                     "description":"Comma seperated keyspaces name that their snapshot will be deleted",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"an optional table name that its snapshot will be deleted",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -646,7 +654,7 @@

				            {

				               "method":"POST",

				               "summary":"Trigger a cleanup of keys on a single keyspace",

				               "type":"int",

				               "type": "long",

				               "nickname":"force_keyspace_cleanup",

				               "produces":[

				                  "application/json"

				@@ -678,7 +686,7 @@

				            {

				               "method":"GET",

				               "summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",

				               "type":"int",

				               "type": "long",

				               "nickname":"scrub",

				               "produces":[

				                  "application/json"

				@@ -726,7 +734,7 @@

				            {

				               "method":"GET",

				               "summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",

				               "type":"int",

				               "type": "long",

				               "nickname":"upgrade_sstables",

				               "produces":[

				                  "application/json"

				@@ -800,7 +808,7 @@

				               "summary":"Return an array with the ids of the currently active repairs",

				               "type":"array",

				               "items":{

				                  "type":"int"

				                  "type": "long"

				               },

				               "nickname":"get_active_repair_async",

				               "produces":[

				@@ -816,7 +824,7 @@

				            {

				               "method":"POST",

				               "summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",

				               "type":"int",

				               "type": "long",

				               "nickname":"repair_async",

				               "produces":[

				                  "application/json"

				@@ -947,7 +955,7 @@

				                     "description":"The repair ID to check for status",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1277,18 +1285,18 @@

				                  },

				                  {

				                     "name":"dynamic_update_interval",

				                     "description":"integer, in ms (default 100)",

				                     "description":"interval in ms (default 100)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "type":"long",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dynamic_reset_interval",

				                     "description":"integer, in ms (default 600,000)",

				                     "description":"interval in ms (default 600,000)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "type":"long",

				                     "paramType":"query"

				                  },

				                  {

				@@ -1493,7 +1501,7 @@

				                     "description":"Stream throughput",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1501,7 +1509,7 @@

				            {

				               "method":"GET",

				               "summary":"Get stream throughput mb per sec",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_stream_throughput_mb_per_sec",

				               "produces":[

				                  "application/json"

				@@ -1517,7 +1525,7 @@

				            {

				               "method":"GET",

				               "summary":"get compaction throughput mb per sec",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_compaction_throughput_mb_per_sec",

				               "produces":[

				                  "application/json"

				@@ -1539,7 +1547,7 @@

				                     "description":"compaction throughput",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1943,7 +1951,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns the threshold for warning of queries with many tombstones",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_tombstone_warn_threshold",

				               "produces":[

				                  "application/json"

				@@ -1965,7 +1973,7 @@

				                     "description":"tombstone debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1978,7 +1986,7 @@

				            {

				               "method":"GET",

				               "summary":"",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_tombstone_failure_threshold",

				               "produces":[

				                  "application/json"

				@@ -2000,7 +2008,7 @@

				                     "description":"tombstone debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2013,7 +2021,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns the threshold for rejecting queries due to a large batch size",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_batch_size_failure_threshold",

				               "produces":[

				                  "application/json"

				@@ -2035,7 +2043,7 @@

				                     "description":"batch size debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2059,7 +2067,7 @@

				                     "description":"throttle in kb",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2072,7 +2080,7 @@

				            {

				               "method":"GET",

				               "summary":"Get load",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_metrics_load",

				               "produces":[

				                  "application/json"

				@@ -2088,7 +2096,7 @@

				            {

				               "method":"GET",

				               "summary":"Get exceptions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_exceptions",

				               "produces":[

				                  "application/json"

				@@ -2104,7 +2112,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -2120,7 +2128,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total hints",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_hints",

				               "produces":[

				                  "application/json"

				@@ -2164,7 +2172,42 @@

				               ]

				            }

				         ]

				      }

				      },

				      {

				         "path":"/storage_service/sstable_info",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"SSTable information",

				               "type":"array",

				               "items":{

				                  "type":"table_sstables"

				               },

				               "nickname":"sstable_info",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"column family name",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      }      

				   ],

				   "models":{

				      "mapper":{

				@@ -2324,6 +2367,92 @@

				               "description":"The endpoint details"

				            }

				         }

				      },

				      "named_maps":{

				        "id":"named_maps",

				        "properties":{

				            "group":{

				                "type":"string"

				            },

				            "attributes":{

				                "type":"array",

				                "items":{

				                    "type":"mapper"

				                }

				            }

				        }

				      },

				      "sstable":{

				        "id":"sstable",

				        "properties":{

				            "size":{

				               "type":"long",

				               "description":"Total size in bytes of sstable"

				            },

				            "data_size":{

				                "type":"long",

				                "description":"The size in bytes on disk of data"

				            },

				            "index_size":{

				               "type":"long",

				               "description":"The size in bytes on disk of index"

				            },

				            "filter_size":{

				               "type":"long",

				               "description":"The size in bytes on disk of filter"

				            },

				            "timestamp":{

				                "type":"datetime",

				                "description":"File creation time"

				            },

				            "generation":{

				                "type":"long",

				                "description":"SSTable generation"

				            },

				            "level":{

				               "type":"long",

				               "description":"SSTable level"

				            },

				            "version":{

				               "type":"string",

				               "enum":[

				                  "ka", "la", "mc"

				               ],

				               "description":"SSTable version"

				            },

				            "properties":{

				                "type":"array",

				                "description":"SSTable attributes",

				                "items":{

				                    "type":"mapper"

				                }

				            },

				            "extended_properties":{

				                "type":"array",

				                "description":"SSTable extended attributes",

				                "items":{

				                    "type":"named_maps"

				                }

				            }

				        }

				      },

				      "table_sstables":{

				        "id":"table_sstables",

				        "description":"Per-table SSTable info and attributes",

				        "properties":{

				            "keyspace":{

				                "type":"string"

				            },

				            "table":{

				                "type":"string"

				            },

				            "sstables":{

				                "type":"array",

				                "items":{

				                    "$ref":"sstable"

				                }

				            }

				        }

				      }

				   }

				}

									
										16

api/api-doc/stream_manager.json
									
												View File
												
				@@ -32,7 +32,7 @@

				            {

				               "method":"GET",

				               "summary":"Get number of active outbound streams",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_active_streams_outbound",

				               "produces":[

				                  "application/json"

				@@ -48,7 +48,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total incoming bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_incoming_bytes",

				               "produces":[

				                  "application/json"

				@@ -72,7 +72,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total incoming bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_incoming_bytes",

				               "produces":[

				                  "application/json"

				@@ -88,7 +88,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total outgoing bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_outgoing_bytes",

				               "produces":[

				                  "application/json"

				@@ -112,7 +112,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total outgoing bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_outgoing_bytes",

				               "produces":[

				                  "application/json"

				@@ -154,7 +154,7 @@

				               "description":"The peer"

				            },

				            "session_index":{

				               "type":"int",

				               "type": "long",

				               "description":"The session index"

				            },

				            "connecting":{

				@@ -211,7 +211,7 @@

				               "description":"The ID"

				            },

				            "files":{

				               "type":"int",

				               "type": "long",

				               "description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."

				            },

				            "total_size":{

				@@ -242,7 +242,7 @@

				               "description":"The peer address"

				            },

				            "session_index":{

				               "type":"int",

				               "type": "long",

				               "description":"The session index"

				            },

				            "file_name":{

									
										15

api/api-doc/system.json
									
												View File
												
				@@ -52,6 +52,21 @@

				            }

				         ]

				      },

				      {

				         "path":"/system/uptime_ms",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get system uptime, in milliseconds",

				               "type":"long",

				               "nickname":"get_system_uptime",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/system/logger/{name}",

				         "operations":[

									
										26

api/api.cc
									
												View File
												
				@@ -20,9 +20,9 @@

				 */

				#include "api.hh"

				#include "http/file_handler.hh"

				#include "http/transformers.hh"

				#include "http/api_docs.hh"

				#include <seastar/http/file_handler.hh>

				#include <seastar/http/transformers.hh>

				#include <seastar/http/api_docs.hh>

				#include "storage_service.hh"

				#include "commitlog.hh"

				#include "gossiper.hh"

				@@ -36,11 +36,14 @@

				#include "endpoint_snitch.hh"

				#include "compaction_manager.hh"

				#include "hinted_handoff.hh"

				#include "http/exception.hh"

				#include "error_injection.hh"

				#include <seastar/http/exception.hh>

				#include "stream_manager.hh"

				#include "system.hh"

				#include "api/config.hh"

				logging::logger apilog("api");

				namespace api {

				static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {

				@@ -66,13 +69,19 @@ future<> set_server_init(http_context& ctx) {

				        rb->set_api_doc(r);

				        rb02->set_api_doc(r);

				        rb02->register_api_file(r, "swagger20_header");

				        set_config(rb02, ctx, r);

				        rb->register_function(r, "system",

				                "The system related API");

				        set_system(ctx, r);

				    });

				}

				future<> set_server_config(http_context& ctx) {

				    auto rb02 = std::make_shared < api_registry_builder20 > (ctx.api_doc, "/v2");

				    return ctx.http_server.set_routes([&ctx, rb02](routes& r) {

				        set_config(rb02, ctx, r);

				    });

				}

				static future<> register_api(http_context& ctx, const sstring& api_name,

				        const sstring api_desc,

				        std::function<void(http_context& ctx, routes& r)> f) {

				@@ -88,6 +97,10 @@ future<> set_server_storage_service(http_context& ctx) {

				    return register_api(ctx, "storage_service", "The storage service API", set_storage_service);

				}

				future<> set_server_snapshot(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { set_snapshot(ctx, r); });

				}

				future<> set_server_snitch(http_context& ctx) {

				    return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", set_endpoint_snitch);

				}

				@@ -151,6 +164,9 @@ future<> set_server_done(http_context& ctx) {

				        rb->register_function(r, "collectd",

				                "The collectd API");

				        set_collectd(ctx, r);

				        rb->register_function(r, "error_injection",

				                "The error injection API");

				        set_error_injection(ctx, r);

				    });

				}

									
										44

api/api.hh
									
												View File
												
				@@ -21,13 +21,15 @@

				#pragma once

				#include "json/json_elements.hh"

				#include <seastar/json/json_elements.hh>

				#include <type_traits>

				#include <boost/lexical_cast.hpp>

				#include <boost/algorithm/string/split.hpp>

				#include <boost/algorithm/string/classification.hpp>

				#include <boost/units/detail/utility.hpp>

				#include "api/api-doc/utils.json.hh"

				#include "utils/histogram.hh"

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "api_init.hh"

				#include "seastarx.hh"

				@@ -216,4 +218,42 @@ std::vector<T> concat(std::vector<T> a, std::vector<T>&& b) {

				    return a;

				}

				template <class T, class Base = T>

				class req_param {

				public:

				    sstring name;

				    sstring param;

				    T value;

				    req_param(const request& req, sstring name, T default_val) : name(name) {

				        param = req.get_query_param(name);

				        if (param.empty()) {

				            value = default_val;

				            return;

				        }

				        try {

				            // boost::lexical_cast does not use boolalpha. Converting a

				            // true/false throws exceptions. We don't want that.

				            if constexpr (std::is_same_v<Base, bool>) {

				                // Cannot use boolalpha because we (probably) want to

				                // accept 1 and 0 as well as true and false. And True. And fAlse.

				                std::transform(param.begin(), param.end(), param.begin(), ::tolower);

				                if (param == "true" || param == "1") {

				                    value = T(true);

				                } else if (param == "false" || param == "0") {

				                    value = T(false);

				                } else {

				                    throw boost::bad_lexical_cast{};

				                }

				            } else {

				                value = T{boost::lexical_cast<Base>(param)};

				            }

				        } catch (boost::bad_lexical_cast&) {

				            throw bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));

				        }

				    }

				    operator T() const { return value; }

				};

				}

									
										17

api/api_init.hh
									
												View File
												
				@@ -19,9 +19,12 @@

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include "database.hh"

				#include "database_fwd.hh"

				#include "service/storage_proxy.hh"

				#include "http/httpd.hh"

				#include <seastar/http/httpd.hh>

				namespace service { class load_meter; }

				namespace locator { class token_metadata; }

				namespace api {

				@@ -31,15 +34,21 @@ struct http_context {

				    httpd::http_server_control http_server;

				    distributed<database>& db;

				    distributed<service::storage_proxy>& sp;

				    service::load_meter& lmeter;

				    sharded<locator::token_metadata>& token_metadata;

				    http_context(distributed<database>& _db,

				            distributed<service::storage_proxy>& _sp)

				            : db(_db), sp(_sp) {

				            distributed<service::storage_proxy>& _sp,

				            service::load_meter& _lm, sharded<locator::token_metadata>& _tm)

				            : db(_db), sp(_sp), lmeter(_lm), token_metadata(_tm) {

				    }

				};

				future<> set_server_init(http_context& ctx);

				future<> set_server_config(http_context& ctx);

				future<> set_server_snitch(http_context& ctx);

				future<> set_server_storage_service(http_context& ctx);

				future<> set_server_snapshot(http_context& ctx);

				future<> set_server_gossip(http_context& ctx);

				future<> set_server_load_sstable(http_context& ctx);

				future<> set_server_messaging_service(http_context& ctx);

									
										6

api/collectd.cc
									
												View File
												
				@@ -21,8 +21,8 @@

				#include "collectd.hh"

				#include "api/api-doc/collectd.json.hh"

				#include "core/scollectd.hh"

				#include "core/scollectd_api.hh"

				#include <seastar/core/scollectd.hh>

				#include <seastar/core/scollectd_api.hh>

				#include "endian.h"

				#include <boost/range/irange.hpp>

				#include <regex>

				@@ -64,7 +64,7 @@ static const char* str_to_regex(const sstring& v) {

				void set_collectd(http_context& ctx, routes& r) {

				    cd::get_collectd.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto id = make_shared<scollectd::type_instance_id>(req->param["pluginid"],

				        auto id = ::make_shared<scollectd::type_instance_id>(req->param["pluginid"],

				                req->get_query_param("instance"), req->get_query_param("type"),

				                req->get_query_param("type_instance"));

									
										230

api/column_family.cc
									
												View File
												
				@@ -22,10 +22,14 @@

				#include "column_family.hh"

				#include "api/api-doc/column_family.json.hh"

				#include <vector>

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "sstables/sstables.hh"

				#include "utils/estimated_histogram.hh"

				#include <algorithm>

				#include "db/system_keyspace_view_types.hh"

				#include "db/data_listeners.hh"

				extern logging::logger apilog;

				namespace api {

				using namespace httpd;

				@@ -34,7 +38,7 @@ using namespace std;

				using namespace json;

				namespace cf = httpd::column_family_json;

				const utils::UUID& get_uuid(const sstring& name, const database& db) {

				std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {

				    auto pos = name.find("%3A");

				    size_t end;

				    if (pos == sstring::npos) {

				@@ -46,14 +50,22 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {

				    } else {

				        end = pos + 3;

				    }

				    return std::make_tuple(name.substr(0, pos), name.substr(end));

				}

				const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {

				    try {

				        return db.find_uuid(name.substr(0, pos), name.substr(end));

				        return db.find_uuid(ks, cf);

				    } catch (std::out_of_range& e) {

				        throw bad_param_exception("Column family '" + name.substr(0, pos) + ":"

				                + name.substr(end) + "' not found");

				        throw bad_param_exception(format("Column family '{}:{}' not found", ks, cf));

				    }

				}

				const utils::UUID& get_uuid(const sstring& name, const database& db) {

				    auto [ks, cf] = parse_fully_qualified_cf_name(name);

				    return get_uuid(ks, cf, db);

				}

				future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {

				    auto uuid = get_uuid(name, ctx.db.local());

				@@ -63,28 +75,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<

				}

				future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,

				        int64_t column_family::stats::*f) {

				        int64_t column_family_stats::*f) {

				    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {

				        return cf.get_stats().*f;

				    }, std::plus<int64_t>());

				}

				future<json::json_return_type>  get_cf_stats(http_context& ctx,

				        int64_t column_family::stats::*f) {

				        int64_t column_family_stats::*f) {

				    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {

				        return cf.get_stats().*f;

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type>  get_cf_stats_count(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {

				        return (cf.get_stats().*f).hist.count;

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    auto uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([uuid, f](database& db) {

				        // Histograms information is sample of the actual load

				@@ -100,14 +112,14 @@ static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const

				static future<json::json_return_type>  get_cf_stats_count(http_context& ctx,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {

				        return (cf.get_stats().*f).hist.count;

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    utils::UUID uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([f, uuid](const database& p) {

				        return (p.find_column_family(uuid).get_stats().*f).hist;},

				@@ -118,7 +130,7 @@ static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const

				    });

				}

				static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    std::function<utils::ihistogram(const database&)> fun = [f] (const database& db)  {

				        utils::ihistogram res;

				        for (auto i : db.get_column_families()) {

				@@ -134,7 +146,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:

				}

				static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    utils::UUID uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([f, uuid](const database& p) {

				        return (p.find_column_family(uuid).get_stats().*f).rate();},

				@@ -145,7 +157,7 @@ static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& c

				    });

				}

				static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db)  {

				        utils::rate_moving_average_and_histogram res;

				        for (auto i : db.get_column_families()) {

				@@ -166,27 +178,27 @@ static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ct

				    }, std::plus<int64_t>());

				}

				static int64_t min_row_size(column_family& cf) {

				static int64_t min_partition_size(column_family& cf) {

				    int64_t res = INT64_MAX;

				    for (auto i: *cf.get_sstables() ) {

				        res = std::min(res, i->get_stats_metadata().estimated_row_size.min());

				        res = std::min(res, i->get_stats_metadata().estimated_partition_size.min());

				    }

				    return (res == INT64_MAX) ? 0 : res;

				}

				static int64_t max_row_size(column_family& cf) {

				static int64_t max_partition_size(column_family& cf) {

				    int64_t res = 0;

				    for (auto i: *cf.get_sstables() ) {

				        res = std::max(i->get_stats_metadata().estimated_row_size.max(), res);

				        res = std::max(i->get_stats_metadata().estimated_partition_size.max(), res);

				    }

				    return res;

				}

				static integral_ratio_holder mean_row_size(column_family& cf) {

				static integral_ratio_holder mean_partition_size(column_family& cf) {

				    integral_ratio_holder res;

				    for (auto i: *cf.get_sstables() ) {

				        auto c = i->get_stats_metadata().estimated_row_size.count();

				        res.sub += i->get_stats_metadata().estimated_row_size.mean() * c;

				        auto c = i->get_stats_metadata().estimated_partition_size.count();

				        res.sub += i->get_stats_metadata().estimated_partition_size.mean() * c;

				        res.total += c;

				    }

				    return res;

				@@ -242,12 +254,11 @@ class sum_ratio {

				    uint64_t _n = 0;

				    T _total = 0;

				public:

				    future<> operator()(T value) {

				    void operator()(T value) {

				        if (value > 0) {

				            _total += value;

				            _n++;

				        }

				        return make_ready_future<>();

				    }

				    // Returns average value of all registered ratios.

				    T get() && {

				@@ -396,29 +407,31 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);

				        return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);

				    });

				    cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);

				        return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            utils::estimated_histogram res(0);

				            for (auto i: *cf.get_sstables() ) {

				                res.merge(i->get_stats_metadata().estimated_row_size);

				                res.merge(i->get_stats_metadata().estimated_partition_size);

				            }

				            return res;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {

				            uint64_t res = 0;

				            for (auto i: *cf.get_sstables() ) {

				                res += i->get_stats_metadata().estimated_row_size.count();

				                res += i->get_stats_metadata().estimated_partition_size.count();

				            }

				            return res;

				        },

				@@ -443,67 +456,67 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);

				        return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);

				    });

				    cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::pending_flushes);

				        return get_cf_stats(ctx, &column_family_stats::pending_flushes);

				    });

				    cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);

				        return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);

				    });

				    cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_count(ctx, &column_family::stats::reads);

				        return get_cf_stats_count(ctx, &column_family_stats::reads);

				    });

				    cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);

				        return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);

				    });

				    cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_count(ctx, &column_family::stats::writes);

				        return get_cf_stats_count(ctx, &column_family_stats::writes);

				    });

				    cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);

				        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);

				    });

				    cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);

				    });

				    cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);

				        return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);

				    });

				    cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);

				        return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);

				    });

				    cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, &column_family::stats::writes);

				        return get_cf_histogram(ctx, &column_family_stats::writes);

				    });

				    cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);

				        return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);

				    });

				    cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);

				        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);

				    });

				    cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);

				    });

				    cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, &column_family::stats::writes);

				        return get_cf_histogram(ctx, &column_family_stats::writes);

				    });

				    cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);

				        return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);

				    });

				    cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -519,11 +532,11 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);

				        return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);

				    });

				    cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::live_sstable_count);

				        return get_cf_stats(ctx, &column_family_stats::live_sstable_count);

				    });

				    cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -546,30 +559,36 @@ void set_column_family(http_context& ctx, routes& r) {

				        return sum_sstable(ctx, true);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_row_size, min_int64);

				        return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, INT64_MAX, min_row_size, min_int64);

				        return map_reduce_cf(ctx, INT64_MAX, min_partition_size, min_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_row_size, max_int64);

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, int64_t(0), max_row_size, max_int64);

				        return map_reduce_cf(ctx, int64_t(0), max_partition_size, max_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        // Cassandra 3.x mean values are truncated as integrals.

				        return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());

				        return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        // Cassandra 3.x mean values are truncated as integrals.

				        return map_reduce_cf(ctx, integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());

				        return map_reduce_cf(ctx, integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());

				    });

				    cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -776,25 +795,25 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            return cf.get_stats().estimated_cas_prepare;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            return cf.get_stats().estimated_cas_propose;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            return cf.get_stats().estimated_cas_commit;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -805,11 +824,11 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);

				        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);

				    });

				    cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);

				        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);

				    });

				    cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {

				@@ -827,13 +846,28 @@ void set_column_family(http_context& ctx, routes& r) {

				        return true;

				    });

				    cf::get_built_indexes.set(r, [](const_req) {

				        // FIXME

				        // Currently there are no index support

				        return std::vector<sstring>();

				    cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);

				        return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {

				            std::set<sstring> vp;

				            for (auto b : vb) {

				                if (b.view.first == ks) {

				                    vp.insert(b.view.second);

				                }

				            }

				            std::vector<sstring> res;

				            auto uuid = get_uuid(ks, cf_name, ctx.db.local());

				            column_family& cf = ctx.db.local().find_column_family(uuid);

				            res.reserve(cf.get_index_manager().list_indexes().size());

				            for (auto&& i : cf.get_index_manager().list_indexes()) {

				                if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {

				                    res.emplace_back(i.metadata().name());

				                }

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {

				        // FIXME

				        // Currently there are no information on the compression

				@@ -920,5 +954,55 @@ void set_column_family(http_context& ctx, routes& r) {

				            return make_ready_future<json::json_return_type>(container_to_vec(res));

				        });

				    });

				    cf::toppartitions.set(r, [&ctx] (std::unique_ptr<request> req) {

				        auto name_param = req->param["name"];

				        auto [ks, cf] = parse_fully_qualified_cf_name(name_param);

				        api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};

				        api::req_param<unsigned> capacity(*req, "capacity", 256);

				        api::req_param<unsigned> list_size(*req, "list_size", 10);

				        apilog.info("toppartitions query: name={} duration={} list_size={} capacity={}",

				            name_param, duration.param, list_size.param, capacity.param);

				        return seastar::do_with(db::toppartitions_query(ctx.db, ks, cf, duration.value, list_size, capacity), [&ctx](auto& q) {

				            return q.scatter().then([&q] {

				                return sleep(q.duration()).then([&q] {

				                    return q.gather(q.capacity()).then([&q] (auto topk_results) {

				                        apilog.debug("toppartitions query: processing results");

				                        cf::toppartitions_query_results results;

				                        for (auto& d: topk_results.read.top(q.list_size())) {

				                            cf::toppartitions_record r;

				                            r.partition = sstring(d.item);

				                            r.count = d.count;

				                            r.error = d.error;

				                            results.read.push(r);

				                        }

				                        for (auto& d: topk_results.write.top(q.list_size())) {

				                            cf::toppartitions_record r;

				                            r.partition = sstring(d.item);

				                            r.count = d.count;

				                            r.error = d.error;

				                            results.write.push(r);

				                        }

				                        return make_ready_future<json::json_return_type>(results);

				                    });

				                });

				            });

				        });

				    });

				    cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				        if (req->get_query_param("split_output") != "") {

				            fail(unimplemented::cause::API);

				        }

				        return foreach_column_family(ctx, req->param["name"], [](column_family &cf) {

				            return cf.compact_all_sstables();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				}

				}

									
										51

api/column_family.hh
									
												View File
												
				@@ -24,6 +24,7 @@

				#include "api.hh"

				#include "api/api-doc/column_family.json.hh"

				#include "database.hh"

				#include <seastar/core/future-util.hh>

				#include <any>

				namespace api {

				@@ -38,14 +39,14 @@ template<class Mapper, class I, class Reducer>

				future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,

				        Mapper mapper, Reducer reducer) {

				    auto uuid = get_uuid(name, ctx.db.local());

				    using mapper_type = std::function<std::any (database&)>;

				    using reducer_type = std::function<std::any (std::any, std::any)>;

				    using mapper_type = std::function<std::unique_ptr<std::any>(database&)>;

				    using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;

				    return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {

				        return I(mapper(db.find_column_family(uuid)));

				    }), std::any(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {

				        return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));

				    })).then([] (std::any r) {

				        return std::any_cast<I>(std::move(r));

				        return std::make_unique<std::any>(I(mapper(db.find_column_family(uuid))));

				    }), std::make_unique<std::any>(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {

				        return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));

				    })).then([] (std::unique_ptr<std::any> r) {

				        return std::any_cast<I>(std::move(*r));

				    });

				}

				@@ -69,30 +70,32 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n

				struct map_reduce_column_families_locally {

				    std::any init;

				    std::function<std::any (column_family&)> mapper;

				    std::function<std::any (std::any, std::any)> reducer;

				    std::any operator()(database& db) const {

				        auto res = init;

				        for (auto i : db.get_column_families()) {

				            res = reducer(res, mapper(*i.second.get()));

				        }

				        return res;

				    std::function<std::unique_ptr<std::any>(column_family&)> mapper;

				    std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;

				    future<std::unique_ptr<std::any>> operator()(database& db) const {

				        auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));

				        return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {

				            *res = std::move(reducer(std::move(*res), mapper(*i.second.get())));

				        }).then([res] {

				            return std::move(*res);

				        });

				    }

				};

				template<class Mapper, class I, class Reducer>

				future<I> map_reduce_cf_raw(http_context& ctx, I init,

				        Mapper mapper, Reducer reducer) {

				    using mapper_type = std::function<std::any (column_family&)>;

				    using reducer_type = std::function<std::any (std::any, std::any)>;

				    using mapper_type = std::function<std::unique_ptr<std::any>(column_family&)>;

				    using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;

				    auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {

				        return I(mapper(cf));

				        return std::make_unique<std::any>(I(mapper(cf)));

				    });

				    auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {

				        return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));

				    auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {

				        return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));

				    });

				    return ctx.db.map_reduce0(map_reduce_column_families_locally{init, std::move(wrapped_mapper), wrapped_reducer}, std::any(init), wrapped_reducer).then([] (std::any res) {

				        return std::any_cast<I>(std::move(res));

				    return ctx.db.map_reduce0(map_reduce_column_families_locally{init,

				            std::move(wrapped_mapper), wrapped_reducer}, std::make_unique<std::any>(init), wrapped_reducer).then([] (std::unique_ptr<std::any> res) {

				        return std::any_cast<I>(std::move(*res));

				    });

				}

				@@ -106,9 +109,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,

				}

				future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,

				        int64_t column_family::stats::*f);

				        int64_t column_family_stats::*f);

				future<json::json_return_type>  get_cf_stats(http_context& ctx,

				        int64_t column_family::stats::*f);

				        int64_t column_family_stats::*f);

				}

									
										15

api/commitlog.cc
									
												View File
												
				@@ -22,15 +22,16 @@

				#include "commitlog.hh"

				#include <db/commitlog/commitlog.hh>

				#include "api/api-doc/commitlog.json.hh"

				#include "database.hh"

				#include <vector>

				namespace api {

				template<typename Func>

				static auto acquire_cl_metric(http_context& ctx, Func&& func) {

				    typedef std::result_of_t<Func(db::commitlog *)> ret_type;

				template<typename T>

				static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {

				    typedef T ret_type;

				    return ctx.db.map_reduce0([func = std::forward<Func>(func)](database& db) {

				    return ctx.db.map_reduce0([func = std::move(func)](database& db) {

				        if (db.commitlog() == nullptr) {

				            return make_ready_future<ret_type>();

				        }

				@@ -63,15 +64,15 @@ void set_commitlog(http_context& ctx, routes& r) {

				    });

				    httpd::commitlog_json::get_completed_tasks.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_pending_tasks.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));

				    });

				}

									
										95

api/compaction_manager.cc
									
												View File
												
				@@ -24,6 +24,7 @@

				#include "api/api-doc/compaction_manager.json.hh"

				#include "db/system_keyspace.hh"

				#include "column_family.hh"

				#include <utility>

				namespace api {

				@@ -38,6 +39,16 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,

				        return make_ready_future<json::json_return_type>(res);

				    });

				}

				static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> sum_pending_tasks(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>&& a,

				        const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& b) {

				    for (auto&& i : b) {

				        if (i.second) {

				            a[i.first] += i.second;

				        }

				    }

				    return std::move(a);

				}

				void set_compaction_manager(http_context& ctx, routes& r) {

				    cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -47,8 +58,8 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				            for (const auto& c : cm.get_compactions()) {

				                cm::summary s;

				                s.ks = c->ks;

				                s.cf = c->cf;

				                s.ks = c->ks_name;

				                s.cf = c->cf_name;

				                s.unit = "keys";

				                s.task_type = sstables::compaction_name(c->type);

				                s.completed = c->total_keys_written;

				@@ -61,6 +72,32 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        });

				    });

				    cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return ctx.db.map_reduce0([&ctx](database& db) {

				            return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {

				                return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {

				                    table& cf = *i.second.get();

				                    tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);

				                    return make_ready_future<>();

				                }).then([&tasks] {

				                    return std::move(tasks);

				                });

				            });

				        }, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(

				                [](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {

				            std::vector<cm::pending_compaction> res;

				            res.reserve(task_map.size());

				            for (auto i : task_map) {

				                cm::pending_compaction task;

				                task.ks = i.first.first;

				                task.cf = i.first.second;

				                task.task = i.second;

				                res.emplace_back(std::move(task));

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				@@ -103,29 +140,37 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				    });

				    cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {

				        return db::system_keyspace::get_compaction_history().then([] (std::vector<db::system_keyspace::compaction_history_entry> history) {

				            std::vector<cm::history> res;

				            res.reserve(history.size());

				            for (auto& entry : history) {

				                cm::history h;

				                h.id = entry.id.to_sstring();

				                h.ks = std::move(entry.ks);

				                h.cf = std::move(entry.cf);

				                h.compacted_at = entry.compacted_at;

				                h.bytes_in = entry.bytes_in;

				                h.bytes_out =  entry.bytes_out;

				                for (auto it : entry.rows_merged) {

				                    httpd::compaction_manager_json::row_merged e;

				                    e.key = it.first;

				                    e.value = it.second;

				                    h.rows_merged.push(std::move(e));

				                }

				                res.push_back(std::move(h));

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				        std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {

				            return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){

				                return s.write("[").then([&s, &first] {

				                    return db::system_keyspace::get_compaction_history([&s, &first](const db::system_keyspace::compaction_history_entry& entry) mutable {

				                        cm::history h;

				                        h.id = entry.id.to_sstring();

				                        h.ks = std::move(entry.ks);

				                        h.cf = std::move(entry.cf);

				                        h.compacted_at = entry.compacted_at;

				                        h.bytes_in = entry.bytes_in;

				                        h.bytes_out =  entry.bytes_out;

				                        for (auto it : entry.rows_merged) {

				                            httpd::compaction_manager_json::row_merged e;

				                            e.key = it.first;

				                            e.value = it.second;

				                            h.rows_merged.push(std::move(e));

				                        }

				                        auto fut = first ? make_ready_future<>() : s.write(", ");

				                        first = false;

				                        return fut.then([&s, h = std::move(h)] {

				                            return formatter::write(s, h);

				                        });

				                    }).then([&s] {

				                        return s.write("]").then([&s] {

				                            return s.close();

				                        });

				                    });

				                });

				            });

				        };

				        return make_ready_future<json::json_return_type>(std::move(f));

				    });

				    cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {

									
										27

api/config.cc
									
												View File
												
				@@ -22,6 +22,7 @@

				#include "api/config.hh"

				#include "api/api-doc/config.json.hh"

				#include "db/config.hh"

				#include "database.hh"

				#include <sstream>

				#include <boost/algorithm/string/replace.hpp>

				@@ -43,14 +44,14 @@ json::json_return_type get_json_return_type(const db::seed_provider_type& val) {

				    return json::json_return_type(val.class_name);

				}

				std::string format_type(const std::string& type) {

				std::string_view format_type(std::string_view type) {

				    if (type == "int") {

				        return "integer";

				    }

				    return type;

				}

				future<> get_config_swagger_entry(const std::string& name, const std::string& description, const std::string& type, bool& first, output_stream<char>& os) {

				future<> get_config_swagger_entry(std::string_view name, const std::string& description, std::string_view type, bool& first, output_stream<char>& os) {

				    std::stringstream ss;

				    if (first) {

				        first=false;

				@@ -87,23 +88,29 @@ future<> get_config_swagger_entry(const std::string& name, const std::string& de

				}

				namespace cs = httpd::config_json;

				#define _get_config_value(name, type, deflt, status, desc, ...) if (id == #name) {return get_json_return_type(ctx.db.local().get_config().name());}

				#define _get_config_description(name, type, deflt, status, desc, ...) f = f.then([&os, &first] {return get_config_swagger_entry(#name, desc, #type, first, os);});

				void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {

				    rb->register_function(r, [] (output_stream<char>& os) {

				        return do_with(true, [&os] (bool& first) {

				    rb->register_function(r, [&ctx] (output_stream<char>& os) {

				        return do_with(true, [&os, &ctx] (bool& first) {

				            auto f = make_ready_future();

				            _make_config_values(_get_config_description)

				            for (auto&& cfg_ref : ctx.db.local().get_config().values()) {

				                auto&& cfg = cfg_ref.get();

				                f = f.then([&os, &first, &cfg] {

				                    return get_config_swagger_entry(cfg.name(), std::string(cfg.desc()), cfg.type_name(), first, os);

				                });

				            }

				            return f;

				        });

				    });

				    cs::find_config_id.set(r, [&ctx] (const_req r) {

				        auto id = r.param["id"];

				        _make_config_values(_get_config_value)

				        for (auto&& cfg_ref : ctx.db.local().get_config().values()) {

				            auto&& cfg = cfg_ref.get();

				            if (id == cfg.name()) {

				                return cfg.value_as_json();

				            }

				        }

				        throw bad_param_exception(sstring("No such config entry: ") + id);

				    });

				}

									
										66

api/error_injection.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,66 @@

				/*

				 * Copyright (C) 2020 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "api/api-doc/error_injection.json.hh"

				#include "api/api.hh"

				#include <seastar/http/exception.hh>

				#include "log.hh"

				#include "utils/error_injection.hh"

				#include "seastar/core/future-util.hh"

				namespace api {

				namespace hf = httpd::error_injection_json;

				void set_error_injection(http_context& ctx, routes& r) {

				    hf::enable_injection.set(r, [](std::unique_ptr<request> req) {

				        sstring injection = req->param["injection"];

				        bool one_shot = req->get_query_param("one_shot") == "True";

				        auto& errinj = utils::get_local_injector();

				        errinj.enable_on_all(injection, one_shot);

				        return make_ready_future<json::json_return_type>(json::json_void());

				    });

				    hf::get_enabled_injections_on_all.set(r, [](std::unique_ptr<request> req) {

				        auto& errinj = utils::get_local_injector();

				        auto ret = errinj.enabled_injections_on_all();

				        return make_ready_future<json::json_return_type>(ret);

				    });

				    hf::disable_injection.set(r, [](std::unique_ptr<request> req) {

				        sstring injection = req->param["injection"];

				        auto& errinj = utils::get_local_injector();

				        errinj.disable_on_all(injection);

				        return make_ready_future<json::json_return_type>(json::json_void());

				    });

				    hf::disable_on_all.set(r, [](std::unique_ptr<request> req) {

				        auto& errinj = utils::get_local_injector();

				        errinj.disable_on_all();

				        return make_ready_future<json::json_return_type>(json::json_void());

				    });

				}

				} // namespace api

									
										13

stdx.hh → api/error_injection.hh
									
												View File
												
				@@ -1,6 +1,5 @@

				/*

				 * Copyright (C) 2016 ScyllaDB

				 * Copyright (C) 2019 ScyllaDB

				 */

				/*

				@@ -22,6 +21,10 @@

				#pragma once

				namespace std { namespace experimental {} }

				namespace seastar { namespace stdx = std::experimental; }

				using namespace seastar;

				#include "api.hh"

				namespace api {

				void set_error_injection(http_context& ctx, routes& r);

				}

									
										2

api/lsa.cc
									
												View File
												
				@@ -23,7 +23,7 @@

				#include "api/lsa.hh"

				#include "api/api.hh"

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "utils/logalloc.hh"

				#include "log.hh"

									
										6

api/messaging_service.cc
									
												View File
												
				@@ -21,7 +21,7 @@

				#include "messaging_service.hh"

				#include "message/messaging_service.hh"

				#include "rpc/rpc_types.hh"

				#include <seastar/rpc/rpc_types.hh>

				#include "api/api-doc/messaging_service.json.hh"

				#include <iostream>

				#include <sstream>

				@@ -76,7 +76,7 @@ future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)

				        auto get_shard_map = [f](messaging_service& ms) {

				            std::unordered_map<gms::inet_address, unsigned long> map;

				            ms.foreach_server_connection_stats([&map, f] (const rpc::client_info& info, const rpc::stats& stats) mutable {

				                map[gms::inet_address(net::ipv4_address(info.addr))] = f(stats);

				                map[gms::inet_address(info.addr.addr())] = f(stats);

				            });

				            return map;

				        };

				@@ -139,7 +139,7 @@ void set_messaging_service(http_context& ctx, routes& r) {

				                messaging_verb v = i; // for type safety we use messaging_verb values

				                auto idx = static_cast<uint32_t>(v);

				                if (idx >= map->size()) {

				                    throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));

				                    throw std::runtime_error(format("verb index out of bounds: {:d}, map size: {:d}", idx, map->size()));

				                }

				                if ((*map)[idx] > 0) {

				                    c.count = (*map)[idx];

									
										271

api/storage_proxy.cc
									
												View File
												
				@@ -26,6 +26,8 @@

				#include "service/storage_service.hh"

				#include "db/config.hh"

				#include "utils/histogram.hh"

				#include "database.hh"

				#include "seastar/core/scheduling_specific.hh"

				namespace api {

				@@ -33,12 +35,70 @@ namespace sp = httpd::storage_proxy_json;

				using proxy = service::storage_proxy;

				using namespace json;

				static future<utils::rate_moving_average>  sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {

				    return d.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average(),

				            std::plus<utils::rate_moving_average>());

				/**

				 * This function implement a two dimentional map reduce where

				 * the first level is a distributed storage_proxy class and the

				 * second level is the stats per scheduling group class.

				 * @param d -  a reference to the storage_proxy distributed class.

				 * @param mapper -  the internal mapper that is used to map the internal

				 * stat class into a value of type `V`.

				 * @param reducer - the reducer that is used in both outer and inner

				 * aggregations.

				 * @param initial_value - the initial value to use for both aggregations

				 * @return A future that resolves to the result of the aggregation.

				 */

				template<typename V, typename Reducer, typename InnerMapper>

				future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,

				        InnerMapper mapper, Reducer reducer, V initial_value) {

				    return d.map_reduce0( [mapper, reducer, initial_value] (const service::storage_proxy& sp) {

				        return map_reduce_scheduling_group_specific<service::storage_proxy_stats::stats>(

				                mapper, reducer, initial_value, sp.get_stats_key());

				    }, initial_value, reducer);

				}

				static future<json::json_return_type>  sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {

				/**

				 * This function implement a two dimentional map reduce where

				 * the first level is a distributed storage_proxy class and the

				 * second level is the stats per scheduling group class.

				 * @param d -  a reference to the storage_proxy distributed class.

				 * @param f - a field pointer which is the implicit internal reducer.

				 * @param reducer - the reducer that is used in both outer and inner

				 * aggregations.

				 * @param initial_value - the initial value to use for both aggregations* @return

				 * @return A future that resolves to the result of the aggregation.

				 */

				template<typename V, typename Reducer, typename F>

				future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,

				        V F::*f, Reducer reducer, V initial_value) {

				    return two_dimensional_map_reduce(d, [f] (F& stats) {

				        return stats.*f;

				    }, reducer, initial_value);

				}

				/**

				 * A partial Specialization of sum_stats for the storage proxy

				 * case where the get stats function doesn't return a

				 * stats object with fields but a per scheduling group

				 * stats object, the name was also changed since functions

				 * partial specialization is not supported in C++.

				 *

				 */

				template<typename V, typename F>

				future<json::json_return_type>  sum_stats_storage_proxy(distributed<proxy>& d, V F::*f) {

				    return two_dimensional_map_reduce(d, [f] (F& stats) { return stats.*f; }, std::plus<V>(), V(0)).then([] (V val) {

				        return make_ready_future<json::json_return_type>(val);

				    });

				}

				static future<utils::rate_moving_average>  sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).rate();

				    }, std::plus<utils::rate_moving_average>(), utils::rate_moving_average());

				}

				static future<json::json_return_type>  sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {

				    return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {

				        httpd::utils_json::rate_moving_average m;

				        m = val;

				@@ -46,29 +106,76 @@ static future<json::json_return_type>  sum_timed_rate_as_obj(distributed<proxy>&

				    });

				}

				static future<json::json_return_type>  sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {

				httpd::utils_json::rate_moving_average_and_histogram get_empty_moving_average() {

				    return timer_to_json(utils::rate_moving_average_and_histogram());

				}

				static future<json::json_return_type>  sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {

				    return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {

				        return make_ready_future<json::json_return_type>(val.count);

				    });

				}

				static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::estimated_histogram proxy::stats::*f) {

				    return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, utils::estimated_histogram(),

				            utils::estimated_histogram_merge).then([](const utils::estimated_histogram& val) {

				static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::estimated_histogram service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(ctx.sp, f, utils::estimated_histogram_merge,

				            utils::estimated_histogram()).then([](const utils::estimated_histogram& val) {

				        utils_json::estimated_histogram res;

				        res = val;

				        return make_ready_future<json::json_return_type>(res);

				    });

				}

				static future<json::json_return_type>  total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram proxy::stats::*f) {

				    return ctx.sp.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).hist.mean * (p.get_stats().*f).hist.count;}, 0.0,

				            std::plus<double>()).then([](double val) {

				static future<json::json_return_type>  total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(ctx.sp, [f] (service::storage_proxy_stats::stats& stats) {

				            return (stats.*f).hist.mean * (stats.*f).hist.count;

				        }, std::plus<double>(), 0.0).then([](double val) {

				        int64_t res = val;

				        return make_ready_future<json::json_return_type>(res);

				    });

				}

				/**

				 * A partial Specialization of sum_histogram_stats

				 * for the storage proxy case where the get stats

				 * function doesn't return a stats object with

				 * fields but a per scheduling group stats object,

				 * the name was also changed since function partial

				 * specialization is not supported in C++.

				 */

				template<typename F>

				future<json::json_return_type>

				sum_histogram_stats_storage_proxy(distributed<proxy>& d,

				        utils::timed_rate_moving_average_and_histogram F::*f) {

				    return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).hist;

				    }, std::plus<utils::ihistogram>(), utils::ihistogram()).

				            then([](const utils::ihistogram& val) {

				        return make_ready_future<json::json_return_type>(to_json(val));

				    });

				}

				/**

				 * A partial Specialization of sum_timer_stats for the

				 * storage proxy case where the get stats function

				 * doesn't return a stats object with fields but a

				 * per scheduling group stats object, the name

				 * was also changed since partial function specialization

				 * is not supported in C++.

				 */

				template<typename F>

				future<json::json_return_type>

				sum_timer_stats_storage_proxy(distributed<proxy>& d,

				        utils::timed_rate_moving_average_and_histogram F::*f) {

				    return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).rate();

				    }, std::plus<utils::rate_moving_average_and_histogram>(),

				            utils::rate_moving_average_and_histogram()).then([](const utils::rate_moving_average_and_histogram& val) {

				        return make_ready_future<json::json_return_type>(timer_to_json(val));

				    });

				}

				void set_storage_proxy(http_context& ctx, routes& r) {

				    sp::get_total_hints.set(r, [](std::unique_ptr<request> req)  {

				        //TBD

				@@ -76,12 +183,9 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {

				        //TBD

				        // FIXME

				        // hinted handoff is not supported currently,

				        // so we should return false

				        return make_ready_future<json::json_return_type>(false);

				    sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req)  {

				        auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();

				        return make_ready_future<json::json_return_type>(enabled);

				    });

				    sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {

				@@ -221,15 +325,15 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				    });

				    sp::get_read_repair_attempted.set(r, [&ctx](std::unique_ptr<request> req)  {

				        return sum_stats(ctx.sp, &proxy::stats::read_repair_attempts);

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_attempts);

				    });

				    sp::get_read_repair_repaired_blocking.set(r, [&ctx](std::unique_ptr<request> req)  {

				        return sum_stats(ctx.sp, &proxy::stats::read_repair_repaired_blocking);

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_blocking);

				    });

				    sp::get_read_repair_repaired_background.set(r, [&ctx](std::unique_ptr<request> req)  {

				        return sum_stats(ctx.sp, &proxy::stats::read_repair_repaired_background);

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_background);

				    });

				    sp::get_schema_versions.set(r, [](std::unique_ptr<request> req)  {

				@@ -245,163 +349,154 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				        });

				    });

				    sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // cas is not supported yet, so just return 0

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);

				    });

				    sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // cas is not supported yet, so just return 0

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);

				    });

				    sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // cas is not supported yet, so just return 0

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);

				    });

				    sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // cas is not supported yet, so just return 0

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);

				    });

				    sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);

				    });

				    sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);

				    });

				    sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);

				    });

				    sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_metrics_failed_read_round_optimization.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_failed_read_round_optimization);

				    });

				    sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);

				    });

				    sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);

				    });

				    sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_timeouts);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);

				    });

				    sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_unavailables);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);

				    });

				    sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_timeouts);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);

				    });

				    sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_unavailables);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);

				    });

				    sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_timeouts);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);

				    });

				    sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_unavailables);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);

				    });

				    sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_timeouts);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);

				    });

				    sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_unavailables);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);

				    });

				    sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_timeouts);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);

				    });

				    sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_unavailables);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);

				    });

				    sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_timeouts);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);

				    });

				    sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_unavailables);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);

				    });

				    sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_histogram_stats(ctx.sp, &proxy::stats::range);

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_histogram_stats(ctx.sp, &proxy::stats::write);

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_histogram_stats(ctx.sp, &proxy::stats::read);

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::range);

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::write);

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);

				    });

				    sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);

				    });

				    sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // No View metrics are available, so just return empty moving average

				        return make_ready_future<json::json_return_type>(get_empty_moving_average());

				    });

				    sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::read);

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::estimated_read);

				        return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_read);

				    });

				    sp::get_read_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				        return total_latency(ctx, &proxy::stats::read);

				        return total_latency(ctx, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::estimated_write);

				        return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_write);

				    });

				    sp::get_write_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				        return total_latency(ctx, &proxy::stats::write);

				        return total_latency(ctx, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::range);

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				        return total_latency(ctx, &proxy::stats::range);

				        return total_latency(ctx, &service::storage_proxy_stats::stats::range);

				    });

				}

									
										461

api/storage_service.cc
									
												View File
												
				@@ -22,19 +22,25 @@

				#include "storage_service.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "db/config.hh"

				#include <optional>

				#include <time.h>

				#include <boost/range/adaptor/map.hpp>

				#include <boost/range/adaptor/filtered.hpp>

				#include <service/storage_service.hh>

				#include <db/commitlog/commitlog.hh>

				#include <gms/gossiper.hh>

				#include <db/system_keyspace.hh>

				#include "http/exception.hh"

				#include "service/storage_service.hh"

				#include "service/load_meter.hh"

				#include "db/commitlog/commitlog.hh"

				#include "gms/gossiper.hh"

				#include "db/system_keyspace.hh"

				#include "seastar/http/exception.hh"

				#include "repair/repair.hh"

				#include "locator/snitch_base.hh"

				#include "column_family.hh"

				#include "log.hh"

				#include "release.hh"

				#include "sstables/compaction_manager.hh"

				#include "sstables/sstables.hh"

				#include "database.hh"

				#include "db/extensions.hh"

				namespace api {

				@@ -48,27 +54,35 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {

				    throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");

				}

				static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {

				    std::vector<ss::token_range> res;

				    for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {

				        ss::token_range r;

				        r.start_token = d._start_token;

				        r.end_token = d._end_token;

				        r.endpoints = d._endpoints;

				        r.rpc_endpoints = d._rpc_endpoints;

				        for (auto det : d._endpoint_details) {

				            ss::endpoint_detail ed;

				            ed.host = det._host;

				            ed.datacenter = det._datacenter;

				            if (det._rack != "") {

				                ed.rack = det._rack;

				            }

				            r.endpoint_details.push(ed);

				static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {

				    ss::token_range r;

				    r.start_token = d._start_token;

				    r.end_token = d._end_token;

				    r.endpoints = d._endpoints;

				    r.rpc_endpoints = d._rpc_endpoints;

				    for (auto det : d._endpoint_details) {

				        ss::endpoint_detail ed;

				        ed.host = det._host;

				        ed.datacenter = det._datacenter;

				        if (det._rack != "") {

				            ed.rack = det._rack;

				        }

				        res.push_back(r);

				        r.endpoint_details.push(ed);

				    }

				    return res;

				    return r;

				}

				using ks_cf_func = std::function<future<json::json_return_type>(http_context&, std::unique_ptr<request>, sstring, std::vector<sstring>)>;

				static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {

				    return [&ctx, f = std::move(f)](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_families = split_cf(req->get_query_param("cf"));

				        if (column_families.empty()) {

				            column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				        }

				        return f(ctx, std::move(req), std::move(keyspace), std::move(column_families));

				    };

				}

				void set_storage_service(http_context& ctx, routes& r) {

				@@ -78,15 +92,15 @@ void set_storage_service(http_context& ctx, routes& r) {

				        });

				    });

				    ss::get_tokens.set(r, [] (std::unique_ptr<request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().sorted_tokens(), [](const dht::token& i) {

				    ss::get_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.token_metadata.local().sorted_tokens(), [](const dht::token& i) {

				           return boost::lexical_cast<std::string>(i);

				        }));

				    });

				    ss::get_node_tokens.set(r, [] (std::unique_ptr<request> req) {

				    ss::get_node_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {

				        gms::inet_address addr(req->param["endpoint"]);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().get_tokens(addr), [](const dht::token& i) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.token_metadata.local().get_tokens(addr), [](const dht::token& i) {

				           return boost::lexical_cast<std::string>(i);

				       }));

				    });

				@@ -104,8 +118,8 @@ void set_storage_service(http_context& ctx, routes& r) {

				        }));

				    });

				    ss::get_leaving_nodes.set(r, [](const_req req) {

				        return container_to_vec(service::get_local_storage_service().get_token_metadata().get_leaving_endpoints());

				    ss::get_leaving_nodes.set(r, [&ctx](const_req req) {

				        return container_to_vec(ctx.token_metadata.local().get_leaving_endpoints());

				    });

				    ss::get_moving_nodes.set(r, [](const_req req) {

				@@ -113,8 +127,8 @@ void set_storage_service(http_context& ctx, routes& r) {

				        return container_to_vec(addr);

				    });

				    ss::get_joining_nodes.set(r, [](const_req req) {

				        auto points = service::get_local_storage_service().get_token_metadata().get_bootstrap_tokens();

				    ss::get_joining_nodes.set(r, [&ctx](const_req req) {

				        auto points = ctx.token_metadata.local().get_bootstrap_tokens();

				        std::unordered_set<sstring> addr;

				        for (auto i: points) {

				            addr.insert(boost::lexical_cast<std::string>(i.second));

				@@ -157,27 +171,26 @@ void set_storage_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(res);

				    });

				    ss::describe_any_ring.set(r, [&ctx](const_req req) {

				        return describe_ring("");

				    ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));

				    });

				    ss::describe_ring.set(r, [&ctx](const_req req) {

				        auto keyspace = validate_keyspace(ctx, req.param);

				        return describe_ring(keyspace);

				    ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));

				    });

				    ss::get_host_id_map.set(r, [](const_req req) {

				    ss::get_host_id_map.set(r, [&ctx](const_req req) {

				        std::vector<ss::mapper> res;

				        return map_to_key_value(service::get_local_storage_service().

				                get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);

				        return map_to_key_value(ctx.token_metadata.local().get_endpoint_to_host_id_map_for_reading(), res);

				    });

				    ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);

				        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);

				    });

				    ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {

				        return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {

				    ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return ctx.lmeter.get_load_map().then([] (auto&& load_map) {

				            std::vector<ss::map_string_double> res;

				            for (auto i : load_map) {

				                ss::map_string_double val;

				@@ -202,64 +215,6 @@ void set_storage_service(http_context& ctx, routes& r) {

				                req.get_query_param("key")));

				    });

				    ss::get_snapshot_details.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().get_snapshot_details().then([] (auto result) {

				            std::vector<ss::snapshots> res;

				            for (auto& map: result) {

				                ss::snapshots all_snapshots;

				                all_snapshots.key = map.first;

				                std::vector<ss::snapshot> snapshot;

				                for (auto& cf: map.second) {

				                    ss::snapshot s;

				                    s.ks = cf.ks;

				                    s.cf = cf.cf;

				                    s.live = cf.live;

				                    s.total = cf.total;

				                    snapshot.push_back(std::move(s));

				                }

				                all_snapshots.value = std::move(snapshot);

				                res.push_back(std::move(all_snapshots));

				            }

				            return make_ready_future<json::json_return_type>(std::move(res));

				        });

				    });

				    ss::take_snapshot.set(r, [](std::unique_ptr<request> req) {

				        auto tag = req->get_query_param("tag");

				        auto column_family = req->get_query_param("cf");

				        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

				        auto resp = make_ready_future<>();

				        if (column_family.empty()) {

				            resp = service::get_local_storage_service().take_snapshot(tag, keynames);

				        } else {

				            if (keynames.size() > 1) {

				                throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");

				            }

				            resp = service::get_local_storage_service().take_column_family_snapshot(keynames[0], column_family, tag);

				        }

				        return resp.then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::del_snapshot.set(r, [](std::unique_ptr<request> req) {

				        auto tag = req->get_query_param("tag");

				        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

				        return service::get_local_storage_service().clear_snapshot(tag, keynames).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::true_snapshots_size.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().true_snapshots_size().then([] (int64_t size) {

				            return make_ready_future<json::json_return_type>(size);

				        });

				    });

				    ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_families = split_cf(req->get_query_param("cf"));

				@@ -285,38 +240,40 @@ void set_storage_service(http_context& ctx, routes& r) {

				        if (column_families.empty()) {

				            column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				        }

				        return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {

				            std::vector<column_family*> column_families_vec;

				            auto& cm = db.get_compaction_manager();

				            for (auto cf : column_families) {

				                column_families_vec.push_back(&db.find_column_family(keyspace, cf));

				        return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,

				                column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {

				            if (!is_cleanup_allowed) {

				                return make_exception_future<json::json_return_type>(

				                        std::runtime_error("Can not perform cleanup operation when topology changes"));

				            }

				            return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {

				                return cm.perform_cleanup(cf);

				            return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {

				                std::vector<column_family*> column_families_vec;

				                auto& cm = db.get_compaction_manager();

				                for (auto cf : column_families) {

				                    column_families_vec.push_back(&db.find_column_family(keyspace, cf));

				                }

				                return parallel_for_each(column_families_vec, [&cm, &db] (column_family* cf) {

				                    return cm.perform_cleanup(db, cf);

				                });

				            }).then([]{

				                return make_ready_future<json::json_return_type>(0);

				            });

				        });

				    });

				    ss::upgrade_sstables.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {

				        bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);

				        return ctx.db.invoke_on_all([=] (database& db) {

				            return do_for_each(column_families, [=, &db](sstring cfname) {

				                auto& cm = db.get_compaction_manager();

				                auto& cf = db.find_column_family(keyspace, cfname);

				                return cm.perform_sstable_upgrade(&cf, exclude_current_version);

				            });

				        }).then([]{

				            return make_ready_future<json::json_return_type>(0);

				        });

				    });

				    ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_family = req->get_query_param("cf");

				        auto disable_snapshot = req->get_query_param("disable_snapshot");

				        auto skip_corrupted = req->get_query_param("skip_corrupted");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    ss::upgrade_sstables.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_family = req->get_query_param("cf");

				        auto exclude_current_version = req->get_query_param("exclude_current_version");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    }));

				    ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				@@ -454,7 +411,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				        return service::get_storage_service().map_reduce(adder<service::storage_service::drain_progress>(), [] (auto& ss) {

				            return ss.get_drain_progress();

				        }).then([] (auto&& progress) {

				            auto progress_str = sprint("Drained %s/%s ColumnFamilies", progress.remaining_cfs, progress.total_cfs);

				            auto progress_str = format("Drained {}/{} ColumnFamilies", progress.remaining_cfs, progress.total_cfs);

				            return make_ready_future<json::json_return_type>(std::move(progress_str));

				        });

				    });

				@@ -559,9 +516,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::join_ring.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().join_ring().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    ss::is_joined.set(r, [] (std::unique_ptr<request> req) {

				@@ -665,7 +620,11 @@ void set_storage_service(http_context& ctx, routes& r) {

				        auto coordinator = std::hash<sstring>()(cf) % smp::count;

				        return service::get_storage_service().invoke_on(coordinator, [ks = std::move(ks), cf = std::move(cf)] (service::storage_service& s) {

				            return s.load_new_sstables(ks, cf);

				        }).then([] {

				        }).then_wrapped([] (auto&& f) {

				            if (f.failed()) {

				                auto msg = fmt::format("Failed to load new sstables: {}", f.get_exception());

				                return make_exception_future<json::json_return_type>(httpd::server_error_exception(msg));

				            }

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				@@ -699,7 +658,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				            } catch (std::out_of_range& e) {

				                throw httpd::bad_param_exception(e.what());

				            } catch (std::invalid_argument&){

				                throw httpd::bad_param_exception(sprint("Bad format in a probability value: \"%s\"", probability.c_str()));

				                throw httpd::bad_param_exception(format("Bad format in a probability value: \"{}\"", probability.c_str()));

				            }

				        });

				    });

				@@ -735,7 +694,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				                return make_ready_future<json::json_return_type>(json_void());

				            });

				        } catch (...) {

				            throw httpd::bad_param_exception(sprint("Bad format value: "));

				            throw httpd::bad_param_exception(format("Bad format value: "));

				        }

				    });

				@@ -817,7 +776,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);

				        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);

				    });

				    ss::get_exceptions.set(r, [](const_req req) {

				@@ -859,6 +818,236 @@ void set_storage_service(http_context& ctx, routes& r) {

				            return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));

				        });

				    });

				    ss::sstable_info.set(r, [&ctx] (std::unique_ptr<request> req) {

				        auto ks = api::req_param<sstring>(*req, "keyspace", {}).value;

				        auto cf = api::req_param<sstring>(*req, "cf", {}).value;

				        // The size of this vector is bound by ks::cf. I.e. it is as most Nks + Ncf long

				        // which is not small, but not huge either. 

				        using table_sstables_list = std::vector<ss::table_sstables>;

				        return do_with(table_sstables_list{}, [ks, cf, &ctx](table_sstables_list& dst) {

				            return service::get_local_storage_service().db().map_reduce([&dst](table_sstables_list&& res) {

				                for (auto&& t : res) {

				                    auto i = std::find_if(dst.begin(), dst.end(), [&t](const ss::table_sstables& t2) {

				                        return t.keyspace() == t2.keyspace() && t.table() == t2.table();

				                    });

				                    if (i == dst.end()) {

				                        dst.emplace_back(std::move(t));

				                        continue;

				                    }

				                    auto& ssd = i->sstables; 

				                    for (auto&& sd : t.sstables._elements) {

				                        auto j = std::find_if(ssd._elements.begin(), ssd._elements.end(), [&sd](const ss::sstable& s) {

				                            return s.generation() == sd.generation();

				                        });

				                        if (j == ssd._elements.end()) {

				                            i->sstables.push(std::move(sd));

				                        }

				                    }

				                }

				            }, [ks, cf](const database& db) {

				                // see above

				                table_sstables_list res;

				                auto& ext = db.get_config().extensions();

				                for (auto& t : db.get_column_families() | boost::adaptors::map_values) {

				                    auto& schema = t->schema();

				                    if ((ks.empty() || ks == schema->ks_name()) && (cf.empty() || cf == schema->cf_name())) {

				                        // at most Nsstables long

				                        ss::table_sstables tst;

				                        tst.keyspace = schema->ks_name();

				                        tst.table = schema->cf_name();

				                        for (auto sstable : *t->get_sstables_including_compacted_undeleted()) {

				                            auto ts = db_clock::to_time_t(sstable->data_file_write_time());

				                            ::tm t;

				                            ::gmtime_r(&ts, &t);

				                            ss::sstable info;

				                            info.timestamp = t;

				                            info.generation = sstable->generation();

				                            info.level = sstable->get_sstable_level();

				                            info.size = sstable->bytes_on_disk();

				                            info.data_size = sstable->ondisk_data_size();

				                            info.index_size = sstable->index_size();

				                            info.filter_size = sstable->filter_size();

				                            info.version = sstable->get_version();

				                            if (sstable->has_component(sstables::component_type::CompressionInfo)) {

				                                auto& c = sstable->get_compression();

				                                auto cp = sstables::get_sstable_compressor(c);

				                                ss::named_maps nm;

				                                nm.group = "compression_parameters";

				                                for (auto& p : cp->options()) {

				                                    ss::mapper e;

				                                    e.key = p.first;

				                                    e.value = p.second;

				                                    nm.attributes.push(std::move(e));

				                                }

				                                if (!cp->options().count(compression_parameters::SSTABLE_COMPRESSION)) {

				                                    ss::mapper e;

				                                    e.key = compression_parameters::SSTABLE_COMPRESSION;

				                                    e.value = cp->name();

				                                    nm.attributes.push(std::move(e));

				                                }

				                                info.extended_properties.push(std::move(nm));

				                            }

				                            sstables::file_io_extension::attr_value_map map;

				                            for (auto* ep : ext.sstable_file_io_extensions()) {

				                                map.merge(ep->get_attributes(*sstable));

				                            }

				                            for (auto& p : map) {

				                                struct {

				                                    const sstring& key; 

				                                    ss::sstable& info;

				                                    void operator()(const std::map<sstring, sstring>& map) const {

				                                        ss::named_maps nm;

				                                        nm.group = key;

				                                        for (auto& p : map) {

				                                            ss::mapper e;

				                                            e.key = p.first;

				                                            e.value = p.second;

				                                            nm.attributes.push(std::move(e));

				                                        }

				                                        info.extended_properties.push(std::move(nm));

				                                    }

				                                    void operator()(const sstring& value) const {

				                                        ss::mapper e;

				                                        e.key = key;

				                                        e.value = value;

				                                        info.properties.push(std::move(e));                                        

				                                    }

				                                } v{p.first, info};

				                                std::visit(v, p.second);

				                            }

				                            tst.sstables.push(std::move(info));

				                        }

				                        res.emplace_back(std::move(tst));

				                    }

				                }

				                std::sort(res.begin(), res.end(), [](const ss::table_sstables& t1, const ss::table_sstables& t2) {

				                    return t1.keyspace() < t2.keyspace() || (t1.keyspace() == t2.keyspace() && t1.table() < t2.table());

				                });

				                return res;

				            }).then([&dst] {

				                return make_ready_future<json::json_return_type>(stream_object(dst));

				            });

				        });

				    });

				}

				void set_snapshot(http_context& ctx, routes& r) {

				    ss::get_snapshot_details.set(r, [](std::unique_ptr<request> req) {

				        std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {

				            return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){

				                return s.write("[").then([&s, &first] {

				                    return service::get_local_storage_service().get_snapshot_details().then([&s, &first] (std::unordered_map<sstring, std::vector<service::storage_service::snapshot_details>>&& result) {

				                        return do_with(std::move(result), [&s, &first](const std::unordered_map<sstring, std::vector<service::storage_service::snapshot_details>>& result) {

				                            return do_for_each(result, [&s, &result,&first](std::tuple<sstring, std::vector<service::storage_service::snapshot_details>>&& map){

				                                return do_with(ss::snapshots(), [&s, &first, &result, &map](ss::snapshots& all_snapshots) {

				                                    all_snapshots.key = std::get<0>(map);

				                                    future<> f = first ? make_ready_future<>() : s.write(", ");

				                                    first = false;

				                                    std::vector<ss::snapshot> snapshot;

				                                    for (auto& cf: std::get<1>(map)) {

				                                        ss::snapshot snp;

				                                        snp.ks = cf.ks;

				                                        snp.cf = cf.cf;

				                                        snp.live = cf.live;

				                                        snp.total = cf.total;

				                                        snapshot.push_back(std::move(snp));

				                                    }

				                                    all_snapshots.value = std::move(snapshot);

				                                    return f.then([&s, &all_snapshots] {

				                                        return all_snapshots.write(s);

				                                    });

				                                });

				                            });

				                        });

				                    }).then([&s] {

				                        return s.write("]").then([&s] {

				                            return s.close();

				                        });

				                    });

				                });

				            });

				        };

				        return make_ready_future<json::json_return_type>(std::move(f));

				    });

				    ss::take_snapshot.set(r, [](std::unique_ptr<request> req) {

				        auto tag = req->get_query_param("tag");

				        auto column_family = req->get_query_param("cf");

				        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

				        auto resp = make_ready_future<>();

				        if (column_family.empty()) {

				            resp = service::get_local_storage_service().take_snapshot(tag, keynames);

				        } else {

				            if (keynames.empty()) {

				                throw httpd::bad_param_exception("The keyspace of column families must be specified");

				            }

				            if (keynames.size() > 1) {

				                throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");

				            }

				            resp = service::get_local_storage_service().take_column_family_snapshot(keynames[0], column_family, tag);

				        }

				        return resp.then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::del_snapshot.set(r, [](std::unique_ptr<request> req) {

				        auto tag = req->get_query_param("tag");

				        auto column_family = req->get_query_param("cf");

				        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

				        return service::get_local_storage_service().clear_snapshot(tag, keynames, column_family).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::true_snapshots_size.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().true_snapshots_size().then([] (int64_t size) {

				            return make_ready_future<json::json_return_type>(size);

				        });

				    });

				    ss::scrub.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {

				        const auto skip_corrupted = req_param<bool>(*req, "skip_corrupted", false);

				        auto f = make_ready_future<>();

				        if (!req_param<bool>(*req, "disable_snapshot", false)) {

				            auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());

				            f = parallel_for_each(column_families, [keyspace, tag](sstring cf) {

				                return service::get_local_storage_service().take_column_family_snapshot(keyspace, cf, tag);

				            });

				        }

				        return f.then([&ctx, keyspace, column_families, skip_corrupted] {

				            return ctx.db.invoke_on_all([=] (database& db) {

				                return do_for_each(column_families, [=, &db](sstring cfname) {

				                    auto& cm = db.get_compaction_manager();

				                    auto& cf = db.find_column_family(keyspace, cfname);

				                    return cm.perform_sstable_scrub(&cf, skip_corrupted);

				                });

				            });

				        }).then([]{

				            return make_ready_future<json::json_return_type>(0);

				        });

				    }));

				}

				}

									
										1

api/storage_service.hh
									
												View File
												
				@@ -26,5 +26,6 @@

				namespace api {

				void set_storage_service(http_context& ctx, routes& r);

				void set_snapshot(http_context& ctx, routes& r);

				}

									
										6

api/system.cc
									
												View File
												
				@@ -22,7 +22,7 @@

				#include "api/api-doc/system.json.hh"

				#include "api/api.hh"

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "log.hh"

				namespace api {

				@@ -30,6 +30,10 @@ namespace api {

				namespace hs = httpd::system_json;

				void set_system(http_context& ctx, routes& r) {

				    hs::get_system_uptime.set(r, [](const_req req) {

				        return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();

				    });

				    hs::get_all_logger_names.set(r, [](const_req req) {

				        return logging::logger_registry().get_all_logger_names();

				    });

									
										115

atomic_cell.cc
									
												View File
												
				@@ -21,6 +21,7 @@

				#include "atomic_cell.hh"

				#include "atomic_cell_or_collection.hh"

				#include "counters.hh"

				#include "types.hh"

				/// LSA mirator for cells with irrelevant type

				@@ -147,35 +148,6 @@ atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type,

				{

				}

				static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)

				{

				    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);

				    auto ti = data::type_info::make_collection();

				    data::cell::context ctx(f, ti);

				    auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);

				    auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());

				    return collection_mutation_view { dv };

				}

				collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {

				    return get_collection_mutation_view(_data.get());

				}

				collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)

				    : _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))

				{

				}

				collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)

				    : _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))

				{

				}

				collection_mutation::operator collection_mutation_view() const

				{

				    return get_collection_mutation_view(_data.get());

				}

				bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const

				{

				    auto ptr_a = _data.get();

				@@ -191,20 +163,20 @@ bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_c

				        if (a.timestamp() != b.timestamp()) {

				            return false;

				        }

				        if (a.is_live() != b.is_live()) {

				            return false;

				        }

				        if (a.is_live()) {

				            if (!b.is_live()) {

				            if (a.is_counter_update() != b.is_counter_update()) {

				                return false;

				            }

				            if (a.is_counter_update()) {

				                if (!b.is_counter_update()) {

				                    return false;

				                }

				                return a.counter_update_value() == b.counter_update_value();

				            }

				            if (a.is_live_and_has_ttl() != b.is_live_and_has_ttl()) {

				                return false;

				            }

				            if (a.is_live_and_has_ttl()) {

				                if (!b.is_live_and_has_ttl()) {

				                    return false;

				                }

				                if (a.ttl() != b.ttl() || a.expiry() != b.expiry()) {

				                    return false;

				                }

				@@ -230,7 +202,7 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)

				    size_t external_value_size = 0;

				    if (flags.get<data::cell::tags::external_data>()) {

				        if (flags.get<data::cell::tags::collection>()) {

				            external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();

				            external_value_size = as_collection_mutation().data.size_bytes();

				        } else {

				            auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);

				            external_value_size = cell_view.value_size();

				@@ -243,16 +215,73 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)

				        + imr_object_type::size_overhead + external_value_size;

				}

				std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection& c) {

				    if (!c._data.get()) {

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell_view& acv) {

				    if (acv.is_live()) {

				        return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",

				            acv.is_counter_update()

				                    ? "counter_update_value=" + to_sstring(acv.counter_update_value())

				                    : to_hex(acv.value().linearize()),

				            acv.timestamp(),

				            acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,

				            acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);

				    } else {

				        return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",

				            acv.timestamp(), acv.deletion_time().time_since_epoch().count());

				    }

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell& ac) {

				    return os << atomic_cell_view(ac);

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {

				    auto& type = acvp._type;

				    auto& acv = acvp._cell;

				    if (acv.is_live()) {

				        std::ostringstream cell_value_string_builder;

				        if (type.is_counter()) {

				            if (acv.is_counter_update()) {

				                cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();

				            } else {

				                cell_value_string_builder << "shards: ";

				                counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {

				                    cell_value_string_builder << ::join(", ", ccv.shards());

				                });

				            }

				        } else {

				            cell_value_string_builder << type.to_string(acv.value().linearize());

				        }

				        return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",

				            cell_value_string_builder.str(),

				            acv.timestamp(),

				            acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,

				            acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);

				    } else {

				        return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",

				            acv.timestamp(), acv.deletion_time().time_since_epoch().count());

				    }

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell::printer& acp) {

				    return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));

				}

				std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {

				    if (!p._cell._data.get()) {

				        return os << "{ null atomic_cell_or_collection }";

				    }

				    using dc = data::cell;

				    os << "{ ";

				    if (dc::structure::get_member<dc::tags::flags>(c._data.get()).get<dc::tags::collection>()) {

				        os << "collection";

				    if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {

				        os << "collection ";

				        auto cmv = p._cell.as_collection_mutation();

				        os << collection_mutation_view::printer(*p._cdef.type, cmv);

				    } else {

				        os << "atomic cell";

				        os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));

				    }

				    return os << " @" << static_cast<const void*>(c._data.get()) << " }";

				    return os << " }";

				}

									
										38

atomic_cell.hh
									
												View File
												
				@@ -26,7 +26,7 @@

				#include "tombstone.hh"

				#include "gc_clock.hh"

				#include "utils/managed_bytes.hh"

				#include "net/byteorder.hh"

				#include <seastar/net//byteorder.hh>

				#include <cstdint>

				#include <iosfwd>

				#include <seastar/util/gcc6-concepts.hh>

				@@ -153,6 +153,14 @@ public:

				    }

				    friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);

				    class printer {

				        const abstract_type& _type;

				        const atomic_cell_view& _cell;

				    public:

				        printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}

				        friend std::ostream& operator<<(std::ostream& os, const printer& acvp);

				    };

				};

				class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {

				@@ -219,30 +227,12 @@ public:

				    static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);

				    friend class atomic_cell_or_collection;

				    friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);

				};

				class collection_mutation_view;

				// Represents a mutation of a collection.  Actual format is determined by collection type,

				// and is:

				//   set:  list of atomic_cell

				//   map:  list of pair<atomic_cell, bytes> (for key/value)

				//   list: tbd, probably ugly

				class collection_mutation {

				public:

				    using imr_object_type =  imr::utils::object<data::cell::structure>;

				    imr_object_type _data;

				    collection_mutation() {}

				    collection_mutation(const collection_type_impl&, collection_mutation_view v);

				    collection_mutation(const collection_type_impl&, bytes_view bv);

				    operator collection_mutation_view() const;

				};

				class collection_mutation_view {

				public:

				    atomic_cell_value_view data;

				    class printer : atomic_cell_view::printer {

				    public:

				        printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}

				        friend std::ostream& operator<<(std::ostream& os, const printer& acvp);

				    };

				};

				class column_definition;

									
										15

atomic_cell_hash.hh
									
												View File
												
				@@ -24,6 +24,7 @@

				// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh

				#include "types.hh"

				#include "types/collection.hh"

				#include "atomic_cell.hh"

				#include "atomic_cell_or_collection.hh"

				#include "hashing.hh"

				@@ -33,14 +34,12 @@ template<>

				struct appending_hash<collection_mutation_view> {

				    template<typename Hasher>

				    void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {

				      cell.data.with_linearized([&] (bytes_view cell_bv) {

				        auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);

				        auto m_view = ctype->deserialize_mutation_form(cell_bv);

				        ::feed_hash(h, m_view.tomb);

				        for (auto&& key_and_value : m_view.cells) {

				            ::feed_hash(h, key_and_value.first);

				            ::feed_hash(h, key_and_value.second, cdef);

				        }

				        cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {

				            ::feed_hash(h, m_view.tomb);

				            for (auto&& key_and_value : m_view.cells) {

				                ::feed_hash(h, key_and_value.first);

				                ::feed_hash(h, key_and_value.second, cdef);

				            }

				      });

				    }

				};

									
										15

atomic_cell_or_collection.hh
									
												View File
												
				@@ -22,6 +22,7 @@

				#pragma once

				#include "atomic_cell.hh"

				#include "collection_mutation.hh"

				#include "schema.hh"

				#include "hashing.hh"

				@@ -67,7 +68,19 @@ public:

				    bytes_view serialize() const;

				    bool equals(const abstract_type& type, const atomic_cell_or_collection& other) const;

				    size_t external_memory_usage(const abstract_type&) const;

				    friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);

				    class printer {

				        const column_definition& _cdef;

				        const atomic_cell_or_collection& _cell;

				    public:

				        printer(const column_definition& cdef, const atomic_cell_or_collection& cell)

				            : _cdef(cdef), _cell(cell) { }

				        printer(const printer&) = delete;

				        printer(printer&&) = delete;

				        friend std::ostream& operator<<(std::ostream&, const printer&);

				    };

				    friend std::ostream& operator<<(std::ostream&, const printer&);

				};

				namespace std {

									
										10

auth/allow_all_authenticator.hh
									
												View File
												
				@@ -52,7 +52,7 @@ public:

				        return make_ready_future<>();

				    }

				    virtual const sstring& qualified_java_name() const override {

				    virtual std::string_view qualified_java_name() const override {

				        return allow_all_authenticator_name();

				    }

				@@ -72,19 +72,19 @@ public:

				        return make_ready_future<authenticated_user>(anonymous_user());

				    }

				    virtual future<> create(stdx::string_view, const authentication_options& options) const override {

				    virtual future<> create(std::string_view, const authentication_options& options) const override {

				        return make_ready_future();

				    }

				    virtual future<> alter(stdx::string_view, const authentication_options& options) const override {

				    virtual future<> alter(std::string_view, const authentication_options& options) const override {

				        return make_ready_future();

				    }

				    virtual future<> drop(stdx::string_view) const override {

				    virtual future<> drop(std::string_view) const override {

				        return make_ready_future();

				    }

				    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const override {

				        return make_ready_future<custom_options>();

				    }

									
										9

auth/allow_all_authorizer.hh
									
												View File
												
				@@ -23,7 +23,6 @@

				#include "auth/authorizer.hh"

				#include "exceptions/exceptions.hh"

				#include "stdx.hh"

				namespace cql3 {

				class query_processor;

				@@ -50,7 +49,7 @@ public:

				        return make_ready_future<>();

				    }

				    virtual const sstring& qualified_java_name() const override {

				    virtual std::string_view qualified_java_name() const override {

				        return allow_all_authorizer_name();

				    }

				@@ -58,12 +57,12 @@ public:

				        return make_ready_future<permission_set>(permissions::ALL);

				    }

				    virtual future<> grant(stdx::string_view, permission_set, const resource&) const override {

				    virtual future<> grant(std::string_view, permission_set, const resource&) const override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke(stdx::string_view, permission_set, const resource&) const override {

				    virtual future<> revoke(std::string_view, permission_set, const resource&) const override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

				@@ -74,7 +73,7 @@ public:

				                        "LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke_all(stdx::string_view) const override {

				    virtual future<> revoke_all(std::string_view) const override {

				        return make_exception_future(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

									
										2

auth/authenticated_user.cc
									
												View File
												
				@@ -45,7 +45,7 @@

				namespace auth {

				authenticated_user::authenticated_user(stdx::string_view name)

				authenticated_user::authenticated_user(std::string_view name)

				        : name(sstring(name)) {

				}

									
										5

auth/authenticated_user.hh
									
												View File
												
				@@ -41,7 +41,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <functional>

				#include <iosfwd>

				#include <optional>

				@@ -49,7 +49,6 @@

				#include <seastar/core/sstring.hh>

				#include "seastarx.hh"

				#include "stdx.hh"

				namespace auth {

				@@ -67,7 +66,7 @@ public:

				    /// An anonymous user.

				    ///

				    authenticated_user() = default;

				    explicit authenticated_user(stdx::string_view name);

				    explicit authenticated_user(std::string_view name);

				};

				///

									
										2

auth/authentication_options.hh
									
												View File
												
				@@ -57,7 +57,7 @@ inline bool any_authentication_options(const authentication_options& aos) noexce

				class unsupported_authentication_option : public std::invalid_argument {

				public:

				    explicit unsupported_authentication_option(authentication_option k)

				            : std::invalid_argument(sprint("The %s option is not supported.", k)) {

				            : std::invalid_argument(format("The {} option is not supported.", k)) {

				    }

				};

									
										1

auth/authenticator.cc
									
												View File
												
				@@ -45,7 +45,6 @@

				#include "auth/common.hh"

				#include "auth/password_authenticator.hh"

				#include "cql3/query_processor.hh"

				#include "db/config.hh"

				#include "utils/class_registrator.hh"

				const sstring auth::authenticator::USERNAME_KEY("username");

									
										28

auth/authenticator.hh
									
												View File
												
				@@ -41,7 +41,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <memory>

				#include <set>

				#include <stdexcept>

				@@ -55,10 +55,10 @@

				#include "auth/authentication_options.hh"

				#include "auth/resource.hh"

				#include "auth/sasl_challenge.hh"

				#include "bytes.hh"

				#include "enum_set.hh"

				#include "exceptions/exceptions.hh"

				#include "stdx.hh"

				namespace db {

				    class config;

				@@ -96,7 +96,7 @@ public:

				    ///

				    /// A fully-qualified (class with package) Java-like name for this implementation.

				    ///

				    virtual const sstring& qualified_java_name() const = 0;

				    virtual std::string_view qualified_java_name() const = 0;

				    virtual bool require_authentication() const = 0;

				@@ -122,7 +122,7 @@ public:

				    ///

				    /// The options provided must be a subset of `supported_options()`.

				    ///

				    virtual future<> create(stdx::string_view role_name, const authentication_options& options) const = 0;

				    virtual future<> create(std::string_view role_name, const authentication_options& options) const = 0;

				    ///

				    /// Alter the authentication record of an existing user.

				@@ -131,39 +131,25 @@ public:

				    ///

				    /// Callers must ensure that the specification of `alterable_options()` is adhered to.

				    ///

				    virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const = 0;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) const = 0;

				    ///

				    /// Delete the authentication record for a user. This will disallow the user from logging in.

				    ///

				    virtual future<> drop(stdx::string_view role_name) const = 0;

				    virtual future<> drop(std::string_view role_name) const = 0;

				    ///

				    /// Query for custom options (those corresponding to \ref authentication_options::options).

				    ///

				    /// If no options are set the result is an empty container.

				    ///

				    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const = 0;

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const = 0;

				    ///

				    /// System resources used internally as part of the implementation. These are made inaccessible to users.

				    ///

				    virtual const resource_set& protected_resources() const = 0;

				    ///

				    /// A stateful SASL challenge which supports many authentication schemes (depending on the implementation).

				    ///

				    class sasl_challenge {

				    public:

				        virtual ~sasl_challenge() = default;

				        virtual bytes evaluate_response(bytes_view client_response) = 0;

				        virtual bool is_complete() const = 0;

				        virtual future<authenticated_user> get_authenticated_user() const = 0;

				    };

				    virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;

				};

									
										11

auth/authorizer.hh
									
												View File
												
				@@ -41,7 +41,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <functional>

				#include <optional>

				#include <stdexcept>

				@@ -54,7 +54,6 @@

				#include "auth/permission.hh"

				#include "auth/resource.hh"

				#include "seastarx.hh"

				#include "stdx.hh"

				namespace auth {

				@@ -101,7 +100,7 @@ public:

				    ///

				    /// A fully-qualified (class with package) Java-like name for this implementation.

				    ///

				    virtual const sstring& qualified_java_name() const = 0;

				    virtual std::string_view qualified_java_name() const = 0;

				    ///

				    /// Query for the permissions granted directly to a role for a particular \ref resource (and not any of its

				@@ -117,14 +116,14 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if granting permissions is not supported.

				    ///

				    virtual future<> grant(stdx::string_view role_name, permission_set, const resource&) const = 0;

				    virtual future<> grant(std::string_view role_name, permission_set, const resource&) const = 0;

				    ///

				    /// Revoke a set of permissions from a role for a particular \ref resource.

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke(stdx::string_view role_name, permission_set, const resource&) const = 0;

				    virtual future<> revoke(std::string_view role_name, permission_set, const resource&) const = 0;

				    ///

				    /// Query for all directly granted permissions.

				@@ -138,7 +137,7 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke_all(stdx::string_view role_name) const = 0;

				    virtual future<> revoke_all(std::string_view role_name) const = 0;

				    ///

				    /// Revoke all permissions granted to any role for a particular resource.

									
										34

auth/common.cc
									
												View File
												
				@@ -48,9 +48,9 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f

				    struct empty_state { };

				    return delay_until_system_ready(as).then([&as, func = std::move(func)] () mutable {

				        return exponential_backoff_retry::do_until_value(1s, 1min, as, [func = std::move(func)] {

				            return func().then_wrapped([] (auto&& f) -> stdx::optional<empty_state> {

				            return func().then_wrapped([] (auto&& f) -> std::optional<empty_state> {

				                if (f.failed()) {

				                    auth_log.info("Auth task failed with error, rescheduling: {}", f.get_exception());

				                    auth_log.debug("Auth task failed with error, rescheduling: {}", f.get_exception());

				                    return { };

				                }

				                return { empty_state() };

				@@ -59,17 +59,15 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f

				    }).discard_result();

				}

				future<> create_metadata_table_if_missing(

				        stdx::string_view table_name,

				static future<> create_metadata_table_if_missing_impl(

				        std::string_view table_name,

				        cql3::query_processor& qp,

				        stdx::string_view cql,

				        std::string_view cql,

				        ::service::migration_manager& mm) {

				    auto& db = qp.db().local();

				    if (db.has_schema(meta::AUTH_KS, sstring(table_name))) {

				        return make_ready_future<>();

				    }

				    static auto ignore_existing = [] (seastar::noncopyable_function<future<>()> func) {

				        return futurize_apply(std::move(func)).handle_exception_type([] (exceptions::already_exists_exception& ignored) { });

				    };

				    auto& db = qp.db();

				    auto parsed_statement = static_pointer_cast<cql3::statements::raw::cf_statement>(

				            cql3::query_processor::parse_statement(cql));

				@@ -78,13 +76,23 @@ future<> create_metadata_table_if_missing(

				    auto statement = static_pointer_cast<cql3::statements::create_table_statement>(

				            parsed_statement->prepare(db, qp.get_cql_stats())->statement);

				    const auto schema = statement->get_cf_meta_data(qp.db().local());

				    const auto schema = statement->get_cf_meta_data(qp.db());

				    const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());

				    schema_builder b(schema);

				    b.set_uuid(uuid);

				    schema_ptr table = b.build();

				    return ignore_existing([&mm, table = std::move(table)] () {

				        return mm.announce_new_column_family(table, false);

				    });

				}

				    return mm.announce_new_column_family(b.build(), false);

				future<> create_metadata_table_if_missing(

				        std::string_view table_name,

				        cql3::query_processor& qp,

				        std::string_view cql,

				        ::service::migration_manager& mm) noexcept {

				    return futurize_apply(create_metadata_table_if_missing_impl, table_name, qp, cql, mm);

				}

				future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db, seastar::abort_source& as) {

									
										8

auth/common.hh
									
												View File
												
				@@ -22,7 +22,7 @@

				#pragma once

				#include <chrono>

				#include <experimental/string_view>

				#include <string_view>

				#include <seastar/core/future.hh>

				#include <seastar/core/abort_source.hh>

				@@ -76,10 +76,10 @@ inline future<> delay_until_system_ready(seastar::abort_source& as) {

				future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);

				future<> create_metadata_table_if_missing(

				        stdx::string_view table_name,

				        std::string_view table_name,

				        cql3::query_processor&,

				        stdx::string_view cql,

				        ::service::migration_manager&);

				        std::string_view cql,

				        ::service::migration_manager&) noexcept;

				future<> wait_for_schema_agreement(::service::migration_manager&, const database&, seastar::abort_source&);

									
										53

auth/default_authorizer.cc
									
												View File
												
				@@ -61,6 +61,7 @@ extern "C" {

				#include "cql3/untyped_result_set.hh"

				#include "exceptions/exceptions.hh"

				#include "log.hh"

				#include "database.hh"

				namespace auth {

				@@ -94,13 +95,13 @@ default_authorizer::~default_authorizer() {

				static const sstring legacy_table_name{"permissions"};

				bool default_authorizer::legacy_metadata_exists() const {

				    return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);

				    return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);

				}

				future<bool> default_authorizer::any_granted() const {

				    static const sstring query = sprint("SELECT * FROM %s.%s LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);

				    static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            infinite_timeout_config,

				@@ -112,9 +113,9 @@ future<bool> default_authorizer::any_granted() const {

				future<> default_authorizer::migrate_legacy_metadata() const {

				    alogger.info("Starting migration of legacy permissions metadata.");

				    static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				@@ -160,7 +161,7 @@ future<> default_authorizer::start() {

				                _migration_manager).then([this] {

				            _finished = do_after_system_ready(_as, [this] {

				                return async([this] {

				                    wait_for_schema_agreement(_migration_manager, _qp.db().local(), _as).get0();

				                    wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();

				                    if (legacy_metadata_exists()) {

				                        if (!any_granted().get0()) {

				@@ -187,15 +188,14 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc

				        return make_ready_future<permission_set>(permissions::NONE);

				    }

				    static const sstring query = sprint(

				            "SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?",

				    static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",

				            PERMISSIONS_NAME,

				            meta::AUTH_KS,

				            PERMISSIONS_CF,

				            ROLE_NAME,

				            RESOURCE_NAME);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            infinite_timeout_config,

				@@ -210,13 +210,12 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc

				future<>

				default_authorizer::modify(

				        stdx::string_view role_name,

				        std::string_view role_name,

				        permission_set set,

				        const resource& resource,

				        stdx::string_view op) const {

				        std::string_view op) const {

				    return do_with(

				            sprint(

				                    "UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",

				            format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",

				                    meta::AUTH_KS,

				                    PERMISSIONS_CF,

				                    PERMISSIONS_NAME,

				@@ -225,7 +224,7 @@ default_authorizer::modify(

				                    ROLE_NAME,

				                    RESOURCE_NAME),

				            [this, &role_name, set, &resource](const auto& query) {

				        return _qp.process(

				        return _qp.execute_internal(

				                query,

				                db::consistency_level::ONE,

				                internal_distributed_timeout_config(),

				@@ -234,24 +233,23 @@ default_authorizer::modify(

				}

				future<> default_authorizer::grant(stdx::string_view role_name, permission_set set, const resource& resource) const {

				future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) const {

				    return modify(role_name, std::move(set), resource, "+");

				}

				future<> default_authorizer::revoke(stdx::string_view role_name, permission_set set, const resource& resource) const {

				future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) const {

				    return modify(role_name, std::move(set), resource, "-");

				}

				future<std::vector<permission_details>> default_authorizer::list_all() const {

				    static const sstring query = sprint(

				            "SELECT %s, %s, %s FROM %s.%s",

				    static const sstring query = format("SELECT {}, {}, {} FROM {}.{}",

				            ROLE_NAME,

				            RESOURCE_NAME,

				            PERMISSIONS_NAME,

				            meta::AUTH_KS,

				            PERMISSIONS_CF);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::ONE,

				            internal_distributed_timeout_config(),

				@@ -272,14 +270,13 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {

				    });

				}

				future<> default_authorizer::revoke_all(stdx::string_view role_name) const {

				    static const sstring query = sprint(

				            "DELETE FROM %s.%s WHERE %s = ?",

				future<> default_authorizer::revoke_all(std::string_view role_name) const {

				    static const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",

				            meta::AUTH_KS,

				            PERMISSIONS_CF,

				            ROLE_NAME);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::ONE,

				            internal_distributed_timeout_config(),

				@@ -293,14 +290,13 @@ future<> default_authorizer::revoke_all(stdx::string_view role_name) const {

				}

				future<> default_authorizer::revoke_all(const resource& resource) const {

				    static const sstring query = sprint(

				            "SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",

				    static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",

				            ROLE_NAME,

				            meta::AUTH_KS,

				            PERMISSIONS_CF,

				            RESOURCE_NAME);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            infinite_timeout_config,

				@@ -311,14 +307,13 @@ future<> default_authorizer::revoke_all(const resource& resource) const {

				                    res->begin(),

				                    res->end(),

				                    [this, res, resource](const cql3::untyped_result_set::row& r) {

				                static const sstring query = sprint(

				                        "DELETE FROM %s.%s WHERE %s = ? AND %s = ?",

				                static const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",

				                        meta::AUTH_KS,

				                        PERMISSIONS_CF,

				                        ROLE_NAME,

				                        RESOURCE_NAME);

				                return _qp.process(

				                return _qp.execute_internal(

				                        query,

				                        db::consistency_level::LOCAL_ONE,

				                        infinite_timeout_config,

									
										10

auth/default_authorizer.hh
									
												View File
												
				@@ -71,19 +71,19 @@ public:

				    virtual future<> stop() override;

				    virtual const sstring& qualified_java_name() const override {

				    virtual std::string_view qualified_java_name() const override {

				        return default_authorizer_name();

				    }

				    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;

				    virtual future<> grant(stdx::string_view, permission_set, const resource&) const override;

				    virtual future<> grant(std::string_view, permission_set, const resource&) const override;

				    virtual future<> revoke( stdx::string_view, permission_set, const resource&) const override;

				    virtual future<> revoke( std::string_view, permission_set, const resource&) const override;

				    virtual future<std::vector<permission_details>> list_all() const override;

				    virtual future<> revoke_all(stdx::string_view) const override;

				    virtual future<> revoke_all(std::string_view) const override;

				    virtual future<> revoke_all(const resource&) const override;

				@@ -96,7 +96,7 @@ private:

				    future<> migrate_legacy_metadata() const;

				    future<> modify(stdx::string_view, permission_set, const resource&, stdx::string_view) const;

				    future<> modify(std::string_view, permission_set, const resource&, std::string_view) const;

				};

				} /* namespace auth */

									
										146

auth/password_authenticator.cc
									
												View File
												
				@@ -44,6 +44,8 @@

				#include <algorithm>

				#include <chrono>

				#include <random>

				#include <string_view>

				#include <optional>

				#include <boost/algorithm/cxx11/all_of.hpp>

				#include <seastar/core/reactor.hh>

				@@ -56,6 +58,7 @@

				#include "log.hh"

				#include "service/migration_manager.hh"

				#include "utils/class_registrator.hh"

				#include "database.hh"

				namespace auth {

				@@ -93,23 +96,25 @@ static bool has_salted_hash(const cql3::untyped_result_set_row& row) {

				    return !row.get_or<sstring>(SALTED_HASH, "").empty();

				}

				static const sstring update_row_query = sprint(

				        "UPDATE %s SET %s = ? WHERE %s = ?",

				        meta::roles_table::qualified_name(),

				        SALTED_HASH,

				        meta::roles_table::role_col_name);

				static const sstring& update_row_query() {

				    static const sstring update_row_query = format("UPDATE {} SET {} = ? WHERE {} = ?",

				            meta::roles_table::qualified_name(),

				            SALTED_HASH,

				            meta::roles_table::role_col_name);

				    return update_row_query;

				}

				static const sstring legacy_table_name{"credentials"};

				bool password_authenticator::legacy_metadata_exists() const {

				    return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);

				    return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);

				}

				future<> password_authenticator::migrate_legacy_metadata() const {

				    plogger.info("Starting migration of legacy authentication metadata.");

				    static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::QUORUM,

				            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				@@ -117,8 +122,8 @@ future<> password_authenticator::migrate_legacy_metadata() const {

				            auto username = row.get_as<sstring>("username");

				            auto salted_hash = row.get_as<sstring>(SALTED_HASH);

				            return _qp.process(

				                    update_row_query,

				            return _qp.execute_internal(

				                    update_row_query(),

				                    consistency_for_user(username),

				                    internal_distributed_timeout_config(),

				                    {std::move(salted_hash), username}).discard_result();

				@@ -134,8 +139,8 @@ future<> password_authenticator::migrate_legacy_metadata() const {

				future<> password_authenticator::create_default_if_missing() const {

				    return default_role_row_satisfies(_qp, &has_salted_hash).then([this](bool exists) {

				        if (!exists) {

				            return _qp.process(

				                    update_row_query,

				            return _qp.execute_internal(

				                    update_row_query(),

				                    db::consistency_level::QUORUM,

				                    internal_distributed_timeout_config(),

				                    {passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt), DEFAULT_USER_NAME}).then([](auto&&) {

				@@ -157,7 +162,7 @@ future<> password_authenticator::start() {

				         _stopped = do_after_system_ready(_as, [this] {

				             return async([this] {

				                 wait_for_schema_agreement(_migration_manager, _qp.db().local(), _as).get0();

				                 wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();

				                 if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {

				                     if (legacy_metadata_exists()) {

				@@ -185,14 +190,14 @@ future<> password_authenticator::stop() {

				    return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});

				}

				db::consistency_level password_authenticator::consistency_for_user(stdx::string_view role_name) {

				db::consistency_level password_authenticator::consistency_for_user(std::string_view role_name) {

				    if (role_name == DEFAULT_USER_NAME) {

				        return db::consistency_level::QUORUM;

				    }

				    return db::consistency_level::LOCAL_ONE;

				}

				const sstring& password_authenticator::qualified_java_name() const {

				std::string_view password_authenticator::qualified_java_name() const {

				    return password_authenticator_name();

				}

				@@ -211,10 +216,10 @@ authentication_option_set password_authenticator::alterable_options() const {

				future<authenticated_user> password_authenticator::authenticate(

				                const credentials_map& credentials) const {

				    if (!credentials.count(USERNAME_KEY)) {

				        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));

				        throw exceptions::authentication_exception(format("Required key '{}' is missing", USERNAME_KEY));

				    }

				    if (!credentials.count(PASSWORD_KEY)) {

				        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));

				        throw exceptions::authentication_exception(format("Required key '{}' is missing", PASSWORD_KEY));

				    }

				    auto& username = credentials.at(USERNAME_KEY);

				@@ -226,13 +231,12 @@ future<authenticated_user> password_authenticator::authenticate(

				    // Rely on query processing caching statements instead, and lets assume

				    // that a map lookup string->statement is not gonna kill us much.

				    return futurize_apply([this, username, password] {

				        static const sstring query = sprint(

				                "SELECT %s FROM %s WHERE %s = ?",

				        static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",

				                SALTED_HASH,

				                meta::roles_table::qualified_name(),

				                meta::roles_table::role_col_name);

				        return _qp.process(

				        return _qp.execute_internal(

				                query,

				                consistency_for_user(username),

				                internal_distributed_timeout_config(),

				@@ -241,7 +245,7 @@ future<authenticated_user> password_authenticator::authenticate(

				    }).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {

				        try {

				            auto res = f.get0();

				            auto salted_hash = std::experimental::optional<sstring>();

				            auto salted_hash = std::optional<sstring>();

				            if (!res->empty()) {

				                salted_hash = res->one().get_opt<sstring>(SALTED_HASH);

				            }

				@@ -253,56 +257,56 @@ future<authenticated_user> password_authenticator::authenticate(

				            std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));

				        } catch (exceptions::request_execution_exception& e) {

				            std::throw_with_nested(exceptions::authentication_exception(e.what()));

				        } catch (exceptions::authentication_exception& e) {

				            std::throw_with_nested(e);

				        } catch (...) {

				            std::throw_with_nested(exceptions::authentication_exception("authentication failed"));

				        }

				    });

				}

				future<> password_authenticator::create(stdx::string_view role_name, const authentication_options& options) const {

				future<> password_authenticator::create(std::string_view role_name, const authentication_options& options) const {

				    if (!options.password) {

				        return make_ready_future<>();

				    }

				    return _qp.process(

				            update_row_query,

				    return _qp.execute_internal(

				            update_row_query(),

				            consistency_for_user(role_name),

				            internal_distributed_timeout_config(),

				            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();

				}

				future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) const {

				future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options) const {

				    if (!options.password) {

				        return make_ready_future<>();

				    }

				    static const sstring query = sprint(

				            "UPDATE %s SET %s = ? WHERE %s = ?",

				    static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",

				            meta::roles_table::qualified_name(),

				            SALTED_HASH,

				            meta::roles_table::role_col_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            consistency_for_user(role_name),

				            internal_distributed_timeout_config(),

				            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();

				}

				future<> password_authenticator::drop(stdx::string_view name) const {

				    static const sstring query = sprint(

				            "DELETE %s FROM %s WHERE %s = ?",

				future<> password_authenticator::drop(std::string_view name) const {

				    static const sstring query = format("DELETE {} FROM {} WHERE {} = ?",

				            SALTED_HASH,

				            meta::roles_table::qualified_name(),

				            meta::roles_table::role_col_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query, consistency_for_user(name),

				            internal_distributed_timeout_config(),

				            {sstring(name)}).discard_result();

				}

				future<custom_options> password_authenticator::query_custom_options(stdx::string_view role_name) const {

				future<custom_options> password_authenticator::query_custom_options(std::string_view role_name) const {

				    return make_ready_future<custom_options>();

				}

				@@ -311,75 +315,13 @@ const resource_set& password_authenticator::protected_resources() const {

				    return resources;

				}

				::shared_ptr<authenticator::sasl_challenge> password_authenticator::new_sasl_challenge() const {

				    class plain_text_password_challenge : public sasl_challenge {

				        const password_authenticator& _self;

				    public:

				        plain_text_password_challenge(const password_authenticator& self) : _self(self) {

				        }

				        /**

				         * SASL PLAIN mechanism specifies that credentials are encoded in a

				         * sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).

				         * The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}

				         * authzId is optional, and in fact we don't care about it here as we'll

				         * set the authzId to match the authnId (that is, there is no concept of

				         * a user being authorized to act on behalf of another).

				         *

				         * @param bytes encoded credentials string sent by the client

				         * @return map containing the username/password pairs in the form an IAuthenticator

				         * would expect

				         * @throws javax.security.sasl.SaslException

				         */

				        bytes evaluate_response(bytes_view client_response) override {

				            plogger.debug("Decoding credentials from client token");

				            sstring username, password;

				            auto b = client_response.crbegin();

				            auto e = client_response.crend();

				            auto i = b;

				            while (i != e) {

				                if (*i == 0) {

				                    sstring tmp(i.base(), b.base());

				                    if (password.empty()) {

				                        password = std::move(tmp);

				                    } else if (username.empty()) {

				                        username = std::move(tmp);

				                    }

				                    b = ++i;

				                    continue;

				                }

				                ++i;

				            }

				            if (username.empty()) {

				                throw exceptions::authentication_exception("Authentication ID must not be null");

				            }

				            if (password.empty()) {

				                throw exceptions::authentication_exception("Password must not be null");

				            }

				            _credentials[USERNAME_KEY] = std::move(username);

				            _credentials[PASSWORD_KEY] = std::move(password);

				            _complete = true;

				            return {};

				        }

				        bool is_complete() const override {

				            return _complete;

				        }

				        future<authenticated_user> get_authenticated_user() const override {

				            return _self.authenticate(_credentials);

				        }

				    private:

				        credentials_map _credentials;

				        bool _complete = false;

				    };

				    return ::make_shared<plain_text_password_challenge>(*this);

				::shared_ptr<sasl_challenge> password_authenticator::new_sasl_challenge() const {

				    return ::make_shared<plain_sasl_challenge>([this](std::string_view username, std::string_view password) {

				        credentials_map credentials{};

				        credentials[USERNAME_KEY] = sstring(username);

				        credentials[PASSWORD_KEY] = sstring(password);

				        return this->authenticate(credentials);

				    });

				}

				}

									
										12

auth/password_authenticator.hh
									
												View File
												
				@@ -61,7 +61,7 @@ class password_authenticator : public authenticator {

				    seastar::abort_source _as;

				public:

				    static db::consistency_level consistency_for_user(stdx::string_view role_name);

				    static db::consistency_level consistency_for_user(std::string_view role_name);

				    password_authenticator(cql3::query_processor&, ::service::migration_manager&);

				@@ -71,7 +71,7 @@ public:

				    virtual future<> stop() override;

				    virtual const sstring& qualified_java_name() const override;

				    virtual std::string_view qualified_java_name() const override;

				    virtual bool require_authentication() const override;

				@@ -81,13 +81,13 @@ public:

				    virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;

				    virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override;

				    virtual future<> create(std::string_view role_name, const authentication_options& options) const override;

				    virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) const override;

				    virtual future<> drop(stdx::string_view role_name) const override;

				    virtual future<> drop(std::string_view role_name) const override;

				    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override;

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const override;

				    virtual const resource_set& protected_resources() const override;

									
										10

auth/permissions_cache.cc
									
												View File
												
				@@ -24,19 +24,9 @@

				#include "auth/authorizer.hh"

				#include "auth/common.hh"

				#include "auth/service.hh"

				#include "db/config.hh"

				namespace auth {

				permissions_cache_config permissions_cache_config::from_db_config(const db::config& dc) {

				    permissions_cache_config c;

				    c.max_entries = dc.permissions_cache_max_entries();

				    c.validity_period = std::chrono::milliseconds(dc.permissions_validity_in_ms());

				    c.update_period = std::chrono::milliseconds(dc.permissions_update_interval_in_ms());

				    return c;

				}

				permissions_cache::permissions_cache(const permissions_cache_config& c, service& ser, logging::logger& log)

				        : _cache(c.max_entries, c.validity_period, c.update_period, log, [&ser, &log](const key_type& k) {

				              log.debug("Refreshing permissions for {}", k.first);

									
										5

auth/permissions_cache.hh
									
												View File
												
				@@ -22,7 +22,7 @@

				#pragma once

				#include <chrono>

				#include <experimental/string_view>

				#include <string_view>

				#include <functional>

				#include <iostream>

				#include <optional>

				@@ -37,7 +37,6 @@

				#include "auth/resource.hh"

				#include "auth/role_or_anonymous.hh"

				#include "log.hh"

				#include "stdx.hh"

				#include "utils/hash.hh"

				#include "utils/loading_cache.hh"

				@@ -59,8 +58,6 @@ namespace auth {

				class service;

				struct permissions_cache_config final {

				    static permissions_cache_config from_db_config(const db::config&);

				    std::size_t max_entries;

				    std::chrono::milliseconds validity_period;

				    std::chrono::milliseconds update_period;

									
										35

auth/resource.cc
									
												View File
												
				@@ -61,7 +61,7 @@ std::ostream& operator<<(std::ostream& os, resource_kind kind) {

				    return os;

				}

				static const std::unordered_map<resource_kind, stdx::string_view> roots{

				static const std::unordered_map<resource_kind, std::string_view> roots{

				        {resource_kind::data, "data"},

				        {resource_kind::role, "roles"}};

				@@ -101,24 +101,25 @@ static permission_set applicable_permissions(const role_resource_view& rv) {

				            permission::DESCRIBE>();

				}

				resource::resource(resource_kind kind) : _kind(kind), _parts{sstring(roots.at(kind))}  {

				resource::resource(resource_kind kind) : _kind(kind) {

				    _parts.emplace_back(roots.at(kind));

				}

				resource::resource(resource_kind kind, std::vector<sstring> parts) : resource(kind) {

				    _parts.reserve(parts.size() + 1);

				resource::resource(resource_kind kind, utils::small_vector<sstring, 3> parts) : resource(kind) {

				    _parts.insert(_parts.end(), std::make_move_iterator(parts.begin()), std::make_move_iterator(parts.end()));

				}

				resource::resource(data_resource_t, stdx::string_view keyspace)

				        : resource(resource_kind::data, std::vector<sstring>{sstring(keyspace)}) {

				resource::resource(data_resource_t, std::string_view keyspace) : resource(resource_kind::data) {

				    _parts.emplace_back(keyspace);

				}

				resource::resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table)

				        : resource(resource_kind::data, std::vector<sstring>{sstring(keyspace), sstring(table)}) {

				resource::resource(data_resource_t, std::string_view keyspace, std::string_view table) : resource(resource_kind::data) {

				    _parts.emplace_back(keyspace);

				    _parts.emplace_back(table);

				}

				resource::resource(role_resource_t, stdx::string_view role)

				        : resource(resource_kind::role, std::vector<sstring>{sstring(role)}) {

				resource::resource(role_resource_t, std::string_view role) : resource(resource_kind::role) {

				    _parts.emplace_back(role);

				}

				sstring resource::name() const {

				@@ -173,7 +174,7 @@ data_resource_view::data_resource_view(const resource& r) : _resource(r) {

				    }

				}

				std::optional<stdx::string_view> data_resource_view::keyspace() const {

				std::optional<std::string_view> data_resource_view::keyspace() const {

				    if (_resource._parts.size() == 1) {

				        return {};

				    }

				@@ -181,7 +182,7 @@ std::optional<stdx::string_view> data_resource_view::keyspace() const {

				    return _resource._parts[1];

				}

				std::optional<stdx::string_view> data_resource_view::table() const {

				std::optional<std::string_view> data_resource_view::table() const {

				    if (_resource._parts.size() <= 2) {

				        return {};

				    }

				@@ -210,7 +211,7 @@ role_resource_view::role_resource_view(const resource& r) : _resource(r) {

				    }

				}

				std::optional<stdx::string_view> role_resource_view::role() const {

				std::optional<std::string_view> role_resource_view::role() const {

				    if (_resource._parts.size() == 1) {

				        return {};

				    }

				@@ -230,9 +231,9 @@ std::ostream& operator<<(std::ostream& os, const role_resource_view& v) {

				    return os;

				}

				resource parse_resource(stdx::string_view name) {

				    static const std::unordered_map<stdx::string_view, resource_kind> reverse_roots = [] {

				        std::unordered_map<stdx::string_view, resource_kind> result;

				resource parse_resource(std::string_view name) {

				    static const std::unordered_map<std::string_view, resource_kind> reverse_roots = [] {

				        std::unordered_map<std::string_view, resource_kind> result;

				        for (const auto& pair : roots) {

				            result.emplace(pair.second, pair.first);

				@@ -241,7 +242,7 @@ resource parse_resource(stdx::string_view name) {

				        return result;

				    }();

				    std::vector<sstring> parts;

				    utils::small_vector<sstring, 3> parts;

				    boost::split(parts, name, [](char ch) { return ch == '/'; });

				    if (parts.empty()) {

									
										36

auth/resource.hh
									
												View File
												
				@@ -41,7 +41,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <iostream>

				#include <optional>

				#include <stdexcept>

				@@ -54,15 +54,15 @@

				#include "auth/permission.hh"

				#include "seastarx.hh"

				#include "stdx.hh"

				#include "utils/hash.hh"

				#include "utils/small_vector.hh"

				namespace auth {

				class invalid_resource_name : public std::invalid_argument {

				public:

				    explicit invalid_resource_name(stdx::string_view name)

				            : std::invalid_argument(sprint("The resource name '%s' is invalid.", name)) {

				    explicit invalid_resource_name(std::string_view name)

				            : std::invalid_argument(format("The resource name '{}' is invalid.", name)) {

				    }

				};

				@@ -98,16 +98,16 @@ struct role_resource_t final {};

				class resource final {

				    resource_kind _kind;

				    std::vector<sstring> _parts;

				    utils::small_vector<sstring, 3> _parts;

				public:

				    ///

				    /// A root resource of a particular kind.

				    ///

				    explicit resource(resource_kind);

				    resource(data_resource_t, stdx::string_view keyspace);

				    resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table);

				    resource(role_resource_t, stdx::string_view role);

				    resource(data_resource_t, std::string_view keyspace);

				    resource(data_resource_t, std::string_view keyspace, std::string_view table);

				    resource(role_resource_t, std::string_view role);

				    resource_kind kind() const noexcept {

				        return _kind;

				@@ -123,7 +123,7 @@ public:

				    permission_set applicable_permissions() const;

				private:

				    resource(resource_kind, std::vector<sstring> parts);

				    resource(resource_kind, utils::small_vector<sstring, 3> parts);

				    friend class std::hash<resource>;

				    friend class data_resource_view;

				@@ -131,7 +131,7 @@ private:

				    friend bool operator<(const resource&, const resource&);

				    friend bool operator==(const resource&, const resource&);

				    friend resource parse_resource(stdx::string_view);

				    friend resource parse_resource(std::string_view);

				};

				bool operator<(const resource&, const resource&);

				@@ -150,7 +150,7 @@ class resource_kind_mismatch : public std::invalid_argument {

				public:

				    explicit resource_kind_mismatch(resource_kind expected, resource_kind actual)

				        : std::invalid_argument(

				            sprint("This resource has kind '%s', but was expected to have kind '%s'.", actual, expected)) {

				            format("This resource has kind '{}', but was expected to have kind '{}'.", actual, expected)) {

				    }

				};

				@@ -166,9 +166,9 @@ public:

				    ///

				    explicit data_resource_view(const resource& r);

				    std::optional<stdx::string_view> keyspace() const;

				    std::optional<std::string_view> keyspace() const;

				    std::optional<stdx::string_view> table() const;

				    std::optional<std::string_view> table() const;

				};

				std::ostream& operator<<(std::ostream&, const data_resource_view&);

				@@ -187,7 +187,7 @@ public:

				    ///

				    explicit role_resource_view(const resource&);

				    std::optional<stdx::string_view> role() const;

				    std::optional<std::string_view> role() const;

				};

				std::ostream& operator<<(std::ostream&, const role_resource_view&);

				@@ -197,20 +197,20 @@ std::ostream& operator<<(std::ostream&, const role_resource_view&);

				///

				/// \throws \ref invalid_resource_name when the name is malformed.

				///

				resource parse_resource(stdx::string_view name);

				resource parse_resource(std::string_view name);

				const resource& root_data_resource();

				inline resource make_data_resource(stdx::string_view keyspace) {

				inline resource make_data_resource(std::string_view keyspace) {

				    return resource(data_resource_t{}, keyspace);

				}

				inline resource make_data_resource(stdx::string_view keyspace, stdx::string_view table) {

				inline resource make_data_resource(std::string_view keyspace, std::string_view table) {

				    return resource(data_resource_t{}, keyspace, table);

				}

				const resource& root_role_resource();

				inline resource make_role_resource(stdx::string_view role) {

				inline resource make_role_resource(std::string_view role) {

				    return resource(role_resource_t{}, role);

				}

									
										44

auth/role_manager.hh
									
												View File
												
				@@ -21,7 +21,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <memory>

				#include <optional>

				#include <stdexcept>

				@@ -33,7 +33,7 @@

				#include "auth/resource.hh"

				#include "seastarx.hh"

				#include "stdx.hh"

				#include "exceptions/exceptions.hh"

				namespace auth {

				@@ -53,38 +53,38 @@ struct role_config_update final {

				///

				/// A logical argument error for a role-management operation.

				///

				class roles_argument_exception : public std::invalid_argument {

				class roles_argument_exception : public exceptions::invalid_request_exception {

				public:

				    using std::invalid_argument::invalid_argument;

				    using exceptions::invalid_request_exception::invalid_request_exception;

				};

				class role_already_exists : public roles_argument_exception {

				public:

				    explicit role_already_exists(stdx::string_view role_name)

				            : roles_argument_exception(sprint("Role %s already exists.", role_name)) {

				    explicit role_already_exists(std::string_view role_name)

				            : roles_argument_exception(format("Role {} already exists.", role_name)) {

				    }

				};

				class nonexistant_role : public roles_argument_exception {

				public:

				    explicit nonexistant_role(stdx::string_view role_name)

				            : roles_argument_exception(sprint("Role %s doesn't exist.", role_name)) {

				    explicit nonexistant_role(std::string_view role_name)

				            : roles_argument_exception(format("Role {} doesn't exist.", role_name)) {

				    }

				};

				class role_already_included : public roles_argument_exception {

				public:

				    role_already_included(stdx::string_view grantee_name, stdx::string_view role_name)

				    role_already_included(std::string_view grantee_name, std::string_view role_name)

				            : roles_argument_exception(

				                      sprint("%s already includes role %s.", grantee_name, role_name)) {

				                      format("{} already includes role {}.", grantee_name, role_name)) {

				    }

				};

				class revoke_ungranted_role : public roles_argument_exception {

				public:

				    revoke_ungranted_role(stdx::string_view revokee_name, stdx::string_view role_name)

				    revoke_ungranted_role(std::string_view revokee_name, std::string_view role_name)

				            : roles_argument_exception(

				                      sprint("%s was not granted role %s, so it cannot be revoked.", revokee_name, role_name)) {

				                      format("{} was not granted role {}, so it cannot be revoked.", revokee_name, role_name)) {

				    }

				};

				@@ -104,7 +104,7 @@ class role_manager {

				public:

				    virtual ~role_manager() = default;

				    virtual stdx::string_view qualified_java_name() const noexcept = 0;

				    virtual std::string_view qualified_java_name() const noexcept = 0;

				    virtual const resource_set& protected_resources() const = 0;

				@@ -115,17 +115,17 @@ public:

				    ///

				    /// \returns an exceptional future with \ref role_already_exists for a role that has previously been created.

				    ///

				    virtual future<> create(stdx::string_view role_name, const role_config&) const = 0;

				    virtual future<> create(std::string_view role_name, const role_config&) const = 0;

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				    ///

				    virtual future<> drop(stdx::string_view role_name) const = 0;

				    virtual future<> drop(std::string_view role_name) const = 0;

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				    ///

				    virtual future<> alter(stdx::string_view role_name, const role_config_update&) const = 0;

				    virtual future<> alter(std::string_view role_name, const role_config_update&) const = 0;

				    ///

				    /// Grant `role_name` to `grantee_name`.

				@@ -135,7 +135,7 @@ public:

				    /// \returns an exceptional future with \ref role_already_included if granting the role would be redundant, or

				    /// create a cycle.

				    ///

				    virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const = 0;

				    virtual future<> grant(std::string_view grantee_name, std::string_view role_name) const = 0;

				    ///

				    /// Revoke `role_name` from `revokee_name`.

				@@ -144,26 +144,26 @@ public:

				    ///

				    /// \returns an exceptional future with \ref revoke_ungranted_role if the role was not granted.

				    ///

				    virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const = 0;

				    virtual future<> revoke(std::string_view revokee_name, std::string_view role_name) const = 0;

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				    ///

				    virtual future<role_set> query_granted(stdx::string_view grantee, recursive_role_query) const = 0;

				    virtual future<role_set> query_granted(std::string_view grantee, recursive_role_query) const = 0;

				    virtual future<role_set> query_all() const = 0;

				    virtual future<bool> exists(stdx::string_view role_name) const = 0;

				    virtual future<bool> exists(std::string_view role_name) const = 0;

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				    ///

				    virtual future<bool> is_superuser(stdx::string_view role_name) const = 0;

				    virtual future<bool> is_superuser(std::string_view role_name) const = 0;

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				    ///

				    virtual future<bool> can_login(stdx::string_view role_name) const = 0;

				    virtual future<bool> can_login(std::string_view role_name) const = 0;

				};

				}

									
										5

auth/role_or_anonymous.hh
									
												View File
												
				@@ -21,7 +21,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <functional>

				#include <iosfwd>

				#include <optional>

				@@ -29,7 +29,6 @@

				#include <seastar/core/sstring.hh>

				#include "seastarx.hh"

				#include "stdx.hh"

				namespace auth {

				@@ -38,7 +37,7 @@ public:

				    std::optional<sstring> name{};

				    role_or_anonymous() = default;

				    role_or_anonymous(stdx::string_view name) : name(name) {

				    role_or_anonymous(std::string_view name) : name(name) {

				    }

				};

									
										15

auth/roles-metadata.cc
									
												View File
												
				@@ -36,7 +36,7 @@ namespace meta {

				namespace roles_table {

				stdx::string_view creation_query() {

				std::string_view creation_query() {

				    static const sstring instance = sprint(

				            "CREATE TABLE %s ("

				            "  %s text PRIMARY KEY,"

				@@ -51,7 +51,7 @@ stdx::string_view creation_query() {

				    return instance;

				}

				stdx::string_view qualified_name() noexcept {

				std::string_view qualified_name() noexcept {

				    static const sstring instance = AUTH_KS + "." + sstring(name);

				    return instance;

				}

				@@ -63,20 +63,19 @@ stdx::string_view qualified_name() noexcept {

				future<bool> default_role_row_satisfies(

				        cql3::query_processor& qp,

				        std::function<bool(const cql3::untyped_result_set_row&)> p) {

				    static const sstring query = sprint(

				            "SELECT * FROM %s WHERE %s = ?",

				    static const sstring query = format("SELECT * FROM {} WHERE {} = ?",

				            meta::roles_table::qualified_name(),

				            meta::roles_table::role_col_name);

				    return do_with(std::move(p), [&qp](const auto& p) {

				        return qp.process(

				        return qp.execute_internal(

				                query,

				                db::consistency_level::ONE,

				                infinite_timeout_config,

				                {meta::DEFAULT_SUPERUSER_NAME},

				                true).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {

				            if (results->empty()) {

				                return qp.process(

				                return qp.execute_internal(

				                        query,

				                        db::consistency_level::QUORUM,

				                        internal_distributed_timeout_config(),

				@@ -98,10 +97,10 @@ future<bool> default_role_row_satisfies(

				future<bool> any_nondefault_role_row_satisfies(

				        cql3::query_processor& qp,

				        std::function<bool(const cql3::untyped_result_set_row&)> p) {

				    static const sstring query = sprint("SELECT * FROM %s", meta::roles_table::qualified_name());

				    static const sstring query = format("SELECT * FROM {}", meta::roles_table::qualified_name());

				    return do_with(std::move(p), [&qp](const auto& p) {

				        return qp.process(

				        return qp.execute_internal(

				                query,

				                db::consistency_level::QUORUM,

				                internal_distributed_timeout_config()).then([&p](::shared_ptr<cql3::untyped_result_set> results) {

									
										11

auth/roles-metadata.hh
									
												View File
												
				@@ -21,13 +21,12 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <functional>

				#include <seastar/core/future.hh>

				#include "seastarx.hh"

				#include "stdx.hh"

				namespace cql3 {

				class query_processor;

				@@ -40,13 +39,13 @@ namespace meta {

				namespace roles_table {

				stdx::string_view creation_query();

				std::string_view creation_query();

				constexpr stdx::string_view name{"roles", 5};

				constexpr std::string_view name{"roles", 5};

				stdx::string_view qualified_name() noexcept;

				std::string_view qualified_name() noexcept;

				constexpr stdx::string_view role_col_name{"role", 4};

				constexpr std::string_view role_col_name{"role", 4};

				}

									
										102

auth/sasl_challenge.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,102 @@

				/*

				 * Licensed to the Apache Software Foundation (ASF) under one

				 * or more contributor license agreements.  See the NOTICE file

				 * distributed with this work for additional information

				 * regarding copyright ownership.  The ASF licenses this file

				 * to you under the Apache License, Version 2.0 (the

				 * "License"); you may not use this file except in compliance

				 * with the License.  You may obtain a copy of the License at

				 *

				 *     http://www.apache.org/licenses/LICENSE-2.0

				 *

				 * Unless required by applicable law or agreed to in writing, software

				 * distributed under the License is distributed on an "AS IS" BASIS,

				 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

				 * See the License for the specific language governing permissions and

				 * limitations under the License.

				 */

				/*

				 * Copyright (C) 2019 ScyllaDB

				 *

				 * Modified by ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "auth/sasl_challenge.hh"

				#include "exceptions/exceptions.hh"

				namespace auth {

				/**

				 * SASL PLAIN mechanism specifies that credentials are encoded in a

				 * sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).

				 * The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}

				 * authzId is optional, and in fact we don't care about it here as we'll

				 * set the authzId to match the authnId (that is, there is no concept of

				 * a user being authorized to act on behalf of another).

				 *

				 * @param bytes encoded credentials string sent by the client

				 * @return map containing the username/password pairs in the form an IAuthenticator

				 * would expect

				 * @throws javax.security.sasl.SaslException

				 */

				bytes plain_sasl_challenge::evaluate_response(bytes_view client_response) {

				    sstring username, password;

				    auto b = client_response.crbegin();

				    auto e = client_response.crend();

				    auto i = b;

				    while (i != e) {

				        if (*i == 0) {

				            sstring tmp(i.base(), b.base());

				            if (password.empty()) {

				                password = std::move(tmp);

				            } else if (username.empty()) {

				                username = std::move(tmp);

				            }

				            b = ++i;

				            continue;

				        }

				        ++i;

				    }

				    if (username.empty()) {

				        throw exceptions::authentication_exception("Authentication ID must not be null");

				    }

				    if (password.empty()) {

				        throw exceptions::authentication_exception("Password must not be null");

				    }

				    _username = std::move(username);

				    _password = std::move(password);

				    return {};

				}

				bool plain_sasl_challenge::is_complete() const {

				    return _username && _password;

				}

				future<authenticated_user> plain_sasl_challenge::get_authenticated_user() const {

				    return _when_complete(*_username, *_password);

				}

				}

									
										89

auth/sasl_challenge.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,89 @@

				/*

				 * Licensed to the Apache Software Foundation (ASF) under one

				 * or more contributor license agreements.  See the NOTICE file

				 * distributed with this work for additional information

				 * regarding copyright ownership.  The ASF licenses this file

				 * to you under the Apache License, Version 2.0 (the

				 * "License"); you may not use this file except in compliance

				 * with the License.  You may obtain a copy of the License at

				 *

				 *     http://www.apache.org/licenses/LICENSE-2.0

				 *

				 * Unless required by applicable law or agreed to in writing, software

				 * distributed under the License is distributed on an "AS IS" BASIS,

				 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

				 * See the License for the specific language governing permissions and

				 * limitations under the License.

				 */

				/*

				 * Copyright (C) 2019 ScyllaDB

				 *

				 * Modified by ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <functional>

				#include <optional>

				#include <string_view>

				#include <seastar/core/future.hh>

				#include <seastar/core/sstring.hh>

				#include "auth/authenticated_user.hh"

				#include "bytes.hh"

				#include "seastarx.hh"

				namespace auth {

				///

				/// A stateful SASL challenge which supports many authentication schemes (depending on the implementation).

				///

				class sasl_challenge {

				public:

				    virtual ~sasl_challenge() = default;

				    virtual bytes evaluate_response(bytes_view client_response) = 0;

				    virtual bool is_complete() const = 0;

				    virtual future<authenticated_user> get_authenticated_user() const = 0;

				};

				class plain_sasl_challenge : public sasl_challenge {

				public:

				    using completion_callback = std::function<future<authenticated_user>(std::string_view, std::string_view)>;

				    explicit plain_sasl_challenge(completion_callback f) : _when_complete(std::move(f)) {

				    }

				    virtual bytes evaluate_response(bytes_view) override;

				    virtual bool is_complete() const override;

				    virtual future<authenticated_user> get_authenticated_user() const override;

				private:

				    std::optional<sstring> _username, _password;

				    completion_callback _when_complete;

				};

				}

									
										104

auth/service.cc
									
												View File
												
				@@ -36,12 +36,12 @@

				#include "auth/standard_role_manager.hh"

				#include "cql3/query_processor.hh"

				#include "cql3/untyped_result_set.hh"

				#include "db/config.hh"

				#include "db/consistency_level_type.hh"

				#include "exceptions/exceptions.hh"

				#include "log.hh"

				#include "service/migration_listener.hh"

				#include "service/migration_manager.hh"

				#include "utils/class_registrator.hh"

				#include "database.hh"

				namespace auth {

				@@ -77,17 +77,23 @@ private:

				    void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}

				    void on_drop_keyspace(const sstring& ks_name) override {

				        _authorizer.revoke_all(

				        // Do it in the background.

				        (void)_authorizer.revoke_all(

				                auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        }).handle_exception([] (std::exception_ptr e) {

				            log.error("Unexpected exception while revoking all permissions on dropped keyspace: {}", e);

				        });

				    }

				    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {

				        _authorizer.revoke_all(

				        // Do it in the background.

				        (void)_authorizer.revoke_all(

				                auth::make_data_resource(

				                        ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        }).handle_exception([] (std::exception_ptr e) {

				            log.error("Unexpected exception while revoking all permissions on dropped table: {}", e);

				        });

				    }

				@@ -97,7 +103,7 @@ private:

				    void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}

				};

				static future<> validate_role_exists(const service& ser, stdx::string_view role_name) {

				static future<> validate_role_exists(const service& ser, std::string_view role_name) {

				    return ser.underlying_role_manager().exists(role_name).then([role_name](bool exists) {

				        if (!exists) {

				            throw nonexistant_role(role_name);

				@@ -105,30 +111,17 @@ static future<> validate_role_exists(const service& ser, stdx::string_view role_

				    });

				}

				service_config service_config::from_db_config(const db::config& dc) {

				    const qualified_name qualified_authorizer_name(meta::AUTH_PACKAGE_NAME, dc.authorizer());

				    const qualified_name qualified_authenticator_name(meta::AUTH_PACKAGE_NAME, dc.authenticator());

				    const qualified_name qualified_role_manager_name(meta::AUTH_PACKAGE_NAME, dc.role_manager());

				    service_config c;

				    c.authorizer_java_name = qualified_authorizer_name;

				    c.authenticator_java_name = qualified_authenticator_name;

				    c.role_manager_java_name = qualified_role_manager_name;

				    return c;

				}

				service::service(

				        permissions_cache_config c,

				        cql3::query_processor& qp,

				        ::service::migration_manager& mm,

				        ::service::migration_notifier& mn,

				        std::unique_ptr<authorizer> z,

				        std::unique_ptr<authenticator> a,

				        std::unique_ptr<role_manager> r)

				            : _permissions_cache_config(std::move(c))

				            , _permissions_cache(nullptr)

				            , _qp(qp)

				            , _migration_manager(mm)

				            , _mnotifier(mn)

				            , _authorizer(std::move(z))

				            , _authenticator(std::move(a))

				            , _role_manager(std::move(r))

				@@ -139,8 +132,7 @@ service::service(

				    if ((_authenticator->qualified_java_name() == password_authenticator_name())

				            && (_role_manager->qualified_java_name() != standard_role_manager_name())) {

				        throw incompatible_module_combination(

				                sprint(

				                        "The %s authenticator must be loaded alongside the %s role-manager.",

				                format("The {} authenticator must be loaded alongside the {} role-manager.",

				                        password_authenticator_name(),

				                        standard_role_manager_name()));

				    }

				@@ -149,19 +141,20 @@ service::service(

				service::service(

				        permissions_cache_config c,

				        cql3::query_processor& qp,

				        ::service::migration_notifier& mn,

				        ::service::migration_manager& mm,

				        const service_config& sc)

				            : service(

				                      std::move(c),

				                      qp,

				                      mm,

				                      mn,

				                      create_object<authorizer>(sc.authorizer_java_name, qp, mm),

				                      create_object<authenticator>(sc.authenticator_java_name, qp, mm),

				                      create_object<role_manager>(sc.role_manager_java_name, qp, mm)) {

				}

				future<> service::create_keyspace_if_missing() const {

				    auto& db = _qp.db().local();

				future<> service::create_keyspace_if_missing(::service::migration_manager& mm) const {

				    auto& db = _qp.db();

				    if (!db.has_keyspace(meta::AUTH_KS)) {

				        std::map<sstring, sstring> opts{{"replication_factor", "1"}};

				@@ -174,15 +167,15 @@ future<> service::create_keyspace_if_missing() const {

				        // We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.

				        // See issue #2129.

				        return _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);

				        return mm.announce_new_keyspace(ksm, api::min_timestamp, false);

				    }

				    return make_ready_future<>();

				}

				future<> service::start() {

				    return once_among_shards([this] {

				        return create_keyspace_if_missing();

				future<> service::start(::service::migration_manager& mm) {

				    return once_among_shards([this, &mm] {

				        return create_keyspace_if_missing(mm);

				    }).then([this] {

				        return _role_manager->start().then([this] {

				            return when_all_succeed(_authorizer->start(), _authenticator->start());

				@@ -191,7 +184,7 @@ future<> service::start() {

				        _permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);

				    }).then([this] {

				        return once_among_shards([this] {

				            _migration_manager.register_listener(_migration_listener.get());

				            _mnotifier.register_listener(_migration_listener.get());

				            return make_ready_future<>();

				        });

				    });

				@@ -200,33 +193,34 @@ future<> service::start() {

				future<> service::stop() {

				    // Only one of the shards has the listener registered, but let's try to

				    // unregister on each one just to make sure.

				    _migration_manager.unregister_listener(_migration_listener.get());

				    return _permissions_cache->stop().then([this] {

				    return _mnotifier.unregister_listener(_migration_listener.get()).then([this] {

				        if (_permissions_cache) {

				            return _permissions_cache->stop();

				        }

				        return make_ready_future<>();

				    }).then([this] {

				        return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());

				    });

				}

				future<bool> service::has_existing_legacy_users() const {

				    if (!_qp.db().local().has_schema(meta::AUTH_KS, meta::USERS_CF)) {

				    if (!_qp.db().has_schema(meta::AUTH_KS, meta::USERS_CF)) {

				        return make_ready_future<bool>(false);

				    }

				    static const sstring default_user_query = sprint(

				            "SELECT * FROM %s.%s WHERE %s = ?",

				    static const sstring default_user_query = format("SELECT * FROM {}.{} WHERE {} = ?",

				            meta::AUTH_KS,

				            meta::USERS_CF,

				            meta::user_name_col_name);

				    static const sstring all_users_query = sprint(

				            "SELECT * FROM %s.%s LIMIT 1",

				    static const sstring all_users_query = format("SELECT * FROM {}.{} LIMIT 1",

				            meta::AUTH_KS,

				            meta::USERS_CF);

				    // This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we

				    // can potentially avoid doing a range query with a high consistency level.

				    return _qp.process(

				    return _qp.execute_internal(

				            default_user_query,

				            db::consistency_level::ONE,

				            infinite_timeout_config,

				@@ -236,7 +230,7 @@ future<bool> service::has_existing_legacy_users() const {

				            return make_ready_future<bool>(true);

				        }

				        return _qp.process(

				        return _qp.execute_internal(

				                default_user_query,

				                db::consistency_level::QUORUM,

				                infinite_timeout_config,

				@@ -246,7 +240,7 @@ future<bool> service::has_existing_legacy_users() const {

				                return make_ready_future<bool>(true);

				            }

				            return _qp.process(

				            return _qp.execute_internal(

				                    all_users_query,

				                    db::consistency_level::QUORUM,

				                    infinite_timeout_config).then([](auto results) {

				@@ -262,7 +256,7 @@ service::get_uncached_permissions(const role_or_anonymous& maybe_role, const res

				        return _authorizer->authorize(maybe_role, r);

				    }

				    const stdx::string_view role_name = *maybe_role.name;

				    const std::string_view role_name = *maybe_role.name;

				    return has_superuser(role_name).then([this, role_name, &r](bool superuser) {

				        if (superuser) {

				@@ -276,7 +270,7 @@ service::get_uncached_permissions(const role_or_anonymous& maybe_role, const res

				        return do_with(permission_set(), [this, role_name, &r](auto& all_perms) {

				            return get_roles(role_name).then([this, &r, &all_perms](role_set all_roles) {

				                return do_with(std::move(all_roles), [this, &r, &all_perms](const auto& all_roles) {

				                    return parallel_for_each(all_roles, [this, &r, &all_perms](stdx::string_view role_name) {

				                    return parallel_for_each(all_roles, [this, &r, &all_perms](std::string_view role_name) {

				                        return _authorizer->authorize(role_name, r).then([&all_perms](permission_set perms) {

				                            all_perms = permission_set::from_mask(all_perms.mask() | perms.mask());

				                        });

				@@ -293,7 +287,7 @@ future<permission_set> service::get_permissions(const role_or_anonymous& maybe_r

				    return _permissions_cache->get(maybe_role, r);

				}

				future<bool> service::has_superuser(stdx::string_view role_name) const {

				future<bool> service::has_superuser(std::string_view role_name) const {

				    return this->get_roles(std::move(role_name)).then([this](role_set roles) {

				        return do_with(std::move(roles), [this](const role_set& roles) {

				            return do_with(false, roles.begin(), [this, &roles](bool& any_super, auto& iter) {

				@@ -311,7 +305,7 @@ future<bool> service::has_superuser(stdx::string_view role_name) const {

				    });

				}

				future<role_set> service::get_roles(stdx::string_view role_name) const {

				future<role_set> service::get_roles(std::string_view role_name) const {

				    //

				    // We may wish to cache this information in the future (as Apache Cassandra does).

				    //

				@@ -322,7 +316,7 @@ future<role_set> service::get_roles(stdx::string_view role_name) const {

				future<bool> service::exists(const resource& r) const {

				    switch (r.kind()) {

				        case resource_kind::data: {

				            const auto& db = _qp.db().local();

				            const auto& db = _qp.db();

				            data_resource_view v(r);

				            const auto keyspace = v.keyspace();

				@@ -417,7 +411,7 @@ static void validate_authentication_options_are_supported(

				future<> create_role(

				        const service& ser,

				        stdx::string_view name,

				        std::string_view name,

				        const role_config& config,

				        const authentication_options& options) {

				    return ser.underlying_role_manager().create(name, config).then([&ser, name, &options] {

				@@ -441,7 +435,7 @@ future<> create_role(

				future<> alter_role(

				        const service& ser,

				        stdx::string_view name,

				        std::string_view name,

				        const role_config_update& config_update,

				        const authentication_options& options) {

				    return ser.underlying_role_manager().alter(name, config_update).then([&ser, name, &options] {

				@@ -458,7 +452,7 @@ future<> alter_role(

				    });

				}

				future<> drop_role(const service& ser, stdx::string_view name) {

				future<> drop_role(const service& ser, std::string_view name) {

				    return do_with(make_role_resource(name), [&ser, name](const resource& r) {

				        auto& a = ser.underlying_authorizer();

				@@ -474,14 +468,14 @@ future<> drop_role(const service& ser, stdx::string_view name) {

				    });

				}

				future<bool> has_role(const service& ser, stdx::string_view grantee, stdx::string_view name) {

				future<bool> has_role(const service& ser, std::string_view grantee, std::string_view name) {

				    return when_all_succeed(

				            validate_role_exists(ser, name),

				            ser.get_roles(grantee)).then([name](role_set all_roles) {

				        return make_ready_future<bool>(all_roles.count(sstring(name)) != 0);

				    });

				}

				future<bool> has_role(const service& ser, const authenticated_user& u, stdx::string_view name) {

				future<bool> has_role(const service& ser, const authenticated_user& u, std::string_view name) {

				    if (is_anonymous(u)) {

				        return make_ready_future<bool>(false);

				    }

				@@ -491,7 +485,7 @@ future<bool> has_role(const service& ser, const authenticated_user& u, stdx::str

				future<> grant_permissions(

				        const service& ser,

				        stdx::string_view role_name,

				        std::string_view role_name,

				        permission_set perms,

				        const resource& r) {

				    return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {

				@@ -499,7 +493,7 @@ future<> grant_permissions(

				    });

				}

				future<> grant_applicable_permissions(const service& ser, stdx::string_view role_name, const resource& r) {

				future<> grant_applicable_permissions(const service& ser, std::string_view role_name, const resource& r) {

				    return grant_permissions(ser, role_name, r.applicable_permissions(), r);

				}

				future<> grant_applicable_permissions(const service& ser, const authenticated_user& u, const resource& r) {

				@@ -512,7 +506,7 @@ future<> grant_applicable_permissions(const service& ser, const authenticated_us

				future<> revoke_permissions(

				        const service& ser,

				        stdx::string_view role_name,

				        std::string_view role_name,

				        permission_set perms,

				        const resource& r) {

				    return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {

				@@ -523,7 +517,7 @@ future<> revoke_permissions(

				future<std::vector<permission_details>> list_filtered_permissions(

				        const service& ser,

				        permission_set perms,

				        std::optional<stdx::string_view> role_name,

				        std::optional<std::string_view> role_name,

				        const std::optional<std::pair<resource, recursive_permissions>>& resource_filter) {

				    return ser.underlying_authorizer().list_all().then([&ser, perms, role_name, &resource_filter](

				            std::vector<permission_details> all_details) {

									
										46

auth/service.hh
									
												View File
												
				@@ -21,13 +21,14 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <memory>

				#include <optional>

				#include <seastar/core/future.hh>

				#include <seastar/core/sstring.hh>

				#include <seastar/util/bool_class.hh>

				#include <seastar/core/sharded.hh>

				#include "auth/authenticator.hh"

				#include "auth/authorizer.hh"

				@@ -35,18 +36,14 @@

				#include "auth/permissions_cache.hh"

				#include "auth/role_manager.hh"

				#include "seastarx.hh"

				#include "stdx.hh"

				namespace cql3 {

				class query_processor;

				}

				namespace db {

				class config;

				}

				namespace service {

				class migration_manager;

				class migration_notifier;

				class migration_listener;

				}

				@@ -55,8 +52,6 @@ namespace auth {

				class role_or_anonymous;

				struct service_config final {

				    static service_config from_db_config(const db::config&);

				    sstring authorizer_java_name;

				    sstring authenticator_java_name;

				    sstring role_manager_java_name;

				@@ -83,13 +78,15 @@ public:

				///

				/// All state associated with access-control is stored externally to any particular instance of this class.

				///

				class service final {

				/// peering_sharded_service inheritance is needed to be able to access shard local authentication service

				/// given an object from another shard. Used for bouncing lwt requests to correct shard.

				class service final : public seastar::peering_sharded_service<service> {

				    permissions_cache_config _permissions_cache_config;

				    std::unique_ptr<permissions_cache> _permissions_cache;

				    cql3::query_processor& _qp;

				    ::service::migration_manager& _migration_manager;

				    ::service::migration_notifier& _mnotifier;

				    std::unique_ptr<authorizer> _authorizer;

				@@ -104,7 +101,7 @@ public:

				    service(

				            permissions_cache_config,

				            cql3::query_processor&,

				            ::service::migration_manager&,

				            ::service::migration_notifier&,

				            std::unique_ptr<authorizer>,

				            std::unique_ptr<authenticator>,

				            std::unique_ptr<role_manager>);

				@@ -117,10 +114,11 @@ public:

				    service(

				            permissions_cache_config,

				            cql3::query_processor&,

				            ::service::migration_notifier&,

				            ::service::migration_manager&,

				            const service_config&);

				    future<> start();

				    future<> start(::service::migration_manager&);

				    future<> stop();

				@@ -141,13 +139,13 @@ public:

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				    ///

				    future<bool> has_superuser(stdx::string_view role_name) const;

				    future<bool> has_superuser(std::string_view role_name) const;

				    ///

				    /// Return the set of all roles granted to the given role, including itself and roles granted through other roles.

				    ///

				    /// \returns an exceptional future with \ref nonexistent_role if the role does not exist.

				    future<role_set> get_roles(stdx::string_view role_name) const;

				    future<role_set> get_roles(std::string_view role_name) const;

				    future<bool> exists(const resource&) const;

				@@ -166,7 +164,7 @@ public:

				private:

				    future<bool> has_existing_legacy_users() const;

				    future<> create_keyspace_if_missing() const;

				    future<> create_keyspace_if_missing(::service::migration_manager& mm) const;

				};

				future<bool> has_superuser(const service&, const authenticated_user&);

				@@ -197,7 +195,7 @@ bool is_protected(const service&, const resource&) noexcept;

				///

				future<> create_role(

				        const service&,

				        stdx::string_view name,

				        std::string_view name,

				        const role_config&,

				        const authentication_options&);

				@@ -210,7 +208,7 @@ future<> create_role(

				///

				future<> alter_role(

				        const service&,

				        stdx::string_view name,

				        std::string_view name,

				        const role_config_update&,

				        const authentication_options&);

				@@ -219,20 +217,20 @@ future<> alter_role(

				///

				/// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.

				///

				future<> drop_role(const service&, stdx::string_view name);

				future<> drop_role(const service&, std::string_view name);

				///

				/// Check if `grantee` has been granted the named role.

				///

				/// \returns an exceptional future with \ref nonexistent_role if `grantee` or `name` do not exist.

				///

				future<bool> has_role(const service&, stdx::string_view grantee, stdx::string_view name);

				future<bool> has_role(const service&, std::string_view grantee, std::string_view name);

				///

				/// Check if the authenticated user has been granted the named role.

				///

				/// \returns an exceptional future with \ref nonexistent_role if the user or `name` do not exist.

				///

				future<bool> has_role(const service&, const authenticated_user&, stdx::string_view name);

				future<bool> has_role(const service&, const authenticated_user&, std::string_view name);

				///

				/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.

				@@ -242,7 +240,7 @@ future<bool> has_role(const service&, const authenticated_user&, stdx::string_vi

				///

				future<> grant_permissions(

				        const service&,

				        stdx::string_view role_name,

				        std::string_view role_name,

				        permission_set,

				        const resource&);

				@@ -254,7 +252,7 @@ future<> grant_permissions(

				/// \returns an exceptional future with \ref unsupported_authorization_operation if granting permissions is not

				/// supported.

				///

				future<> grant_applicable_permissions(const service&, stdx::string_view role_name, const resource&);

				future<> grant_applicable_permissions(const service&, std::string_view role_name, const resource&);

				future<> grant_applicable_permissions(const service&, const authenticated_user&, const resource&);

				///

				@@ -265,7 +263,7 @@ future<> grant_applicable_permissions(const service&, const authenticated_user&,

				///

				future<> revoke_permissions(

				        const service&,

				        stdx::string_view role_name,

				        std::string_view role_name,

				        permission_set,

				        const resource&);

				@@ -290,7 +288,7 @@ using recursive_permissions = bool_class<struct recursive_permissions_tag>;

				future<std::vector<permission_details>> list_filtered_permissions(

				        const service&,

				        permission_set,

				        std::optional<stdx::string_view> role_name,

				        std::optional<std::string_view> role_name,

				        const std::optional<std::pair<resource, recursive_permissions>>& resource_filter);

				}

									
										117

auth/standard_role_manager.cc
									
												View File
												
				@@ -21,7 +21,7 @@

				#include "auth/standard_role_manager.hh"

				#include <experimental/optional>

				#include <optional>

				#include <unordered_set>

				#include <vector>

				@@ -35,10 +35,12 @@

				#include "auth/common.hh"

				#include "auth/roles-metadata.hh"

				#include "cql3/query_processor.hh"

				#include "cql3/untyped_result_set.hh"

				#include "db/consistency_level_type.hh"

				#include "exceptions/exceptions.hh"

				#include "log.hh"

				#include "utils/class_registrator.hh"

				#include "database.hh"

				namespace auth {

				@@ -46,9 +48,9 @@ namespace meta {

				namespace role_members_table {

				constexpr stdx::string_view name{"role_members" , 12};

				constexpr std::string_view name{"role_members" , 12};

				static stdx::string_view qualified_name() noexcept {

				static std::string_view qualified_name() noexcept {

				    static const sstring instance = AUTH_KS + "." + sstring(name);

				    return instance;

				}

				@@ -72,7 +74,7 @@ struct record final {

				    role_set member_of;

				};

				static db::consistency_level consistency_for_role(stdx::string_view role_name) noexcept {

				static db::consistency_level consistency_for_role(std::string_view role_name) noexcept {

				    if (role_name == meta::DEFAULT_SUPERUSER_NAME) {

				        return db::consistency_level::QUORUM;

				    }

				@@ -80,37 +82,36 @@ static db::consistency_level consistency_for_role(stdx::string_view role_name) n

				    return db::consistency_level::LOCAL_ONE;

				}

				static future<stdx::optional<record>> find_record(cql3::query_processor& qp, stdx::string_view role_name) {

				    static const sstring query = sprint(

				            "SELECT * FROM %s WHERE %s = ?",

				static future<std::optional<record>> find_record(cql3::query_processor& qp, std::string_view role_name) {

				    static const sstring query = format("SELECT * FROM {} WHERE {} = ?",

				            meta::roles_table::qualified_name(),

				            meta::roles_table::role_col_name);

				    return qp.process(

				    return qp.execute_internal(

				            query,

				            consistency_for_role(role_name),

				            internal_distributed_timeout_config(),

				            {sstring(role_name)},

				            true).then([](::shared_ptr<cql3::untyped_result_set> results) {

				        if (results->empty()) {

				            return stdx::optional<record>();

				            return std::optional<record>();

				        }

				        const cql3::untyped_result_set_row& row = results->one();

				        return stdx::make_optional(

				        return std::make_optional(

				                record{

				                        row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),

				                        row.get_as<bool>("is_superuser"),

				                        row.get_as<bool>("can_login"),

				                        row.get_or<bool>("is_superuser", false),

				                        row.get_or<bool>("can_login", false),

				                        (row.has("member_of")

				                                 ? row.get_set<sstring>("member_of")

				                                 : role_set())});

				    });

				}

				static future<record> require_record(cql3::query_processor& qp, stdx::string_view role_name) {

				    return find_record(qp, role_name).then([role_name](stdx::optional<record> mr) {

				static future<record> require_record(cql3::query_processor& qp, std::string_view role_name) {

				    return find_record(qp, role_name).then([role_name](std::optional<record> mr) {

				        if (!mr) {

				            throw nonexistant_role(role_name);

				        }

				@@ -123,12 +124,12 @@ static bool has_can_login(const cql3::untyped_result_set_row& row) {

				    return row.has("can_login") && !(boolean_type->deserialize(row.get_blob("can_login")).is_null());

				}

				stdx::string_view standard_role_manager_name() noexcept {

				std::string_view standard_role_manager_name() noexcept {

				    static const sstring instance = meta::AUTH_PACKAGE_NAME + "CassandraRoleManager";

				    return instance;

				}

				stdx::string_view standard_role_manager::qualified_java_name() const noexcept {

				std::string_view standard_role_manager::qualified_java_name() const noexcept {

				    return standard_role_manager_name();

				}

				@@ -166,12 +167,11 @@ future<> standard_role_manager::create_metadata_tables_if_missing() const {

				future<> standard_role_manager::create_default_role_if_missing() const {

				    return default_role_row_satisfies(_qp, &has_can_login).then([this](bool exists) {

				        if (!exists) {

				            static const sstring query = sprint(

				                    "INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, true, true)",

				            static const sstring query = format("INSERT INTO {} ({}, is_superuser, can_login) VALUES (?, true, true)",

				                    meta::roles_table::qualified_name(),

				                    meta::roles_table::role_col_name);

				            return _qp.process(

				            return _qp.execute_internal(

				                    query,

				                    db::consistency_level::QUORUM,

				                    internal_distributed_timeout_config(),

				@@ -191,20 +191,20 @@ future<> standard_role_manager::create_default_role_if_missing() const {

				static const sstring legacy_table_name{"users"};

				bool standard_role_manager::legacy_metadata_exists() const {

				    return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);

				    return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);

				}

				future<> standard_role_manager::migrate_legacy_metadata() const {

				    log.info("Starting migration of legacy user metadata.");

				    static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::QUORUM,

				            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {

				            role_config config;

				            config.is_superuser = row.get_as<bool>("super");

				            config.is_superuser = row.get_or<bool>("super", false);

				            config.can_login = true;

				            return do_with(

				@@ -227,7 +227,7 @@ future<> standard_role_manager::start() {

				        return this->create_metadata_tables_if_missing().then([this] {

				            _stopped = auth::do_after_system_ready(_as, [this] {

				                return seastar::async([this] {

				                    wait_for_schema_agreement(_migration_manager, _qp.db().local(), _as).get0();

				                    wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();

				                    if (any_nondefault_role_row_satisfies(_qp, &has_can_login).get0()) {

				                        if (this->legacy_metadata_exists()) {

				@@ -254,13 +254,12 @@ future<> standard_role_manager::stop() {

				    return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});;

				}

				future<> standard_role_manager::create_or_replace(stdx::string_view role_name, const role_config& c) const {

				    static const sstring query = sprint(

				            "INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, ?, ?)",

				future<> standard_role_manager::create_or_replace(std::string_view role_name, const role_config& c) const {

				    static const sstring query = format("INSERT INTO {} ({}, is_superuser, can_login) VALUES (?, ?, ?)",

				            meta::roles_table::qualified_name(),

				            meta::roles_table::role_col_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            consistency_for_role(role_name),

				            internal_distributed_timeout_config(),

				@@ -269,7 +268,7 @@ future<> standard_role_manager::create_or_replace(stdx::string_view role_name, c

				}

				future<>

				standard_role_manager::create(stdx::string_view role_name, const role_config& c) const {

				standard_role_manager::create(std::string_view role_name, const role_config& c) const {

				    return this->exists(role_name).then([this, role_name, &c](bool role_exists) {

				        if (role_exists) {

				            throw role_already_exists(role_name);

				@@ -280,7 +279,7 @@ standard_role_manager::create(stdx::string_view role_name, const role_config& c)

				}

				future<>

				standard_role_manager::alter(stdx::string_view role_name, const role_config_update& u) const {

				standard_role_manager::alter(std::string_view role_name, const role_config_update& u) const {

				    static const auto build_column_assignments = [](const role_config_update& u) -> sstring {

				        std::vector<sstring> assignments;

				@@ -300,9 +299,8 @@ standard_role_manager::alter(stdx::string_view role_name, const role_config_upda

				            return make_ready_future<>();

				        }

				        return _qp.process(

				                sprint(

				                        "UPDATE %s SET %s WHERE %s = ?",

				        return _qp.execute_internal(

				                format("UPDATE {} SET {} WHERE {} = ?",

				                        meta::roles_table::qualified_name(),

				                        build_column_assignments(u),

				                        meta::roles_table::role_col_name),

				@@ -312,7 +310,7 @@ standard_role_manager::alter(stdx::string_view role_name, const role_config_upda

				    });

				}

				future<> standard_role_manager::drop(stdx::string_view role_name) const {

				future<> standard_role_manager::drop(std::string_view role_name) const {

				    return this->exists(role_name).then([this, role_name](bool role_exists) {

				        if (!role_exists) {

				            throw nonexistant_role(role_name);

				@@ -320,11 +318,10 @@ future<> standard_role_manager::drop(stdx::string_view role_name) const {

				        // First, revoke this role from all roles that are members of it.

				        const auto revoke_from_members = [this, role_name] {

				            static const sstring query = sprint(

				                    "SELECT member FROM %s WHERE role = ?",

				            static const sstring query = format("SELECT member FROM {} WHERE role = ?",

				                    meta::role_members_table::qualified_name());

				            return _qp.process(

				            return _qp.execute_internal(

				                    query,

				                    consistency_for_role(role_name),

				                    internal_distributed_timeout_config(),

				@@ -359,12 +356,11 @@ future<> standard_role_manager::drop(stdx::string_view role_name) const {

				        // Finally, delete the role itself.

				        auto delete_role = [this, role_name] {

				            static const sstring query = sprint(

				                    "DELETE FROM %s WHERE %s = ?",

				            static const sstring query = format("DELETE FROM {} WHERE {} = ?",

				                    meta::roles_table::qualified_name(),

				                    meta::roles_table::role_col_name);

				            return _qp.process(

				            return _qp.execute_internal(

				                    query,

				                    consistency_for_role(role_name),

				                    internal_distributed_timeout_config(),

				@@ -379,19 +375,19 @@ future<> standard_role_manager::drop(stdx::string_view role_name) const {

				future<>

				standard_role_manager::modify_membership(

				        stdx::string_view grantee_name,

				        stdx::string_view role_name,

				        std::string_view grantee_name,

				        std::string_view role_name,

				        membership_change ch) const {

				    const auto modify_roles = [this, role_name, grantee_name, ch] {

				        const auto query = sprint(

				                "UPDATE %s SET member_of = member_of %s ? WHERE %s = ?",

				        const auto query = format(

				                "UPDATE {} SET member_of = member_of {} ? WHERE {} = ?",

				                meta::roles_table::qualified_name(),

				                (ch == membership_change::add ? '+' : '-'),

				                meta::roles_table::role_col_name);

				        return _qp.process(

				        return _qp.execute_internal(

				                query,

				                consistency_for_role(grantee_name),

				                internal_distributed_timeout_config(),

				@@ -401,18 +397,16 @@ standard_role_manager::modify_membership(

				    const auto modify_role_members = [this, role_name, grantee_name, ch] {

				        switch (ch) {

				            case membership_change::add:

				                return _qp.process(

				                        sprint(

				                                "INSERT INTO %s (role, member) VALUES (?, ?)",

				                return _qp.execute_internal(

				                        format("INSERT INTO {} (role, member) VALUES (?, ?)",

				                                meta::role_members_table::qualified_name()),

				                        consistency_for_role(role_name),

				                        internal_distributed_timeout_config(),

				                        {sstring(role_name), sstring(grantee_name)}).discard_result();

				            case membership_change::remove:

				                return _qp.process(

				                        sprint(

				                                "DELETE FROM %s WHERE role = ? AND member = ?",

				                return _qp.execute_internal(

				                        format("DELETE FROM {} WHERE role = ? AND member = ?",

				                                meta::role_members_table::qualified_name()),

				                        consistency_for_role(role_name),

				                        internal_distributed_timeout_config(),

				@@ -426,7 +420,7 @@ standard_role_manager::modify_membership(

				}

				future<>

				standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view role_name) const {

				standard_role_manager::grant(std::string_view grantee_name, std::string_view role_name) const {

				    const auto check_redundant = [this, role_name, grantee_name] {

				        return this->query_granted(

				                grantee_name,

				@@ -457,7 +451,7 @@ standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view r

				}

				future<>

				standard_role_manager::revoke(stdx::string_view revokee_name, stdx::string_view role_name) const {

				standard_role_manager::revoke(std::string_view revokee_name, std::string_view role_name) const {

				    return this->exists(role_name).then([this, revokee_name, role_name](bool role_exists) {

				        if (!role_exists) {

				            throw nonexistant_role(sstring(role_name));

				@@ -479,7 +473,7 @@ standard_role_manager::revoke(stdx::string_view revokee_name, stdx::string_view

				static future<> collect_roles(

				        cql3::query_processor& qp,

				        stdx::string_view grantee_name,

				        std::string_view grantee_name,

				        bool recurse,

				        role_set& roles) {

				    return require_record(qp, grantee_name).then([&qp, &roles, recurse](record r) {

				@@ -497,7 +491,7 @@ static future<> collect_roles(

				    });

				}

				future<role_set> standard_role_manager::query_granted(stdx::string_view grantee_name, recursive_role_query m) const {

				future<role_set> standard_role_manager::query_granted(std::string_view grantee_name, recursive_role_query m) const {

				    const bool recurse = (m == recursive_role_query::yes);

				    return do_with(

				@@ -508,15 +502,14 @@ future<role_set> standard_role_manager::query_granted(stdx::string_view grantee_

				}

				future<role_set> standard_role_manager::query_all() const {

				    static const sstring query = sprint(

				            "SELECT %s FROM %s",

				    static const sstring query = format("SELECT {} FROM {}",

				            meta::roles_table::role_col_name,

				            meta::roles_table::qualified_name());

				    // To avoid many copies of a view.

				    static const auto role_col_name_string = sstring(meta::roles_table::role_col_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::QUORUM,

				            internal_distributed_timeout_config()).then([](::shared_ptr<cql3::untyped_result_set> results) {

				@@ -534,19 +527,19 @@ future<role_set> standard_role_manager::query_all() const {

				    });

				}

				future<bool> standard_role_manager::exists(stdx::string_view role_name) const  {

				    return find_record(_qp, role_name).then([](stdx::optional<record> mr) {

				future<bool> standard_role_manager::exists(std::string_view role_name) const  {

				    return find_record(_qp, role_name).then([](std::optional<record> mr) {

				        return static_cast<bool>(mr);

				    });

				}

				future<bool> standard_role_manager::is_superuser(stdx::string_view role_name) const {

				future<bool> standard_role_manager::is_superuser(std::string_view role_name) const {

				    return require_record(_qp, role_name).then([](record r) {

				        return r.is_superuser;

				    });

				}

				future<bool> standard_role_manager::can_login(stdx::string_view role_name) const {

				future<bool> standard_role_manager::can_login(std::string_view role_name) const {

				    return require_record(_qp, role_name).then([](record r) {

				        return r.can_login;

				    });

									
										29

auth/standard_role_manager.hh
									
												View File
												
				@@ -23,14 +23,13 @@

				#include "auth/role_manager.hh"

				#include <experimental/string_view>

				#include <string_view>

				#include <unordered_set>

				#include <seastar/core/abort_source.hh>

				#include <seastar/core/future.hh>

				#include <seastar/core/sstring.hh>

				#include "stdx.hh"

				#include "seastarx.hh"

				namespace cql3 {

				@@ -43,7 +42,7 @@ class migration_manager;

				namespace auth {

				stdx::string_view standard_role_manager_name() noexcept;

				std::string_view standard_role_manager_name() noexcept;

				class standard_role_manager final : public role_manager {

				    cql3::query_processor& _qp;

				@@ -58,7 +57,7 @@ public:

				            , _stopped(make_ready_future<>()) {

				    }

				    virtual stdx::string_view qualified_java_name() const noexcept override;

				    virtual std::string_view qualified_java_name() const noexcept override;

				    virtual const resource_set& protected_resources() const override;

				@@ -66,25 +65,25 @@ public:

				    virtual future<> stop() override;

				    virtual future<> create(stdx::string_view role_name, const role_config&) const override;

				    virtual future<> create(std::string_view role_name, const role_config&) const override;

				    virtual future<> drop(stdx::string_view role_name) const override;

				    virtual future<> drop(std::string_view role_name) const override;

				    virtual future<> alter(stdx::string_view role_name, const role_config_update&) const override;

				    virtual future<> alter(std::string_view role_name, const role_config_update&) const override;

				    virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const override;

				    virtual future<> grant(std::string_view grantee_name, std::string_view role_name) const override;

				    virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const override;

				    virtual future<> revoke(std::string_view revokee_name, std::string_view role_name) const override;

				    virtual future<role_set> query_granted(stdx::string_view grantee_name, recursive_role_query) const override;

				    virtual future<role_set> query_granted(std::string_view grantee_name, recursive_role_query) const override;

				    virtual future<role_set> query_all() const override;

				    virtual future<bool> exists(stdx::string_view role_name) const override;

				    virtual future<bool> exists(std::string_view role_name) const override;

				    virtual future<bool> is_superuser(stdx::string_view role_name) const override;

				    virtual future<bool> is_superuser(std::string_view role_name) const override;

				    virtual future<bool> can_login(stdx::string_view role_name) const override;

				    virtual future<bool> can_login(std::string_view role_name) const override;

				private:

				    enum class membership_change { add, remove };

				@@ -97,9 +96,9 @@ private:

				    future<> create_default_role_if_missing() const;

				    future<> create_or_replace(stdx::string_view role_name, const role_config&) const;

				    future<> create_or_replace(std::string_view role_name, const role_config&) const;

				    future<> modify_membership(stdx::string_view role_name, stdx::string_view grantee_name, membership_change) const;

				    future<> modify_membership(std::string_view role_name, std::string_view grantee_name, membership_change) const;

				};

				}

									
										19

auth/transitional.cc
									
												View File
												
				@@ -45,7 +45,6 @@

				#include "auth/default_authorizer.hh"

				#include "auth/password_authenticator.hh"

				#include "auth/permission.hh"

				#include "db/config.hh"

				#include "utils/class_registrator.hh"

				namespace auth {

				@@ -83,7 +82,7 @@ public:

				        return _authenticator->stop();

				    }

				    virtual const sstring& qualified_java_name() const override {

				    virtual std::string_view qualified_java_name() const override {

				        return transitional_authenticator_name();

				    }

				@@ -118,19 +117,19 @@ public:

				        });

				    }

				    virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override {

				    virtual future<> create(std::string_view role_name, const authentication_options& options) const override {

				        return _authenticator->create(role_name, options);

				    }

				    virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override {

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) const override {

				        return _authenticator->alter(role_name, options);

				    }

				    virtual future<> drop(stdx::string_view role_name) const override {

				    virtual future<> drop(std::string_view role_name) const override {

				        return _authenticator->drop(role_name);

				    }

				    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const override {

				        return _authenticator->query_custom_options(role_name);

				    }

				@@ -202,7 +201,7 @@ public:

				        return _authorizer->stop();

				    }

				    virtual const sstring& qualified_java_name() const override {

				    virtual std::string_view qualified_java_name() const override {

				        return transitional_authorizer_name();

				    }

				@@ -218,11 +217,11 @@ public:

				        return make_ready_future<permission_set>(transitional_permissions);

				    }

				    virtual future<> grant(stdx::string_view s, permission_set ps, const resource& r) const override {

				    virtual future<> grant(std::string_view s, permission_set ps, const resource& r) const override {

				        return _authorizer->grant(s, std::move(ps), r);

				    }

				    virtual future<> revoke(stdx::string_view s, permission_set ps, const resource& r) const override {

				    virtual future<> revoke(std::string_view s, permission_set ps, const resource& r) const override {

				        return _authorizer->revoke(s, std::move(ps), r);

				    }

				@@ -230,7 +229,7 @@ public:

				        return _authorizer->list_all();

				    }

				    virtual future<> revoke_all(stdx::string_view s) const override {

				    virtual future<> revoke_all(std::string_view s) const override {

				        return _authorizer->revoke_all(s);

				    }

									
										4

backlog_controller.hh
									
												View File
												
				@@ -23,7 +23,11 @@

				#include <seastar/core/scheduling.hh>

				#include <seastar/core/timer.hh>

				#include <seastar/core/gate.hh>

				#include <seastar/core/file.hh>

				#include <chrono>

				#include <cmath>

				#include "seastarx.hh"

				// Simple proportional controller to adjust shares for processes for which a backlog can be clearly

				// defined.

									
										72

build_id.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,72 @@

				/*

				 * Copyright (C) 2019 ScyllaDB

				 */

				#include "build_id.hh"

				#include <fmt/printf.h>

				#include <link.h>

				#include <seastar/core/align.hh>

				#include <sstream>

				#include <cassert>

				using namespace seastar;

				static const Elf64_Nhdr* get_nt_build_id(dl_phdr_info* info) {

				    auto base = info->dlpi_addr;

				    const auto* h = info->dlpi_phdr;

				    auto num_headers = info->dlpi_phnum;

				    for (int i = 0; i != num_headers; ++i, ++h) {

				        if (h->p_type != PT_NOTE) {

				            continue;

				        }

				        auto* p = reinterpret_cast<const char*>(base) + h->p_vaddr;

				        auto* e = p + h->p_memsz;

				        while (p != e) {

				            const auto* n = reinterpret_cast<const Elf64_Nhdr*>(p);

				            if (n->n_type == NT_GNU_BUILD_ID) {

				                return n;

				            }

				            p += sizeof(Elf64_Nhdr);

				            p += n->n_namesz;

				            p = align_up(p, 4);

				            p += n->n_descsz;

				            p = align_up(p, 4);

				        }

				    }

				    assert(0 && "no NT_GNU_BUILD_ID note");

				}

				static int callback(dl_phdr_info* info, size_t size, void* data) {

				    std::string& ret = *(std::string*)data;

				    std::ostringstream os;

				    // The first DSO is always the main program, which has an empty name.

				    assert(strlen(info->dlpi_name) == 0);

				    auto* n = get_nt_build_id(info);

				    auto* p = reinterpret_cast<const char*>(n);

				    p += sizeof(Elf64_Nhdr);

				    p += n->n_namesz;

				    p = align_up(p, 4);

				    const char* desc = p;

				    for (unsigned i = 0; i < n->n_descsz; ++i) {

				        fmt::fprintf(os, "%02x", (unsigned char)*(desc + i));

				    }

				    ret = os.str();

				    return 1;

				}

				std::string get_build_id() {

				    std::string ret;

				    int r = dl_iterate_phdr(callback, &ret);

				    assert(r == 1);

				    return ret;

				}

Compare commits

5208 Commits next-3.0 ... next-4.0

4 .dockerignore Normal file Unescape Escape View File

4 .github/PULL_REQUEST_TEMPLATE.md vendored Unescape Escape View File

5 .gitignore vendored Unescape Escape View File

6 .gitmodules vendored Unescape Escape View File

33 CMakeLists.txt Unescape Escape View File

2 CONTRIBUTING.md Unescape Escape View File

97 HACKING.md Unescape Escape View File

31 MAINTAINERS Unescape Escape View File

29 README-DPDK.md Unescape Escape View File

66 README.md Unescape Escape View File

12 SCYLLA-VERSION-GEN Unescape Escape View File

1 abseil Submodule

147 alternator/auth.cc Normal file Unescape Escape View File

46 alternator/auth.hh Normal file Unescape Escape View File

111 alternator/base64.cc Normal file Unescape Escape View File

34 alternator/base64.hh Normal file Unescape Escape View File

682 alternator/conditions.cc Normal file Unescape Escape View File

49 alternator/conditions.hh Normal file Unescape Escape View File

50 alternator/error.hh Normal file Unescape Escape View File

3656 alternator/executor.cc Normal file View File

82 alternator/executor.hh Normal file Unescape Escape View File

127 alternator/expressions.cc Normal file Unescape Escape View File

265 alternator/expressions.g Normal file Unescape Escape View File

41 alternator/expressions.hh Normal file Unescape Escape View File

78 alternator/expressions_eval.hh Normal file Unescape Escape View File

228 alternator/expressions_types.hh Normal file Unescape Escape View File

300 alternator/rjson.cc Normal file Unescape Escape View File

177 alternator/rjson.hh Normal file Unescape Escape View File

124 alternator/rmw_operation.hh Normal file Unescape Escape View File

268 alternator/serialization.cc Normal file Unescape Escape View File

72 alternator/serialization.hh Normal file Unescape Escape View File

483 alternator/server.cc Normal file Unescape Escape View File

83 alternator/server.hh Normal file Unescape Escape View File

104 alternator/stats.cc Normal file Unescape Escape View File

98 alternator/stats.hh Normal file Unescape Escape View File

53 alternator/tags_extension.hh Normal file Unescape Escape View File

30 api/api-doc/cache_service.json Unescape Escape View File

156 api/api-doc/column_family.json Unescape Escape View File

41 api/api-doc/compaction_manager.json Unescape Escape View File

90 api/api-doc/error_injection.json Normal file Unescape Escape View File

12 api/api-doc/failure_detector.json Unescape Escape View File

4 api/api-doc/gossiper.json Unescape Escape View File

4 api/api-doc/hinted_handoff.json Unescape Escape View File

2 api/api-doc/messaging_service.json Unescape Escape View File

109 api/api-doc/storage_proxy.json Unescape Escape View File

185 api/api-doc/storage_service.json Unescape Escape View File

16 api/api-doc/stream_manager.json Unescape Escape View File

15 api/api-doc/system.json Unescape Escape View File

26 api/api.cc Unescape Escape View File

44 api/api.hh Unescape Escape View File

17 api/api_init.hh Unescape Escape View File

6 api/collectd.cc Unescape Escape View File

230 api/column_family.cc Unescape Escape View File

51 api/column_family.hh Unescape Escape View File

15 api/commitlog.cc Unescape Escape View File

95 api/compaction_manager.cc Unescape Escape View File

27 api/config.cc Unescape Escape View File

66 api/error_injection.cc Normal file Unescape Escape View File

13 stdx.hh → api/error_injection.hh Unescape Escape View File

2 api/lsa.cc Unescape Escape View File

6 api/messaging_service.cc Unescape Escape View File

271 api/storage_proxy.cc Unescape Escape View File

461 api/storage_service.cc Unescape Escape View File

1 api/storage_service.hh Unescape Escape View File

6 api/system.cc Unescape Escape View File

115 atomic_cell.cc Unescape Escape View File

38 atomic_cell.hh Unescape Escape View File

15 atomic_cell_hash.hh Unescape Escape View File

15 atomic_cell_or_collection.hh Unescape Escape View File

10 auth/allow_all_authenticator.hh Unescape Escape View File

9 auth/allow_all_authorizer.hh Unescape Escape View File

2 auth/authenticated_user.cc Unescape Escape View File

5 auth/authenticated_user.hh Unescape Escape View File

2 auth/authentication_options.hh Unescape Escape View File

1 auth/authenticator.cc Unescape Escape View File

28 auth/authenticator.hh Unescape Escape View File

11 auth/authorizer.hh Unescape Escape View File

34 auth/common.cc Unescape Escape View File

5208 Commits

next-3.0 ... next-4.0

4

.dockerignore Normal file

View File

4

.github/PULL_REQUEST_TEMPLATE.md vendored

View File

5

.gitignore vendored

View File

6

.gitmodules vendored

View File

33

CMakeLists.txt

View File

2

CONTRIBUTING.md

View File

97

HACKING.md

View File

31

MAINTAINERS

View File

29

README-DPDK.md

View File

66

README.md

View File

12

SCYLLA-VERSION-GEN

View File

1

abseil Submodule

147

alternator/auth.cc Normal file

View File

46

alternator/auth.hh Normal file

View File

111

alternator/base64.cc Normal file

View File

34

alternator/base64.hh Normal file

View File

682

alternator/conditions.cc Normal file

View File

49

alternator/conditions.hh Normal file

View File

50

alternator/error.hh Normal file

View File

3656

alternator/executor.cc Normal file

View File

82

alternator/executor.hh Normal file

View File

127

alternator/expressions.cc Normal file

View File

265

alternator/expressions.g Normal file

View File

41

alternator/expressions.hh Normal file

View File

78

alternator/expressions_eval.hh Normal file

View File

228

alternator/expressions_types.hh Normal file

View File

300

alternator/rjson.cc Normal file

View File

177

alternator/rjson.hh Normal file

View File

124

alternator/rmw_operation.hh Normal file

View File

268

alternator/serialization.cc Normal file

View File

72

alternator/serialization.hh Normal file

View File

483

alternator/server.cc Normal file

View File

83

alternator/server.hh Normal file

View File

104

alternator/stats.cc Normal file

View File

98

alternator/stats.hh Normal file

View File

53

alternator/tags_extension.hh Normal file

View File

30

api/api-doc/cache_service.json

View File

156

api/api-doc/column_family.json

View File

41

api/api-doc/compaction_manager.json

View File

90

api/api-doc/error_injection.json Normal file

View File

12

api/api-doc/failure_detector.json

View File

4

api/api-doc/gossiper.json

View File

4

api/api-doc/hinted_handoff.json

View File

2

api/api-doc/messaging_service.json

View File

109

api/api-doc/storage_proxy.json

View File

185

api/api-doc/storage_service.json

View File

16

api/api-doc/stream_manager.json

View File

15

api/api-doc/system.json

View File

26

api/api.cc

View File

44

api/api.hh

View File

17

api/api_init.hh

View File

6

api/collectd.cc

View File

230

api/column_family.cc

View File

51

api/column_family.hh

View File

15

api/commitlog.cc

View File

95

api/compaction_manager.cc

View File

27

api/config.cc

View File

66

api/error_injection.cc Normal file

View File

13

stdx.hh → api/error_injection.hh

View File

2

api/lsa.cc

View File

6

api/messaging_service.cc

View File

271

api/storage_proxy.cc

View File

461

api/storage_service.cc

View File

1

api/storage_service.hh

View File

6

api/system.cc

View File

115

atomic_cell.cc

View File

38

atomic_cell.hh

View File

15

atomic_cell_hash.hh

View File

15

atomic_cell_or_collection.hh

View File

10

auth/allow_all_authenticator.hh

View File

9

auth/allow_all_authorizer.hh

View File

2

auth/authenticated_user.cc

View File

5

auth/authenticated_user.hh

View File

2

auth/authentication_options.hh

View File

1

auth/authenticator.cc

View File

28

auth/authenticator.hh

View File

11

auth/authorizer.hh

View File

34

auth/common.cc

View File

8

auth/common.hh

View File