scylladb

Author	SHA1	Message	Date
Konstantin Osipov	fd293768e7	storage_proxy: do not touch all_replicas.front() if it's empty. The list of all endpoints for a query can be empty if we have replication_factor 0 or there are no live endpoints for this token. Do not access all_replicas.front() in this case. Fixes #5935. Message-Id: <20200306192521.73486-2-kostja@scylladb.com> (cherry picked from commit `9827efe554`)	2020-06-22 18:29:15 +03:00
Gleb Natapov	22dfa48585	cql transport: do not log broken pipe error when a client closes its side of a connection abruptly Fixes #5661 Message-Id: <20200615075958.GL335449@scylladb.com> (cherry picked from commit `7ca937778d`)	2020-06-21 13:09:22 +03:00
Benny Halevy	2f3d7f1408	cql3::util::maybe_quote: avoid stack overflow and fix quote doubling The function was reimplemented to solve the following issues. The cutom implementation also improved its performance in close to 19% Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings as found with the limits_test.py:TestLimits.max_key_length_test dtest. std::regex_replace does not replace in-place so no doubling of quotes was actually done. Add unit test that reproduces the crash without this fix and tests various string patterns for correctness. Note that defining the regex with std::regex::optimize still ended up with stack overflow. Fixes #5671 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `0329fe1fd1`)	2020-06-21 13:07:21 +03:00
Gleb Natapov	76a08df939	commitlog: fix size of a write used to zero a segment Due to a bug the entire segment is written in one huge write of 32Mb. The idea was to split it to writes of 128K, so fix it. Fixes #5857 Message-Id: <20200220102939.30769-1-gleb@scylladb.com> (cherry picked from commit `df2f67626b`)	2020-06-21 13:03:05 +03:00
Amnon Heiman	6aa129d3b0	api/storage_service.cc: stream result of token_range The get token range API can become big which can cause large allocation and stalls. This patch replace the implementation so it would stream the results using the http stream capabilities instead of serialization and sending one big buffer. Fixes #6297 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `7c4562d532`)	2020-06-21 12:57:48 +03:00
Takuya ASADA	b4f781e4eb	scylla_post_install.sh: fix operator precedence issue with multiple statements In bash, 'A \|\| B && C' will be problem because when A is true, then it will be evaluates C, since && and \|\| have the same precedence. To avoid the issue we need make B && C in one statement. Fixes #5764 (cherry picked from commit `b6988112b4`)	2020-06-21 12:47:05 +03:00
Takuya ASADA	27594ca50e	scylla_raid_setup: create missing directories We need to create hints, view_hints, saved_caches directories on RAID volume. Fixes #5811 (cherry picked from commit `086f0ffd5a`)	2020-06-21 12:45:27 +03:00
Rafael Ávila de Espíndola	0f2f0d65d7	configure: Reduce the dynamic linker path size gdb has a SO_NAME_MAX_PATH_SIZE of 512, so we use that as the path size. Fixes: #6494 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200528202741.398695-2-espindola@scylladb.com> (cherry picked from commit `aa778ec152`)	2020-06-21 12:29:16 +03:00
Tomasz Grabiec	31c2f8a3ae	row_cache: Fix undefined behavior on key linearization This is relevant only when using partition or clustering keys which have a representation in memory which is larger than 12.8 KB (10% of LSA segment size). There are several places in code (cache, background garbage collection) which may need to linearize keys because of performing key comparison, but it's not done safely: 1) the code does not run with the LSA region locked, so pointers may get invalidated on linearization if it needs to reclaim memory. This is fixed by running the code inside an allocating section. 2) LSA region is locked, but the scope of with_linearized_managed_bytes() encloses the allocating section. If allocating section needs to reclaim, linearization context will contain invalidated pointers. The fix is to reorder the scopes so that linearization context lives within an allocating section. Example of 1 can be found in range_populating_reader::handle_end_of_stream() where it performs a lookup: auto prev = std::prev(it); if (prev->key().equal(_cache._schema, _last_key->_key)) { it->set_continuous(true); but handle_end_of_stream() is not invoked under allocating section. Example of 2 can be found in mutation_cleaner_impl::merge_some() where it does: return with_linearized_managed_bytes([&] { ... return _worker_state->alloc_section(region, [&] { Fixes #6637. Refs #6108. Tests: - unit (all) Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `e81fc1f095`)	2020-06-21 11:58:59 +03:00
Yaron Kaikov	ec12331f11	release: prepare for 3.3.4	2020-06-15 21:19:02 +03:00
Avi Kivity	ccc463b5e5	tools: toolchain: regenerate for gnutls 3.6.14 CVE-2020-13777. Fixes #6627. Toolchain source image registry disambiguated due to tighter podman defaults.	2020-06-15 08:05:58 +03:00
Calle Wilund	4a9676f6b7	gms::inet_address: Fix sign extension error in custom address formatting Fixes #5808 Seems some gcc:s will generate the code as sign extending. Mine does not, but this should be more correct anyhow. Added small stringify test to serialization_test for inet_address (cherry picked from commit `a14a28cdf4`)	2020-06-09 20:16:50 +03:00
Takuya ASADA	aaf4989c31	aws: update enhanced networking supported instance list Sync enhanced networking supported instance list to latest one. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Fixes #6540 (cherry picked from commit `969c4258cf`)	2020-06-09 16:03:00 +03:00
Asias He	b29f954f20	gossip: Make is_safe_for_bootstrap more strict Consider 1. Start n1, n2 in the cluster 2. Stop n2 and delete all data for n2 3. Start n2 to replace itself with replace_address_first_boot: n2 4. Kill n2 before n2 finishes the replace operation 5. Remove replace_address_first_boot: n2 from scylla.yaml of n2 6. Delete all data for n2 7. Start n2 At step 7, n2 will be allowed to bootstrap as a new node, because the application state of n2 in the cluster is HIBERNATE which is not rejected in the check of is_safe_for_bootstrap. As a result, n2 will replace n2 with a different tokens and a different host_id, as if the old n2 node was removed from the cluster silently. Fixes #5172 (cherry picked from commit `cdcedf5eb9`)	2020-05-25 14:30:53 +03:00
Eliran Sinvani	5546d5df7b	Auth: return correct error code when role is not found Scylla returns the wrong error code (0000 - server internal error) in response to trying to do authentication/authorization operations that involves a non-existing role. This commit changes those cases to return error code 2200 (invalid query) which is the correct one and also the one that Cassandra returns. Tests: Unit tests (Dev) All auth and auth_role dtests (cherry picked from commit ce8cebe34801f0ef0e327a32f37442b513ffc214) Fixes #6363.	2020-05-25 12:58:38 +03:00
Amnon Heiman	541c29677f	storage_service: get_range_to_address_map prevent use after free The implementation of get_range_to_address_map has a default behaviour, when getting an empty keypsace, it uses the first non-system keyspace (first here is basically, just a keyspace). The current implementation has two issues, first, it uses a reference to a string that is held on a stack of another function. In other word, there's a use after free that is not clear why we never hit. The second, it calls get_non_system_keyspaces twice. Though this is not a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling that function does have a cost). This patch solves both issues, by chaning the implementation to hold a string instead of a reference to a string. Second, it stores the results from get_non_system_keyspaces and reuse them it's more efficient and holds the returned values on the local stack. Fixes #6465 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `69a46d4179`)	2020-05-25 12:48:48 +03:00
Hagit Segev	06f18108c0	release: prepare for 3.3.3	2020-05-24 23:28:07 +03:00
Tomasz Grabiec	90002ca3d2	sstables: index_reader: Fix overflow when calculating promoted index end When index file is larger than 4GB, offset calculation will overflow uint32_t and _promoted_index_end will be too small. As a result, promoted_index_size calculation will underflow and the rest of the page will be interpretd as a promoted index. The partitions which are in the remainder of the index page will not be found by single-partition queries. Data is not lost. Introduced in `6c5f8e0eda`. Fixes #6040 Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com> (cherry picked from commit `a6c87a7b9e`)	2020-05-24 09:46:11 +03:00
Rafael Ávila de Espíndola	da23902311	repair: Make sure sinks are always closed In a recent next failure I got the following backtrace function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101 at ./seastar/include/seastar/core/shared_ptr.hh:463 at repair/row_level.cc:2059 This patch changes a few functions to use finally to make sure the sink is always closed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200515202803.60020-1-espindola@scylladb.com> (cherry picked from commit `311fbe2f0a`) Ref #6414	2020-05-20 09:00:57 +03:00
Asias He	2b0dc21f97	repair: Fix race between write_end_of_stream and apply_rows Consider: n1, n2, n1 is the repair master, n2 is the repair follower. === Case 1 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after row r1 is written. data: partition_start, r1 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream() data: partition_start, r1, partition_end 5) Step 2 resumes to apply the rows. data: partition_start, r1, partition_end, partition_end, partition_start, r2 === Case 2 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after partition_start for r2 is written but before _partition_opened is set to true. data: partition_start, r1, partition_end, partition_start 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream(). Since _partition_opened[node_idx] is false, partition_end is skipped, end_of_stream is written. data: partition_start, r1, partition_end, partition_start, end_of_stream This causes unbalanced partition_start and partition_end in the stream written to sstables. To fix, serialize the write_end_of_stream and apply_rows with a semaphore. Fixes: #6394 Fixes: #6296 Fixes: #6414 (cherry picked from commit `b2c4d9fdbc`)	2020-05-20 08:22:05 +03:00
Piotr Dulikowski	b544691493	hinted handoff: don't keep positions of old hints in rps_set When sending hints from one file, rps_set field in send_one_file_ctx keeps track of commitlog positions of hints that are being currently sent, or have failed to be sent. At the end of the operation, if sending of some hints failed, we will choose position of the earliest hint that failed to be sent, and will retry sending that file later, starting from that position. This position is stored in _last_not_complete_rp. Usually, this set has a bounded size, because we impose a limit of at most 128 hints being sent concurrently. Because we do not attempt to send any more hints after a failure is detected, rps_set should not have more than 128 elements at a time. Due to a bug, commitlog positions of old hints (older than gc_grace_seconds of the destination table) were inserted into rps_set but not removed after checking their age. This could cause rps_set to grow very large when replaying a file with old hints. Moreover, if the file mixed expired and non-expired hints (which could happen if it had hints to two tables with different gc_grace_seconds), and sending of some non-expired hints failed, then positions of expired hints could influence calculation _last_not_complete_rp, and more hints than necessary would be resent on the next retry. This simple patch removes commitlog position of a hint from rps_set when it is detected to be too old. Fixes #6422 (cherry picked from commit `85d5c3d5ee`)	2020-05-20 08:06:17 +03:00
Piotr Dulikowski	d420b06844	hinted handoff: remove discarded hint positions from rps_set Related commit: `85d5c3d` When attempting to send a hint, an exception might occur that results in that hint being discarded (e.g. keyspace or table of the hint was removed). When such an exception is thrown, position of the hint will already be stored in rps_set. We are only allowed to retain positions of hints that failed to be sent and needed to be retried later. Dropping a hint is not an error, therefore its position should be removed from rps_set - but current logic does not do that. Because of that bug, hint files with many discardable hints might cause rps_set to grow large when the file is replayed. Furthermore, leaving positions of such hints in rps_set might cause more hints than necessary to be re-sent if some non-discarded hints fail to be sent. This commit fixes the problem by removing positions of discarded hints from rps_set. Fixes #6433 (cherry picked from commit `0c5ac0da98`)	2020-05-20 08:04:10 +03:00
Avi Kivity	b3a2cb2f68	Update seastar submodule * seastar 0ebd89a858...30f03aeba9 (1): > timer: add scheduling_group awareness Fixes #6170.	2020-05-10 18:39:20 +03:00
Hagit Segev	c8c057f5f8	release: prepare for 3.3.2	2020-05-10 18:16:28 +03:00
Gleb Natapov	038bfc925c	storage_proxy: limit read repair only to replicas that answered during speculative reads Speculative reader has more targets that needed for CL. In case there is a digest mismatch the repair runs between all of them, but that violates provided CL. The patch makes it so that repair runs only between replicas that answered (there will be CL of them). Fixes #6123 Reviewed-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200402132245.GA21956@scylladb.com> (cherry picked from commit `36a24bbb70`)	2020-05-07 19:48:37 +03:00
Mike Goltsov	13a4e7db83	fix error in fstrim service (scylla_util.py) On Centos 7 machine: fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation When trying run scylla-fstrim service manually you get error: Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module> main() File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main cfg = parse_scylla_dirs_with_default(conf=args.config) File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default if key not in y or not y[k]: NameError: name 'k' is not defined It caused by error in scylla_util.py Fixes #6294. (cherry picked from commit `068bb3a5bf`)	2020-05-07 19:45:50 +03:00
Juliusz Stasiewicz	727d6cf8f3	atomic_cell: special rule for printing counter cells Until now, attempts to print counter update cell would end up calling abort() because `atomic_cell_view::value()` has no specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`, i.e. counter update IMR type. Such visitor is not easy to write if we want to intercept counters only (and not all int64_t values). Anyway, linearized byte representation of counter cell would not be helpful without knowing if it consists of counter shards or counter update (delta) - and this must be known upon `deserialize`. This commit introduces simple approach: it determines cell type on high level (from `atomic_cell_view`) and prints counter contents by `counter_cell_view` or `atomic_cell_view::counter_update_value()`. Fixes #5616 (cherry picked from commit `0ea17216fe`)	2020-05-07 19:40:47 +03:00
Tomasz Grabiec	6d6d7b4abe	sstables: Release reserved space for sharding metadata The intention of the code was to clear sharding metadata chunked_vector so that it doesn't bloat memory. The type of c is `chunked_vector*`. Assigning `{}` clears the pointer while the intended behavior was to reset the `chunked_vector` instance. The original instance is left unmodified with all its reserved space. Because of this, the previous fix had no effect because token ranges are stored entirely inline and popping them doesn't realease memory. Fixes #4951 Tests: - sstable_mutation_test (dev) - manual using scylla binary on customer data on top of 2019.1.5 Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1584559892-27653-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `5fe626a887`)	2020-05-07 19:06:22 +03:00
Tomasz Grabiec	28f974b810	Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael Row cache needs to be invalidated whenever data in sstables changes. Cleanup removes data from sstables which doesn't belong to the node anymore, which means cache must be invalidated on cleanup. Currently, stale data can be returned when a node re-owns ranges which data are still stored in the node's row cache, because cleanup didn't invalidate the cache." Fixes #4446. tests: - unit tests (dev mode) - dtests: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test cleanup_test.py (cherry picked from commit `d0b6be0820`)	2020-05-07 16:24:51 +03:00
Piotr Sarna	5fdadcaf3b	network_topology_strategy: validate integers In order to prevent users from creating a network topology strategy instance with invalid inputs, it's not enough to use std::stol() on the input: a string "3abc" still returns the number '3', but will later confuse cqlsh and other drivers, when they ask for topology strategy details. The error message is now more human readable, since for incorrect numeric inputs it used to return a rather cryptic message: ServerError: stol() This commit fixes the issue and comes with a simple test. Fixes #3801 Tests: unit(dev) Message-Id: <7aaae83d003738f047d28727430ca0a5cec6b9c6.1583478000.git.sarna@scylladb.com> (cherry picked from commit `5b7a35e02b`)	2020-05-07 16:24:49 +03:00
Pekka Enberg	a960394f27	scripts/jobs: Keep memory reserve when calculating parallelism The "jobs" script is used to determine the amount of compilation parallelism on a machine. It attempts to ensure each GCC process has at least 4 GB of memory per core. However, in the worst case scenario, we could end up having the GCC processes take up all the system memory, forcin swapping or OOM killer to kick in. For example, on a 4 core machine with 16 GB of memory, this worst case scenario seems easy to trigger in practice. Fix up the problem by keeping a 1 GB of memory reserve for other processes and calculating parallelism based on that. Message-Id: <20200423082753.31162-1-penberg@scylladb.com> (cherry picked from commit `7304a795e5`)	2020-05-04 19:01:54 +03:00
Piotr Sarna	3216a1a70a	alternator: fix signature timestamps Generating timestamps for auth signatures used a non-thread-safe ::gmtime function instead of thread-safe ::gmtime_r. Tests: unit(dev) Fixes #6345 (cherry picked from commit `fb7fa7f442`)	2020-05-04 17:08:13 +03:00
Avi Kivity	5a7fd41618	Merge 'Fix hang in multishard_writer' from Asias " This series fix hang in multishard_writer when error happens. It contains - multishard_writer: Abort the queue attached to consumers when producer fails - repair: Fix hang when the writer is dead Fixes #6241 Refs: #6248 " * asias-stream_fix_multishard_writer_hang: repair: Fix hang when the writer is dead mutation_writer_test: Add test_multishard_writer_producer_aborts multishard_writer: Abort the queue attached to consumers when producer fails (cherry picked from commit `8925e00e96`)	2020-05-01 20:13:00 +03:00
Raphael S. Carvalho	dd24ba7a62	api/service: fix segfault when taking a snapshot without keyspace specified If no keyspace is specified when taking snapshot, there will be a segfault because keynames is unconditionally dereferenced. Let's return an error because a keyspace must be specified when column families are specified. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com> (cherry picked from commit `02e046608f`) Fixes #6336.	2020-04-30 12:57:14 +03:00
Avi Kivity	204f6dd393	Update seastar submodule * seastar a0bdc6cd85...0ebd89a858 (1): > http server: fix "Date" header format Fixes #6253.	2020-04-26 19:31:44 +03:00
Nadav Har'El	b1278adc15	alternator: unzero "scylla_alternator_total_operations" metric In commit `388b492040`, which was only supposed to move around code, we accidentally lost the line which does _executor.local()._stats.total_operations++; So after this commit this counter was always zero... This patch returns the line incrementing this counter. Arguably, this counter is not very important - a user can also calculate this number by summing up all the counters in the scylla_alternator_operation array (these are counters for individual types of operations). Nevertheless, as long as we do export a "scylla_alternator_total_operations" metric, we need to correctly calculate it and can't leave it zero :-) Fixes #5836 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219162820.14205-1-nyh@scylladb.com> (cherry picked from commit `b8aed18a24`)	2020-04-19 19:07:31 +03:00
Botond Dénes	ee9677ef71	schema: schema(): use std::stable_sort() to sort key columns When multiple key columns (clustering or partition) are passed to the schema constructor, all having the same column id, the expectation is that these columns will retain the order in which they were passed to `schema_builder::with_column()`. Currently however this is not guaranteed as the schema constructor sort key columns by column id with `std::sort()`, which doesn't guarantee that equally comparing elements retain their order. This can be an issue for indexes, the schemas of which are built independently on each node. If there is any room for variance between for the key column order, this can result in different nodes having incompatible schemas for the same index. The fix is to use `std::stable_sort()` which guarantees that the order of equally comparing elements won't change. This is a suspected cause of #5856, although we don't have hard proof. Fixes: #5856 Signed-off-by: Botond Dénes <bdenes@scylladb.com> [avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes unstable at 17 elements, and the failing schema had a clustering key with 23 elements] Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com> (cherry picked from commit `a4aa753f0f`)	2020-04-19 18:19:05 +03:00
Nadav Har'El	2060e361cf	materialized views: fix corner case of view updates used by Alternator While CQL does not allow creation of a materialized view with more than one base regular column in the view's key, in Alternator we do allow this - both partition and clustering key may be a base regular column. We had a bug in the logic handling this case: If the new base row is missing a value for one of the view key columns, we shouldn't create a view row. Similarly, if the existing base row was missing a value for one of the view key columns, a view row does not exist and doesn't need to be deleted. This was done incorrectly, and made decisions based on just one of the key columns, and the logic is now fixed (and I think, simplified) in this patch. With this patch, the Alternator test which previously failed because of this problem now passes. The patch also includes new tests in the existing C++ unit test test_view_with_two_regular_base_columns_in_key. This tests was already supposed to be testing various cases of two-new-key-columns updates, but missed the cases explained above. These new tests failed badly before this patch - some of them had clean write errors, others caused crashes. With this patch, they pass. Fixes #6008. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200312162503.8944-1-nyh@scylladb.com> (cherry picked from commit `635e6d887c`)	2020-04-19 15:24:19 +03:00
Hagit Segev	6f939ffe19	release: prepare for 3.3.1	2020-04-18 00:23:31 +03:00
Kamil Braun	69105bde8a	sstables: freeze types nested in collection types in legacy sstables Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect serialization headers, which don't wrap frozen UDTs nested inside collections with the FrozenType<...> tag. When reading such SSTable, Scylla would detect a mismatch between the schema saved in schema tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema from the serialization header (which doesn't have these tags). SSTables created in Scylla versions 3.1 and above, in particular in Scylla versions that contain this commit, create correct serialization headers (which wrap UDTs in the FrozenType<...> tag). This commit does two things: 1. for all SSTables created after this commit, include a new feature flag, CorrectUDTsInCollections, presence of which implies that frozen UDTs inside collections have the FrozenType<...> tag. 2. when reading a Scylla SSTable without the feature flag, we assume that UDTs nested inside collections are always frozen, even if they don't have the tag. This assumption is safe to be made, because at the time of this commit, Scylla does not allow non-frozen (multi-cell) types inside collections or UDTs, and because of point 1 above. There is one edge case not covered: if we don't know whether the SSTable comes from Scylla or from C*. In that case we won't make the assumption described in 2. Therefore, if we get a mismatch between schema and serialization headers of a table which we couldn't confirm to come from Scylla, we will still reject the table. If any user encounters such an issue (unlikely), we will have to use another solution, e.g. using a separate tool to rewrite the SSTable. Fixes #6130. (cherry picked from commit `3d811e2f95`)	2020-04-17 09:12:28 +03:00
Kamil Braun	e09e9a5929	sstables: move definition of column_translation::state::build to a .cc file Ref #6130	2020-04-17 09:12:28 +03:00
Piotr Sarna	2308bdbccb	alternator: use partition tombstone if there's no clustering key As @tgrabiec helpfully pointed out, creating a row tombstone for a table which does not have a clustering key in its schema creates something that looks like an open-ended range tombstone. That's problematic for KA/LA sstable formats, which are incapable of writing such tombstones, so a workaround is provided in order to allow using KA/LA in alternator. Fixes #6035 Cherry-picked from `0a2d7addc0`	2020-04-16 12:14:10 +02:00
Asias He	a2d39c9a2e	gossip: Add an option to force gossip generation Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation number g1, g2, g3. n1, n2, n3 running scylla version with commit `0a52ecb6df` (gossip: Fix max generation drift measure) One year later, user wants the upgrade n1,n2,n3 to a new version when n3 does a rolling restart with a new version, n3 will use a generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's gossip update and mark g3 as down. Such unnecessary marking of node down can cause availability issues. For example: DC1: n1, n2 DC2: n3, n4 When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which causes the whole DC2 to be unavailable. To fix, we can start the node with a gossip generation within MAX_GENERATION_DIFFERENCE difference for the new node. Once all the nodes run the version with commit `0a52ecb6df`, the option is no logger needed. Fixes #5164 (cherry picked from commit `743b529c2b`)	2020-03-27 12:49:23 +01:00
Asias He	5fe2ce3bbe	gossiper: Always use the new generation number User reported an issue that after a node restart, the restarted node is marked as DOWN by other nodes in the cluster while the node is up and running normally. Consier the following: - n1, n2, n3 in the cluster - n3 shutdown itself - n3 send shutdown verb to n1 and n2 - n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to INT_MAX - n3 restarts - n3 sends gossip shadow rounds to n1 and n2, in storage_service::prepare_to_join, - n3 receives response from n1, in gossiper::handle_ack_msg, since _enabled = false and _in_shadow_round == false, n3 will apply the application state in fiber1, filber 1 finishes faster filber 2, it sets _in_shadow_round = false - n3 receives response from n2, in gossiper::handle_ack_msg, since _enabled = false and _in_shadow_round == false, n3 will apply the application state in fiber2, filber 2 yields - n3 finishes the shadow round and continues - n3 resets gossip endpoint_state_map with gossiper.reset_endpoint_state_map() - n3 resumes fiber 2, apply application state about n3 into endpoint_state_map, at this point endpoint_state_map contains information including n3 itself from n2. - n3 calls gossiper.start_gossiping(generation_number, app_states, ...) with new generation number generated correctly in storage_service::prepare_to_join, but in maybe_initialize_local_state(generation_nbr), it will not set new generation and heartbeat if the endpoint_state_map contains itself - n3 continues with the old generation and heartbeat learned in fiber 2 - n3 continues the gossip loop, in gossiper::run, hbs.update_heart_beat() the heartbeat is set to the number starting from 0. - n1 and n2 will not get update from n3 because they use the same generation number but n1 and n2 has larger heartbeat version - n1 and n2 will mark n3 as down even if n3 is alive. To fix, always use the the new generation number. Fixes: #5800 Backports: 3.0 3.1 3.2 (cherry picked from commit `62774ff882`)	2020-03-27 12:49:20 +01:00
Piotr Sarna	aafa34bbad	cql: fix qualifying indexed columns for filtering When qualifying columns to be fetched for filtering, we also check if the target column is not used as an index - in which case there's no need of fetching it. However, the check was incorrectly assuming that any restriction is eligible for indexing, while it's currently only true for EQ. The fix makes a more specific check and contains many dynamic casts, but these will hopefully we gone once our long planned "restrictions rewrite" is done. This commit comes with a test. Fixes #5708 Tests: unit(dev) (cherry picked from commit `767ff59418`)	2020-03-22 09:00:51 +01:00
Hagit Segev	7ae2cdf46c	release: prepare for 3.3.0	2020-03-19 21:46:44 +02:00
Hagit Segev	863f88c067	release: prepare for 3.3.rc3	2020-03-15 22:45:30 +02:00
Avi Kivity	90b4e9e595	Update seastar submodule * seastar f54084c08f...a0bdc6cd85 (1): > tls: Fix race and stale memory use in delayed shutdown Fixes #5759 (maybe)	2020-03-12 19:41:50 +02:00
Konstantin Osipov	434ad4548f	locator: correctly select endpoints if RF=0 SimpleStrategy creates a list of endpoints by iterating over the set of all configured endpoints for the given token, until we reach keyspace replication factor. There is a trivial coding bug when we first add at least one endpoint to the list, and then compare list size and replication factor. If RF=0 this never yields true. Fix by moving the RF check before at least one endpoint is added to the list. Cassandra never had this bug since it uses a less fancy while() loop. Fixes #5962 Message-Id: <20200306193729.130266-1-kostja@scylladb.com> (cherry picked from commit `ac6f64a885`)	2020-03-12 12:09:46 +02:00
Avi Kivity	cbbb15af5c	logalloc: increase capacity of _regions vector outside reclaim lock Reclaim consults the _regions vector, so we don't want it moving around while allocating more capacity. For that we take the reclaim lock. However, that can cause a false-positive OOM during startup: 1. all memory is allocated to LSA as part of priming (`2baa16b371`) 2. the _regions vector is resized from 64k to 128k, requiring a segment to be freed (plenty are free) 3. but reclaiming_lock is taken, so we cannot reclaim anything. To fix, resize the _regions vector outside the lock. Fixes #6003. Message-Id: <20200311091217.1112081-1-avi@scylladb.com> (cherry picked from commit `c020b4e5e2`)	2020-03-12 11:25:20 +02:00
Benny Halevy	3231580c05	dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with the binary's build-id when stripping its debug info as it is passed the `--build-id-seed <version>.<release>` option. To prevent that we need to set the following macros as follows: unset `_unique_build_ids` set `_no_recompute_build_ids` to 1 Fixes #5881 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `25a763a187`)	2020-03-09 15:21:50 +02:00
Piotr Sarna	62364d9dcd	Merge 'cql3: do_execute_base_query: fix null deref ... ... when clustering key is unavailable' from Benny This series fixes null pointer dereference seen in #5794 `efd7efe` cql3: generate_base_key_from_index_pk; support optional index_ck `7af1f9e` cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable `7fe1a9e` cql3: do_execute_base_query: fixup indentation Fixes #5794 Branches: 3.3 Test: unit(dev) secondary_indexes_test:TestSecondaryIndexes.test_truncate_base(debug) * bhalevy/fix-5794-generate_base_key_from_index_pk: cql3: do_execute_base_query: fixup indentation cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable cql3: generate_base_key_from_index_pk; support optional index_ck (cherry picked from commit `4e95b67501`)	2020-03-09 15:20:01 +02:00
Takuya ASADA	3bed8063f6	dist/debian: fix "unable to open node-exporter.service.dpkg-new" error It seems like .service is conflicting on install time because the file installed twice, both debian/.service and debian/scylla-server.install. We don't need to use *.install, so we can just drop the line. Fixes #5640 (cherry picked from commit `29285b28e2`)	2020-03-03 12:40:39 +02:00
Yaron Kaikov	413fcab833	release: prepare for 3.3.rc2	2020-02-27 14:45:18 +02:00
Juliusz Stasiewicz	9f3c3036bf	cdc: set TTLs on CDC log cells Cells in CDC logs used to be created while completely neglecting TTLs (the TTLs from `cdc = {...'ttl':600}`). This patch adds TTLs to all cells; there are no row markers, so wee need not set TTL there. Fixes #5688 (cherry picked from commit `67b92c584f`)	2020-02-26 18:12:55 +02:00
Benny Halevy	ff2e108a6d	gossiper: do_stop_gossiping: copy live endpoints vector It can be resized asynchronously by mark_dead. Fixes #5701 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com> (cherry picked from commit `f45fabab73`)	2020-02-26 13:00:11 +02:00
Gleb Natapov	ade788ffe8	commitlog: use commitlog IO scheduling class for segment zeroing There may be other commitlog writes waiting for zeroing to complete, so not using proper scheduling class causes priority inversion. Fixes #5858. Message-Id: <20200220102939.30769-2-gleb@scylladb.com> (cherry picked from commit `6a78cc9e31`)	2020-02-26 12:51:10 +02:00
Benny Halevy	1f8bb754d9	storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service Match subscription done in main() and avoid cross shard access to _lifecycle_subscribers vector. Fixes #5385 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Acked-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com> (cherry picked from commit `5b0ea4c114`)	2020-02-25 16:39:49 +02:00
Tomasz Grabiec	7b2eb09225	Merge fixes for use-after-frees related to shutdown of services Backport of `884d5e2bcb` and `4839ca8491`. Fixes crashes when scylla is stopped early during boot. Merged from https://github.com/xemul/scylla/tree/br-mm-combined-fixes-for-3.3 Fixes #5765.	2020-02-25 13:34:01 +01:00
Pavel Emelyanov	d2293f9fd5	migration_manager: Abort and wait cluster upgrade waiters The maybe_schedule_schema_pull waits for schema_tables_v3 to become available. This is unsafe in case migration manager goes away before the feature is enabled. Fix this by subscribing on feature with feature::listener and waiting for condition variable in maybe_schedule_schema_pull. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 14:18:15 +03:00
Pavel Emelyanov	25b31f6c23	migration_manager: Abort and wait delayed schema pulls The sleep is interrupted with the abort source, the "wait" part is done with the existing _background_tasks gate. Also we need to make sure the gate stays alive till the end of the function, so make use of the async_sharded_service (migration manager is already such). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 14:18:15 +03:00
Pavel Emelyanov	742a1ce7d6	storage_service: Unregister from gossiper notifications ... at all This unregistration doesn't happen currently, but doesn't seem to cause any problems in general, as on stop gossiper is stopped and nothing from it hits the store_service. However (!) if an exception pops up between the storage_service is subscribed on gossiper and the drain_on_shutdown defer action is set up then we _may_ get into the following situation: - main's stuff gets unrolled back - gossiper is not stopped (drain_on_shutdown defer is not set up) - migration manager is stopped (with deferred action in main) - a nitification comes from gossiper -> storage_service::on_change might want to pull schema with the help of local migration manager -> assert(local_is_initialized) strikes Fix this by registering storage_service to gossiper a bit earlier (both are already initialized y that time) and setting up unregister defer right afterwards. Test: unit(dev), manual start-stop Bug: #5628 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200130190343.25656-1-xemul@scylladb.com>	2020-02-24 14:18:15 +03:00
Avi Kivity	4ca9d23b83	Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations" This reverts commit `bdc542143e`. Exposes a data resurrection bug (#5838).	2020-02-24 10:02:58 +02:00
Avi Kivity	9e97f3a9b3	Update seastar submodule * seastar dd686552ff...f54084c08f (2): > reactor: fallback to epoll backend when fs.aio-max-nr is too small > util: move read_sys_file_as() from iotune to seastar header, rename read_first_line_as() Fixes #5638.	2020-02-20 10:25:00 +02:00
Piotr Dulikowski	183418f228	hh: handle counter update hints correctly This patch fixes a bug that appears because of an incorrect interaction between counters and hinted handoff. When a counter is updated on the leader, it sends mutations to other replicas that contain all counter shards from the leader. If consistency level is achieved but some replicas are unavailable, a hint with mutation containing counter shards is stored. When a hint's destination node is no longer its replica, it is attempted to be sent to all its current replicas. Previously, storage_proxy::mutate was used for that purpose. It was incorrect because that function treats mutations for counter tables as mutations containing only a delta (by how much to increase/decrease the counter). These two types of mutations have different serialization format, so in this case a "shards" mutation is reinterpreted as "delta" mutation, which can cause data corruption to occur. This patch backports `storage_proxy::mutate_hint_from_scratch` function, which bypasses special handling of counter mutations and treats them as regular mutations - which is the correct behavior for "shards" mutations. Refs #5833. Backports: 3.1, 3.2, 3.3 Tests: unit(dev) (cherry picked from commit `ec513acc49`)	2020-02-19 16:49:12 +02:00
Piotr Sarna	756574d094	db,view: fix generating view updates for partition tombstones The update generation path must track and apply all tombstones, both from the existing base row (if read-before-write was needed) and for the new row. One such path contained an error, because it assumed that if the existing row is empty, then the update can be simply generated from the new row. However, lack of the existing row can also be the result of a partition/range tombstone. If that's the case, it needs to be applied, because it's entirely possible that this partition row also hides the new row. Without taking the partition tombstone into account, creating a future tombstone and inserting an out-of-order write before it in the base table can result in ghost rows in the view table. This patch comes with a test which was proven to fail before the changes. Branches 3.1,3.2,3.3 Fixes #5793 Tests: unit(dev) Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com> (cherry picked from commit `e93c54e837`)	2020-02-16 20:26:28 +02:00
Rafael Ávila de Espíndola	a348418918	service: Add a lock around migration_notifier::_listeners Before this patch the iterations over migration_notifier::_listeners could race with listeners being added and removed. The addition side is not modified, since it is common to add a listener during construction and it would require a fairly big refactoring. Instead, the iteration is modified to use indexes instead of iterators so that it is still valid if another listener is added concurrently. For removal we use a rw lock, since removing an element invalidates indexes too. There are only a few places that needed refactoring to handle unregister_listener returning a future<>, so this is probably OK. Fixes #5541. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200120192819.136305-1-espindola@scylladb.com> (cherry picked from commit `27bd3fe203`)	2020-02-16 20:13:42 +02:00
Avi Kivity	06c0bd0681	Update seastar submodule * seastar 3f3e117de3...dd686552ff (1): > perftune.py: Use safe_load() for fix arbitrary code execution Fixes #5630.	2020-02-16 15:53:16 +02:00
Avi Kivity	223c300435	Point seastar submodule at scylla-seastar.git branch-3.3 This allows us to backport seastar patches to Scylla 3.3.	2020-02-16 15:51:46 +02:00
Gleb Natapov	ac8bef6781	commitlog: fix flushing an entry marked as "sync" in periodic mode After `546556b71b` we can have mixed writes into commitlog, some do flush immediately some do not. If non flushing write races with flushing one and becomes responsible for writing back its buffer into a file flush will be skipped which will cause assert in batch_cycle() to trigger since flush position will not be advanced. Fix that by checking that flush was skipped and in this case flush explicitly our file position. Fixes #5670 Message-Id: <20200128145103.GI26048@scylladb.com> (cherry picked from commit `c654ffe34b`)	2020-02-16 15:48:40 +02:00
Pavel Solodovnikov	68691907af	lwt: fix handling of nulls in parameter markers for LWT queries This patch affects the LWT queries with IF conditions of the following form: `IF col in :value`, i.e. if the parameter marker is used. When executing a prepared query with a bound value of `(None,)` (tuple with null, example for Python driver), it is serialized not as NULL but as "empty" value (serialization format differs in each case). Therefore, Scylla deserializes the parameters in the request as empty `data_value` instances, which are, in turn, translated to non-empty `bytes_opt` with empty byte-string value later. Account for this case too in the CAS condition evaluation code. Example of a problem this patch aims to fix: Suppose we have a table `tbl` with a boolean field `test` and INSERT a row with NULL value for the `test` column. Then the following update query fails to apply due to the error in IF condition evaluation code (assume `v=(null)`): `UPDATE tbl SET test=false WHERE key=0 IF test IN :v` returns false in `[applied]` column, but is expected to succeed. Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286) Fixes: #5710 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com> (cherry picked from commit `bcc4647552`)	2020-02-16 15:29:28 +02:00
Avi Kivity	f59d2fcbf1	Merge "stop passing tracing state pointer in client_state" from Gleb " client_state is used simultaneously by many requests running in parallel while tracing state pointer is per request. Both those facts do not sit well together and as a result sometimes tracing state is being overwritten while still been used by active request which may cause incorrect trace or even a crash. " Fixes #5700. Backported from `9f1f60fc38` * 'gleb/trace_fix_3.3_backport' of ssh://github.com/scylladb/seastar-dev: client_state: drop the pointer to a tracing state from client_state transport: pass tracing state explicitly instead of relying on it been in the client_state alternator: pass tracing state explicitly instead of relying on it been in the client_state	2020-02-16 15:23:41 +02:00
Asias He	bdc542143e	streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations The table::flush_streaming_mutations is used in the days when streaming data goes to memtable. After switching to the new streaming, data goes to sstables directly in streaming, so the sstables generated in table::flush_streaming_mutations will be empty. It is unnecessary to invalidate the cache if no sstables are added. To avoid unnecessary cache invalidating which pokes hole in the cache, skip calling _cache.invalidate() if the sstables is empty. The steps are: - STREAM_MUTATION_DONE verb is sent when streaming is done with old or new streaming - table::flush_streaming_mutations is called in the verb handler - cache is invalidated for the streaming ranges In summary, this patch will avoid a lot of cache invalidation for streaming. Backports: 3.0 3.1 3.2 Fixes: #5769 (cherry picked from commit `5e9925b9f0`)	2020-02-16 15:16:24 +02:00
Botond Dénes	061a02237c	row: append(): downgrade assert to on_internal_error() This assert, added by `060e3f8` is supposed to make sure the invariant of the append() is respected, in order to prevent building an invalid row. The assert however proved to be too harsh, as it converts any bug causing out-of-order clustering rows into cluster unavailability. Downgrade it to on_internal_error(). This will still prevent corrupt data from spreading in the cluster, without the unavailability caused by the assert. Fixes: #5786 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200211083829.915031-1-bdenes@scylladb.com> (cherry picked from commit `3164456108`)	2020-02-16 15:12:46 +02:00
Gleb Natapov	35b6505517	client_state: drop the pointer to a tracing state from client_state client_state is shared between requests and tracing state is per request. It is not safe to use the former as a container for the later since a state can be overwritten prematurely by subsequent requests. (cherry picked from commit `31cf2434d6`)	2020-02-13 13:45:56 +02:00
Gleb Natapov	866c04dd64	transport: pass tracing state explicitly instead of relying on it been in the client_state Multiple requests can use the same client_state simultaneously, so it is not safe to use it as a container for a tracing state which is per request. Currently next request may overwrite tracing state for previous one causing, in a best case, wrong trace to be taken or crash if overwritten pointer is freed prematurely. Fixes #5700 (cherry picked from commit `9f1f60fc38`)	2020-02-13 13:45:56 +02:00
Gleb Natapov	dc588e6e7b	alternator: pass tracing state explicitly instead of relying on it been in the client_state Multiple requests can use the same client_state simultaneously, so it is not safe to use it as a container for a tracing state which is per request. This is not yet an issue for the alternator since it creates new client_state object for each request, but first of all it should not and second trace state will be dropped from the client_state, by later patch. (cherry picked from commit `38fcab3db4`)	2020-02-13 13:45:56 +02:00
Takuya ASADA	f842154453	dist/debian: keep /etc/systemd .conf files on 'remove' Since dpkg does not re-install conffiles when it removed by user, currently we are missing dependencies.conf and sysconfdir.conf on rollback. To prevent this, we need to stop running 'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'. Fixes #5734 (cherry picked from commit `43097854a5`)	2020-02-12 14:26:40 +02:00
Yaron Kaikov	b38193f71d	dist/docker: Switch to 3.3 release repository (#5756 ) Change the SCYLLA_REPO_URL variable to point to branch-3.3 instead of master. This ensures that Docker image builds that don't specify the variable build from the right repository by default.	2020-02-10 11:11:38 +02:00
Rafael Ávila de Espíndola	f47ba6dc06	lua: Handle nil returns correctly This is a minimum backport to 3.3. With this patch lua nil values are mapped to CQL null values instead of producing an error. Fixes #5667 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200203164918.70450-1-espindola@scylladb.com>	2020-02-09 18:55:42 +02:00
Hagit Segev	0d0c1d4318	release: prepare for 3.3.rc1	2020-02-09 15:55:24 +02:00
Takuya ASADA	9225b17b99	scylla_post_install.sh: fix 'integer expression expected' error awk returns float value on Debian, it causes postinst script failure since we compare it as integer value. Replaced with sed + bash. Fixes #5569 (cherry picked from commit `5627888b7c`)	2020-02-04 14:30:04 +02:00
Gleb Natapov	00b3f28199	db/system_keyspace: use user memory limits for local.paxos table Treat writes to local.paxos as user memory, as the number of writes is dependent on the amount of user data written with LWT. Fixes #5682 Message-Id: <20200130150048.GW26048@scylladb.com> (cherry picked from commit `b08679e1d3`)	2020-02-02 17:36:52 +02:00
Rafael Ávila de Espíndola	1bbe619689	types: Fix encoding of negative varint We would sometimes produce an unnecessary extra 0xff prefix byte. The new encoding matches what cassandra does. This was both a efficiency and correctness issue, as using varint in a key could produce different tokens. Fixes #5656 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> (cherry picked from commit `c89c90d07f`)	2020-02-02 16:00:58 +02:00
Avi Kivity	c36f71c783	test: make eventually() more patient We use eventually() in tests to wait for eventually consistent data to become consistent. However, we see spurious failures indicating that we wait too little. Increasing the timeout has a negative side effect in that tests that fail will now take longer to do so. However, this negative side effect is negligible to false-positive failures, since they throw away large test efforts and sometimes require a person to investigate the problem, only to conclude it is a false positive. This patch therefore makes eventually() more patient, by a factor of 32. Fixes #4707. Message-Id: <20200130162745.45569-1-avi@scylladb.com> (cherry picked from commit `ec5b721db7`)	2020-02-01 13:20:22 +02:00
Pekka Enberg	f5471d268b	release: prepare for 3.3.rc0	2020-01-30 14:00:51 +02:00
Takuya ASADA	fd5c65d9dc	dist/debian: Use tilde for release candidate builds We need to add '~' to handle rcX version correctly on Debian variants (merged at `ae33e9f`), but when we moved to relocated package we mistakenly dropped the code, so add the code again. Fixes #5641 (cherry picked from commit `dd81fd3454`)	2020-01-28 18:34:48 +02:00
Avi Kivity	3aa406bf00	tools: toolchain: dbuild: relax process limit in container Docker restricts the number of processes in a container to some limit it calculates. This limit turns out to be too low on large machines, since we run multiple links in parallel, and each link runs many threads. Remove the limit by specifying --pids-limit -1. Since dbuild is meant to provide a build environment, not a security barrier, this is okay (the container is still restricted by host limits). I checked that --pids-limit is supported by old versions of docker and by podman. Fixes #5651. Message-Id: <20200127090807.3528561-1-avi@scylladb.com> (cherry picked from commit `897320f6ab`)	2020-01-28 18:14:01 +02:00
Piotr Sarna	c0253d9221	db,view: fix checking for secondary index special columns A mistake in handling legacy checks for special 'idx_token' column resulted in not recognizing materialized views backing secondary indexes properly. The mistake is really a typo, but with bad consequences - instead of checking the view schema for being an index, we asked for the base schema, which is definitely not an index of itself. Branches 3.1,3.2 (asap) Fixes #5621 Fixes #4744 (cherry picked from commit `9b379e3d63`)	2020-01-21 23:32:11 +02:00
Avi Kivity	12bc965f71	atomic_cell: consistently use comma as separator in pretty-printers The atomic_cell pretty printers use a mix of commas and semicolons. This change makes them use commas everywhere, for consistency. Message-Id: <20200116133327.2610280-1-avi@scylladb.com>	2020-01-16 17:26:33 +01:00
Nadav Har'El	1ed21d70dc	merge: CDC: do mutation augmentation from storage proxy Merged pull request https://github.com/scylladb/scylla/pull/5567 from Calle Wilund: Fixes #5314 Instead of tying CDC handling into cql statement objects, this patch set moves it to storage proxy, i.e. shared code for mutating stuff. This means we automatically handle cdc for code paths outside cql (i.e. alternator). It also adds api handling (though initially inefficient) for batch statements. CDC is tied into storage proxy by giving the former a ref to the latter (per shard). Initially this is not a constructor parameter, because right now we have chicken and egg issues here. Hopefully, Pavels refactoring of migration manager and notifications will untie these and this relationship can become nicer. The actual augmentation can (as stated above) be made much more efficient. Hopefully, the stream management refactoring will deal with expensive stream lookup, and eventually, we can maybe coalesce pre-image selects for batches. However, that is left as an exercise for when deemed needed. The augmentation API has an optional return value for a "post-image handler" to be used iff returned after mutation call is finished (and successful). It is not yet actually invoked from storage_proxy, but it is at least in the call chain.	2020-01-16 17:12:56 +02:00
Avi Kivity	e677f56094	Merge "Enable general centos RPM (not only centos7)" from Hagit	2020-01-16 14:13:24 +02:00
Tomasz Grabiec	36d90e637e	Merge "Relax migration manager dependencies" from Pavel Emalyanov The set make dependencies between mm and other services cleaner, in particular, after the set: - the query processor no longer needs migration manager (which doesn't need query processor either) - the database no longer needs migration manager, thus the mutual dependency between these two is dropped, only migration manager -> database is left - the migration manager -> storage_service dependency is relaxed, one more patchset will be needed to remove it, thus dropping one more mutual dependency between them, only the storage_service -> migration manager will be left - the migration manager is stopped on drain, but several more services need it on stop, thus causing use after free problems, in particular there's a caught bug when view builder crashes when unregistering from notifier list on stop. Fixed. Tests: unit(dev) Fixes: #5404	2020-01-16 12:12:25 +01:00
Hagit Segev	d0405003bd	building-packages doc: Update no specific el7 on path	2020-01-16 12:49:08 +02:00
Rafael Ávila de Espíndola	c42a2c6f28	configure: Add -O1 when compiling generated parsers Enabling asan enables a few cleanup optimizations in gcc. The net result is that using -fsanitize=address -fno-sanitize-address-use-after-scope Produces code that uses a lot less stack than if the file is compiled with just -O0. This patch adds -O1 in addition to -fno-sanitize-address-use-after-scope to protect the unfortunate developer that decides to build in dev mode with --cflags='-O0 -g'. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200116012318.361732-2-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola	317e0228a8	configure: Put user flags after the mode flags It is sometimes convenient to build with flags that don't match any existing mode. Recently I was tracking a bug that would not reproduce with debug, but reproduced with dev, so I tried debugging the result of ./configure.py --cflags="-O0 -g" While the binary had debug info, it still had optimizations because configure.py put the mode flags after the user flags (-O0 -O1). This patch flips the order (-O1 -O0) so that the flags passed in the command line win. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200116012318.361732-1-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Gleb Natapov	51281bc8ad	lwt: fix write timeout exception reporting CQL transport code relies on an exception's C++ type to create correct reply, but in lwt we converted some mutation_timeout exceptions to more generic request_timeout while forwarding them which broke the protocol. Do not drop type information. Fixes #5598. Message-Id: <20200115180313.GQ9084@scylladb.com>	2020-01-16 12:05:50 +02:00
Piotr Jastrzębski	0c8c1ec014	config: fix description of enable_deprecated_partitioners Murmur3 is the default partitioner. ByteOrder and Random are the deprecated ones and should be mentioned in the description. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-16 12:05:50 +02:00
Nadav Har'El	9953a33354	merge "Adding a schema file when creating a snapshot" Merged pull request https://github.com/scylladb/scylla/pull/5294 from Amnon Heiman: To use a snapshot we need a schema file that is similar to the result of running cql DESCRIBE command. The DESCRIBE is implemented in the cql driver so the functionality needs to be re-implemented inside scylla. This series adds a describe method to the schema file and use it when doing a snapshot. There are different approach of how to handle materialize views and secondary indexes. This implementation creates each schema.cql file in its own relevant directory, so the schema for materializing view, for example, will be placed in the snapshot directory of the table of that view. Fixes #4192	2020-01-16 12:05:50 +02:00
Piotr Dulikowski	c383652061	gossip: allow for aborting on sleep This commit makes most sleeps in gossip.cc abortable. It is now possible to quickly shut down a node during startup, most notably during the phase while it waits for gossip to settle.	2020-01-16 12:05:50 +02:00
Avi Kivity	e5e0642f2a	tools: toolchain: add dependencies for building debian and rpm packages This reduces network traffic and eliminates time for installation when building packages from the frozen toolchain, as well as isolating the build from updates to those dependencies which may cause breakage.	2020-01-16 12:05:50 +02:00
Pekka Enberg	da9dae3dbe	Merge 'test.py: add support for CQL tests' from Kostja This patch set adds support for CQL tests to test.py, as well as many other improvements: * --name is now a positional argument * test output is preserved in testlog/${mode} * concise output format * better color support * arbitrary number of test suites * per-suite yaml-based configuration * options --jenkins and --xunit are removed and xml files are generated for all runs A simple driver is written in C++ to read CQL for standard input, execute in embedded mode and produce output. The patch is checked with BYO. Reviewed-by: Dejan Mircevski <dejan@scylladb.com> * 'test.py' of github.com:/scylladb/scylla-dev: (39 commits) test.py: introduce BoostTest and virtualize custom boost arguments test.py: sort tests within a suite, and sort suites test.py: add a basic CQL test test.py: add CQL .reject files to gitignore test.py: print a colored unidiff in case of test failure test.py: add CqlTestSuite to run CQL tests test.py: initial import of CQL test driver, cql_repl test.py: remove custom colors and define a color palette test.py: split test output per test mode test.py: remove tests_to_run test.py: virtualize Test.run(), to introduce CqlTest.Run next test.py: virtualize test search pattern per TestSuite test.py: virtualize write_xunit_report() test.py: ensure print_summary() is agnostic of test type test.py: tidy up print_summary() test.py: introduce base class Test for CQL and Unit tests test.py: move the default arguments handling to UnitTestSuite test.py: move custom unit test command line arguments to suite.yaml test.py: move command line argument processing to UnitTestSuite test.py: introduce add_test(), which is suite-specific ...	2020-01-16 12:05:50 +02:00
Pekka Enberg	e8b659ec5d	dist/docker: Remove Ubuntu-based Docker image The Ubuntu-based Docker image uses Scylla 1.0 and has not been updated since 2017. Let's remove it as unmaintained. Message-Id: <20200115102405.23567-1-penberg@scylladb.com>	2020-01-16 12:05:50 +02:00
Avi Kivity	546556b71b	Merge "allow commitlog to wait for specific entires to be flushed on disk" from Gleb " Currently commitlog supports two modes of operation. First is 'periodic' mode where all commitlog writes are ready the moment they are stored in a memory buffer and the memory buffer is flushed to a storage periodically. Second is a 'batch' mode where each write is flushed as soon as possible (after previous flush completed) and writes are only ready after they are flushed. The first option is not very durable, the second is not very efficient. This series adds an option to mark some writes as "more durable" in periodic mode meaning that they will be flushed immediately and reported complete only after the flush is complete (flushing a durable write also flushes all writes that came before it). It also changes paxos to use those durable writes to store paxos state. Note that strictly speaking the last patch is not needed since after writing to an actual table the code updates paxos table and the later uses durable writes that make sure all previous writes are flushed. Given that both writes supposed to run on the same shard this should be enough. But it feels right to make base table writes durable as well. " * 'gleb/commilog_sync_v4' of github.com:scylladb/seastar-dev: paxos: immediately sync commitlog entries for writes made by paxos learn stage paxos: mark paxos table schema as "always sync" schema: allow schema to be marked as 'always sync to commitlog' commitlog: add test for per entry sync mode database: pass sync flag from db::apply function to the commitlog commitlog: add sync method to entry_writer	2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola	2ebd1463b2	tests: Handle null and not present values differently Before this patch result_set_assertions was handling both null values and missing values in the same way. This patch changes the handling of missing values so that now checking for a null value is not the same as checking for a value not being present. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200114184116.75546-1-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Botond Dénes	0c52c2ba50	data: make cell::make_collection(): more consistent and safer `3ec889816` changed cell::make_collection() to take different code paths depending whether its `data` argument is nothrow copyable/movable or not. In case it is not, it is wrapped in a view to make it so (see the above mentioned commit for a full explanation), relying on the methods pre-existing requirement for callers to keep `data` alive while the created writer is in use. On closer look however it turns out that this requirement is neither respected, nor enforced, at least not on the code level. The real requirement is that the underlying data represented by `data` is kept alive. If `data` is a view, it is not expected to be kept alive and callers don't, it is instead copied into `make_collection()`. Non-views however are expected to be kept alive. This makes the API error prone. To avoid any future errors due to this ambiguity, require all `data` arguments to be nothrow copyable and movable. Callers are now required to pass views of nonconforming objects. This patch is a usability improvement and is not fixing a bug. The current code works as-is because it happens to conform to the underlying requirements. Refs: #5575 Refs: #5341 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200115084520.206947-1-bdenes@scylladb.com>	2020-01-16 12:05:50 +02:00
Amnon Heiman	ac8aac2b53	tests/cql_query_test: Add schema describe tests This patch adds tests for the describe method. test_describe_simple_schema tests regular tables. test_describe_view_schema tests view and index. Each test, create a table, find the schema, call the describe method and compare the results to the string that was used to create the table. The view tests also verify that adding an index or view does not change the base table. When comparing results, leading and trailing white spaces are ignored and all combination of whitespaces and new lines are treated equaly. Additional tests may be added at a future phase if required. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:07:57 +02:00
Amnon Heiman	028525daeb	database: add schema.cql file when creating a snapshot When creating a snapshot we need to add a schema.cql file in the snapshot directory that describes the table in that snapshot. This patch adds the file using the schema describe method. get_snapshot_details and manifest_json_filter were modified to ignore the schema.cql file. Fixes #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Amnon Heiman	82367b325a	schema: Add a describe method This patch adds a describe method to a table schema. It acts similar to a DESCRIBE cql command that is implemented in a CQL driver. The method supports tables, secondary indexes local indexes and materialize views. relates to: #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Amnon Heiman	6f58d51c83	secondary_index_manager: add the index_name_from_table_name function index_name_from_table_name is a reverse of index_table_name, it gets a table name that was generated for an index and return the name of the index that generated that table. Relates to #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Pavel Emelyanov	555856b1cd	migration_manager: Use in-place value factory The factory is purely a state-less thing, there is no difference what instance of it to use, so we may omit referencing the storage_service in passive_announce This is 2nd simple migration_manager -> storage_service link to cut (more to come later). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	f129d8380f	migration_manager: Get database through storage_proxy There are several places where migration_manager needs storage_service reference to get the database from, thus forming the mutual dependency between them. This is the simplest case where the migration_manager link to the storage_service can be cut -- the databse reference can be obtained from storage_proxy instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	5cf365d7e7	database: Explicitly pass migration_manager through init_non_system_keyspace This is the last place where database code needs the migration_manager instance to be alive, so now the mutual dependency between these two is gone, only the migration_manager needs the database, but not the vice-versa. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	ebebf9f8a8	database: Do not request migration_manager instance for passive_announce The helper in question is static, so no need to play with the migration_manager instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	3f84256853	migration_manager: Remove register/unregister helpers In the 2nd patch the migration_manager kept those for simpler patching, but now we can drop it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	9e4b41c32a	tests: Switch on migration notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	9d31bc166b	cdc: Use migration_notifier to (un)register for events If no one provided -- get it from storage_service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:19 +03:00
Pavel Emelyanov	ecab51f8cc	storage_service: Use migration_notifier (and stop worrying) The storage_server needs migration_manager for notifications and carefully handles the manager's stop process not to demolish the listeners list from under itself. From now on this dependency is no longer valid (however the storage_service seems still need the migration_manager, but this is different story). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	7814ed3c12	cql_server: Use migration_notifier in events_notifier This patch removes an implicit cql_server -> migration_manager dependency, as the former's event notifier uses the latter for notifications. This dependency also breaks a loop: storage_service -> cql_server -> migration_manager -> storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	d9edcb3f15	query_processor: Use migration_notifier This patch breaks one (probably harmless but still) dependency loop. The query_processor -> migration_manager -> storage_proxy -> tracing -> query_processor. The first link is not not needed, as the query_processor needs the migration_manager purely to (ub)subscribe on notifications. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	2735024a53	auth: Use migration_notifier The same as with view builder. The constructor still needs both, but the life-time reference is now for notifier only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	28f1250b8b	view_builder: Use migration notifier The migration manager itself is still needed on start to wait for schema agreement, but there's no longer the need for the life-time reference on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	7cfab1de77	database: Switch on mnotifier from migration_manager Do not call for local migration manager instance to send notifications, call for the local migration notifier, it will always be alive. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f45b23f088	storage_service: Keep migration_notifier The storage service will need this guy to initialize sub-services with. Also it registers itself with notifiers. That said, it's convenient to have the migration notifier on board. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	e327feb77f	database: Prepare to use on-database migration_notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f240d5760c	migration_manager: Split notifier from main class The _listeners list on migration_manager class and the corresponding notify_xxx helpers have nothing to do with the its instances, they are just transport for notification delivery. At the same time some services need the migration manager to be alive at their stop time to unregister from it, while the manager itself may need them for its needs. The proposal is to move the migration notifier into a complete separate sharded "service". This service doesn't need anything, so it's started first and stopped last. While it's not effectively a "migration" notifier, we inherited the name from Cassandra and renaming it will "scramble neurons in the old-timers' brains but will make it easier for newcomers" as Avi says. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:19 +03:00
Pavel Emelyanov	074cc0c8ac	migration_manager: Helpers for on_before_ notifications Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:27:27 +03:00
Pavel Emelyanov	1992755c72	storage_service: Kill initialization helper from init.cc The helper just makes further patching more complex, so drop it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:27:27 +03:00
Konstantin Osipov	a665fab306	test.py: introduce BoostTest and virtualize custom boost arguments	2020-01-15 13:37:25 +03:00
Gleb Natapov	51672e5990	paxos: immediately sync commitlog entries for writes made by paxos learn stage	2020-01-15 12:15:42 +02:00
Gleb Natapov	0fc48515d8	paxos: mark paxos table schema as "always sync" We want all writes to paxos table to be persisted on a storage before declared completed.	2020-01-15 12:15:42 +02:00
Gleb Natapov	16e0fc4742	schema: allow schema to be marked as 'always sync to commitlog' All writes that uses this schema will be immediately persisted on a storage.	2020-01-15 12:15:42 +02:00
Gleb Natapov	0ce70c7a04	commitlog: add test for per entry sync mode	2020-01-15 12:15:42 +02:00
Gleb Natapov	29574c1271	database: pass sync flag from db::apply function to the commitlog Allow upper layers to request a mutation to be persisted on a disk before making future ready independent of which mode commitlog is running in.	2020-01-15 12:15:42 +02:00
Gleb Natapov	e0bc4aa098	commitlog: add sync method to entry_writer If the method returns true commitlog should sync to file immediately after writing the entry and wait for flush to complete before returning.	2020-01-15 12:15:42 +02:00
Piotr Sarna	9aab75db60	alternator: clean up single value rjson comparator The comparator is refreshed to ensure the following: - null compares less to all other types; - null, true and false are comparable against each other, while other types are only comparable against themselves and null. Comparing mixed types is not currently reachable from the alternator API, because it's only used for sets, which can only use strings, binary blobs and numbers - thus, no new pytest cases are added. Fixes #5454	2020-01-15 10:57:49 +02:00
Juliusz Stasiewicz	d87d01b501	storage_proxy: intercept rpc::closed_error if counter leader is down (#5579 ) When counter mutation is about to be sent, a leader is elected, but if the leader fails after election, we get `rpc::closed_error`. The exception propagates high up, causing all connections to be dropped. This patch intercepts `rpc::closed_error` in `storage_proxy::mutate_counters` and translates it to `mutation_write_failure_exception`. References #2859	2020-01-15 09:56:45 +01:00
Konstantin Osipov	a351ea57d5	test.py: sort tests within a suite, and sort suites This makes it easier to navigate the test artefacts. No need to sort suites since they are already stored in a dict.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	ba87e73f8e	test.py: add a basic CQL test	2020-01-15 11:41:19 +03:00
Konstantin Osipov	44d31db1fc	test.py: add CQL .reject files to gitignore To avoid accidental commit, add .reject files to .gitignore	2020-01-15 11:41:19 +03:00
Konstantin Osipov	4f64f0c652	test.py: print a colored unidiff in case of test failure Print a colored unidiff between result and reject files in case of test failure.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	d3f9e64028	test.py: add CqlTestSuite to run CQL tests Run the test and compare results. Manage temporary and .reject files. Now that there are CQL tests, improve logging. run_test success no longer means test success.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	b114bfe0bd	test.py: initial import of CQL test driver, cql_repl cql_repl is a simple program which reads CQL from stdin, executes it, and writes results to stdout. It support --input, --output and --log options. --log is directed to cql_test.log by default. --input is stdin by default --output is stdout by default. The result set output is print with a basic JSON visitor.	2020-01-15 11:41:16 +03:00
Konstantin Osipov	0ec27267ab	test.py: remove custom colors and define a color palette Using a standard Python module improves readability, and allows using colors easily in other output.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	0165413405	test.py: split test output per test mode Store test temporary files and logs in ${testdir}/${mode}. Remove --jenkins and --xunit, and always write XML files at a predefined location: ${testdir}/${mode}/xml/. Use .xunit.xml extension for tests which XML output is in xunit format, and junit.xml for an accumulated output of all non-boost tests in junit format.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	4095ab08c8	test.py: remove tests_to_run Avoid storing each test twice, use per-tests list to construct a global iterable.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	169128f80b	test.py: virtualize Test.run(), to introduce CqlTest.Run next	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d05f6c3cc7	test.py: virtualize test search pattern per TestSuite CQL tests have .cql extension, while unit tests have .cc.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	abcc182ab3	test.py: virtualize write_xunit_report() Make sure any non-boost test can participate in the report.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	18aafacfad	test.py: ensure print_summary() is agnostic of test type Introduce a virtual Test.print_summary() to print a failed test summary.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	21fbe5fa81	test.py: tidy up print_summary() Now that we have tabular output, make print_summary() more concise.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	c171882b51	test.py: introduce base class Test for CQL and Unit tests	2020-01-15 10:53:24 +03:00
Konstantin Osipov	fd6897d53e	test.py: move the default arguments handling to UnitTestSuite Move UnitTeset default seastar argument handling to UnitTestSuite (cleanup).	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d3126f08ed	test.py: move custom unit test command line arguments to suite.yaml Load the command line arguments, if any, from suite.yaml, rather than keep them hard-coded in test.py. This is allows operations team to have easier access to these. Note I had to sacrifice dynamic smp count for mutation_reader_test (the new smp count is fixed at 3) since this is part of test configuration now.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	ef6cebcbd2	test.py: move command line argument processing to UnitTestSuite	2020-01-15 10:53:24 +03:00
Konstantin Osipov	4a20617be3	test.py: introduce add_test(), which is suite-specific	2020-01-15 10:53:24 +03:00
Konstantin Osipov	7e10bebcda	test.py: move long test list to suite.yaml Use suite.yaml for long tests	2020-01-15 10:53:24 +03:00
Konstantin Osipov	32ffde91ba	test.py: move test id assignment to TestSuite Going forward finding and creating tests will be a responsibility of TestSuite, so the id generator needs to be shared.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	b5b4944111	test.py: move repeat handling to TestSuite This way we can avoid iterating over all tests to handle --repeat. Besides, going forward the tests will be stored in two places: in the global list of all tests, for the runner, and per suite, for suite-based reporting, so it's easier if TestSuite if fully responsible for finding and adding tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	34a1b49fc3	test.py: move add_test_list() to TestSuite	2020-01-15 10:53:24 +03:00
Konstantin Osipov	44e1c4267c	test.py: introduce test suites - UnitTestSuite - for test/unit tests - BoostTestSuite - a tweak on UnitTestSuite, with options to log xml test output to a dedicated file	2020-01-15 10:53:24 +03:00
Konstantin Osipov	eed3201ca6	test.py: use path, rather than test kind, for search pattern Going forward there may be multiple suites of the same kind.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	f95c97667f	test.py: support arbitrary number of test suites Scan entire test/ for folders that contain suite.yaml, and load tests from these folders. Skip the rest. Each folder with a suite.yaml is expected to have a valid suite configuration in the yaml file. A suite is a folder with test of the same type. E.g. it can be a folder with unit tests, boost tests, or CQL tests. The harness will use suite.yaml to create an appropriate suite test driver, to execute tests in different formats.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	c1f8169cd4	test.py: add suite.yaml to boost and unit tests The plan is to move suite-specific settings to the configuration file.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	ec9ad04c8a	test.py: move 'success' to TestUnit class There will be other success attributes: program return status 0 doesn't mean the test is successful for all tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	b4aa4d35c3	test.py: save test output in tmpdir It is handy to have it so that a reference of a failed test is available without re-running it.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	f4efe03ade	test.py: always produce xml output, derive output paths from tmpdir It reduces the number of configurations to re-test when test.py is modified. and simplifies usage of test.py in build tools, since you no longer need to bother with extra arguments.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d2b546d464	test.py: output job count in the log	2020-01-15 10:53:24 +03:00
Konstantin Osipov	233f921f9d	test.py: make test output brief&tabular New format: % ./test.py --verbose --mode=release ================================================================================ [N/TOTAL] TEST MODE RESULT ------------------------------------------------------------------------------ [1/111] boost/UUID_test release [ PASS ] [2/111] boost/enum_set_test release [ PASS ] [3/111] boost/like_matcher_test release [ PASS ] [4/111] boost/observable_test release [ PASS ] [5/111] boost/allocation_strategy_test release [ PASS ] ^C % ./test.py foo ================================================================================ [N/TOTAL] TEST MODE RESULT ------------------------------------------------------------------------------ [3/3] unit/memory_footprint_test debug [ PASS ] ------------------------------------------------------------------------------	2020-01-15 10:53:24 +03:00
Konstantin Osipov	879bea20ab	test.py: add a log file Going forward I'd like to make terminal output brief&tabular, but some test details are necessary to preserve so that a failure is easy to debug. This information now goes to the log file. - open and truncate the log file on each harness start - log options of each invoked test in the log, so that a failure is easy to reproduce - log test result in the log Since tests are run concurrently, having an exact trace of concurrent execution also helps debugging flaky tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	cbee76fb95	test.py: gitignore the default ./test.py tmpdir, ./testlog	2020-01-15 10:53:24 +03:00
Konstantin Osipov	1de69228f1	test.py: add --tmpdir It will be used for test log files.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	caf742f956	test.py: flake8 style fix	2020-01-15 10:53:24 +03:00
Konstantin Osipov	dab364c87d	test.py: sort imports	2020-01-15 10:53:24 +03:00
Konstantin Osipov	7ec4b98200	test.py: make name a positional argument. Accept multiple test names, treat test name as a substring, and if the same name is given multiple times, run the test multiple times.	2020-01-15 10:53:24 +03:00
Dejan Mircevski	bb2e04cc8b	alternator: Improve comments on comparators Some comparator methods in conditions.cc use unexpected operators; explain why. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-14 22:25:55 +02:00
Tomasz Grabiec	c8a5a27bd9	Merge "storage_service: Move load_broadcaster away" from Pavel E. The storage_service struct is a collection of diverse things, most of them requiring only on start and on stop and/or runing on shard 0 (but is nonetheless sharded). As a part of clearing this structure and generated by it inter- -componenes dependencies, here's the sanitation of load_broadcaster.	2020-01-14 19:26:06 +01:00
Calle Wilund	313ed91ab0	cdc: Listen for migration callbacks on all shards Fixes #5582 ... but only populate log on shard 0. Migration manager callbacks are slightly assymetric. Notifications for pre-create/update mutations are sent only on initiating shard (neccesary, because we consider the mutations mutable). But "created" callbacks are sent on all shards (immutable). We must subscribe on all shards, but still do population of cdc table only once, otherwise we can either miss table creat or populate more than once. v2: - Add test case Message-Id: <20200113140524.14890-1-calle@scylladb.com>	2020-01-14 16:35:41 +01:00
Avi Kivity	2138657d3a	Update seastar submodule * seastar 36cf5c5ff0...3f3e117de3 (16): > memcached: don't use C++17-only std::optional > reactor: Comment why _backend is assigned in constructor body > log: restore --log-to-stdout for backward compatibility > used_size.hh: Include missing headers > core: Move some code from reactor.cc to future.cc > future-util: move parallel_for_each to future-util.cc > task: stop wrapping tasks with unique_ptr > Merge "Setup timer signal handler in backend constructor" from Pavel Fixes #5524 > future: avoid a branch in future's move constructor if type is trivial > utils: Expose used_size > stream: Call get_future early > future-util: Move parallel_for_each_state code to a .cc > memcached: log exceptions > stream: Delete dead code > core: Turn pollable_fd into a simple proxy over pollable_fd_state. > Merge "log to std::cerr" from Benny	2020-01-14 16:56:25 +02:00
Pavel Emelyanov	e1ed8f3f7e	storage_service: Remove _shadow_token_metadata This is the part of de-bloating storage_service. The field in question is used to temporary keep the _token_metadata value during shard-wide replication. There's no need to have it as class member, any "local" copy is enough. Also, as the size of token_metadata is huge, and invoke_on_all() copies the function for each shard, keep one local copy of metadata using do_with() and pass it into the invoke_on_all() by reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reviewed-by: Asias He <asias@scylladb.com> Message-Id: <20200113171657.10246-1-xemul@scylladb.com>	2020-01-14 16:29:10 +02:00
Rafael Ávila de Espíndola	054f5761a7	types: Refactor code into a serialize_varint helper This is a bit cleaner and avoids a boost::multiprecision::cpp_int copy while serializing a decimal. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200110221422.35807-1-espindola@scylladb.com>	2020-01-14 16:28:27 +02:00
Avi Kivity	6c84dd0045	cql3: update_statement: do not set query option always_return_static_content for list read-before-write The query option always_return_static_content was added for lightweight transations in commits `e0b31dd273` (infrastructure) and `65b86d155e` (actual use). However, the flag was added unconditionally to update_parameters::options. This caused it to be set for list read-modify-write operations, not just for lightweight transactions. This is a little wasteful, and worse, it breaks compatibility as old nodes do not understand the always_return_static_content flag and complain when they see it. To fix, remove the always_return_static_content from update_parameters::options and only set it from compare-and-swap operations that are used to implement lightweight transactions. Fixes #5593. Reviewed-by: Gleb Natapov <gleb@scylladb.com> Message-Id: <20200114135133.2338238-1-avi@scylladb.com>	2020-01-14 16:15:20 +02:00
Hagit Segev	ef88e1e822	CentOS RPMs: Remove target to enable general centos.	2020-01-14 14:31:03 +02:00
Alejo Sanchez	6909d4db42	cql3: BYPASS CACHE query counter This patch is the first part of requested full scan metrics. It implements a counter of SELECT queries with BYPASS CACHE option. In scope of #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200113222740.506610-2-alejo.sanchez@scylladb.com>	2020-01-14 12:19:00 +02:00
Rafael Ávila de Espíndola	dca1bc480f	everywhere: Use serialized(foo) instead of data_value(foo).serialize() This is just a simple cleanup that reduces the size of another patch I am working on and is an independent improvement. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200114051739.370127-1-espindola@scylladb.com>	2020-01-14 12:17:12 +02:00
Pavel Emelyanov	b9f28e9335	storage_service: Remove dead drain branch The drain_in_progress variable here is the future that's set by the drain() operation itself. Its promise is set when the drain() finishes. The check for this future in the beginning of drain() is pointless. No two drain()-s can run in parallels because of run_with_api_lock() protection. Doing the 2nd drain after successfull 1st one is also impossible due to the _operation_mode check. The 2nd drain after _exceptioned_ (and thus incomplete) 1st one will deadlock, after this patch will try to drain for the 2nd time, but that should by ok. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200114094724.23876-1-xemul@scylladb.com>	2020-01-14 12:07:29 +02:00
Piotr Sarna	36ec43a262	Merge "add table with connected cql clients" from Juliusz This change introduces system.clients table, which provides information about CQL clients connected. PK is the client's IP address, CK consists of outgoing port number and client_type (which will be extended in future to thrift/alternator/redis). Table supplies also shard_id and username. Other columns, like connection_stage, driver_name, driver_version..., are currently empty but exist for C* compatibility and future use. This is an ordinary table (i.e. non-virtual) and it's updated upon accepting connections. This is also why C*'s column request_count was not introduced. In case of abrupt DB stop, the table should not persist, so it's being truncated on startup. Resolves #4820	2020-01-14 10:01:07 +02:00
Avi Kivity	1f46133273	Merge "data: make cell::make_collection() exception safe" from Botond " Most of the code in `cell` and the `imr` infrastructure it is built on is `noexcept`. This means that extra care must be taken to avoid rouge exceptions as they will bring down the node. The changes introduced by 0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this code path by violating an undocumented and unvalidated assumption -- that fragment ranges passed to `cell::make_collection()` are nothrow copyable and movable. This series refactors `cell::make_collection()` such that it does not have this assumption anymore and is safe to use with any range. Note that the unit test included in this series, that was used to find all the possible exception sources will not be currently run in any of our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not being set. I plan to address this in a followup because setting this flags fails other tests using the failure injection mechanism. This is because these tests are normally run with the failure injection disabled so failures managed to lurk in without anyone noticing. Fixes: #5575 Refs: #5341 Tests: unit(dev, debug) " * 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla: test: mutation_test: add exception safety test for large collection serialization data/cell.hh: avoid accidental copies of non-nothrow copiable ranges utils/fragment_range.hh: introduce fragment_range_view	2020-01-14 10:01:06 +02:00
Nadav Har'El	5b08ec3d2c	alternator: error on unsupported ScanIndexForward=false We do not yet support the ScanIndexForward=false option for reversing the sort order of a Query operation, as reported in issue #5153. But even before implementing this feature, it is important that we produce an error if a user attempts to use it - instead of outright ignoring this parameter and giving the user wrong results. This is what this patch does. Before this patch, the reverse-order query in the xfailing test test_query.py::test_query_reverse seems to succeed - yet gives results in the wrong order. With this patch, the query itself fails - stating that the ScanIndexForward=false argument is not supported. Refs #5153 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200105113719.26326-1-nyh@scylladb.com>	2020-01-14 10:01:06 +02:00
Pavel Emelyanov	c4bf532d37	storage_service: Fix race in removenode/force_removenode/other Here's another theoretical problem, that involves 3 sequential calls to respectively removenode, force_removenode and some other operation. Let's walk through them First goes the removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now the force_removenode can run: run_with_no_api_lock storage_service::force_removenode check _operation_in_progress (not empty) _force_remove_completion = true sleep in _operation_in_progress.empty loop Now the 1st call wakes up and: if _force_remove_completion == true throw <some exception> .finally() handler in run_with_api_lock _operation_in_progress = <empty> At this point some other operation may start. Say, drain: run_with_api_lock _operation_in_progress = "drain" storage_service::drain ... go to sleep somewhere No let's go back to the 1st op that wakes up from its sleep. The code it executes is while (!ss._operation_in_progress.empty()) { sleep_abortable() } and while the drain is running it will never exit. However (! and this is the core of the race) should the drain operation happen _before_ the force_removenode, another check for _operation_in_progress would have made the latter exit with the "Operation drain is in progress, try again" message. Fix this inconsistency by making the check for current operation every wake-up from the sleep_abortable. Fixes #5591 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-14 10:01:06 +02:00
Pavel Emelyanov	cc92683894	storage_service: Fix race and deadlock in removenode/force_removenode Here's a theoretical problem, that involves 3 sequential calls to respectively removenode, force_removenode and removenode (again) operations. Let's walk through them First goes the removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now the force_removenode can run: run_with_no_api_lock storage_service::force_removenode check _operation_in_progress (not empty) _force_remove_completion = true sleep in _operation_in_progress.empty loop Now the 1st call wakes up and: if _force_remove_completion == true _force_remove_completion = false throw <some exception> .finally() handler in run_with_api_lock _operation_in_progress = <empty> ! at this point we have _force_remove_completion = false and _operation_in_progress = <empty>, which opens the following opportunity for the 3d removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now here's what we have in 2nd and 3rd ops: 1. _operation_in_progress = "removenode" (set by 3rd) prevents the force_removenode from exiting its loop 2. _force_remove_completion = false (set by 1st on exit) prevents the removenode from waiting on replicating_nodes list One can start the 4th call with force_removenode, it will proceed and wake up the 3rd op, but after it we'll have two force_removenode-s running in parallel and killing each other. I propose not to set _force_remove_completion to false in removenode, but just exit and let the owner of this flag unset it once it gets the control back. Fixes #5590 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-14 10:01:06 +02:00
Benny Halevy	ff55b5dca3	cql3: functions: limit sum overflow detection to integral types Other types do not have a wider accumulator at the moment. And static_cast<accumulator_type>(ret) != _sum evaluates as false for NaN/Inf floating point values. Fixes #5586 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>	2020-01-14 10:01:06 +02:00
Avi Kivity	e3310201dd	atomic_cell_or_collection: type-aware print atomic_cell or collection components Now that atomic_cell_view and collection_mutation_view have type-aware printers, we can use them in the type-aware atomic_cell_or_collection printer. Message-Id: <20191231142832.594960-1-avi@scylladb.com>	2020-01-14 10:01:06 +02:00
Avi Kivity	931b196d20	mutation_partition: row: resolve column name when in schema-aware printer Instead of printing the column id, print the full column name. Message-Id: <20191231142944.595272-1-avi@scylladb.com>	2020-01-14 10:01:06 +02:00
Nadav Har'El	4aa323154e	merge: Pretty print canonical_mutation objects Merged pull request https://github.com/scylladb/scylla/pull/5533 from Avi Kivity: canonical_mutation objects are used for schema reconciliation, which is a fragile area and thus deserves some debugging help. This series makes canonical_mutation objects printable.	2020-01-14 10:01:06 +02:00
Takuya ASADA	5241deda2d	dist: nonroot: fix CLI tool path for nonroot (#5584 ) CLI tool path is hardcorded, need to specify correct path on nonroot.	2020-01-14 10:01:06 +02:00
Nadav Har'El	1511b945f8	merge: Handle multiple regular base columns in view pk Merged patch series from Piotr Sarna: "Previous assumption was that there can only be one regular base column in the view key. The assumption is still correct for tables created via CQL, but it's internally possible to create a view with multiple such columns - the new assumption is that if there are multiple columns, they share their liveness. This series is vital for indexing to work properly on alternator, so it would be best to solve the issue upstream. I strived to leave the existing semantics intact as long as only up to one regular column is part of the materialized view primary key, which is the case for Scylla's materialized views. For alternator it may not be true, but all regular columns in alternator share liveness info (since alternator does not support per-column TTL), which is sufficient to compute view updates in a consistent way. Fixes #5006 Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)" Piotr Sarna (3): db,view: fix checking if partition key is empty view: handle multiple regular base columns in view pk test: add a case for multiple base regular columns in view key alternator-test/test_gsi.py \| 1 - view_info.hh \| 5 +- cql3/statements/alter_table_statement.cc \| 2 +- db/view/view.cc \| 77 ++++++++++++++---------- mutation_partition.cc \| 2 +- test/boost/cql_query_test.cc \| 58 ++++++++++++++++++ 6 files changed, 109 insertions(+), 36 deletions(-)	2020-01-14 10:01:00 +02:00
Nadav Har'El	f16e3b0491	merge: bouncing lwt request to an owning shard Merged patch series from Gleb Natapov: "LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by the transport code that jumps to a correct shard and re-process incoming message there. The nicer way to achieve the same would be to jump to a right shard inside of the storage_proxy::cas(), but unfortunately with current implementation of the modification statements they are unusable by a shard different from where it was created, so the jump should happen before a modification statement for an cas() is created. When we fix our cql code to be more cross-shard friendly this can be reworked to do the jump in the storage_proxy." Gleb Natapov (4): transport: change make_result to takes a reference to cql result instead of shared_ptr storage_service: move start_native_transport into a thread lwt: Process lwt request on a owning shard lwt: drop invoke_on in paxos_state prepare and accept auth/service.hh \| 5 +- message/messaging_service.hh \| 2 +- service/client_state.hh \| 30 +++- service/paxos/paxos_state.hh \| 10 +- service/query_state.hh \| 6 + service/storage_proxy.hh \| 2 + transport/messages/result_message.hh \| 20 +++ transport/messages/result_message_base.hh \| 4 + transport/request.hh \| 4 + transport/server.hh \| 25 ++- cql3/statements/batch_statement.cc \| 6 + cql3/statements/modification_statement.cc \| 6 + cql3/statements/select_statement.cc \| 8 + message/messaging_service.cc \| 2 +- service/paxos/paxos_state.cc \| 48 ++--- service/storage_proxy.cc \| 47 ++++- service/storage_service.cc \| 120 +++++++------ test/boost/cql_query_test.cc \| 1 + thrift/handler.cc \| 3 + transport/messages/result_message.cc \| 5 + transport/server.cc \| 203 ++++++++++++++++------ 21 files changed, 377 insertions(+), 180 deletions(-)	2020-01-14 09:59:59 +02:00
Botond Dénes	300728120f	test: mutation_test: add exception safety test for large collection serialization Use `seastar::memory::local_failure_injector()` to inject al possible `std::bad_alloc`:s into the collection serialization code path. The test just checks that there are no `std::abort()`:s caused by any of the exceptions. The test will not be run if `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` is not defined.	2020-01-13 16:53:35 +02:00
Botond Dénes	3ec889816a	data/cell.hh: avoid accidental copies of non-nothrow copiable ranges `cell::make_collection()` assumes that all ranges passed to it are nothrow copyable and movable views. This is not guaranteed, is not expressed in the interface and is not mentioned in the comments either. The changes introduced by 0a453e5d3a to collection serialization, making it use fragmented buffers, fell into this trap, as it passes `bytes_ostream` to `cell::make_collection()`. `bytes_ostream`'s copy constructor allocates and hence can throw, triggering an `std::terminate()` inside `cell::make_collection()` as the latter is noexcept. To solve this issue, non-nothrow copyable and movable ranges are now wrapped in a `fragment_range_view` to make them so. `cell::make_collection()` already requires callers to keep alive the range for the duration of the call, so this does not introduce any new requirements to the callers. Additionally, to avoid any future accidents, do not accept temporaries for the `data` parameter. We don't ever want to move this param anyway, we will either have a trivially copyable view, or a potentially heavy-weight range that we will create a trivially copyable view of.	2020-01-13 16:53:35 +02:00
Botond Dénes	b52b4d36a2	utils/fragment_range.hh: introduce fragment_range_view A lightweight, trivially copyable and movable view for fragment ranges. Allows for uniform treatment of all kinds of ranges, i.e. treating all of them as a view. Currently `fragment_range.hh` provides lightweight, view-like adaptors for empty and single-fragment ranges (`bytes_view`). To allow code to treat owning multi-fragment ranges the shame way as the former two, we need a view for the latter as well -- this is `fragment_range_view`.	2020-01-13 16:52:59 +02:00
Calle Wilund	75f2b2876b	cdc: Remove free function for mutation augmentation	2020-01-13 13:18:55 +00:00
Calle Wilund	3eda3122af	cdc: Move mutation augment from cql3::modification_statement to storage proxy Using the attached service object	2020-01-13 13:18:55 +00:00
Juliusz Stasiewicz	27dfda0b9e	main/transport: using the infrastructure of system.clients Resolves #4820. Execution path in main.cc now cleans up system.clients table if it exists (this is done on startup). Also, server.cc now calls functions that notify about cql clients connecting/disconnecting.	2020-01-13 14:07:04 +01:00
Pavel Emelyanov	148da64a7e	storage_servce: Move load_broadcaster away This simplifies the storage_service API and fixes the complain about shared_ptr usage instead of unique_ptr. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-13 13:55:09 +03:00
Pavel Emelyanov	b6e1e6df64	misc_services: Introduce load_meter There's a lonely get_load_map() call on storage_service that needs only load broadcaster, always runs on shard 0 and that's it. Next patch will move this whole stuff into its own helper no-shard container and this is preparation for this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-13 13:53:08 +03:00
Gleb Natapov	5753ab7195	lwt: drop invoke_on in paxos_state prepare and accept Since lwt requests are now running on an owning shard there is no longer a need to invoke cross shard call on paxos_state level. RPC calls may still arrive to a wrong shard so we need to make cross shard call there.	2020-01-13 10:26:02 +02:00
Gleb Natapov	d28dd4957b	lwt: Process lwt request on a owning shard LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by transport code that jumps to a correct shard and re-process incoming message there.	2020-01-13 10:26:02 +02:00
Piotr Sarna	3853594108	alternator-test: turn off TLS self-signed verification Two test cases did not ignore TLS self-signed warnings, which are used locally for testing HTTPS. Fixes #5557 Tests(test_health, test_authorization) Message-Id: <8bda759dc1597644c534f94d00853038c2688dd7.1578394444.git.sarna@scylladb.com>	2020-01-10 15:31:30 +02:00
Rafael Ávila de Espíndola	5313828ab8	cql3: Fix indentation Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200109025855.10591-2-espindola@scylladb.com>	2020-01-09 10:42:55 +02:00
Rafael Ávila de Espíndola	4da6dc1a7f	cql3: Change a lambda capture order to match another Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200109025855.10591-1-espindola@scylladb.com>	2020-01-09 10:42:49 +02:00
Avi Kivity	6d454d13ac	db/schema_tables: make gratuitous generic lambdas in do_merge_schema() concrete Those gratuitous lambdas make life harder for IDE users by hiding the actual types from the IDEs. Message-Id: <20200107154746.1918648-1-avi@scylladb.com>	2020-01-08 17:43:18 +01:00
Avi Kivity	454074f284	Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz " The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717 " * tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla: database: Avoid OOMing with flush continuations after failed memtable flush lsa: Introduce operator bool() to occupancy_stats lsa: Expose region_impl::evictable_occupancy in the region class	2020-01-08 16:58:54 +02:00
Gleb Natapov	feed544c5d	paxos: fix truncation time checking during learn stage The comparison is done in millisecons, not microseconds. Fixes #5566 Message-Id: <20200108094927.GN9084@scylladb.com>	2020-01-08 14:37:07 +01:00
Gleb Natapov	2832f1d9eb	storage_service: move start_native_transport into a thread The code runs only once and it is simple if it runs in a seastar thread.	2020-01-08 14:57:57 +02:00
Gleb Natapov	7fb2e8eb9f	transport: change make_result to takes a reference to cql result instead of shared_ptr	2020-01-08 14:57:57 +02:00
Avi Kivity	0bde5906b3	Merge "cql3: detect and handle int overflow in aggregate functions #5537 " from Benny " Fix overflow handling in sum() and avg(). sum: - aggregated into __int128 - detect overflow when computing result and log a warning if found avg: - fix division function to divide the accumulator type _sum (__int128 for integers) by _count Add unit tests for both cases Test: - manual test against Cassandra 3.11.3 to make sure the results in the scylla unit test agree with it. - unit(dev), cql_query_test(debug) Fixes #5536 " * 'cql3-sum-overflow' of https://github.com/bhalevy/scylla: test: cql_query_test: test avg overflow cql3: functions: protect against int overflow in avg test: cql_query_test: test sum overflow cql3: functions: detect and handle int overflow in sum exceptions: sort exception_code definitions exceptions: define additional cassandra CQL exceptions codes	2020-01-08 10:39:38 +02:00
Avi Kivity	d649371baa	Merge "Fix crash on SELECT SUM(udf(...))" from Rafael " We were failing to start a thread when the UDF call was nested in an aggregate function call like SUM. " * 'espindola/fix-sum-of-udf' of https://github.com/espindola/scylla: cql3: Fix indentation cql3: Add missing with_thread_if_needed call cql3: Implement abstract_function_selector::requires_thread remove make_ready_future call	2020-01-08 10:25:42 +02:00
Benny Halevy	dafbd88349	query: initialize read_command timestamp to now This was initialized to api::missing_timestamp but should be set to either a client provided-timestamp or the server's. Unlike write operations, this timestamp need not be unique as the one generated by client_state::get_timestamp. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200108074021.282339-2-bhalevy@scylladb.com>	2020-01-08 10:19:07 +02:00
Benny Halevy	39325cf297	storage_proxy: fix int overflow in service::abstract_read_executor::execute exec->_cmd->read_timestamp may be initialized by default to api::min_timestamp, causing: service/storage_proxy.cc:3328:116: runtime error: signed integer overflow: 1577983890961976 - -9223372036854775808 cannot be represented in type 'long int' Aborting on shard 1. Do not optimize cross-dc repair if read_timestamp is missing (or just negative) We're interested in reads that happen within write_timeout of a write. Fixes #5556 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200108074021.282339-1-bhalevy@scylladb.com>	2020-01-08 10:18:59 +02:00
Raphael S. Carvalho	390c8b9b37	sstables: Move STCS implementation to source file header only implementation potentially create a problem with duplicate symbols Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200107154258.9746-1-raphaelsc@scylladb.com>	2020-01-08 09:55:35 +02:00
Benny Halevy	20a0b1a0b6	test: cql_query_test: test avg overflow Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:50:50 +02:00
Benny Halevy	1c81422c1b	cql3: functions: protect against int overflow in avg Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	9053ef90c7	test: cql_query_test: test sum overflow Add unit tests for summing up int's and bigint's with possible handling of overflow. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	e97a111f64	cql3: functions: detect and handle int overflow in sum Detect integer overflow in cql sum functions and throw an error. Note that Cassandra quietly truncates the sum if it doesn't fit in the input type but we rather break compatibility in this case. See https://issues.apache.org/jira/browse/CASSANDRA-4914?focusedCommentId=14158400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14158400 Fixes #5536 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	98260254df	exceptions: sort exception_code definitions Be compatible with Cassandra source. It's easier to maintain this way. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:21 +02:00
Benny Halevy	30d0f1df75	exceptions: define additional cassandra CQL exceptions codes As of `e9da85723a` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:40:57 +02:00
Rafael Ávila de Espíndola	282228b303	cql3: Fix indentation Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola	4316bc2e18	cql3: Add missing with_thread_if_needed call This fixes an assert when doing sum(udf(...)). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola	d301d31de0	cql3: Implement abstract_function_selector::requires_thread Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:24 -08:00
Rafael Ávila de Espíndola	dc9b3b8ff2	remove make_ready_future call Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:10:27 -08:00
Calle Wilund	9f6b22d882	cdc: Assign self to storage proxy object	2020-01-07 12:01:58 +00:00
Calle Wilund	fc5904372b	storage_proxy: Add (optional) cdc service object pointer member The cdc service is assigned from outside, post construction, mainly because of the chickens and eggs in main startup. Would be nice to have it unconditionally, but this is workable.	2020-01-07 12:01:58 +00:00
Calle Wilund	d6003253dd	storage_proxy: Move mutate_counters to private section It is (and shall) only be called from inside storage proxy, and we would like this to be reflected in the interface so our eventual moving of cdc logic into the mutate call chains become easier to verify and comprehend.	2020-01-07 12:01:58 +00:00
Calle Wilund	b6c788fccf	cdc: Add augmentation call to cdc service To eventually replace the free function. Main difference is this is build to both handle batches correctly and to eventually allow hanging cdc object on storage proxy, and caches on the cdc object.	2020-01-07 12:01:58 +00:00
Piotr Sarna	04dc8faec9	test: add a case for multiple base regular columns in view key The test case checks that having two base regular columns in the materialized view key (not obtainable via CQL), still works fine when values are inserted or deleted. If TTL was involved and these columns would have different expiration rules, the case would be more complicated, but it's not possible for a user to reach that case - neither with CQL, nor with alternator.	2020-01-07 12:19:06 +01:00
Piotr Sarna	155a47cc55	view: handle multiple regular base columns in view pk Previous assumption was that there can only be one regular base column in the view key. The assumption is still correct for tables created via CQL, but it's internally possible to create a view with multiple such columns - the new assumption is that if there are multiple columns, they share their liveness. This patch is vital for indexing to work properly on alternator, so it would be best to solve the issue upstream. I strived to leave the existing semantics intact as long as only up to one regular column is part of the materialized view primary key, which is the case for Scylla's materialized views. For alternator it may not be true, but all regular columns in alternator share liveness info (since alternator does not support per-column TTL), which is sufficient to compute view updates in a consistent way. Fixes #5006 Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo) Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>	2020-01-07 12:18:39 +01:00
Avi Kivity	6e0a073b2e	mutation_partition: use type-aware printing of the clustering row Now that position_in_partition_view has type-aware printing, use it to provide a human readable version of clustering keys. Message-Id: <20191231151315.602559-2-avi@scylladb.com>	2020-01-07 12:17:11 +01:00
Avi Kivity	488c42408a	position_in_partition_view: add type-aware printer If the position_in_partition_view represents a clustering key, we can now see it with the clustering key decoded according to the schema. Message-Id: <20191231151315.602559-1-avi@scylladb.com>	2020-01-07 12:15:09 +01:00
Piotr Sarna	54315f89cd	db,view: fix checking if partition key is empty Previous implementation did not take into account that a column in a partition key might exist in a mutation, but in a DEAD state - if it's deleted. There are no regressions for CQL, while for alternator and its capability of having two regular base columns in a view key, this additional check must be performed.	2020-01-07 12:05:36 +01:00
Avi Kivity	3a3c20d337	schema_tables: de-templatize diff_table_or_view() This reduces code bloat and makes the code friendlier for IDEs, as the IDE now understands the type of create_schema. Message-Id: <20191231134803.591190-1-avi@scylladb.com>	2020-01-07 11:56:54 +01:00
Avi Kivity	e5e42672f5	sstables: reduce bloat from sstables::write_simple() sstables::write_simple() has quite a lot of boilerplate which gets replicated into each template instance. Move all of that into a non-template do_write_simple(), leaving only things that truly depend on the component being written in the template, and encapsulating them with a noncopyable_function. An explicit template instantiation was added, since this is used in a header file. Before, it likely worked by accident and stopped working when the template became small enough to inline. Tests: unit (dev) Message-Id: <20200106135453.1634311-1-avi@scylladb.com>	2020-01-07 11:56:11 +01:00
Avi Kivity	8f7f56d6a0	schema_tables: make gratuitous generic lambda in create_tables_from_partitions() concrete The generic lambda made IDE searches for create_table_from_table_row() fail. Message-Id: <20191231135210.591972-1-avi@scylladb.com>	2020-01-07 11:49:10 +01:00
Avi Kivity	92fd83d3af	schema_tables: make gratuitoous generic lambda in create_table_from_name() concrete The lambda made IDE searches for read_table_mutations fail. Message-Id: <20191231135103.591741-1-avi@scylladb.com>	2020-01-07 11:48:56 +01:00
Avi Kivity	dd6dd97df9	schema_tables: make gratuitous generic lambda in merge_tables_and_views() concrete The generic lambda made IDE searches for create_table_from_mutations fail. Message-Id: <20191231135059.591681-1-avi@scylladb.com>	2020-01-07 11:48:39 +01:00
Avi Kivity	c63cf02745	canonical_mutation: add pretty printing Add type-aware printing of canonical_mutation objects.	2020-01-07 12:06:31 +02:00
Avi Kivity	e093121687	mutation_partition_view: add virtual visitor mutation_partition_view now supports a compile-time resolved visitor. This is performant but results in bloat when the performance is not needed. Furthermore, the template function that applies the object to the visitor is private and out-of-line, to reduce compile time. To allow visitation on mutation_partition_view objects, add a virtual visitor type and a non-template accept function. Note: mutation_partition_visitor is very similar to the new type, but different enough to break the template visitor which is used to implement the new visitor. The new visitor will be used to implement pretty printing for canonical_mutation.	2020-01-07 12:06:31 +02:00
Avi Kivity	75d9909b27	collection_mutation_view: add type-aware pretty printer Add a way for the user to associate a type with a collection_mutation_view and get a nice printout.	2020-01-07 12:06:29 +02:00
Rafael Ávila de Espíndola	b80852c447	main: Explicitly allow scylla core dumps I have not looked into the security reason for disabling it when a program has file capabilities. Fixes #5560 [avi: remove extraneous semicolon] Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200106231836.99052-1-espindola@scylladb.com>	2020-01-07 11:15:59 +02:00
Rafael Ávila de Espíndola	07f1cb53ea	tests: run with ASAN_OPTIONS='disable_coredump=0:abort_on_error=1' These are the same options we use in seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200107001513.122238-1-espindola@scylladb.com>	2020-01-07 11:11:49 +02:00
Takuya ASADA	238a25a0f4	docker: fix typo of scylla-jmx script path (#5551 ) The path should /opt/scylladb/jmx, not /opt/scylladb/scripts/jmx. Fixes #5542	2020-01-07 10:54:16 +02:00
Asias He	401854dbaf	repair: Avoid duplicated partition_end write Consider this: 1) Write partition_start of p1 2) Write clustering_row of p1 3) Write partition_end of p1 4) Repair is stopped due to error before writing partition_start of p2 5) Repair calls repair_row_level_stop() to tear down which calls wait_for_writer_done(). A duplicate partition_end is written. To fix, track the partition_start and partition_end written, avoid unpaired writes. Backports: 3.1 and 3.2 Fixes: #5527	2020-01-06 14:06:02 +02:00
Eliran Sinvani	e64445d7e5	debian-reloc: Propagate PRODUCT variable to renaming command in debian pkg commit `21dec3881c` introduced a bug that will cause scylla debian build to fail. This is because the commit relied on the environment PRODUCT variable to be exported (and as a result, to propogate to the rename command that is executed by find in a subshell) This commit fixes it by explicitly passing the PRODUCT variable into the rename command. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20200106102229.24769-1-eliransin@scylladb.com>	2020-01-06 12:31:58 +02:00
Asias He	38d4015619	gossiper: Remove HIBERNATE status from dead state In scylla, the replacing node is set as HIBERNATE status. It is the only place we use HIBERNATE status. The replacing node is supposed to be alive and updating its heartbeat, so it is not supposed to be in dead state. This patch fixes the following problem in replacing. 1) start n1, n2 2) n2 is down 3) start n3 to replace n2, but kill n3 in the middle of the replace 4) start n4 to replace n2 After step 3 and step 4, the old n3 will stay in gossip forever until a full cluster shutdown. Note n3 will only stay in gossip but in system.peers table. User will see the annoying and infinite logs like on all the nodes rpc - client $ip_of_n3:7000: fail to connect: Connection refused Fixes: #5449 Tests: replace_address_test.py + manual test	2020-01-06 11:47:31 +02:00
Amos Kong	c5ec1e3ddc	scylla_ntp_setup: check redhat variant version by prase_version (#5434 ) VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7" scylla_ntp_setup doesn't work on OEL7.7 for ValueError. - ValueError: invalid literal for int() with base 10: '7.7' This patch changed redhat_version() to return version string, and compare with parse_version(). Fixes #5433 Signed-off-by: Amos Kong <amos@scylladb.com>	2020-01-06 11:43:14 +02:00
Asias He	145fd0313a	streaming: Fix map access in stream_manager::get_progress When the progress is queried, e.g., query from nodetool netstats the progress info might not be updated yet. Fix it by checking before access the map to avoid errors like: std::out_of_range (_Map_base::at) Fixes: #5437 Tests: nodetool_additional_test.py:TestNodetool.netstats_test	2020-01-06 10:31:15 +02:00
Rafael Ávila de Espíndola	98cd8eddeb	tests: Run with halt_on_error=1:abort_on_error=1 This depends on the just emailed fixes to undefined behavior in tests. With this change we should quickly notice if a change introduces undefined behavior. Fixes #4054 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191230222646.89628-1-espindola@scylladb.com>	2020-01-05 17:20:31 +02:00
Rafael Ávila de Espíndola	dc5ecc9630	enum_option_test: Add explicit underlying types to enums We expect to be able to create variables with out of range values, so these enums needs explicit underlying types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200102173422.68704-1-espindola@scylladb.com>	2020-01-05 17:20:31 +02:00
Nadav Har'El	f0d8dd4094	merge: CDC rolling upgrade Merged pull request https://github.com/scylladb/scylla/pull/5538 from Avi Kivity and Piotr Jastrzębski. This series prepares CDC for rolling upgrade. This consists of reducing the footprint of cdc, when disabled, on the schema, adding a cluster feature, and redacting the cdc column when transferring it to other nodes. The latter is needed because we'll want to backport this to 3.2, which doesn't have canonical_mutations yet.	2020-01-05 17:13:12 +02:00
Gleb Natapov	720c0aa285	commitlog: update last sync timestamp when cycle a buffer If in memory buffer has not enough space for incoming mutation it is written into a file, but the code missed updating timestamp of a last sync, so we may sync to often. Message-Id: <20200102155049.21291-9-gleb@scylladb.com>	2020-01-05 16:13:59 +02:00
Gleb Natapov	14746e4218	commitlog: drop segment gate The code that enters the gate never defers before leaving, so the gate behaves like a flag. Lets use existing flag to prohibit adding data to a closed segment. Message-Id: <20200102155049.21291-8-gleb@scylladb.com>	2020-01-05 16:13:59 +02:00
Gleb Natapov	f8c8a5bd1f	test: fix error reporting in commitlog_test Message-Id: <20200102155049.21291-7-gleb@scylladb.com>	2020-01-05 16:13:58 +02:00
Gleb Natapov	680330ae70	commitlog: introduce segment::close() function. Currently segment closing code is spread over several functions and activated based on the _closed flag. Make segment closing explicit by moving all the code into close() function and call it where _closed flag is set. Message-Id: <20200102155049.21291-6-gleb@scylladb.com>	2020-01-05 16:13:55 +02:00
Gleb Natapov	a1ae08bb63	commitlog: remove unused segment::flush() parameter Message-Id: <20200102155049.21291-5-gleb@scylladb.com>	2020-01-05 16:13:55 +02:00
Gleb Natapov	1e15e1ef44	commitlog: cleanup segment sync() Call cycle() only once. Message-Id: <20200102155049.21291-4-gleb@scylladb.com>	2020-01-05 16:13:54 +02:00
Gleb Natapov	3d3d2c572e	commitlog: move segment shutdown code from sync() Currently sync() does two completely different things based on the shutdown parameter. Separate code into two different function. Message-Id: <20200102155049.21291-3-gleb@scylladb.com>	2020-01-05 16:13:54 +02:00
Gleb Natapov	89afb92b28	commitlog: drop superfluous this Message-Id: <20200102155049.21291-2-gleb@scylladb.com>	2020-01-05 16:13:53 +02:00
Piotr Jastrzebski	95feeece0b	scylla_tables: treat empty cdc props as disabled Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	396e35bf20	cdc: add schema_change test for cdc_options The original "test_schema_digest_does_not_change" test case ensures that schema digests will match for older nodes that do not support all the features yet (including computed columns). The additional case uses sstables generated after CDC was enabled and a table with CDC enabled is created, in order to make sure that the digest computed including CDC column does not change spuriously as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	c08e6985cd	cdc: allow cluster rolling upgrade Addition of cdc column in scylla_tables changes how schema digests are calculated, and affect the ABI of schema update messages (adding a column changes other columns' indexes in frozen_mutation). To fix this, extend the schema_tables mechanism with support for the cdc column, and adjust schemas and mutations to remove that column when sending schemas during upgrade. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	caa0a4e154	tests: disable CDC in schema_change_tests Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	129af99b94	cdc: Return reference from cluster_supports_cdc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	4639989964	cdc: Add CDC_OPTIONS schema_feature Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Avi Kivity	c150f2e5d7	schema_tables, cdc: don't store empty cdc columns in scylla_tables An empty cdc column in scylla_tables is hashed differently from a missing column. This causes schema mismatch when a schema is propagated to another node, because the other node will redact the schema column completely if the cluster feature isn't enabled, and an empty value is hashed differently from a missing value. Store a tombstone instead. Tombstones are removed before digesting, so they don't affect the outcome. This change also undoes the changes in `386221da84` ("schema_tables: handle 'cdc' options") to schema_change_test test_merging_does_not_alter_tables_which_didnt_change. That change enshrined the breakage into the test, instead of fixing the root cause, which was that we added an an extra mutation to the schema (for cdc options, which were disabled).	2020-01-05 14:36:18 +02:00
Rafael Ávila de Espíndola	3d641d4062	lua: Use existing cpp_int cast logic Different versions of boost have different rules for what conversions from cpp_int to smaller intergers are allowed. We already had a function that worked with all supported versions, but it was not being use by lua. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200104041028.215153-1-espindola@scylladb.com>	2020-01-05 12:10:54 +02:00
Rafael Ávila de Espíndola	88b5aadb05	tests: cql_test_env: wait for two futures starting internal services I noticed this while looking at the crashes next is currently experiencing. While I have no idea if this fixes the issue, it does avoid broken future warnings (for no_sharded_instance_exception) in a debug build. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200103201540.65324-1-espindola@scylladb.com>	2020-01-05 12:09:59 +02:00
Avi Kivity	4b8e2f5003	Update seastar submodule * seastar 0525bbb08...36cf5c5ff (6): > memcached: Fix use after free in shutdown > Revert "task: stop wrapping tasks with unique_ptr" > task: stop wrapping tasks with unique_ptr > http: Change exception formating to the generic seastar one > Merge "Avoid a few calls to ~exception_ptr" from Rafael > tests: fix core generation with asan	2020-01-03 15:48:53 +02:00
Nadav Har'El	44c2a44b54	alternator-test: test for ConditionExpression feature This patch adds a very comprehensive test for the ConditionExpression feature, i.e., the newer syntax of conditional writes replacing the old-style "Expected" - for the UpdateItem, PutItem and DeleteItem operations. I wrote these tests while closely following the DynamoDB ConditionExpression documentation, and attempted to cover all conceivable features, subfeatures and subcases of the ConditionExpression syntax - to serve as a test for a future support for this feature in Alternator (see issue #5053). As usual, all these tests pass on AWS DynamoDB, but because we haven't yet implemented this feature in Alternator, all but one xfail on Alternator. Refs #5053. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191229143556.24002-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	aad5eeab51	alternator: better error messages when Alternator port is taken If Alternator is requested to be enabled on a specific port but the port is already taken, the boot fails as expected - but the error log is confusing; It currently looks something like this: WARN 2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) ... (many more messages about the server shutting down) INFO 2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) There are two problems here. First, the "WARN" should really be an "ERROR", because it causes the server to be shut down and the user must see this error. Second, the final line in the log, something the user is likely to see first, contains only the ultimate cause for the exception (an address already in use) but not the information what this address was needed for. This patch solves both issues, and the log now looks like: ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system :98, posix_listen failed for address 0.0.0.0:8000: Address already in use) ... INFO 2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191224124127.7093-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	1f64a3bbc9	alternator: error on unsupported ReturnValues option We don't support yet the ReturnValues option on PutItem, UpdateItem or DeleteItem operations (see issue #5053), but if a user tries to use such an option anyway, we silently ignore this option. It's better to fail, reporting the unsupported option. In this patch we check the ReturnValues option and if it is anything but the supported default ("NONE"), we report an error. Also added a test to confirm this fix. The test verifies that "NONE" is allowed, and something which is unsupported (e.g., "DOG") is not ignored but rather causes an error. Refs #5053. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191216193310.20060-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola	dc93228b66	reloc: Turn the default flags into common flags These are flags we always want to enable. In particular, we want them to be used by the bots, but the bots run this script with --configure-flags, so they were being discarded. We put the user option later so that they can override the common options. Fixes #5505 Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Takuya ASADA <syuu@scylladb.com> Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola	d4dfb6ff84	build-id: Handle the binary having multiple PT_NOTE headers There is no requirement that all notes be placed in a single PT_NOTE. It looks like recent lld's actually put each section in its own PT_NOTE. This change looks for build-id in all PT_NOTE headers. Fixes #5525 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191227000311.421843-1-espindola@scylladb.com>	2020-01-03 15:48:20 +02:00
Avi Kivity	1e9237d814	dist: redhat: use parallel compression for rpm payload rpm compression uses xz, which is painfully slow. Adjust the compression settings to run on all threads. The xz utility documentation suggests that 0 threads is equivalent to all CPUs, but apparently the library interface (which rpmbuild uses) doesn't think the same way. Message-Id: <20200101141544.1054176-1-avi@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	de1171181c	user defined types: fix support for case-sensitive type names In the current code, support for case-sensitive (quoted) user-defined type names is broken. For example, a test doing: CREATE TYPE "PHone" (country_code int, number text) CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk)) Fails - the first line creates the type with the case-sensitive name PHone, but the second line wrongly ends up looking for the lowercased name phone, and fails with an exception "Unknown type ks.phone". The problem is in cql3_type_name_impl. This class is used to convert a type object into its proper CQL syntax - for example frozen<list<int>>. The problem is that for a user-defined type, we forgot to quote its name if not lowercase, and the result is wrong CQL; For example, a list of PHone will be written as list<PHone> - but this is wrong because the CQL parser, when it sees this expression, lowercases the unquoted type name PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>. The solution is for cql3_type_name_impl to use for a user-defined type its get_name_as_cql_string() method instead of get_name_as_string(). get_name_as_cql_string() is a new method which prints the name of the user type as it should be in a CQL expression, i.e., quoted if necessary. The bug in the above test was apparently caused when our code serialized the type name to disk as the string PHone (without any quoting), and then later deserialized it using the CQL type parser, which converted it into a lowercase phone. With this patch, the type's name is serialized as "PHone", with the quotes, and deserialized properly as the type PHone. While the extra quotes may seem excessive, they are necessary for the correct CQL type expression - remember that the type expression may be significantly more complex, e.g., frozen<list<"PHone">> and all of this, including the quotes, is necessary for our parser to be able to translate this string back into a type object. This patch may cause breakage to existing databases which used case- sensitive user-defined types, but I argue that these use cases were already broken (as demonstrated by this test) so we won't break anything that actually worked before. Fixes #5544 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200101160805.15847-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Pavel Emelyanov	34f8762c4d	storage_service: Drop _update_jobs This field is write-only. Leftover from `83ffae1` (storage_service: Drop block_until_update_pending_ranges_finished) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191226091210.20966-1-xemul@scylladb.com>	2020-01-03 15:48:20 +02:00
Pavel Emelyanov	f2b20e7083	cache_hitrate_calculator: Do not reinvent the peering_sharded_service The class in question wants to run its own instances on different shards, for this sake it keeps reference on sharded self to call invoke_on() on. There's a handy peering_sharded_service<> in seastar for the same, using it makes the code nicer and shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191226112401.23960-1-xemul@scylladb.com>	2020-01-03 15:48:19 +02:00
Rafael Ávila de Espíndola	bbed9cac35	cql3: move function creation to a .cc file We had a lot of code in a .hh file, that while using templeates, was only used from creating functions during startup. This moves it to a new .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200101002158.246736-1-espindola@scylladb.com>	2020-01-03 15:48:19 +02:00
Benny Halevy	c0883407fe	scripts: Add cpp-name-format: pretty printer Pretty-print cpp-names, useful for deciphering complex backtraces. For example, the following line: service::storage_proxy::init_messaging_service()::{lambda(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>)#1}::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360 Is formatted as: service::storage_proxy::init_messaging_service()::{ lambda( seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector< frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info> )#1 }::operator()( seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector< frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info> ) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191226142212.37260-1-bhalevy@scylladb.com>	2020-01-01 12:08:12 +02:00
Rafael Ávila de Espíndola	75817d1fe7	sstable: Add checks to help track problems with large_data_handler use after free I can't quite figure out how we were trying to write a sstable with the large data handler already stopped, but the backtrace suggests a good place to add extra checks. This patch adds two check. One at the start and one at the end of sstable::write_components. The first one should give us better backtraces if the large_data_handler is already stopped. The second one should help catch some race condition. Refs: #5470 Message-Id: <20191231173237.19040-1-espindola@scylladb.com>	2020-01-01 12:03:31 +02:00
Rafael Ávila de Espíndola	3c34e2f585	types: Avoid an unaligned load in json integer serialization The patch also adds a test that makes the fixed issue easier to reproduce. Fixes #5413 Message-Id: <20191231171406.15980-1-espindola@scylladb.com>	2019-12-31 19:23:42 +02:00
Gleb Natapov	bae5cb9f37	commitlog: remove unused argument during segment creation Since `99a5a77234` all segments are created equal and "active" argument is never true, so drop it. Message-Id: <20191231150639.GR9084@scylladb.com>	2019-12-31 17:14:03 +02:00
Rafael Ávila de Espíndola	aa535a385d	enum_option_test: Add an explicit underlying type to an enum We expect to be able to create a variable with an out of range value, so the enum needs an explicit underlying type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191230222029.88942-1-espindola@scylladb.com>	2019-12-31 16:59:00 +02:00
Nadav Har'El	48a914c291	Fix uninitialized members Merged pull request https://github.com/scylladb/scylla/pull/5532 from Benny Halevy: Initialize bool members in row_level_repair and _storage_service causing ubsan errors. Fixes #5531	2019-12-31 10:32:54 +02:00
Takuya ASADA	aa87169670	dist/debian: add procps on Depends We require procps package to use sysctl on postinst script for scylla-kernel-conf. Fixes #5494 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191218234100.37844-1-syuu@scylladb.com>	2019-12-30 19:30:35 +02:00
Avi Kivity	972127e3a8	atomic_cell: add type-aware pretty printing The standard printer for atomic_cell prints the value as hex, because atomic_cell does not include the type. Add a type-aware printer that allows the user to provide the type.	2019-12-30 18:27:04 +02:00
Avi Kivity	19f68412ad	atomic_cell: move pretty printers from database.cc to atomic_cell.cc atomic_cell.cc is the logical home for atomic_cell pretty printers, and since we plan to add more pretty printers, start by tidying up.	2019-12-30 18:20:30 +02:00
Eliran Sinvani	21dec3881c	debian-reloc: rename buld product to the name specified in SCYLLA-VERSION-GEN When the product name is other than "scylla", the debian packaging scripts go over all files that starts with "scylla-" and change the prefix to be the actual product name. However, if there are no such files in the directory the script will fail since the renaming command will get the wildcard string instrad of an actual file name. This patch replaces the command with a command with an equivalent desired effect that only operates on files if there are any. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20191230143250.18101-1-eliransin@scylladb.com>	2019-12-30 17:45:50 +02:00
Takuya ASADA	263385cb4b	dist: stop replacing /usr/lib/scylla with symlink (#5530 ) Since we merged /usr/lib/scylla with /opt/scylladb, we removed /usr/lib/scylla and replace it with the symlink point to /opt/scylladb. However, RPM does not support replacing a directory with a symlink, we are doing some dirty hack using RPM scriptlet, but it causes multiple issues on upgrade/downgrade. (See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/) To minimize Scylla upgrading/downgrade issues on user side, it's better to keep /usr/lib/scylla directory. Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb, we can create symlinks for each setup scripts like /usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>. Fixes #5522 Fixes #4585 Fixes #4611	2019-12-30 13:52:24 +02:00
Hagit Segev	9d454b7dc6	reloc/build_rpm.sh: Fix '--builddir' option handling (#5519 ) The '--builddir' option value is assigned to the "builddir" variable, which is wrong. The correct variable is "BUILDDIR" so use that instead to fix the '--builddir' option. Also, add logging to the script when executing the "dist/redhat_build.rpm.sh" script to simplify debugging.	2019-12-30 13:25:22 +02:00
Benny Halevy	8aa5d84dd8	storage_service: initialize _is_bootstrap_mode Hit the following ubsan error with bootstrap_test:TestBootstrap.manual_bootstrap_test in debug mode: service/storage_service.cc:3519:37: runtime error: load of value 190, which is not a valid value for type 'bool' The use site is: service::storage_service::is_cleanup_allowed(seastar::basic_sstring<char, unsigned int, 15u, true>)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const at /local/home/bhalevy/dev/scylla/service/storage_service.cc:3519 While at it, initialize `_initialized` to false as well, just in case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-30 11:44:58 +02:00
Benny Halevy	474ffb6e54	repair: initialize row_level_repair: _zero_rows Avoid following UBSAN error: repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool' Fixes #5531 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-30 11:44:58 +02:00
Fabiano Lucchese	d7795b1efa	scylla_setup: Support for enforcing optimal Linux clocksource setting (#5499 ) A Linux machine typically has multiple clocksources with distinct performances. Setting a high-performant clocksource might result in better performance for ScyllaDB, so this should be considered whenever starting it up. This patch introduces the possibility of enforcing optimized Linux clocksource to Scylla's setup/start-up processes. It does so by adding an interactive question about enforcing clocksource setting to scylla_setup, which modifies the parameter "CLOCKSOURCE" in scylla_server configuration file. This parameter is read by perftune.py which, if set to "yes", proceeds to (non persistently) setting the clocksource. On x86, TSC clocksource is used. Fixes #4474 Fixes #5474 Fixes #5480	2019-12-30 10:54:14 +02:00
Avi Kivity	e223154268	cdc: options: return an empty options map when cdc is disabled This is compatible with 3.1 and below, which didn't have that schema field at all.	2019-12-29 16:34:37 +02:00
Benny Halevy	27e0aee358	docs/debugging.md: fix anchor links Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191229074136.13516-1-bhalevy@scylladb.com>	2019-12-29 16:26:26 +02:00
Pavel Solodovnikov	aba9a11ff0	cql: pass variable_specifications via lw_shared_ptr Instances of `variable_specifications` are passed around as shared_ptr's, which are redundant in this case since the class is marked as `final`. Use `lw_shared_ptr` instead since we know for sure it's not a polymorphic pointer. Tests: unit(debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>	2019-12-29 16:26:26 +02:00
Benny Halevy	4c884908bb	directories: Keep a unique set of directories to initialize If any two directories of data/commitlog/hints/view_hints are the same we still end up running verify_owner_and_mode and disk_sanity(check_direct_io_support) in parallel on the same directoriea and hit #5510. This change uses std::set rather than std::vector to collect a unique set of directories that need initialization. Fixes #5510 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>	2019-12-29 16:26:26 +02:00
Gleb Natapov	60a851d3a5	commitlog: always flush segments atomically with writing db::commitlog::segment::batch_cycle() assumes that after a write for a certain position completes (as reported by _pending_ops.wait_for_pending()) it will also be flushed, but this is true only if writing and flushing are atomic wrt _pending_ops lock. It usually is unless flush_after is set to false when cycle() is called. In this case only writing is done under the lock. This is exactly what happens when a segment is closed. Flush is skipped because zero header is added after the last entry and then flushed, but this optimization breaks batch_cycle() assumption. Fix it by flushing after the write atomically even if a segment is being closed. Fixes #5496 Message-Id: <20191224115814.GA6398@scylladb.com>	2019-12-24 14:52:23 +02:00
Pavel Emelyanov	a5cdfea799	directories: Do not mess with per-shard base dir The hints and view_hints directory has per-shard sub-dirs, and the directories code tries to create, check and lock all of them, including the base one. The manipulations in question are excessive -- it's enough to check and lock either the base dir, or all the per-shard ones, but not everything. Let's take the latter approach for its simplicity. Fixes #5510 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Looks-good-to: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223142429.28448-1-xemul@scylladb.com>	2019-12-24 14:49:28 +02:00
Benny Halevy	f8f5db42ca	dbuild: try to pull image if not present locally Pekka Enberg <penberg@scylladb.com> wrote: > Image might not be present, but the subsequent "docker run" command will automatically pull it. Just letting "docker run" fail produces kinda confusing error message, referring to docker help, but the we want to provide the user with our own help, so still fail early, just also try to pull the image if "docker image inspect" failed, indicating it's not present locally. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-4-bhalevy@scylladb.com>	2019-12-24 11:13:23 +02:00
Benny Halevy	ee2f97680a	dbuild: just die when no image-id is provided Suggested-by: Pekka Enberg <penberg@scylladb.com> > This will print all the available Docker images, > many (most?) of them completely unrelated. > Why not just print an error saying that no image was specified, > and then perhaps print usage. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-3-bhalevy@scylladb.com>	2019-12-24 11:13:22 +02:00
Benny Halevy	87b2f189f7	dbuild: s/usage/die/ Suggested-by: Dejan Mircevski <dejan@scylladb.com> > The use pattern of this function strongly suggests a name like `die`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-2-bhalevy@scylladb.com>	2019-12-24 11:13:21 +02:00
Benny Halevy	718e9eb341	table: move_sstables_from_staging: fix use after free of shared_sstable Introduced in `4b3243f5b9` Reproducible with materialized_views_test:TestMaterializedViews.mv_populating_from_existing_data_during_node_remove_test and read_amplification_test:ReadAmplificationTest.no_read_amplification_on_repair_with_mv_test ==955382==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200023de18 at pc 0x00000051d788 bp 0x7f8a0563fcc0 sp 0x7f8a0563fcb0 READ of size 8 at 0x60200023de18 thread T1 (reactor-1) #0 0x51d787 in seastar::lw_shared_ptr<sstables::sstable>::lw_shared_ptr(seastar::lw_shared_ptr<sstables::sstable> const&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:289 #1 0x10ba189 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530 #2 0x109c4f1 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1556 #3 0x106941a in do_for_each<__gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >, table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda( std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:618 #4 0x1069203 in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:626 #5 0x10ba589 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36 #6 0x10ba668 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging (std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44 #7 0x10ba7c0 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging (std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563 ... 0x60200023de18 is located 8 bytes inside of 16-byte region [0x60200023de10,0x60200023de20) freed by thread T1 (reactor-1) here: #0 0x7f8a153b796f in operator delete(void) (/lib64/libasan.so.5+0x11096f) #1 0x6ab4d1 in __gnu_cxx::new_allocator<seastar::lw_shared_ptr<sstables::sstable> >::deallocate(seastar::lw_shared_ptr<sstables::sstable>, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128 #2 0x612052 in std::allocator_traits<std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::deallocate(std::allocator<seastar::lw_shared_ptr<sstables::sstable> >&, seastar::lw_shared_ptr<sstables::sstable>, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470 #3 0x58fdfb in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::_M_deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351 #4 0x52a790 in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332 #5 0x52a99b in std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680 #6 0xff60fa in ~<lambda> /local/home/bhalevy/dev/scylla/table.cc:2477 #7 0xff7202 in operator() /local/home/bhalevy/dev/scylla/table.cc:2496 #8 0x106af5b in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1573 #9 0x102f5d5 in futurize_apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1645 #10 0x102f9ee in operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/semaphore.hh:488 #11 0x109d2f1 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36 #12 0x109d42c in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44 #13 0x109d595 in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563 ... Fixes #5511 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191222214326.1229714-1-bhalevy@scylladb.com>	2019-12-23 15:20:41 +02:00
Konstantin Osipov	476fbc60be	test.py: prepare to remove custom colors Add dbuild dependency on python3-colorama, which will be used in test.py instead of a hand-made palette. [avi: update tools/toolchain/image] Message-Id: <20191223125251.92064-2-kostja@scylladb.com>	2019-12-23 15:13:22 +02:00
Pavel Emelyanov	d361894b9d	batchlog_manager: Speed up token_metadata endpoints counting a bit In this place we only need to know the number of endpoints, while current code additionally shuffles them before counting. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:45 +02:00
Pavel Emelyanov	6e06c88b4c	token_metadata: Remove unused helper There are two _identical_ methods in token_metadata class: get_all_endpoints_count() and number_of_endpoints(). The former one is used (called) the latter one is not used, so let's remove it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:43 +02:00
Pavel Emelyanov	2662d9c596	migration_manager: Remove run_may_throw() first argument It's unused in this function. Also this helps getting rid of global instances of components. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:42 +02:00
Pavel Emelyanov	703b16516a	storage_service: Remove unused helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:41 +02:00
Takuya ASADA	e0071b1756	reloc: don't archive dist/ami/files/.rpm on relocatable package We should skip archiving dist/ami/files/.rpm on relocatable package, since it doesn't used. Also packer and variables.json, too. Fixes #5508 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191223121044.163861-1-syuu@scylladb.com>	2019-12-23 14:19:51 +02:00
Tomasz Grabiec	28dec80342	db/schema_tables: Add trace-level logging of schema digesting This greatly helps to narrow down the source of schema digest mismatch between nodes. Intented use is to enable this logger on disagreeing nodes and trigger schema digest recalculation and observe which mutations differ in digest and then examine their content. Message-Id: <1574872791-27634-1-git-send-email-tgrabiec@scylladb.com>	2019-12-23 12:28:22 +02:00
Konstantin Osipov	1116700bc9	test.py: do not return 0 if there are failed tests Fix a return value regression introduced when switching to asyncio. Message-Id: <20191222134706.16616-2-kostja@scylladb.com>	2019-12-22 16:14:32 +02:00
Asias He	7322b749e0	repair: Do not return working_row_buf_nr in get combined row hash verb In commit `b463d7039c` (repair: Introduce get_combined_row_hash_response), working_row_buf_nr is returned in REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is scheduled to be part of 3.1 release. However it is not backported to 3.1 by accident. In order to be compatible between 3.1 and 3.2 repair. We need to drop the working_row_buf_nr in 3.2 release. Fixes: #5490 Backports: 3.2 Tests: Run repair in a mixed 3.1 and 3.2 cluster	2019-12-21 20:13:15 +02:00
Takuya ASADA	8eaecc5ed6	dist/common/scripts/scylla_setup: add swap existance check Show warnings when no swap is configured on the node. Closes #2511 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191220080222.46607-1-syuu@scylladb.com>	2019-12-21 20:03:58 +02:00
Pavel Solodovnikov	5a15bed569	cql3: return `result_set` by cref in `cql3::result::result_set` Changes summary: * make `cql3::result_set` movable-only * change signature of `cql3::result::result_set` to return by cref * adjust available call sites to the aforementioned method to accept cref Motivation behind this change is elimination of dangerous API, which can easily set a trap for developers who don't expect that result_set would be returned by value. There is no point in copying the `result_set` around, so make `cql3::result::result_set` to cache `result_set` internally in a `unique_ptr` member variable and return a const reference so to minimize unnecessary copies here and there. Tests: unit(debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191220115100.21528-1-pa.solodovnikov@scylladb.com>	2019-12-21 16:56:42 +02:00
Takuya ASADA	3a6cb0ed8c	install.sh: drop limits.d from nonroot mode The file only required for root mode. Fixes #5507 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191220101940.52596-1-syuu@scylladb.com>	2019-12-21 15:26:08 +02:00
Botond Dénes	08bb0bd6aa	mutation_fragment_stream_validator: wrap exceptions into own exception type So a higher level component using the validator to validate a stream can catch only validation errors, and let any other incidental exception through. This allows building data correctors on top of the `mutation_fragment_stream_validator`, by filtering a fragment stream through a validator, catching invalid fragment stream exceptions and dropping the respective fragments from the stream. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>	2019-12-20 12:05:00 +01:00
Rafael Ávila de Espíndola	91c7f5bf44	Print build-id on startup Fixes #5426 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191218031556.120089-1-espindola@scylladb.com>	2019-12-19 15:43:04 +02:00
Avi Kivity	440ad6abcc	Revert "relocatable: Check that patchelf didn't mangle the PT_LOAD headers" This reverts commit `237ba74743`. While it works for the scylla executable, it fails for iotune, which is built by seastar. It should be reinstated after we pass the correct link parameters to the seastar build system.	2019-12-19 11:20:34 +02:00
Pekka Enberg	c0aea19419	Merge "Add a timeout for housekeeping for offline installs" from Amnon " These series solves an issue with scylla_setup and prevent it from waiting forever if housekeeping cannot look for the new Scylla version. Fixes #5302 It should be backported to versions that support offline installations. " * 'scylla_setup_timeout' of git://github.com/amnonh/scylla: scylla_setup: do not wait forever if no reply is return housekeeping scylla_util.py: Add optional timeout to out function	2019-12-19 08:18:19 +02:00
Rafael Ávila de Espíndola	8d777b3ad5	relocatable: Use a super long path for the dynamic linker Having a long path allows patchelf to change the interpreter without changing the PT_LOAD headers and therefore without moving the build-id out of the first page. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191213224803.316783-1-espindola@scylladb.com>	2019-12-18 19:10:59 +02:00
Pavel Solodovnikov	c451f6d82a	LWT: Fix required participants calculation for LOCAL_SERIAL CL Suppose we have a multi-dc setup (e.g. 9 nodes distributed across 3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]). When a query that uses LWT is executed with LOCAL_SERIAL consistency level, the `storage_proxy::get_paxos_participants` function incorrectly calculates the number of required participants to serve the query. In the example above it's calculated to be 5 (i.e. the number of nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL, which is equivalent to LOCAL_QUORUM cl in this case). This behavior results in an exception being thrown when executing the following query with LOCAL_SERIAL cl: INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'} Tests: unit(dev), dtest(consistency_test.py) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>	2019-12-18 16:58:32 +01:00
Botond Dénes	cd6bf3cb28	scylla-gdb.py: static_vector: update for changed storage The actual buffer is now in a member called 'data'. Leave the old `dummy.dummy` and `dummy` as fall-back. This seems to change every Fedora release. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191218153544.511421-1-bdenes@scylladb.com>	2019-12-18 17:39:56 +02:00
Tomasz Grabiec	5865d08d6c	migration_manager: Recalculate schema only on shard 0 Schema is node-global, update_schema_version_and_announce() updates all shards. We don't need to recalculate it from every shard, so install the listeners only on shard 0. Reduces noise in the logs. Message-Id: <1574872860-27899-1-git-send-email-tgrabiec@scylladb.com>	2019-12-18 16:43:26 +02:00
Pavel Emelyanov	998f51579a	storage_service: Rip join_ring config option The option in question apparently does not work, several sharded objects are start()-ed (and thus instanciated) in join_roken_ring, while instances themselves of these objects are used during init of other stuff. This leads to broken seastar local_is_initialized assertion on sys_dist_ks, but reading the code shows more examples, e.g. the auth_service is started on join, but is used for thrift and cql servers initialization. The suggestion is to remove the option instead of fixing. The is_joined logic is kept since on-start joining still can take some time and it's safer to report real status from the API. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191203140717.14521-1-xemul@scylladb.com>	2019-12-18 12:45:13 +02:00
Nadav Har'El	8157f530f5	merge: CDC: handle schema changes Merged pull request https://github.com/scylladb/scylla/pull/5366 from Calle Wilund: Moves schema creation/alter/drop awareness to use new "before" callbacks from migration manager, and adds/modifies log and streams table as part of the base table modification. Makes schema changes semi-atomic per node. While this does not deal with updates coming in before a schema change has propagated cluster, it now falls into the same pit as when this happens without CDC. Added side effect is also that now schemas are transparent across all subsystems, not just cql. Patches: cdc_test: Add small test for altering base schema (add column) cdc: Handle schema changes via migration manager callbacks migration_manager: Invoke "before" callbacks for table operations migration_listener: Add empty base class and "before" callbacks for tables cql_test_env: Include cdc service in cql tests cdc: Add sharded service that does nothing. cdc: Move "options" to separate header to avoid to much header inclusion cdc: Remove some code from header	2019-12-17 23:04:36 +02:00
Avi Kivity	1157ee16a5	Update seastar submodule * seastar 00da4c8760...0525bbb08f (7): > future: Simplify future_state_base::any move constructor > future: don't create temporary tuple on future::get(). > future: don't instantiate new future on future::then_wrapped(). > future: clean-up the Result handling in then_wrapped(). > Merge "Fix core dumps when asan is enabled" from Rafael > future: Move ignore to the base class > future: Don't delete in ignore	2019-12-17 19:47:50 +02:00
Botond Dénes	638623b56b	configure.py: make build.ninja target depend on SCYLLA-VERSION-GEN Currently `SCYLLA-VERSION-GEN` is not a dependency of any target and hence changes done to it will not be picked up by ninja. To trigger a rebuild and hence version changes to appear in the `scylla` target binary, one has to do `touch configure.py`. This is counter intuitive and frustrating to people who don't know about it and wonder why their changed version is not appearing as the output of `scylla --version`. This patch makes `SCYLLA-VERSION-GEN` a dependency of `build.ninja, making the `build.ninja` target out-of-date whenever `SCYLLA-VERSION-GEN` is changed and hence will trigger a rerun of `configure.py` when the next target is built, allowing a build of e.g. `scylla` to pick up any changes done to the version automatically. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191217123955.404172-1-bdenes@scylladb.com>	2019-12-17 17:40:04 +02:00
Avi Kivity	7152ba0c70	Merge "tests: automatically search for unit tests" from Kostja " This patch set rearranges the test files so that it is now possible to search for tests automatically, and adds this functionality to test.py " * 'test.py.requeue' of ssh://github.com/scylladb/scylla-dev: cmake: update CMakeLists.txt to scan test/ rather than tests/ test.py: automatically lookup all unit and boost tests tests: move all test source files to their new locations tests: move a few remaining headers tests: move another set of headers to the new test layout tests: move .hh files and resources to new locations tests: remove executable property from data_listeners_test.cc	2019-12-17 17:32:18 +02:00
Amnon Heiman	dd42f83013	scylla_setup: do not wait forever if no reply is return housekeeping When scylla is installed without a network connectivity, the test if a newer version is available can cause scylla_setup to wait forever. This patch adds a limit to the time scylla_setup will wait for a reply. When there is no reply, the relevent error will be shown that it was unable to check for newer version, but this will not block the setup script. Fixes #5302 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-12-17 14:56:47 +02:00
Nadav Har'El	aa1de5a171	merge: Synchronize snapshot and staging sstable deletion using sem Merged pull request https://github.com/scylladb/scylla/pull/5343 from Benny Halevy. Fixes #5340 Hold the sstable_deletion_sem table::move_sstables_from_subdirs to serialize access to the staging directory. It now synchronizes snapshot, compaction deletion of sstables, and view_update_generator moving of sstables from staging. Tests: unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master] snapshot_test.py (dev)	2019-12-17 14:06:02 +02:00
Juliusz Stasiewicz	7fdc8563bf	system_keyspace: Added infrastructure for table `system.clients' I used the following as a reference: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java At this moment there is only info about IP, clients outgoing port, client 'type' (i.e. CQL/thrift/alternator), shard ID and username. Column `request_count' is NOT present and CK consists of (`port', `client_type'), contrary to what C's has: (`port'). Code that notifies `system.clients` about new connections goes to top-level files `connection_notifier.`. Currently only CQL clients are observed, but enum `client_type` can be used in future to notify about connections with other protocols.	2019-12-17 11:31:28 +01:00
Benny Halevy	4b3243f5b9	table: move_sstables_from_staging_in_thread with _sstable_deletion_sem Hold the _sstable_deletion_sem while moving sstables from the staging directory so not to move them under the feet of table::snapshot. Fixes #5340 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	0446ce712a	view_update_generator::start: use variable binding Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	5d7c80c148	view_update_generator::start: fix indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	02784f46b9	view_update_generator: handle errors when processing sstable Consumer may throw, in this case, break from the loop and retry. move_sstable_from_staging_in_thread may theoretically throw too, ignore the error in this case since the sstable was already processed, individual move failures are already ignored and moving from staging will be retried upon restart. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	abda12107f	sstables: move_to_new_dir: add do_sync_dirs param To be used for "batch" move of several sstables from staging to the base directory, allowing the caller to sync the directories once when all are moved rather than for each one of them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	6efef84185	sstable: return future from move_to_new_dir distributed_loader::probe_file needlessly creates a seastar thread for it and the next patch will use it as part of a parallel_for_each loop to move a list of sstables (and sync the directories once at the end). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	0d2a7111b2	view_update_generator: sstable_with_table: std::move constructor args Just a small optimization. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:19:55 +02:00
Nadav Har'El	fc85c49491	alternator: error on unsupported parallel scan We do not yet support the parallel Scan options (TotalSegments, Segment), as reported in issue #5059. But even before implementing this feature, it is important that we produce an error if a user attempts to use it - instead of outright ignoring this parameter. This is what this patch does. The patch also adds a full test, test_scan.py::test_scan_parallel, for the parallel scan feature. The test passes on DynamoDB, and still xfails on Alternator after this patch - but now the Scan request fails immediately reporting the unsupported option - instead of what the pre-patch code did: returning the wrong results and the test failing just when the results do not match the expectations. Refs #5059. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191217084917.26191-1-nyh@scylladb.com>	2019-12-17 11:27:56 +02:00
Avi Kivity	f7d69b0428	Revert "Merge "bouncing lwt request to an owning shard" from Gleb" This reverts commit `64cade15cc`, reversing changes made to `9f62a3538c`. This commit is suspected of corrupting the response stream. Fixes #5479.	2019-12-17 11:06:10 +02:00
Rafael Ávila de Espíndola	237ba74743	relocatable: Check that patchelf didn't mangle the PT_LOAD headers Should avoid issue #4983 showing up again. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191213224803.316783-2-espindola@scylladb.com>	2019-12-16 20:18:32 +02:00
Avi Kivity	3b7aca3406	Merge "db: Don't create a reference to nullptr" from Rafael " Only the first patch is needed to fix the undefined behavior, but the followup ones simplify the memory management around user types. " * 'espindola/fix-5193-v2' of ssh://github.com/espindola/scylla: db: Don't use lw_shared_ptr for user_types_metadata user_types_metadata: don't implement enable_lw_shared_from_this cql3: pass a const user_types_metadata& to prepare_internal db: drop special case for top level UDTs db: simplify db::cql_type_parser::parse db: Don't create a reference to nullptr Add test for loading a schema with a non native type	2019-12-16 17:10:58 +02:00
Konstantin Osipov	d6bc7cae67	cmake: update CMakeLists.txt to scan test/ rather than tests/ A follow up on directory rename.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	e079a04f2a	test.py: automatically lookup all unit and boost tests	2019-12-16 17:47:42 +03:00
Konstantin Osipov	1c8736f998	tests: move all test source files to their new locations 1. Move tests to test (using singular seems to be a convention in the rest of the code base) 2. Move boost tests to test/boost, other (non-boost) unit tests to test/unit, tests which are expected to be run manually to test/manual. Update configure.py and test.py with new paths to tests.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	2fca24e267	tests: move a few remaining headers Move sstable_test.hh, test_table.hh and cql_assertions.hh from tests/ to test/lib or test/boost and update dependent .cc files. Move tests/perf_sstable.hh to test/perf/perf_sstable.hh	2019-12-16 17:47:42 +03:00
Konstantin Osipov	b9bf1fbede	tests: move another set of headers to the new test layout Move another small subset of headers to test/ with the same goals: - preserve bisectability - make the revision history traceable after a move Update dependent files.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	8047d24c48	tests: move .hh files and resources to new locations The plan is to move the unstructured content of tests/ directory into the following directories of test/: test/lib - shared header and source files for unit tests test/boost - boost unit tests test/unit - non-boost unit tests test/manual - tests intended to be run manually test/resource - binary test resources and configuration files In order to not break git bisect and preserve the file history, first move most of the header files and resources. Update paths to these files in .cc files, which are not moved.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	644595e15f	tests: remove executable property from data_listeners_test.cc Executable flag must be committed to git by mistake.	2019-12-16 17:47:41 +03:00
Benny Halevy	d2e00abe13	tests: commitlog_test: test_allocation_failure: improve error reporting We're seeing the following error from test from time to time: fatal error: in "test_allocation_failure": std::runtime_error: Did not get expected exception from writing too large record This is not reproducible and the error string does not contain enough information to figure out what happened exactly, therefore this patch adds an exception if the call succeeded unexpectedly and also prints the unexpected exception if one was caught. Refs #4714 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191215052434.129641-1-bhalevy@scylladb.com>	2019-12-16 15:38:48 +01:00
Asias He	6b7344f6e5	streaming: Fix typo in stream_result_future::maybe_complete s/progess/progress/ Refs: #5437	2019-12-16 11:12:03 +02:00
Dejan Mircevski	f3883cd935	dbuild: Fix podman invocation (#5481 ) The is_podman check was depending on `docker -v` printing "podman" in the output, but that doesn't actually work, since podman prints $0. Use `docker --help` instead, which will output "podman". Also return podman's return status, which was previously being dropped. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-16 11:11:48 +02:00
Avi Kivity	00ae4af94c	Merge "Sanitize and speed-up (a bit) directories set up" from Pavel " On start there are two things that scylla does on data/commitlog/etc. dirs: locks and verifies permissions. Right now these two actions are managed by different approaches, it's convenient to merge them. Also the introduced in this set directories class makes a ground for better --workdir option handling. In particular, right now the db::config entries are modified after options parse to update directories with the workdir prefix. With the directories class at hands will be able to stop doing this. " * 'br-directories-cleanup' of https://github.com/xemul/scylla: directories: Make internals work on fs::path directories: Cleanup adding dirs to the vector to work on directories: Drop seastar::async usage directories: Do touch_and_lock and verify sequentially directories: Do touch_and_lock in parallel directories: Move the whole stuff into own .cc file directories: Move all the dirs code into .init method file_lock: Work with fs::path, not sstring	2019-12-15 16:02:46 +02:00
Takuya ASADA	5e502ccea9	install.sh: setup workdir correctly on nonroot mode Specify correct workdir on nonroot mode, to set correct path of data / commitlog / hints directories at once. Fixes #5475 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191213012755.194145-1-syuu@scylladb.com>	2019-12-15 16:00:57 +02:00
Avi Kivity	c25d51a4ea	Revert "scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379 )" This reverts commit `4333b37f9e`. It breaks upgrades, and the user question is not informative enough for the user to make a correct decision. Fixes #5478. Fixes #5480.	2019-12-15 14:37:40 +02:00
Pavel Emelyanov	23a8d32920	directories: Make internals work on fs::path Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	373fcfdb3e	directories: Cleanup adding dirs to the vector to work on The unordered_set is turned into vector since for fs::path there's no hash() method that's needed for set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	14437da769	directories: Drop seastar::async usage Now the only future-able operation remained is the call to parallel_for_each(), all the rest is non-blocking preparation, so we can drop the seastar::async and just return the future from parallel_for_each. The indendation is now good, as in previous patch is was prepared just for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	06f4f3e6d8	directories: Do touch_and_lock and verify sequentially The goal is to drop the seastar::async() usage. Currently we have two places that return futures -- calls to parallel_for_each-s. We can either chain them together or, since both are working on the same set of directories, chain actions inside them. For code simplicity I propose to chain actions. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	8d0c820aa1	directories: Do touch_and_lock in parallel The list of paths that should be touch-and-locked is already at hands, this shortens the code and makes it slightly faster (in theory). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	71a528d404	directories: Move the whole stuff into own .cc file In order not to pollute the root dir place the code in utils/ directory, "utils" namespace. While doing this -- move the touch_and_lock from the class declaration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Benny Halevy	9ec98324ed	messaging_service: unregister_handler: return rpc unregister_handler future Now that seastar returns it. Fixes https://github.com/scylladb/scylla/issues/5228 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191212143214.99328-1-bhalevy@scylladb.com>	2019-12-12 16:38:36 +02:00
Pavel Emelyanov	f2b3c17e66	directories: Move all the dirs code into .init method The seastar::async usage is tempoarary, added for bisect-safety, soon it will go away. For this reason the indentation in the .init method is not "canonical", but is prepared for one-patch drop of the seastar::async. The hinted_handoff_enabled arg is there, as it's not just a parameter on config, it had been parsed in main.cc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 17:33:11 +03:00
Pavel Emelyanov	82ef2a7730	file_lock: Work with fs::path, not sstring The main.cc code that converts sstring to fs::path will be patched soon, the file_desc::open belongs to seastar and works on sstrings. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 17:32:10 +03:00
Konstantin Osipov	bc482ee666	test.py: remove an unused option Message-Id: <20191204142622.89920-2-kostja@scylladb.com>	2019-12-12 15:53:35 +02:00
Avi Kivity	64cade15cc	Merge "bouncing lwt request to an owning shard" from Gleb " LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by the transport code that jumps to a correct shard and re-process incoming message there. " * 'gleb/bounce_lwt_request' of github.com:scylladb/seastar-dev: lwt: take raw lock for entire cas duration lwt: drop invoke_on in paxos_state prepare and accept lwt: Process lwt request on a owning shard storage_service: move start_native_transport into a thread transport: change make_result to takes a reference to cql result instead of shared_ptr	2019-12-12 15:50:22 +02:00
Nadav Har'El	9f62a3538c	alternator: fix BEGINS_WITH operator for blobs The implementation of Expected's BEGINS_WITH operator on blobs was incorrect, naively comparing the base64-encoded strings, which doesn't work. This patches fixes the code to compare the decoded strings. The reason why the BEGINS_WITH test missed this bug was that we forgot to check the blob case and only tested the string case; So this patch also adds the missing test - which reproduces this bug, and verifies its fix. Fixes #5457 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191211115526.29862-1-nyh@scylladb.com>	2019-12-12 14:02:56 +01:00
Dejan Mircevski	27b8b6fe9d	cql3: Fix needs_filtering() for clustering columns The LIKE operator requires filtering, so needs_filtering() must check is_LIKE(). This already happens for partition columns, but it was overlooked for clustering columns in the initial implementation of LIKE. Fixes #5400. Tests: unit(dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-12 01:19:13 +02:00
Benny Halevy	d1bcb39e7f	hinted handoff: log message after removing hints directory (#5372 ) To be used by dtest as an indicator that endpoint's hints were drained and hints directory is removed. Refs #5354 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-12 01:16:19 +02:00
Rafael Ávila de Espíndola	3b61cf3f0b	db: Don't use lw_shared_ptr for user_types_metadata The user_types_metadata can simply be owned by the keyspace. This simplifies the code since we never have to worry about nulls and the ownership is now explicit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	a55838323b	user_types_metadata: don't implement enable_lw_shared_from_this It looks like this was done just to avoid including user_types_metadata.hh, which seems a bit much considering that it requires adding specialization to the seastar namespace. A followup patch will also stop using lw_shared_ptr for user_types_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	f7c2c60b07	cql3: pass a const user_types_metadata& to prepare_internal We never modify the user_types_metadata via prepare_internal, so we can pass it a const reference. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	99cb8965be	db: drop special case for top level UDTs This was originally done in `7f64a6ec4b`, but that commit was reverted in reverted in `8517eecc28`. The revert was done because the original change would call parse_raw for non UDT types. Unlike the old patch, this one doesn't change the behavior of non UDT types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	7ae9955c5f	db: simplify db::cql_type_parser::parse The variant of db::cql_type_parser::parse that has a user_types_metadata argument was only used from the variant that didn't. This inlines one in the other. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	2092e1ef6f	db: Don't create a reference to nullptr The user_types variable can be null during db startup since we have to create types before reading the system table defining user types. This avoids undefined behavior, but is unlikely that it was causing more serious problems since the variable is only used when creating user types and we don't create any until after all system tables are read, in which case the user_types variable is not null. Fixes #5193 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	6143941535	Add test for loading a schema with a non native type This would have found the error with the previous version of the patch series. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:43:34 -08:00
Gleb Natapov	64cfb9b1f6	lwt: take raw lock for entire cas duration It will prevent parallel update by the same coordinator and should reduce contention.	2019-12-11 14:41:31 +02:00
Gleb Natapov	898d2330a2	lwt: drop invoke_on in paxos_state prepare and accept Since lwt requests are now running on an owning shard there is no longer a need to invoke cross shard call.	2019-12-11 14:41:31 +02:00
Gleb Natapov	964c532c4f	lwt: Process lwt request on a owning shard LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by transport code that jumps to a correct shard and re-process incoming message there.	2019-12-11 14:41:31 +02:00
Gleb Natapov	54be057af3	storage_service: move start_native_transport into a thread The code runs only once and it is simple if it runs in a seastar thread.	2019-12-11 14:41:31 +02:00
Gleb Natapov	007ba3e38e	transport: change make_result to takes a reference to cql result instead of shared_ptr	2019-12-11 14:41:31 +02:00
Nadav Har'El	9e5c6995a3	alternator-test: add tests for ReturnValues parameter This patch adds comprehensive tests for the ReturnValue parameter of the write operations (PutItem, UpdateItem, DeleteItem), which can return pre-write or post-write values of the modified item. The tests are in a new test file, alternator-test/test_returnvalues.py. This feature is not yet implemented in Alternator, so all the new tests xfail on Alternator (and all pass on AWS). Refs #5053 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191127163735.19499-1-nyh@scylladb.com>	2019-12-11 13:26:39 +01:00
Nadav Har'El	ab69bfc111	alternator-test: add xfailing tests for ScanIndexForward This patch adds tests for Query's "ScanIndexForward" parameter, which can be used to return items in reversed sort order. We test that a Limit works and returns the given number of last items in the sort order, and also that such reverse queries can be resumed, i.e., paging works in the reverse order. These tests pass against AWS DynamoDB, but fail against Alternator (which doesn't support ScanIndexForward yet), so it is marked xfail. Refs #5153. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191127114657.14953-1-nyh@scylladb.com>	2019-12-11 13:26:39 +01:00
Pekka Enberg	6bc18ba713	storage_proxy: Remove reference to MBean interface The JMX interface is implemented by the scylla-jmx project, not scylla. Therefore, let's remove this historical reference to MBeans from storage_proxy. Message-Id: <20191211121652.22461-1-penberg@scylladb.com>	2019-12-11 14:24:28 +02:00
Avi Kivity	63474a3380	Merge "Add `experimental_features` option" from Dejan " Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser. Fixes #5338 " * 'vecexper' of https://github.com/dekimir/scylla: config: Add `experimental_features` option utils: Add enum_option	2019-12-11 14:23:08 +02:00
Avi Kivity	56b9bdc90f	Update seastar submodule * seastar e440e831c8...00da4c8760 (7): > Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi Fixes #5443. > install-dependencies.sh: fix arch dependencies > Merge " rpc: fix use-after-free during rpc teardown vs. rpc server message handling" from Benny > Merge "testing: improve the observability of abandoned failed futures" from Botond > rework the fair_queue tester > directory_test: Update to use run instead of run_deprecated > log: support fmt 6.0 branch with chrono.h for log	2019-12-11 14:17:49 +02:00
Benny Halevy	105c8ef5a9	messaging_service: wait on unregister_handler Prepare for returning future<> from seastar rpc unregister_handler. Refs https://github.com/scylladb/scylla/issues/5228 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191208153924.1953-1-bhalevy@scylladb.com>	2019-12-11 14:17:41 +02:00
Nadav Har'El	06c3802a1a	storage_proxy: avoid overflow in view-backlog delay calculation In the calculate_delay() code for view-backlog flow control, we calculate a delay and cap it at a "budget" - the remaining timeout. This timeout is measured in milliseconds, but the capping calculation converted it into microseconds, which overflowed if the timeout is very large. This causes some tests which enable the UB sanitizer to fail. We fix this problem by comparing the delay to the budget in millisecond resolution, not in microsecond resolution. Then, if the calculated delay is short enough, we return it using its full microsecond resolution. Fixes #5412 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191205131130.16793-1-nyh@scylladb.com>	2019-12-11 14:10:54 +02:00
Nadav Har'El	2824d8f6aa	Merge: alternator: Fix EQ operator for sets Merged pull request https://github.com/scylladb/scylla/pull/5453 from Piotr Sarna: Checking the EQ relation for alternator attributes is usually performed simply by comparing underlying JSON objects, but sets (SS, BS, NS types) need a special routine, as we need to make sure that sets stored in a different order underneath are still equal, e.g: [1, 3, 2] == [1, 2, 3] Fixes #5021	2019-12-11 13:20:25 +02:00
Piotr Sarna	421db1dc9d	alternator-test: remove XFAIL from set EQ test With this series merged, test_update_expected_1_eq_set from test_expected.py suite starts passing.	2019-12-11 12:07:39 +01:00
Piotr Sarna	a8e45683cb	alternator: add EQ comparison for sets Checking the EQ relation for alternator attributes is usually performed simply by comparing underlying JSON objects, but sets (SS, BS, NS types) need a special routine, as we need to make sure that sets stored in a different order underneath are still equal, e.g: [1, 3, 2] == [1, 2, 3] Fixes #5021	2019-12-11 12:07:39 +01:00
Piotr Sarna	fb37394995	schema_tables: notify table deletions before creations If a set of mutations contains both an entry that deletes a table and an entry that adds a table with the same name, it's expected to be a replacement operation (delete old + create new), rather than a useless "try to create a table even though it exists already and then immediately delete the original one" operation. As such, notifications about the deletions should be performed before notifications about the creations. The place that originally suffered from this wrong order is view building - which in this case created an incorrect duplicated entry in the view building bookkeeping, and then immediately deleted it, resulting in having old, deprecated entries with stale UUIDS lying in the build queue and never proceeding, because the underlying table is long gone. The issue is fixed by ensuring the order of notifications: - drops are announced first, view drops are announced before table drops; - creations follow, table creations are announced before views; - finally, changes to tables and views are announced; Fixes #4382 Tests: unit(dev), mv_populating_from_existing_data_during_node_stop_test	2019-12-11 12:48:29 +02:00
Benny Halevy	d544df6c3c	dist/ami/build_ami.sh: support incremental build of rpms (#5191 ) Iterate over an array holding all rpm names to see if any of them is missing from `dist/ami/files`. If they are missing, look them up in build/redhat/RPMS/x86_64 so that if reloc/build_rpm.sh was run manually before dist/ami/build_ami.sh we can just collect the built rpms from its output dir. If we're still missing any rpms, then run reloc/build_rpm.sh and copy the required rpms from build/redhat/RPMS/x86_64. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2019-12-11 12:48:29 +02:00
Amnon Heiman	f43285f39a	api: replace swagger definition to use long instead of int (#5380 ) In swagger 1.2 int is defined as int32. We originally used int following the jmx definition, in practice internally we use uint and int64 in many places. While the API format the type correctly, an external system that uses swagger-based code generator can face a type issue problem. This patch replace all use of int in a return type with long that is defined as int64. Changing the return type, have no impact on the system, but it does help external systems that use code generator from swagger. Fixes #5347 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-12-11 12:48:29 +02:00
Nadav Har'El	2abac32f2e	Merged: alternator: Implement CONTAINS and NOT_CONTAINS in Expected Merged pull request https://github.com/scylladb/scylla/pull/5447 by Dejan Mircevski. Adds the last missing operators in the "Expected" parameter and re-enable their tests. Fixes #5034.	2019-12-11 12:48:29 +02:00
Cem Sancak	86b8036502	Fix DPDK mode in prepare script Fixes #5455.	2019-12-11 12:48:29 +02:00
Calle Wilund	35089da983	conf/config: Add better descriptive text on server/client encryption Provide some explanation on prio strings + direction to gnutls manual. Document client auth option. Remove confusing/misleading statement on "custom options" Message-Id: <20191210123714.12278-1-calle@scylladb.com>	2019-12-11 12:48:28 +02:00
Dejan Mircevski	32af150f1d	alternator: Implement NOT_CONTAINS operator in Expected Enable existing NOT_CONTAINS test, add NOT_CONTAINS to the list of recognized operators, implement check_NOT_CONTAINS, and hook it up to verify_expected_one(). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 15:31:47 -05:00
Dejan Mircevski	bd2bd3c7c8	alternator: Implement CONTAINS operator in Expected Enable existing CONTAINS test, implement check_CONTAINS, and hook it up to verify_expected_one(). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 15:31:47 -05:00
Dejan Mircevski	5a56fd384c	config: Add `experimental_features` option When the user wants to turn on only some experimental features, they can use this new option. The existing `experimental` option is preserved for backwards compatibility. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 11:47:03 -05:00
Piotr Sarna	9504bbf5a4	alternator: move unwrap_set to serialization header The utility function for unwrapping a set is going to be useful across source files, so it's moved to serialization.hh/serialization.cc.	2019-12-10 15:08:47 +01:00
Piotr Sarna	4660e58088	alternator: move rjson value comparison to rjson.hh The comparison struct is going to be useful across source files, so it's moved into rjson header, where it conceptually belongs anyway.	2019-12-10 15:08:47 +01:00
Botond Dénes	db0e2d8f90	scylla-gdb.py: document and add safety net to seastar::thread related commands Almost all commands provided by `scylla-gdb.py` are safe to use. The worst that could happen if they fail is that you won't get the desired information. There is one notable exception: `scylla thread`. If anything goes wrong while this command is executed - gdb crashes, a bug in the command, etc. - there is a good change the process under examination will crash. Sometimes this is fine, but other times e.g. when live debugging a production node, this is unacceptable. To avoid any accidents add documentation to all commands working with `seastar::thread`. And since most people don't read documentation, especially when debugging under pressure, add a safety net to the `scylla thread` command. When run, this command will now warn of the dangers and will ask for explicit acknowledgment of the risk of crash, by means of passing an `--iamsure` flag. When this flag is missing, it will refuse to run. I am sure this will be very annoying but I am also sure that the avoided crashes are worth it. As part of making `scylla thread` safe, its argument parsing code is migrated to `argparse`. This changes the usage but this should be fine because it is well documented. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191129092838.390878-1-bdenes@scylladb.com>	2019-12-10 11:51:57 +02:00
Eliran Sinvani	765db5d14f	build_ami: Trim ami description attribute to the allowed size The ami description attribute is only allowed to be 255 characters long. When build_ami.sh generates an ami, it generates an ami description which is a concatenation of all of the componnents version strings. It can happen that the description string is too long which eventually causes the ami build to fail. This patch trims the description string to 255 characters. It is ok since the individual versions of the components are also saved in tags attached to the image. Tests: 1. Reproduced with a long description and validated that it doesn't fail after the fix. Fixes #5435 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20191209141143.28893-1-eliransin@scylladb.com>	2019-12-10 11:51:57 +02:00
Fabiano Lucchese	4333b37f9e	scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379 ) A Linux machine typically has multiple clocksources with distinct performances. Setting a high-performant clocksource might result in better performance for ScyllaDB, so this should be considered whenever starting it up. This patch introduces the possibility of enforcing optimized Linux clocksource to Scylla's setup/start-up processes. It does so by adding an interactive question about enforcing clocksource setting to scylla_setup, which modifies the parameter "CLOCKSOURCE" in scylla_server configuration file. This parameter is read by perftune.py which, if set to "yes", proceeds to (non persistently) setting the clocksource. On x86, TSC clocksource is used. Fixes #4474	2019-12-10 11:51:57 +02:00
Pavel Emelyanov	3a21419fdb	features: Remove _FEATURE suffix from hinted_handoff feature name All the other features are named w/o one. The internal const-s are all different, but I'm fixing it separately. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191209154310.21649-1-xemul@scylladb.com>	2019-12-10 11:51:57 +02:00
Dejan Mircevski	a26bd9b847	utils: Add enum_option This allows us to accept command-line options with a predefined set of valid arguments. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-09 09:45:59 -05:00
Calle Wilund	7c5e4c527d	cdc_test: Add small test for altering base schema (add column)	2019-12-09 14:35:04 +00:00
Calle Wilund	cb0117eb44	cdc: Handle schema changes via migration manager callbacks This allows us to create/alter/drop log and desc tables "atomically" with the base, by including these mutations in the original mutation set, i.e. batch create/alter tables. Note that population does not happen until types are actually already put into database (duh), thus there _is_ still a gap between creating cdc and it being truly usable. This may or may not need handling later.	2019-12-09 14:35:04 +00:00
Rafael Ávila de Espíndola	761b19cee5	build: Split the build and host linker flags A general build system knows about 3 machines: * build: where the building is running * host: where the built software will run * target: the machine the software will produce code for The target machine is only relevant for compilers, so we can ignore it. Until now we could ignore the build and host distinction too. This patch adds the first difference: don't use host ld_flags when linking build tools (gen_crc_combine_table). The reason for this change is to make it possible to build with -Wl,--dynamic-linker pointing to a path that will exist on the host machine, but may not exist on the build machine. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191207030408.987508-1-espindola@scylladb.com>	2019-12-09 15:54:57 +02:00
Calle Wilund	27183f648d	migration_manager: Invoke "before" callbacks for table operations Potentially allowing (cdc) augmentation of mutations. Note: only does the listener part in seastar::thread, to avoid changing call behaviour.	2019-12-09 12:12:09 +00:00
Calle Wilund	f78a3bf656	migration_listener: Add empty base class and "before" callbacks for tables Empty base type makes for less boiler plate in implementations. The "before" callbacks are for listeners who need to potentially react/augment type creation/alteration _before_ actually committing type to schema tables (and holding the semaphore for this). I.e. it is for cdc to add/modify log/desc tables "atomically" with base.	2019-12-09 12:12:09 +00:00
Calle Wilund	4e406105b1	cql_test_env: Include cdc service in cql tests	2019-12-09 12:12:09 +00:00
Calle Wilund	a21e140169	cdc: Add sharded service that does nothing. But can be used to hang functionality into eventually.	2019-12-09 12:12:09 +00:00
Calle Wilund	2787b0c4f8	cdc: Move "options" to separate header to avoid to much header inclusion cdc should not contaminate the whole universe.	2019-12-09 12:12:09 +00:00
fastio	8f326b28f4	Redis: Combine all the source files redis/commands/* into redis/commands.{hh,cc} Fixes: #5394 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-08 13:54:33 +02:00
Avi Kivity	9c63cd8da5	sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417 ) The vm.swappiness sysctl controls the kernel's prefernce for swapping anonymous memory vs page cache. Since Scylla uses very large amounts of anonymous memory, and tiny amounts of page cache, the correct setting is to prefer swapping page cache. If the kernel swaps anonymous memory the reactor will stall until the page fault is satisfied. On the other hand, page cache pages usually belong to other applications, usually backup processes that read Scylla files. This setting has been used in production in Scylla Cloud for a while with good results. Users can opt out by not installing the scylla-kernel-conf package (same as with the other kernel tunables).	2019-12-08 13:04:25 +02:00
Avi Kivity	0e319e0359	Update seastar submodule * seastar 166061da3...e440e831c (8): > Fail tests on ubsan errors > future: make a couple of asserts more strict > future: Move make_ready out of line > config: Do not allow zero rates Fixes #5360 > future: add new state to avoid temporaries in get_available_state(). > future: avoid temporary future_state on get_available_state(). > future: inline future::abandoned > noncopyable_function: Avoid uninitialized warning on empty types	2019-12-06 18:33:23 +02:00
Piotr Sarna	0718ff5133	Merge 'min/max on collections returns human-readable result' from Juliusz Previously, scylla used min/max(blob)->blob overload for collections, tuples and UDTs; effectively making the results being printed as blobs. This PR adds "dynamically"-typed min()/max() functions for compound types. These types can be complicated, like map<int,set<tuple<..., and created in runtime, so functions for them are created on-demand, similarly to tojson(). The comparison remains unchanged - underneath this is still byte-by-byte weak lex ordering. Fixes #5139 * jul-stas/5139-minmax-bad-printing-collections: cql_query_tests: Added tests for min/max/count on collections cql3: min()/max() for collections/tuples/UDTs do not cast to blobs	2019-12-06 16:40:17 +01:00
Juliusz Stasiewicz	75955beb0b	cql_query_tests: Added tests for min/max/count on collections This tests new min/max function for collections and tuples. CFs in test suite were named according to types being tested, e.g. `cf_map<int,text>' what is not a valid CF name. Therefore, these names required "escaping" of invalid characters, here: simply replacing with '_'.	2019-12-06 12:15:49 +01:00
Juliusz Stasiewicz	9efad36fb8	cql3: min()/max() for collections/tuples/UDTs do not cast to blobs Before: cqlsh> insert into ks.list_types (id, val) values (1, [3,4,5]); cqlsh> select max(val) from ks.list_types; system.max(val) ------------------------------------------------------------ 0x00000003000000040000000300000004000000040000000400000005 After: cqlsh> select max(val) from ks.list_types; system.max(val) -------------------- [3, 4, 5] This is accomplished similarly to `tojson()`/`fromjson()`: functions are generated on demand from within `cql3::functions::get()`. Because collections can have a variety of types, including UDTs and tuples, it would be impossible to statically define max(T t)->T for every T. Until now, max(blob)->blob overload was used. Because `impl_max/min_function_for` is templated with the input/output type, which can be defined in runtime, we need type-erased ("dynamic") versions of these functors. They work identically, i.e. they compare byte representations of lhs and rhs with `bytes::operator<`. Resolves #5139	2019-12-06 12:14:51 +01:00
Avi Kivity	a18a921308	docs: maintainer.md: use command line to merge multi-commit pull requests If you merge a pull request that contains multiple patches via the github interface, it will document itself as the committer. Work around this brain damage by using the command line.	2019-12-06 10:59:46 +01:00
Botond Dénes	7b37a700e1	configure.py: make tests explicitely depend on libseastar_testing.a So that changes to libseastar_testing.a make all test target out of date. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191205142436.560823-1-bdenes@scylladb.com>	2019-12-05 19:30:34 +02:00
Piotr Sarna	3a46b1bb2b	Merge "handle hints on separate connection and scheduling group" from Piotr Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write. The intent of using a separate connection is to increase fairness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one. Information about new RPC support is propagated through new gossip feature HINTED_HANDOFF_SEPARATE_CONNECTION. Fixes #4974. Tests: unit(release)	2019-12-05 17:25:26 +01:00
Calle Wilund	c11874d851	gms::inet_address: Use special ostream formatting to match Java To make gms::inet_address::to_string() similar in output to origin. The sole purpose being quick and easy fix of API/JMX ipv6 formatting of endpoints etc, where strings are used as lexical comparisons instead of textual representation. A better, but more work, solution is to fix the scylla-jmx bridge to do explicit parse + re-format of addresses, but there are many such callpoints. An even better solution would be to fix nodetool to not make this mistake of doing lexical comparisons, but then we risk breaking merge compatibility. But could be an option for a separate nodeprobe impl. Message-Id: <20191204135319.1142-1-calle@scylladb.com>	2019-12-05 17:01:26 +02:00
Gleb Natapov	4893bc9139	tracing: split adding prepared query parameters from stopping of a trace Currently query_options objects is passed to a trace stopping function which makes it mandatory to make them alive until the end of the query. The reason for that is to add prepared statement parameters to the trace. All other query options that we want to put in the trace are copied into trace_state::params_values, so lets copy prepared statement parameters there too. Trace enabled case will become a little bit more expensive but on the other hand we can drop a continuation that holds query_options object alive from a fast path. It is safe to drop the call to stop_foreground_prepared() here since The tracing will be stopped in process_request_one(). Message-Id: <20191205102026.GJ9084@scylladb.com>	2019-12-05 17:00:47 +02:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Nadav Har'El	5b2f35a21a	Merge "Redis: fix the options related to Redis API, fix the DEL and GET command" Merged pull request https://github.com/scylladb/scylla/pull/5381 by Peng Jian, fixing multiple small issues with Redis: * Rename the options related to Redis API, and describe them clearly. * Rename redis_transport_port to redis_port * Rename redis_transport_port_ssl to redis_ssl_port * Rename redis_default_database_count to redis_database_count * Remove unnecessary option enable_redis_protocol * Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM * Fix the DEL command: support to delete mutilple keys in one command. * Fix the GET command: return the empty string when the required key is not exists. * Fix the redis-test/test_del_non_existent_key: mark xfail.	2019-12-05 11:58:34 +02:00
Avi Kivity	85822c7786	database: fix schema use-after-move in make_multishard_streaming_reader On aarch64, asan detected a use-after-move. It doesn't happen on x86_64, likely due to different argument evaluation order. Fix by evaluating full_slice before moving the schema. Note: I used "auto&&" and "std::move()" even though full_slice() returns a reference. I think this is safer in case full_slice() changes, and works just as well with a reference. Fixes #5419.	2019-12-05 11:58:34 +02:00
Piotr Sarna	79c3a508f4	table: Reduce read amplification in view update generation This commit makes sure that single-partition readers for read-before-write do not have fast-forwarding enabled, as it may lead to huge read amplification. The observed case was: 1. Creating an index. CREATE INDEX index1 ON myks2.standard1 ("C1"); 2. Running cassandra-stress in order to generate view updates. cassandra-stress write no-warmup n=1000000 cl=ONE -schema \ 'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \ keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors skip-read-validation -node 127.0.0.1; Without disabling fast-forwarding, single-partition readers were turned into scanning readers in cache, which resulted in reading 36GB (sic!) on a workload which generates less than 1GB of view updates. After applying the fix, the number dropped down to less than 1GB, as expected. Refs #5409 Fixes #4615 Fixes #5418	2019-12-05 11:58:34 +02:00
Konstantin Osipov	6a5e7c0e22	tests: reduce the number of iterations of dynamic_bitset_test This test execution time dominates by a serious margin test execution time in dev/release mode: reducing its execution time improves the test.py turnaround by over 70%. Message-Id: <20191204135315.86374-2-kostja@scylladb.com>	2019-12-05 11:58:34 +02:00
Avi Kivity	07427c89a2	gdb: change 'scylla thread' command to access fs_base register directly Currently, 'scylla thread' uses arch_prctl() to extract the value of fsbase, used to reference thread local variables. gdb 8 added support for directly accessing the value as $fs_base, so use that instead. This works from core dumps as well as live processes, as you don't need to execute inferior functions. The patch is required for debugging threads in core dumps, but not sufficient, as we still need to set $rip and $rsp, and gdb still[1] doesn't allow this. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=9370	2019-12-05 11:58:34 +02:00
Piotr Dulikowski	adfa7d7b8d	messaging_service: don't move `unsigned` values in handlers Performing std::move on integral types is pointless. This commit gets rid of moves of values of `unsigned` type in rpc handlers.	2019-12-05 00:58:31 +01:00
Piotr Dulikowski	77d2ceaeba	storage_proxy: handle hints through separate rpc verb	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	2609065090	storage_proxy: move register_mutation handler to local lambda This refactor makes it possible to reuse the lambda in following commits.	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	6198ee2735	hh: introduce HINTED_HANDOFF_SEPARATE_CONNECTION feature The feature introduced by this commit declares that hints can be sent using the new dedicated RPC verb. Before using the new verb, nodes need to know if other nodes in the cluster will be able to handle the new RPC verb.	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	2e802ca650	hh: add HINT_MUTATION verb Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write. The intent of using a separate connection is to increase fariness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.	2019-12-05 00:51:49 +01:00
Avi Kivity	fd951a36e3	Merge "Let compaction wait on background deletions" from Benny " In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done. However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted. This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish. Fixes #4909 Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction "	2019-12-04 11:18:41 +02:00
Takuya ASADA	c9d8606786	dist/common/scripts/scylla_ntp_setup: relax RHEL version check We may able to use chrony setup script on future version of RHEL/CentOS, it better to run chrony setup when RHEL version >= 8, not only 8. Note that on Fedora it still provides ntp/ntpdate package, so we run ntp setup on it for now. (same on debian variants) Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191203192812.5861-1-syuu@scylladb.com>	2019-12-04 10:59:14 +02:00
Juliusz Stasiewicz	430b2ad19d	commitlog+region_group: timeout exceptions with names `segment_manager' now uses a decorated version of `timed_out_error' with hardcoded name. On the other hand `region_group' uses named `on_request_expiry' within its `expiring_fifo'.	2019-12-03 19:07:19 +01:00
Avi Kivity	91d3f2afce	docs: maintainers.md: fix typo in git push --force-with-lease Just one lease, not many. Reported by Piotr Sarna.	2019-12-03 18:17:46 +01:00
Calle Wilund	56a5e0a251	commitlog_replayer: Ensure applied frozen_mutation is safe during apply Fixes #5211 In `79935df959` replay apply-call was changed from one with no continuation to one with. But the frozen mutation arg was still just lambda local. Change to use do_with for this case as well. Message-Id: <20191203162606.1664-1-calle@scylladb.com>	2019-12-03 18:28:01 +02:00
Juliusz Stasiewicz	d043393f52	db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore Exception messages contain semaphore's name (provided in ctor). This affects the queue overflow exception as well as timeout exception. Also, custom throwing function in ctor was changed to `prethrow_action', i.e. metrics can still be updated there but now callers have no control over the type of the exception being thrown. This affected `restricted_reader_max_queue_length' test. `reader_concurrency_semaphore'-s docs are updated accordingly.	2019-12-03 15:41:34 +01:00
Amos Kong	e26b396f16	scylla-docker: fix default data_directories in scyllasetup.py (#5399 ) Use default data_file_directories if it's not assigned in scylla.yaml Fixes #5398 Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 13:58:17 +02:00
Rafael Ávila de Espíndola	1cd17887fa	build: strip debug when configured with --debuginfo 0 In a build configured with --debuginfo 0 the scylla binary still ends up with some debug info from the libraries that are statically linked in. We should avoid compiling subprojects (including seastar) with debug info when none is needed, but this at least avoids it showing up in the binary. The main motivation for this is that it is confusing to get a binary with some debug info in it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191127215843.44992-1-espindola@scylladb.com>	2019-12-03 12:41:04 +02:00
Tomasz Grabiec	0a453e5d30	Merge "Use fragmented buffers for collection de/serialization" from Botond This series refactors the collection de/serialization code to use fragmented buffers, avoiding the large allocations and the associated pains when working with large collections. Currently all operations that involve collections require deserializing them, executing the operation, then serializing them again to their internal storage format. The de/serialization operations happen in linearized buffers, which means that we have to allocate a buffer large enough to hold the entire collection. This can cause immense pressure on the memory allocator, which, in the face of memory fragmentation, might be unable to serve the allocation at all. We've seen this causing all sorts of nasty problems, including but not limited to: failing compactions, failing memtable flush, OOM crash and etc. Users are strongly discouraged from using large collections, yet they are still a fact of life and have been haunting us since forever. The proper solution for these problems would be to come up with an in-memory format for collections, however that is a major effort, with a lot of unknowns. This is something we plan on doing at some point but until it happens we should make life less painful for those with large collections. The goal of this series is to avoid the need of allocating these large buffers. Serialization now happens into a `bytes_ostream` which automatically fragments the values internally. Deserialization happens with `utils::linearizing_input_stream` (introduced by this series), which linearizes only the individual collection cells, but not the entire collection. An important goal of this series was to introduce the least amount of risk, and hence the least amount of code. This series does not try to make a revolution and completely revamp and optimize the de/serialization codepaths. These codepaths have their days numbered so investing a lot of effort into them is in vain. We can apply incremental optimizations where we deem it necessary. Fixes: #5341	2019-12-03 10:31:34 +01:00
fastio	01599ffbae	Redis API: Support the syntax of deleting multiple keys in one DEL command, fix the returning value for GET command. Support to delete multiple keys in one DEL command. The feature of returning number of the really deleted keys is still not supported. Return empty string to client for GET command when the required key is not exists. Fixes: #5334 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-03 17:27:40 +08:00
fastio	039b83ad3b	Redis API: Rename options related to Redis API, describe them clearly, and remove unnecessary one. Rename option redis_transport_port to redis_port, which the redis transport listens on for clients. Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients. Rename option redis_database_count. Set the redis dabase count. Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace. Remove option enable_redis_protocol, which is unnecessary. Fixes: #5335 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-03 17:13:35 +08:00
Nadav Har'El	7b93360c8d	Merge: redis: skip processing request of EOF Merged pull request https://github.com/scylladb/scylla/pull/5393/ by Amos Kong: ` When I test the redis cmd by echo and nc, there is a redundant error in the end. I checked by strace, currently if client read nothing from stdin, it will shutdown the socket, redis server will read nothing (0 byte) from socket. But it tries to process the empty command and returns an error. $ echo -n -e '1\r\n$4\r\nping\r\n' \|strace nc localhost 6379 \| ... \| read(0, "1\r\n$4\r\nping\r\n", 8192) = 14 \| select(5, [4], [4], [], NULL) = 1 (out [4]) \|>>> sendto(4, "1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14 \| select(5, [0 4], [], [], NULL) = 1 (in [0]) \| recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket) \| read(0, "", 8192) = 0 \|>>> shutdown(4, SHUT_WR) = 0 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32 \| write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG \| -ERR unknown command '' \| ) = 32 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0 \| close(1) = 0 \| close(4) = 0 Current result: $ echo -n -e '' \|nc localhost 6379 -ERR unknown command '' $ echo -n -e '1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG -ERR unknown command '' Expected: $ echo -n -e '' \|nc localhost 6379 $ echo -n -e '*1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG	2019-12-03 10:40:20 +02:00
Avi Kivity	83feb9ea77	tools: toolchain: update frozen image Commit `96009881d8` added diffutils to the dependencies via Seastar's install-dependencies.sh, after it was inadvertantly dropped in `1164ff5329` (update to Fedora 31; diffutils is no longer brought in as a side effect of something else). Regenerate the image to include diffutils. Ref #5401.	2019-12-03 10:36:55 +02:00
Amos Kong	fb9af2a86b	redis-test: add test_raw_cmd.py This patch added subtests for EOF process, it reads and writes the socket directly by using protocol cmds. We can add more tests in future, tests with Redis module will hide some protocol error. Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 10:47:56 +08:00
Amos Kong	4fa862adf4	redis: skip processing request of EOF When I test the redis cmd by echo and nc, there is a redundant error in the end. I checked by strace, currently if client read nothing from stdin, it will shutdown the socket, redis server will read nothing (0 byte) from socket. But it tries to process the empty command and returns an error. $ echo -n -e '1\r\n$4\r\nping\r\n' \|strace nc localhost 6379 \| ... \| read(0, "1\r\n$4\r\nping\r\n", 8192) = 14 \| select(5, [4], [4], [], NULL) = 1 (out [4]) \|>>> sendto(4, "1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14 \| select(5, [0 4], [], [], NULL) = 1 (in [0]) \| recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket) \| read(0, "", 8192) = 0 \|>>> shutdown(4, SHUT_WR) = 0 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32 \| write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG \| -ERR unknown command '' \| ) = 32 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0 \| close(1) = 0 \| close(4) = 0 Current result: $ echo -n -e '' \|nc localhost 6379 -ERR unknown command '' $ echo -n -e '1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG -ERR unknown command '' Expected: $ echo -n -e '' \|nc localhost 6379 $ echo -n -e '*1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 10:47:56 +08:00
Rafael Ávila de Espíndola	bb114de023	dbuild: Fix confusion about relabeling podman needs to relabel directories in exactly the same cases docker does. The difference is that podman cannot relabel /tmp. The reason it was working before is that in practice anyone using dbuild has already relabeled any directories that need relabeling, with the exception of /tmp, since it is recreated on every boot. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191201235614.10511-2-espindola@scylladb.com>	2019-12-02 18:38:16 +02:00
Rafael Ávila de Espíndola	867cdbda28	dbuild: Use a temporary directory for /tmp With this we don't have to use --security-opt label=disable. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191201235614.10511-1-espindola@scylladb.com>	2019-12-02 18:38:14 +02:00
Botond Dénes	1d1f8b0d82	tests: mutation_test: add large collection allocation test Checking that there are no large allocations when a large collection is de/serialized.	2019-12-02 17:13:53 +02:00
Avi Kivity	28355af134	docs: add maintainer's handbook (#5396 ) This is a list of recipes used by maintainers to maintain scylla.git.	2019-12-02 15:01:54 +02:00
Calle Wilund	8c6d6254cf	cdc: Remove some code from header	2019-12-02 13:00:19 +00:00
Botond Dénes	4c59487502	collection_mutation: don't linearize the buffer on deserialization Use `utils::linearizing_input_stream` for the deserizalization of the collection. Allows for avoiding the linearization of the entire cell value, instead only linearizing individual values as they are deserialized from the buffer.	2019-12-02 10:10:31 +02:00
Botond Dénes	690e9d2b44	utils: introduce linearizing_input_stream `linearizing_input_stream` allows transparently reading linearized values from a fragmented buffer. This is done by linearizing on-the-fly only those read values that happen to be split across multiple fragments. This reduces the size of the largest allocation from the size of the entire buffer (when the entire buffer is linearized) to the size of the largest read value. This is a huge gain when the buffer contains loads of small objects, and modest gains when the buffer contains few large objects. But the even in the worst case the size of the largest allocation will be less or equal compared to the case where the entire buffer is linearized. This stream is planned to be used as glue code between the fragmented cell value and the collection deserialization code which expects to be reading linearized values.	2019-12-02 10:10:31 +02:00
Botond Dénes	065d8d37eb	tests: random-utils: get_string(): add overload that takes engine parameter	2019-12-02 10:10:31 +02:00
Botond Dénes	2f9307c973	collection_mutation: use a fragmented buffer for serialization For the serialization `bytes_ostream` is used.	2019-12-02 10:10:31 +02:00
Botond Dénes	fc5b096f73	imr: value_writer::write_to_destination(): don't dereference chunk iterator eagerly Currently the loop which writes the data from the fragmented origin to the destination, moves to the next chunk eagerly after writing the value of the current chunk, if the current chunk is exhausted. This presents a problem when we are writing the last piece of data from the last chunk, as the chunk will be exhausted and we eagerly attempt to move to the next chunk, which doesn't exist and dereferencing it will fail. The solution is to not be eager about moving to the next chunk and only attempt it if we actually have more data to write and hence expect more chunks.	2019-12-02 10:10:31 +02:00
Botond Dénes	875314fc4b	bytes_ostream: make it a FragmentRange The presence of `const_iterator` seems to be a requirement as well although it is not part of the concept. But perhaps it is just an assumption made by code using it.	2019-12-02 10:10:31 +02:00
Botond Dénes	4054ba0c45	serialization: accept any CharOutputIterator Not just bytes::output_iterator. Allow writing into streams other than just `bytes`. In fact we should be very careful with writing into `bytes` as they require potentially large contiguous allocations. The `write()` method is now templatized also on the type of its first argument, which now accepts any CharOutputIterator. Due to our poor usage of namespace this now collides with `write` defined inside `db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to be templatized on the data type it reads from, and de-templatizing it resolves the clash.	2019-12-02 10:10:31 +02:00
Botond Dénes	07007edab9	bytes_ostream: add output_iterator To allow it being used for serialization code, which works in terms of output iterators.	2019-12-02 10:10:31 +02:00
Takuya ASADA	c5a95210fe	dist/common/scripts/scylla_setup: list virtio-blk devices correctly on interactive RAID setup Currently interactive RAID setup prompt does not list virtio-blk devices due to following reasons: - We fail matching '-p' option on 'lsblk --help' output since misusage of regex functon, list_block_devices() always skipping to use lsblk output. - We don't check existance of /dev/vd* when we skipping to use lsblk. - We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e' option, but we actually needed them. To fix the problem we need to use re.search() instead of re.match() to match '-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list, then need to stop '-e 252' option on lsblk which excludes virtio-blk. Additionally, it better to parse 'TYPE' field of lsblk output, we should skip 'loop' devices and 'rom' devices since these are not disk devices. Fixes #4066 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191201160143.219456-1-syuu@scylladb.com>	2019-12-01 18:36:48 +02:00
Takuya ASADA	124da83103	dist/common/scripts: use chrony as NTP server on RHEL8/CentOS8 We need to use chrony as NTP server on RHEL8/CentOS8, since it dropped ntpd/ntpdate. Fixes #4571 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191101174032.29171-1-syuu@scylladb.com>	2019-12-01 18:35:03 +02:00
Nadav Har'El	b82417ba27	Merge "alternator: Implement Expected operators LE, GE, and BETWEEN" Merged pull request https://github.com/scylladb/scylla/pull/5392 from Dejan Mircevski. Refs #5034 The patches: alternator: Implement LE operator in Expected alternator: Implement GE operator in Expected alternator: Make cmp diagnostic a value, not funct utils: Add operator<< for big_decimal alternator: Implement BETWEEN operator in Expected	2019-12-01 16:11:11 +02:00
Nadav Har'El	8614c30bcf	Merge "implement echo command" Merged pull request https://github.com/scylladb/scylla/pull/5387 from Amos Kong: This patch implemented echo command, which return the string back to client. Reference: https://redis.io/commands/echo	2019-12-01 10:29:57 +02:00
Amos Kong	49fee4120e	redis-test: add test_echo Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-30 13:32:00 +08:00
Amos Kong	3e2034f07b	redis: implement echo command This patch implemented echo command, which return the string back to client. Reference: - https://redis.io/commands/echo Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-30 13:30:35 +08:00
Dejan Mircevski	dcb1b360ba	alternator: Implement BETWEEN operator in Expected Enable existing BETWEEN test, and add some more coverage to it. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 16:47:21 -05:00
Dejan Mircevski	c43b286f35	utils: Add operator<< for big_decimal ... and remove an existing duplicate from lua.cc. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 15:32:09 -05:00
Dejan Mircevski	e0d77739cc	alternator: Make cmp diagnostic a value, not funct All check_compare diagnostics are static strings, so there's no need to call functions to get them. Instead of a function, make diagnostic a simple value. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 15:09:05 -05:00
Dejan Mircevski	65cb84150a	alternator: Implement GE operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 12:29:08 -05:00
Dejan Mircevski	f201f0eaee	alternator: Implement LE operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 11:59:52 -05:00
Avi Kivity	96009881d8	Update seastar submodule * seastar 8eb6a67a4...166061da3 (3): > install-dependencies.sh: add diffutils > reactor: replace std::optional (in _network_stack_ready) with compat::optional > noncopyable_function: disable -Wuninitialized warning in noncopyable_function_base Ref #5386.	2019-11-29 12:50:48 +02:00
Tomasz Grabiec	6562c60c86	Merge "test.py: terminate children upon signal" from Kostja Allows a signal to terminate the outstanding test tasks, to avoid dangling children.	2019-11-29 12:05:03 +02:00
Pekka Enberg	bb227cf2b4	Merge "Fix default directories in Scylla setup scripts" from Amos "Fix two problem in scylla_io_setup: - Problem 1: paths of default directories is invalid, introduced by commit `5ec1915` ("scylla_io_setup: assume default directories under /var/lib/scylla"). - Problem 2: wrong path join, introduced by commit `31ddb21` ("dist/common/scripts: support nonroot mode on setup scripts"). Fix a problem in scylla_io_setup, scylla_fstrim and scylla_blocktune.py: - Fixed default scylla directories when they aren't assigned in scylla.yaml" Fixes #5370 Reviewed-by: Pavel Emelyanov <xemul@scylladb.com> * 'scylla_io_setup' of git://github.com/amoskong/scylla: use parse_scylla_dirs_with_default to get scylla directories scylla_io_setup: fix data_file_directories check scylla_util: introduce helper to process the default scylla directories scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml scylla_io_setup: fix path join of default scylla directories	2019-11-29 12:05:03 +02:00
Ultrabug	61f1e6e99c	test.py: fix undefined variable 'options' in write_xunit_report()	2019-11-28 19:06:22 +03:00
Ultrabug	5bdc0386c4	test.py: comparison to False should be 'if cond is False:'	2019-11-28 19:06:22 +03:00
Ultrabug	737b1cff5e	test.py: use isinstance() for type comparison	2019-11-28 19:06:22 +03:00
Konstantin Osipov	c611325381	test.py: terminate children upon signal Use asyncio as a more modern way to work with concurrency, Process signals in an event loop, terminate all outstanding tests before exiting. Breaking change: this commit requires Python 3.7 or newer to run this script. The patch adds a version check and a message to enforce it.	2019-11-28 19:06:22 +03:00
Botond Dénes	cf24f4fe30	imr: move documentation to docs/ Where all the other documentation is, and hence where people would be looking for it. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191128144612.378244-1-bdenes@scylladb.com>	2019-11-28 16:47:52 +02:00
Avi Kivity	36dd0140a8	Update seastar submodule * seastar 5c25de907a...8eb6a67a4b (1): > util/backtrace.hh: add missing print.hh include	2019-11-28 16:47:16 +02:00
Benny Halevy	7aef39e400	tracing: one_session_records: keep local tracing ptr Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr in one_session_records when constructed so it can be used during shutdown. Fixes #5243 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-28 15:24:10 +01:00
Gleb Natapov	75499896ab	client_state: store _user as optional instead of shared_ptr _user cannot outlive client_state class instance, so there is no point in holding it in shared_ptr. Tested: debug test.py and dtest auth_test.py Message-Id: <20191128131217.26294-5-gleb@scylladb.com>	2019-11-28 15:48:59 +02:00
Gleb Natapov	1538cea043	cql: modification_statement: store _restrictions as optional instead of shared_ptr _restrictions can be optional since its lifetime is managed by modification_statement class explicitly. Message-Id: <20191128131217.26294-4-gleb@scylladb.com>	2019-11-28 15:48:54 +02:00
Gleb Natapov	ce5d6d5eee	storage_service: store thrift server as an optional instead of shared_ptr Only do_stop_rpc_server uses the shared_ptr to prolong server's lifetime until stop() completes, but do_with() can be used to achieve the same. Message-Id: <20191128131217.26294-3-gleb@scylladb.com>	2019-11-28 15:48:51 +02:00
Gleb Natapov	b9b99431a8	storage_service: store cql server as an optional instead of shared_ptr Only do_stop_native_transport() uses the shared_ptr to prolong server's lifetime until stop() completes, but do_with() can be used to achieve the same. Message-Id: <20191128131217.26294-2-gleb@scylladb.com>	2019-11-28 15:48:47 +02:00
Avi Kivity	2b7e97514a	Update seastar submodule * seastar 6f0ef32514...5c25de907a (7): > shared_future: Fix crash when all returned futures time out Fixes #5322. > future: don't create temporaries on get_value(). > reactor: lower the default stall threshold to 200ms > reactor: Simplify network initialization > reactor: Replace most std::function with noncopyable_function > futures: Avoid extra moves in SEASTAR_TYPE_ERASE_MORE mode > inet_address: Make inet_address == operator ignore scope (again)	2019-11-28 14:48:01 +02:00
Juliusz Stasiewicz	fa12394dfe	reader_concurrency_semaphore: cosmetic changes Added line breaks, replaced unused include, included seastarx.hh instead of `using namespace seastar`.	2019-11-28 13:39:08 +01:00
Nadav Har'El	fde336a882	Merged "5139 minmax bad printing" Merged pull request https://github.com/scylladb/scylla/pull/5311 from Juliusz Stasiewicz: This is a partial solution to #5139 (only for two types) because of the above and because collections are much harder to do. They are coming in a separate PR.	2019-11-28 14:06:43 +02:00
Juliusz Stasiewicz	3b9ebca269	tests/cql_query_test: add test for aggregates on inet+time_type This is a test to max(), min() and count() system functions on the arguments of types: `net::inet_address` and `time_native_type`.	2019-11-28 11:20:43 +01:00
Juliusz Stasiewicz	9c23d89531	cql3/functions: add missing min/max/count for inet and time type References #5139. Aggregate functions, like max(), when invoked on `inet_address' and `time_native_type' used to choose max(blob)->blob overload, with casting of argument and result to bytes. This is because appropriate calls to `aggregate_fcts::make_XXX_function()' were missing. This commit adds them. Functioning remains the same but now clients see user-friendly representations of aggregate result, not binary. Comparing inet addresses without inet::operator< is performed by trick, where ADL is bypassed by wrapping the name of std::min/max and providing an overload of wrapper on inet type.	2019-11-28 11:18:31 +01:00
Pavel Emelyanov	8532093c61	cql: The cql_server does not need proxy reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191127153842.4098-1-xemul@scylladb.com>	2019-11-28 10:58:46 +01:00
Amos Kong	e2eb754d03	use parse_scylla_dirs_with_default to get scylla directories Use default data_file_directories/commitlog_directory if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 15:48:14 +08:00
Amos Kong	bd265bda4f	scylla_io_setup: fix data_file_directories check Use default data_file_directories if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 15:47:56 +08:00
Amos Kong	123c791366	scylla_util: introduce helper to process the default scylla directories Currently we support to assign workdir from scylla.yaml, and we use many hardcode '/var/lib/scylla' in setup scripts. Some setup scripts get scylla directories by parsing scylla.yaml, introduced parse_scylla_dirs_with_default() that adds default values if scylla directories aren't assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:54:32 +08:00
Amos Kong	b75061b4bc	scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:38:01 +08:00
Amos Kong	ada0e92b85	scylla_io_setup: fix path join of default scylla directories Currently we are checking an invalid path of some default scylla directories, the directories don't exist, so the tune will always be skipped. It caused by two problem. Problem 1: paths of default directories is invalid Introduced by commit `5ec191536e`, we try to tune some scylla default directories if they exist. But the directory paths we try are wrong. For example: - What we check: /var/lib/scylla/commitlog_directory - Correct one: /var/lib/scylla/commitlog Problem 2: wrong path join Introduced by commit `31ddb2145a`, default_path might be replaced from '/var/lib/scylla/' to '/var/lib/scylla'. Our code tries to check an invalid path that is wrongly join, eg: '/var/lib/scyllacommitlog' Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:37:58 +08:00
Amos Kong	d4a26f2ad0	scylla_util: get_scylla_dirs: return default data/commitlog directories if they aren't set (#5358 ) The default values of data_file_directories and commitlog_directory were commented by commit `e0f40ed16a`. It causes scylla_util.py:get_scylla_dirs() to fail in checking the values. This patch changed get_scylla_dirs() to return default data/commitlog directories if they aren't set. Fixes #5358 Reviewed-by: Pavel Emelyanov <xemul@scylladb.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-27 13:52:05 +02:00
Nadav Har'El	cb1ed5eab2	alternator-test: test Query's Limit parameter Add a test, test_query.py::test_query_limit, to verify that the Limit parameter correctly limits the number of rows returned by the Query. This was supposed to already work correctly - but we never had a test for it. As we hoped, the test passes (on both Alternator and DynamoDB). Another test, test_query.py::test_query_limit_paging, verifies that paging can be done with any setting of Limit. We already had tests for paging of the Scan operation, but not for the Query operation. Refs #5153 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-27 12:27:26 +01:00
Nadav Har'El	c01ca661a0	alternator-test: Select parameter of Query and Scan This is a comprehensive test for the "Select" parameter of Query and Scan operations, but only for the base-table case, not index, so another future patch should add similar tests in test_gsi.py and test_lsi.py as well. The main use of the Select parameter is to allow returning just the count of items, instead of their content, but it also has other esoteric options, all of which we test here. The test currently succeeds on AWS DynamoDB, demonstrating that the test is correct, but fails on Alternator because the "Select" parameter is not yet supported. So the test is marked xfail. Refs #5058 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-27 12:22:33 +01:00
Botond Dénes	9d09f57ba5	scylla-gdb.py: scylla_smp_queues: use lazy initalization Currently the command tries to read all seastar smp queues in its initialization code in the constructor. This constructor is run each time `scylla-gdb.py` is sourced in `gdb` which leads to slowdowns and sometimes also annoying errors because the sourcing happens in the wrong context and seastar symbols are not available. Avoid this by running this initializing code lazily, on the first invocation. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191127095408.112101-1-bdenes@scylladb.com>	2019-11-27 12:04:57 +01:00
Tomasz Grabiec	87b72dad3e	Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov This patchset adds missing "const" function qualifiers throughout the Scylla code base, which would make code less error-prone. The changeset incorporates Kostja's work regarding const qualifiers in the cql code hierarchy along with a follow-up patch addressing the review comment of the corresponding patch set (the patch subject is "cql: propagate const property through prepared statement tree.").	2019-11-27 10:56:20 +01:00
Rafael Ávila de Espíndola	91b43f1f06	dbuild: fix podman with selinux enabled With this change I am able to run tests using docker-podman. The option also exists in docker. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126194101.25221-1-espindola@scylladb.com>	2019-11-26 21:50:56 +02:00
Rafael Ávila de Espíndola	480055d3b5	dbuild: Fix missing docker options With the recent changes docker was missing a few options. In particular, it was missing -u. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126194347.25699-1-espindola@scylladb.com>	2019-11-26 21:45:31 +02:00
Rafael Ávila de Espíndola	c0a2cd70ff	lua: fix test with boost 1.66 The boost 1.67 release notes says Changed maximum supported year from 10000 to 9999 to resolve various issues So change the test to use a larger number so that we get an exception with both boost 1.66 and boost 1.67. Fixes #5344 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126180327.93545-1-espindola@scylladb.com>	2019-11-26 21:17:15 +02:00
Pavel Solodovnikov	55a1d46133	cql: some more missing const qualifiers There are several virtual functions in public interfaces named "is_*" that clearly should be marked as "const", so fix that.	2019-11-26 17:57:51 +03:00
Pavel Solodovnikov	412f1f946a	cql: remove "mutable" on _opts in select_statement _opts initialization can be safely done in the constructor, hence no need to make it mutable.	2019-11-26 17:55:10 +03:00
Piotr Sarna	d90dbd6ab0	Merge "support podman as a replacement to docker" from Avi Docker on Fedora 31 is flakey, and is not supported at all on RHEL 8. Podman is a drop-in replacement for docker; this series adds support for using podman in dbuild. Apart from actually working on Fedora 31 hosts, podman is nicer in being more secure and not requiring a daemon. Fixes #5332	2019-11-26 15:17:49 +01:00
Tomasz Grabiec	5c9fe83615	Merge "Sanitize sub-modules shutting down" from Pavel As suggested in issue #4586 here is the helper that prints "shutting down foo" message, then shuts the foo down, then prints the "[it] was successull" one. In between it catches the exception (if any) and warns this in logs. By "then" I mean literally then, not the seastar's then() :) Fixes: #4586	2019-11-26 15:14:22 +02:00
Piotr Sarna	9c5a5a5ac2	treewide: add names to semaphores By default, semaphore exceptions bring along very little context: either that a semaphore was broken or that it timed out. In order to make debugging easier without introducing significant runtime costs, a notion of named semaphore is added. A named semaphore is simply a semaphore with statically defined name, which is present in its errors, bringing valuable context. A semaphore defined as: auto sem = semaphore(0); will present the following message when it breaks: "Semaphore broken" However, a named semaphore: auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"}); will present a message with at least some debugging context: "Semaphore broken: io_concurrency_sem" It's not much, but it would really help in pinpointing bugs without having to inspect core dumps. At the same time, it does not incur any costs for normal semaphore operations (except for its creation), but instead only uses more CPU in case an error is actually thrown, which is considered rare and not to be on the hot path. Refs #4999 Tests: unit(dev), manual: hardcoding a failure in view building code	2019-11-26 15:14:21 +02:00
Avi Kivity	6fbb724140	conf: remove unsupported options from scylla.yaml (#5299 ) These unsupported options do nothing except to confuse users who try to tune them. Options removed: hinted_handoff_throttle_in_kb max_hints_delivery_threads batchlog_replay_throttle_in_kb key_cache_size_in_mb key_cache_save_period key_cache_keys_to_save row_cache_size_in_mb row_cache_save_period row_cache_keys_to_save counter_cache_size_in_mb counter_cache_save_period counter_cache_keys_to_save memory_allocator saved_caches_directory concurrent_reads concurrent_writes concurrent_counter_writes file_cache_size_in_mb index_summary_capacity_in_mb index_summary_resize_interval_in_minutes trickle_fsync trickle_fsync_interval_in_kb internode_authenticator native_transport_max_threads native_transport_max_concurrent_connections native_transport_max_concurrent_connections_per_ip rpc_server_type rpc_min_threads rpc_max_threads rpc_send_buff_size_in_bytes rpc_recv_buff_size_in_bytes internode_send_buff_size_in_bytes internode_recv_buff_size_in_bytes thrift_framed_transport_size_in_mb concurrent_compactors compaction_throughput_mb_per_sec sstable_preemptive_open_interval_in_mb inter_dc_stream_throughput_outbound_megabits_per_sec cross_node_timeout streaming_socket_timeout_in_ms dynamic_snitch_update_interval_in_ms dynamic_snitch_reset_interval_in_ms dynamic_snitch_badness_threshold request_scheduler request_scheduler_options throttle_limit default_weight weights request_scheduler_id	2019-11-26 15:14:21 +02:00
Amos Kong	817f34d1a9	ami: support new aws instance types: c5d, m5d, m5ad, r5d, z1d (#5330 ) Currently scylla_io_setup will skip in scylla_setup, because we didn't support those new instance types. I manually executed scylla_io_setup, and the scylla-server started and worked well. Let's apply this patch first, then check if there is some new problem in ami-test. Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-26 15:14:21 +02:00
Konstantin Osipov	90346236ac	cql: propagate const property through prepared statement tree. cql_statement is a class representing a prepared statement in Scylla. It is used concurrently during execution, so it is important that its change is not changed by execution. Add const qualifier to the execution methods family, throghout the cql hierarchy. Mark a few places which do mutate prepared statement state during execution as mutable. While these are not affecting production today, as code ages, they may become a source of latent bugs and should be moved out of the prepared state or evaluated at prepare eventually: cf_property_defs::_compaction_strategy_class list_permissions_statement::_resource permission_altering_statement::_resource property_definitions::_properties select_statement::_opts	2019-11-26 14:18:17 +03:00
Pavel Solodovnikov	2f442f28af	treewide: add const qualifiers throughout the code base	2019-11-26 02:24:49 +03:00
Pavel Emelyanov	50a1ededde	main: Remove now unused defer-with-log helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	a0f92d40ee	main: Shut down sighup handler with verbose helper And (!) fix the misprinted variable name. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	0719369d83	repair: Remove extra logging on shutdown The shutdown start/finish messages are already printed in verbose_shutdown() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	2d64fc3a3e	main: Shut down database with verbose_shutdown helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	636c300db5	main: Shut down prometheus with verbose_shutdown() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> --- v2: - Have stop easrlier so that exception in start/listen do not prevent prometheu.stop from calling	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	804b152527	main: Sanitize shutting down callbacks As suggested in issue #4586 here is the helper that prints "shutting down foo" message, then shuts the foo down, then prints the "shutting down foo was successfull". In between it catches the exception (if any) and warns this in logs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:45:49 +03:00
Nadav Har'El	4160b3630d	Merge "Return preimage from CDC only when it's enabled" Merged pull request https://github.com/scylladb/scylla/pull/5218 from Piotr Jastrzębski: Users should be able to decide whether they need preimage or not. There is already an option for that but it's not respected by the implementation. This PR adds support for this functionality. Tests: unit(dev). Individual patches: cdc: Don't take storage_proxy as transformer::pre_image_select param cdc::append_log_mutations: use do_with instead of shared_ptr cdc::append_log_mutations: fix undefined behavior cdc: enable preimage in test_pre_image_logging test cdc: Return preimage only when it's requested cdc: test both enabled and disabled preimage in test_pre_image_logging	2019-11-25 14:32:17 +02:00
Pavel Emelyanov	f6ac969f1e	mm: Stop migration manager Before stopping the db itself, stop the migration service. It must be stopped before RPC, but RPC is not stopped yet itself, so we should be safe here. Here's the tail of the resulting logs: INFO 2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager INFO 2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service INFO 2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished INFO 2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server INFO 2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete. Also -- stop the mm on drain before the commitlog it stopped. [Tomasz: mm needs the cl because pulling schema changes from other nodes involves applying them into the database. So cl/db needs to be stopped after mm is stopped.] The drain logs would look like ... INFO 2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service INFO 2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service INFO 2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED: and then on stop ... INFO 2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished INFO 2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server INFO 2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete. Fixes #5300 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191125080605.7661-1-xemul@scylladb.com>	2019-11-25 12:59:01 +01:00
Asias He	6ec602ff2c	repair: Fix rx_hashes_nr metrics (#5213 ) In get_full_row_hashes_with_rpc_stream and repair_get_row_diff_with_rpc_stream_process_op which were introduced in the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not updated correctly. In the test we have 3 nodes and run repair on node3, we makes sure the following metrics are correct. assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'], node3_metrics['scylla_repair_rx_hashes_nr']) assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'], node3_metrics['scylla_repair_tx_hashes_nr']) assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'], node3_metrics['scylla_repair_rx_row_nr']) assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'], node3_metrics['scylla_repair_tx_row_nr']) assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'], node3_metrics['scylla_repair_rx_row_bytes']) assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'], node3_metrics['scylla_repair_tx_row_bytes']) Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test Fixes: #5339 Backports: 3.2	2019-11-25 13:57:37 +02:00
Piotr Jastrzebski	2999cb5576	cdc: test both enabled and disabled preimage in test_pre_image_logging Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	222b94c707	cdc: Return preimage only when it's requested Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	c94a5947b7	cdc: enable preimage in test_pre_image_logging test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	595c9f9d32	cdc::append_log_mutations: fix undefined behavior The code was iterating over a collection that was modified at the same time. Iterators were used for that and collection modification can invalidate all iterators. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	f0f44f9c51	cdc::append_log_mutations: use do_with instead of shared_ptr This will not only safe some allocations but also improve code readability. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	b8d9158c21	cdc: Don't take storage_proxy as transformer::pre_image_select param transformer has access to storage_proxy through its _ctx field. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Nadav Har'El	3eab6cd549	Merged "toolchain: update to Fedora 31" Merged pull request https://github.com/scylladb/scylla/pull/5310 from Avi Kivity: This is a minor update as gcc and boost versions did not change. A noteable update is patchelf 0.10, which adds support to large binaries. A few minor issues exposed by the update are fixed in preparatory patches. Patches: dist: rpm: correct systemd post-uninstall scriptlet build: force xz compression on rpm binary payload tools: toolchain: update to Fedora 31	2019-11-24 13:38:45 +02:00
Tomasz Grabiec	e3d025d014	row_cache: Fix abort on bad_alloc during cache update Since `90d6c0b`, cache will abort when trying to detach partition entries while they're updated. This should never happen. It can happen though, when the update fails on bad_alloc, because the cleanup guard invalidates the cache before it releases partition snapshots (held by "update" coroutine). Fix by destroying the coroutine first. Fixes #5327. Tests: - row_cache_test (dev) Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>	2019-11-24 12:06:51 +02:00
Rafael Ávila de Espíndola	8599f8205b	rpmbuild: don't use dwz By default rpm uses dwz to merge the debug info from various binaries. Unfortunately, it looks like addr2line has not been updated to handle this: // This works $ addr2line -e build/release/scylla 0x1234567 $ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug // now this fails $ addr2line -e build/release/scylla 0x1234567 I think the issue is https://sourceware.org/bugzilla/show_bug.cgi?id=23652 Fixes #5289 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123015734.89331-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	25d5d39b3c	reloc: Force using sha1 for build-ids The default build-id used by lld is xxhash, which is 8 bytes long. rpm requires build-ids to be at least 16 bytes long (https://github.com/rpm-software-management/rpm/issues/950). We force using sha1 for now. That has no impact in gold and bfd since that is their default. We set it in here instead of configure.py to not slow down regular builds. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123020801.89750-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	b5667b9c31	build: don't compress debug info in executables By default we were compressing debug info only in release executables. The idea, if I understand it correctly, is that those are the ones we ship, so we want a more compact binary. I don't think that was doing anything useful. The compression is just gzip, so when we ship a .tar.xz, having the debug info compressed inside the scylla binary probably reduces the overall compression a bit. When building a rpm the situation in amusing. As part of the rpm build process the debug info is decompressed and extracted to an external file. Given that most of the link time goes to compressing debug info, it is probably a good idea to just skip that. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123022825.102837-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Tomasz Grabiec	d84859475e	Merge "Refactor test.py and cleanup resources" from Kostja Structure the code to be able to introduce futures. Apply trivial cleanups. Switch to asyncio and use it to work with processes and handle signals. Cleanup all processes upon signal.	2019-11-24 11:35:29 +02:00
Tomasz Grabiec	e166fdfa26	Merge "Optimize LWT query phase" from Vladimir Davydov This patch implements a simple optimization for LWT: it makes PAXOS prepare phase query locally and return the current value of the modified key so that a separate query is not necessary. For more details see patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial preparatory refactoring.	2019-11-24 11:35:29 +02:00
Pavel Solodovnikov	4879db70a6	system_keyspace: support timeouts in queries to `system.paxos` table. Also introduce supplementary `execute_cql_with_timeout` function. Remove redundant comment for `execute_cql`. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191121214148.57921-1-pa.solodovnikov@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	bf5f864d80	paxos: piggyback result query on prepare response Current LWT implementation uses at least three network round trips: - first, execute PAXOS prepare phase - second, query the current value of the updated key - third, propose the change to participating replicas (there's also learn phase, but we don't wait for it to complete). The idea behind the optimization implemented by this patch is simple: piggyback the current value of the updated key on the prepare response to eliminate one round trip. To generate less network traffic, only the closest to the coordinator replica sends data while other participating replicas send digests which are used to check data consistency. Note, this patch changes the API of some RPC calls used by PAXOS, but this should be okay as long as the feature in the early development stage and marked experimental. To assess the impact of this optimization on LWT performance, I ran a simple benchmark that starts a number of concurrent clients each of which updates its own key (uncontended case) stored in a cluster of three AWS i3.2xlarge nodes located in the same region (us-west-1) and measures the aggregate bandwidth and latency. The test uses shard-aware gocql driver. Here are the results: latency 99% (ms) bandwidth (rq/s) timeouts (rq/s) clients before after before after before after 1 2 2 626 637 0 0 5 4 3 2616 2843 0 0 10 3 3 4493 4767 0 0 50 7 7 10567 10833 0 0 100 15 15 12265 12934 0 0 200 48 30 13593 14317 0 0 400 185 60 14796 15549 0 0 600 290 94 14416 15669 0 0 800 568 118 14077 15820 2 0 1000 710 118 13088 15830 9 0 2000 1388 232 13342 15658 85 0 3000 1110 363 13282 15422 233 0 4000 1735 454 13387 15385 329 0 That is, this optimization improves max LWT bandwidth by about 15% and allows to run 3-4x more clients while maintaining the same level of system responsiveness.	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	6160b9017d	commitlog: make sure a file is closed If allocate or truncate throws, we have to close the file. Fixes #4877 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191114174810.49004-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	3d1d4b018f	paxos: remove unnecessary move constructor invocations invoke_on() guarantees that captures object won't be destroyed until the future returned by the invoked function is resolved so there's no need to move key, token, proposal for calling paxos_state::*_impl helpers.	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	cfb079b2c9	types: Refactor duplicated value_cast implementation The two implementations of value_cast were almost identical. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-3-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	ef2e96c47c	storage_proxy: factor out helper to sort endpoints by proximity We need it for PAXOS.	2019-11-24 11:35:29 +02:00
Nadav Har'El	854e6c8d7b	alternator-test: test_health_only_works_for_root_path: remove wrong check The test_health_only_works_for_root_path test checks that while Alternator's HTTP server responds to a "GET /" request with success ("health check"), it should respond to different URLs with failures (page not found). One of the URLs it tested was "/..", but unfortunately some versions of Python's HTTP client canonize this request to just a "/", causing the request to unexpectedly succeed - and the test to fail. So this patch just drops the "/.." check. A few other nonsense URLs are attempted by the test - e.g., "/abc". Fixes #5321 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	63d4590336	storage_proxy: move digest_algorithm upper We need it for PAXOS. Mark it as static inline while we are at it.	2019-11-24 11:35:29 +02:00
Nadav Har'El	43d3e8adaf	alternator: make DescribeTable return table schema One of the fields still missing in DescribeTable's response (Refs #5026) was the table's schema - KeySchema and AttributeDefinitions. This patch adds this missing feature, and enables the previously-xfailing test test_describe_table_schema. A complication of this patch is that in a table with secondary indexes, we need to return not just the base table's schema, but also the indexes' schema. The existing tests did not cover that feature, so we add here two more tests in test_gsi.py for that. One of these secondary-index schema tests, test_gsi_2_describe_table_schema, still fails, because it outputs a range-key which Scylla added to a view because of its own implementation needs, but wasn't in the user's definition of the GSI. I opened a separate issue #5320 for that. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	f5c2a23118	serializer: add reference_wrapper handling Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or std::tuple<> as T. We need it to avoid copying query::result while serializing paxos::promise.	2019-11-24 11:35:29 +02:00
Botond Dénes	89f9b89a89	scylla-gdb.py: scylla task_histogram: scan all tasks with -a or -s 0 Currently even if `-a` or `-s 0` is provided, `scylla task_histogram` will scan a limited amount of pages due to a bug in the scan loop's stop condition, which will be trigger a stop once the default sample limit is reached. Fix the loop by skipping this check when the user wants to scan all tasks. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	1452653fbc	query_context: fix use after free of timeout_config in execute_cql_with_timeout timeout_config is used by reference by cql3::query_processor::process(), see cql3::query_options, so the caller must make sure it doesn't go away.	2019-11-24 11:35:29 +02:00
Avi Kivity	ff7e78330c	tools: toolchain: dbuild: work around "podman logs --follow" hang At least some versions of 'podman logs --follow' hang when the container eventually exits (also happens with docker on recent versions). Fortunately, we don't need to use 'podman logs --follow' and can use the more natural non-detached 'podman run', because podman does not proxy SIGTERM and instead shuts down the container when it receives it. So, to work around the problem, use the same code path in interactive and non-interactive runs, when podman is in use instead of docker.	2019-11-22 13:59:05 +02:00
Avi Kivity	702834d0e4	tools: dbuild: avoid uid/gid/selinux hacks when using podman With docker, we went to considerable lengths to ensure that access to mounted volume was done using the calling user, including supplementary groups. This avoids root-owned files being left around after a build, and ensures that access to group-shared files (like /var/cache/ccache) works as expected. All of this is unnecessary and broken when using podman. Podman uses a proxy to access files on behalf of the container, so naturally all access is done using the calling user's identity. Since it remaps user and group IDs, assigning the host uid/gid is meaningless. Using --userns host also breaks, because sudo no longer works. Fix this by making all the uid/gid/selinux games specific to docker and ignore them when using podman. To preserve the functionality of tools that depend on $HOME, set that according to the host setting.	2019-11-22 13:58:29 +02:00
Tomasz Grabiec	9d7f8f18ab	database: Avoid OOMing with flush continuations after failed memtable flush The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717	2019-11-22 12:08:36 +01:00
Tomasz Grabiec	fb28543116	lsa: Introduce operator bool() to occupancy_stats	2019-11-22 12:08:28 +01:00
Tomasz Grabiec	a69fda819c	lsa: Expose region_impl::evictable_occupancy in the region class	2019-11-22 12:08:10 +01:00
Avi Kivity	1c181c1b85	tools: dbuild: don't mount duplicate volumes podman refuses to start with duplicate volumes, which routinely happen if the toplevel directory is the working directory. Detect this and avoid the duplicate.	2019-11-22 10:13:30 +02:00
Konstantin Osipov	b8b5834cf1	test.py: simplify message output in run_test()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	90a8f79d7e	test.py: use UnitTest class where possible	2019-11-21 23:16:22 +03:00
Konstantin Osipov	8cd8cfc307	test.py: rename harness command line arguments to 'options' UnitTest class uses juggles with the name 'args' quite a bit to construct the command line for a unit test, so let's spread the harness command line arguments from the unit test command line arguments a bit apart by consistently calling the harness command line arguments 'options', and unit test command line arguments 'args'. Rename usage() to parse_cmd_line().	2019-11-21 23:16:22 +03:00
Konstantin Osipov	e5d624d055	test.py: consolidate argument handling in UnitTest constructor Create unique UnitTest objects in find_tests() for each found match, including repeat, to ensure each test has its own unique id. This will also be used to store execution state in the test.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	dd60673cef	test.py: move --collectd to standard args	2019-11-21 23:16:22 +03:00
Konstantin Osipov	fe12f73d7f	test.py: introduce class UnitTest	2019-11-21 23:16:22 +03:00
Konstantin Osipov	bbcdee37f7	test.py: add add_test_list() to find_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	4723afa09c	test.py: add long tests with add_test()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	13f1e2abc6	test.py: store the non-default seastar arguments along with definition	2019-11-21 23:16:22 +03:00
Konstantin Osipov	72ef11eb79	test.py: introduce add_test() to find_tests() To avoid code duplication, and to build upon later.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	b50b24a8a7	test.py: avoid an unnecessary loop in find_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	a5103d0092	test.py: move args.repeat processing to find_tests() It somewhat stands in the way of using asyncio This patch also implements a more comprehensive fix for #5303, since we not only have --repeat, but run some tests in different configurations, in which case xml output is also overwritten.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	0f0a49b811	test.py: introduce print_summary() and write_xunit_report() (One more moving of the code around).	2019-11-21 23:16:22 +03:00
Konstantin Osipov	22166771ef	test.py: rename test_to_run tests_to_run	2019-11-21 23:16:22 +03:00
Konstantin Osipov	1d94d9827e	test.py: introduce run_all_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	29087e1349	test.py: move out run_test() routine (Trivial code refactoring.)	2019-11-21 23:16:22 +03:00
Konstantin Osipov	79506fc5ab	test.py: introduce find_tests() Trivial code refactoring.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	a44a1c4124	test.py: remove print_status_succint (Trivial code cleanup.)	2019-11-21 23:16:22 +03:00
Konstantin Osipov	b9605c1d37	test.py: move mode list evaluation to usage()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	0c4df5a548	test.py: add usage()	2019-11-21 23:16:22 +03:00
Pavel Emelyanov	e0f40ed16a	cli: Add the --workdir\|-W option When starting scylla daemon as non-root the initialization fails because standard /var/lib/scylla is not accessible by regular users. Making the default dir accessible for user is not very convenient either, as it will cause conflicts if two or more instances of scylla are in use. This problem can be resolved by specifying --commitlog-directory, --data-file-directories, etc on start, but it's too much typing. I propose to revive Nadav's --home option that allows to move all the directories under the same prefix in one go. Unlike Nadav's approach the --workdir option doesn't do any tricky manipulations with existing directories. Insead, as Pekka suggested, the individual directories are placed under the workir if and only if the respective option is NOT provided. Otherwise the directory configuration is taken as is regardless of whether its absolute or relative path. The values substutution is done early on start. Avi suggested that this is unsafe wrt HUP config re-read and proper paths must be resolved on the fly, but this patch doesn't address that yet, here's why. First of all, the respective options are MustRestart now and the substitution is done before HUP handler is installed. Next, commitlog and data_file values are copied on start, so marking the options as LiveUpdate won't make any effect. Finally, the existing named_value::operator() returns a reference, so returning a calculated (and thus temporary) value is not possible (from my current understanding, correct me if I'm wrong). Thus if we want the _directory() to return calculated value all callers of them must be patched to call something different (e.g. _directory.get() ?) which will lead to more confusion and errors. Changes v3: - the option is --workdir back again - the existing *directory are only affected if unset - default config doesn't have any of these set - added the short -W alias Changes v2: - the option is --home now - all other paths are changed to be relative Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191119130059.18066-1-xemul@scylladb.com>	2019-11-21 15:07:39 +02:00
Rafael Ávila de Espíndola	5417c5356b	types: Move get_castas_fctn to cql3 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-9-espindola@scylladb.com>	2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola	f06d6df4df	types: Simplify casts to string These now just use the to_string member functions, which makes it possible to move the code to another file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-8-espindola@scylladb.com>	2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola	786b1ec364	types: Move json code to its own file Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-7-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola	af8e207491	types: Avoid using deserialize_value in json code This makes it independent of internal functions and makes it possible to move it to another file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-6-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola	ed65e2c848	types: Move cql3_kind to the cql3 directory Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-5-espindola@scylladb.com>	2019-11-21 12:08:47 +02:00
Rafael Ávila de Espíndola	bd560e5520	types: Fix dynamic types of some data_value objects I found these mismatched types while converting some member functions to standalone functions, since they have to use the public API that has more type checks. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-4-espindola@scylladb.com>	2019-11-21 12:08:46 +02:00
Rafael Ávila de Espíndola	0d953d8a35	types: Add a test for value_cast We had no tests on when value_cast throws or when it moves the value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-2-espindola@scylladb.com>	2019-11-21 12:08:45 +02:00
Konstantin Osipov	002ff51053	lua: make sure the latest master builds on Debian/Ubuntu Use pkg-config to search for Lua dependencies rather than hard-code include and link paths. Avoid using boost internals, not present in earlier versions of boost. Reviewed-by: Rafael Avila de Espindola <espindola@scylladb.com> Message-Id: <20191120170005.49649-1-kostja@scylladb.com>	2019-11-21 07:57:12 +02:00
Pavel Solodovnikov	d910899d61	configure.py: support multi-threaded linking via `gold` Use `-Wl,--threads` flag to enable multi-threaded linking when using `ld.gold` linker. Additional compilation test is required because it depends on whether or not the `gold` linker has been compiled with `--enable-threads` option. This patch introduces a substantial improvement to the link times of `scylla` binary in release and debug modes (around 30 percent). Local setup reports the following numbers with release build for linking only build/release/scylla: Single-threaded mode: Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.30 Multi-threaded mode: Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.57 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191120163922.21462-1-pa.solodovnikov@scylladb.com>	2019-11-20 19:28:00 +02:00
Nadav Har'El	89d6d668cb	Merge "Redis API in Scylla" Merged patch series from Peng Jian, adding optionally-enabled Redis API support to Scylla. This feature is experimental, and partial - the extent of this support is detailed in docs/redis/redis.md. Patches: Document: add docs/redis/redis.md redis: Redis API in Scylla Redis API: graft redis module to Scylla redis-test: add test cases for Redis API	2019-11-20 16:59:13 +02:00
Piotr Sarna	086e744f8f	scripts/find-maintainer: refresh maintainers list This commit attempts to make the maintainers list up-to-date to the best of my knowledge, because it got really stale over the time. Message-Id: <eab6d3f481712907eb83e91ed2b8dbfa0872155f.1574261533.git.sarna@scylladb.com>	2019-11-20 16:56:31 +02:00
Glauber Costa	73aff1fc95	api: export system uptime via REST This will be useful for tools like nodetool that want to query the uptime of the system. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190619110850.14206-1-glauber@scylladb.com>	2019-11-20 16:44:11 +02:00
Tomasz Grabiec	9a686ac551	Merge "scylla-gdb: active sstables: support k_l/mc sstable readers" from Benny Fixes #5277	2019-11-19 23:49:39 +01:00
Avi Kivity	1164ff5329	tools: toolchain: update to Fedora 31 This is a minor update as gcc and boost versions do not change. glibc-langpack-en no longer gets pulled in by default. As it is required by some locale use somewhere, it is added to the explicit dependencies.	2019-11-20 00:08:30 +02:00
Avi Kivity	301c835cbf	build: force xz compression on rpm binary payload Fedora 31 switched the default compression to zstd, which isn't readable by some older rpm distributions (CentOS 7 in particular). Tell it to use the older xz compression instead, so packages produced on Fedora 31 can be installed on older distributions.	2019-11-20 00:08:24 +02:00
Avi Kivity	3ebd68ef8a	dist: rpm: correct systemd post-uninstall scriptlet The post-uninstall scriptlet requires a parameter, but older versions of rpm survived without it. Fedora 31's rpm is more strict, so supply this parameter.	2019-11-20 00:03:49 +02:00
Peng Jian	e6adddd8ef	redis-test: add test cases for Redis API Signed-off-by: Peng Jian <pengjian.uestc@gmail.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-20 04:56:16 +08:00
Peng Jian	f2801feb66	Redis API: graft redis module to Scylla In this document, the detailed design and implementation of Redis API in Scylla is provided. v2: build: work around ragel 7 generated code bug (suggested by Avi) Ragel 7 incorrectly emits some unused variables that don't compile. As a workaround, sed them away. Signed-off-by: Peng Jian <pengjian.uestc@gmail.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-20 04:55:58 +08:00
Peng Jian	0737d9e84d	redis: Redis API in Scylla Scylla has advantage and amazing features. If Redis build on the top of Scylla, it has the above features automatically. It's achived great progress in cluster master managment, data persistence, failover and replication. The benefits to the users are easy to use and develop in their production environment, and taking avantages of Scylla. Using the Ragel to parse the Redis request, server abtains the command name and the parameters from the request, invokes the Scylla's internal API to read and write the data, then replies to client. Signed-off-by: Peng Jian, <pengjian.uestc@gmail.com>	2019-11-20 04:55:56 +08:00
Peng Jian	708a42c284	Document: add docs/redis/redis.md In this document, the detailed design and implementation of Redis API in Scylla is provided. Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-11-20 04:46:33 +08:00
Nadav Har'El	9b9609c65b	merge: row_marker: correct row expiry condition Merged patch set by Piotr Dulikowski: This change corrects condition on which a row was considered expired by its TTL. The logic that decides when a row becomes expired was inconsistent with the logic that decides if a single cell is expired. A single cell becomes expired when expiry_timestamp <= now, while a row became expired when expiry_timestamp < now (notice the strict inequality). For rows inserted with TTL, this caused non-key cells to expire (change their values to null) one second before the row disappeared. Now, row expiry logic uses non-strict inequality. Fixes #4263, Fixes #5290. Tests: unit(dev) python test described in issue #5290	2019-11-19 18:14:15 +02:00
Amnon Heiman	9df10e2d4b	scylla_util.py: Add optional timeout to out function It is useful to have an option to limit the execution time of a shell script. This patch adds an optional timeout parameter, if a parameter will be provided a command will return and failure if the duration is passed. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-11-19 17:30:28 +02:00
Nadav Har'El	b38c3f1288	Merge "Add separate counters for accesses to system tables" Merged patch series from Juliusz Stasiewicz: Welcome to my first PR to Scylla! The task was intended as a warm-up ("noob") exercise; its description is here: #4182 Sorry, I also couldn't help it and did some scouting: edited descriptions of some metrics and shortened few annoyingly long LoC.	2019-11-19 15:21:56 +02:00
Piotr Dulikowski	9be842d3d8	row_marker: tests for row expiration	2019-11-19 13:45:30 +01:00
Tomasz Grabiec	5e4abd75cc	main: Abort on EBADF and ENOTSOCK by default Those are typically symptoms of use-after-free or memory corruption in the program. It's better to catch such error sooner than later. That situation is also dangerous since if a valid descriptor would land under the invalid access, not the one which was intended for the operation, then the operation may be performed on the wrong file and result in corruption. Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>	2019-11-19 13:07:33 +02:00
Piotr Dulikowski	589313a110	row_marker: correct expiration condition This change corrects condition on which a row was considered expired by its TTL. The logic that decides when a row becomes expired was inconsistent with the logic that decides if a single cell is expired. A single cell becomes expired when `expiry_timestamp <= now`, while a row became expired when `expiry_timestamp < now` (notice the strict inequality). For rows inserted with TTL, this caused non-key cells to expire (change their values to null) one second before the row disappeared. Now, row expiry logic uses non-strict inequality. Fixes: #4263, #5290. Tests: - unit(dev) - python test described in issue #5290	2019-11-19 11:46:59 +01:00
Pekka Enberg	505f2c1008	test.py: Append test repeat cycle to output XML filename Currently, we overwrite the same XML output file for each test repeat cycle. This can cause invalid XML to be generated if the XML contents don't match exactly for every iteration. Fix the problem by appending the test repeat cycle in the XML filename as follows: $ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test $ ls -1 *.xml jenkins_test.release.vint_serialization_test.0.boost.xml jenkins_test.release.vint_serialization_test.1.boost.xml jenkins_test.release.vint_serialization_test.2.boost.xml Fixes #5303. Message-Id: <20191119092048.16419-1-penberg@scylladb.com>	2019-11-19 11:30:47 +02:00
Rafael Ávila de Espíndola	750adee6e3	lua: fix build with boost 1.67 and older vs fmt It is not completely clear why the fmt base code fails with boost 1.67, but it is easy to avoid. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191118210540.129603-1-espindola@scylladb.com>	2019-11-19 11:14:00 +02:00
Tomasz Grabiec	ff567649fa	Merge "gossip: Limit number of pending gossip ACK and ACK2 messages" from Asias In a cross-dc large cluster, the receiver node of the gossip SYN message might be slow to send the gossip ACK message. The ack messages can be large if the payload of the application state is big, e.g., CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK message can consume unlimited amount of memory which causes OOM eventually. To fix, this patch queues the SYN message and handles it later if the previous ACK message is still being sent. However, we only store the latest SYN message. Since the latest SYN message from peer has the latest information, so it is safe to drop the previous SYN message and keep the latest one only. After this patch, there can be at most 1 pending SYN message and 1 pending ACK message per peer node.	2019-11-18 10:52:38 +01:00
Benny Halevy	f9e93bba38	sstables: compaction: move cleanup parameter to compaction_descriptor Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>	2019-11-18 10:52:20 +01:00
Avi Kivity	1fe062aed4	Merge "Add basic UDF support" from Rafael " This patch series adds only UDF support, UDA will be in the next patch series. With this all CQL types are mapped to Lua. Right now we setup a new lua state and copy the values for each argument and return. This will be optimized once profiled. We require --experimental to enable UDF in case there is some change to the table format. " * 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits) Lua: Document the conversions between Lua and CQL Lua: Implement decimal subtraction Lua: Implement decimal addition Lua: Implement support for returning decimal Lua: Implement decimal to string conversion Lua: Implement decimal to floating point conversion Lua: Implement support for decimal arguments Lua: Implement support for returning varint Lua: Implement support for returning duration Lua: Implement support for duration arguments Lua: Implement support for returning inet Lua: Implement support for inet arguments Lua: Implement support for returning time Lua: Implement support for time arguments Lua: Implement support for returning timeuuid Lua: Implement support for returning uuid Lua: Implement support for uuid and timeuuid arguments Lua: Implement support for returning date Lua: Implement support for date arguments Lua: Implement support for returning timestamp ...	2019-11-17 16:38:19 +02:00
Konstantin Osipov	48f3ca0fcb	test.py: use the configured build modes from ninja mode_list Add mode_list rule to ninja build and use it by default when searching for tests in test.py. Now it is no longer necessary to explicitly specify the test mode when invoking test.py. (cherry picked from commit a211ff30c7f2de12166d8f6f10d259207b462d4b)	2019-11-17 13:42:10 +01:00
Nadav Har'El	2fb2eb27a2	sstables: allow non-traditional characters in table name The goal of this patch is to fix issue #5280, a rather serious Alternator bug, where Scylla fails to restart when an Alternator table has secondary indexes (LSI or GSI). Traditionally, Cassandra allows table names to contain only alphanumeric characters and underscores. However, most of our internal implementation doesn't actually have this restriction. So Alternator uses the characters ':' and '!' in the table names to mark global and local secondary indexes, respectively. And this actually works. Or almost... This patch fixes a problem of listing, during boot, the sstables stored for tables with such non-traditional names. The sstable listing code needlessly assumes that the directory name, i.e., the CF names, matches the "\w+" regular expression. When an sstable is found in a directory not matching such regular expression, the boot fails. But there is no real reason to require such a strict regular expression. So this patch relaxes this requirement, and allows Scylla to boot with Alternator's GSI and LSI tables and their names which include the ":" and "!" characters, and in fact any other name allowed as a directory name. Fixes #5280. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191114153811.17386-1-nyh@scylladb.com>	2019-11-17 14:27:47 +02:00
Shlomi Livne	3e873812a4	Document backport queue and procedure (#5282 ) This document adds information about how fixes are tracked to be backported into releases and what is the procedure that is followed to backport those fixes. Signed-off-by: Shlomi Livne <shlomi@scylladb.com>	2019-11-17 01:45:24 -08:00
Benny Halevy	c215ad79a9	scylla-gdb: resolve: add startswith parameter Allow filtering the resolved addresses by a startswith string. The common use case if for resolving vtable ptrs, when resolving the output of `find_vptrs` that may be too long for the host (running gdb) memory size. In this case the number of vtable ptrs is considerably smaller than the total number of objects returned by find_ptrs (e.g. 462 vs. 69625 in a OOM core I examined from scylla --smp=2 --memory=1024M) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-17 11:40:54 +02:00
Benny Halevy	2f688dcf08	scylla-gdb.py: find_single_sstable_readers: fix support for sstable_mutation_reader provide template arguments for k_l and m readers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-17 11:02:05 +02:00
Kamil Braun	a67e887dea	sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285 ) CQL tracing would only report file I/O involving one sstable, even if multiple sstables were read from during the query. Steps to reproduce: create a table with NullCompactionStrategy insert row, flush memtables insert row, flush memtables restart Scylla tracing on select * from table The trace would only report DMA reads from one of the two sstables. Kudos to @denesb for catching this. Related issue: #4908	2019-11-17 00:38:37 -08:00
Tomasz Grabiec	a384d0af76	Merge "A set of cleanups over main() code" from Pavel E. There are ... signs of massive start/stop code rework in the main() function. While fixing the sub-modules interdependencies during start/stop I've polished these signs too, so here's the simplest ones.	2019-11-15 15:25:18 +01:00
Pavel Emelyanov	1dc490c81c	tracing: Move register_tracing_keyspace_backend forward decl into proper header Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	7e81df71ba	main: Shorten developer_mode() evaluation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	1bd68d87fc	main: Do not carry pctx all over the code v2: - do not use struct initialization extention Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	655b6d0d1e	main: Hide start_thrift Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	26f2b2ce5e	main,db: Kill some unused .hh includes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	f5b345604f	main: Factor out get_conf_sub Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	924d52573d	main: Remove unused return_value variable (and capture) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	2195edb819	gitignore: Add tags file This file is generated by ctags utility for navigation, so it is not to be tracked by git. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191031221339.19030-1-xemul@scylladb.com>	2019-11-14 16:50:11 +01:00
Gleb Natapov	e0668f806a	lwt: change format of partition key serialization for system.paxos table Serialize provided partition_key in such a way that the serialized value will hash to the same token as the original key. This way when system.paxos table is updated the update is shard local. Message-Id: <20191114135449.GU10922@scylladb.com>	2019-11-14 15:07:16 +01:00
Avi Kivity	19b665ea6b	Merge "Correctly handle null/unset frozen collection/UDT columns in INSERT JSON." from Kamil " When using INSERT JSON with frozen collection/UDT columns, if the columns were left unspecified or set to null, the statement would create an empty non-null value for these columns instead of using null values as it should have. For example: cqlsh:b> create table t (k text primary key, l frozen<list<int>>, m frozen<map<int, int>>, s frozen<set<int>>, u frozen<ut>); cqlsh:b> insert into t JSON '{"k": "insert_json"}'; cqlsh:b> select * from t; k \| l \| m \| s \| u -------------------+------+------+------+------ insert_json \| [] \| {} \| {} \| This PR fixes this. Resolves #5246 and closes #5270. " * 'frozen-json' of https://github.com/kbr-/scylla: tests: add null/unset frozen collection/UDT INSERT JSON test cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON cql3: decouple execute from term binding in user_type::setter	2019-11-14 15:29:30 +02:00
Avi Kivity	4544aa0b34	Update seastar submodule * seastar 75e189c6ba...6f0ef32514 (6): > Merge "Add named semaphores" from Piotr > parallel_for_each_state: pass rvalue reference to add_future > future: Pass rvalue to uninitialized_wrapper::uninitialized_set. > dependencies: Add libfmt-dev to debian > log: Fix logger behavior when logging both to stdout and syslog. > README.md: list Scylla among the projects using Seastar	2019-11-14 15:01:18 +02:00
Juliusz Stasiewicz	1cfa458409	metrics: separate counters for `system' KS accesses Resolves #4182. Metrics per system tables are accumulated separately, depending on the origin of query (DB internals vs clients).	2019-11-14 13:14:39 +01:00
Vladimir Davydov	ab42b72c6d	cql: fix SERIAL consistency check for batch statements If CONSISTENCY is set to SERIAL or LOCAL SERIAL, all write requests must fail according to Cassandra's documentation. However, batched writes bypass this check. Fix this.	2019-11-14 12:15:39 +01:00
Vladimir Davydov	25aeefd6f3	cql: fix CAS consistency level validation This patch resurrects Cassandra's code validating a consistency level for CAS requests. Basically, it makes CAS requests use a special function instead of validate_for_write to make error messages more coherent. Note, we don't need to resurrect requireNetworkTopologyStrategy as EACH_QUORUM should work just fine for both CAS and non-CAS writes. Looks like it is just an artefact of a rebase in the Cassandra repository.	2019-11-14 12:15:39 +01:00
Juliusz Stasiewicz	b1e4d222ed	cql3: cosmetics - improved description of metrics	2019-11-14 10:35:42 +01:00
Avi Kivity	cd075e9132	reloc: do not install dependencies when building the relocatable package The dependencies are provided by the frozen toolchain. If a dependency is missing, we must update the toolchain rather than rely on build-time installation, which is not reproducible (as different package versions are available at different times). Luckily "dnf install" does not update an already-installed package. Had that been a case, none of our builds would have been reproducible, since packages would be updated to the latest version as of the build time rather than the version selected by the frozen toolchain. So, to prevent missing packages in the frozen toolchain translating to an unreproducible build, remove the support for installing dependencies from reloc/build_reloc.sh. We still parse the --nodeps option in case some script uses it. Fixes #5222. Tests: reloc/build_reloc.sh.	2019-11-14 09:37:14 +02:00
Gleb Natapov	552c56633e	storage_proxy: do not release mutation if not all replies were received MV backpressure code frees mutation for delayed client replies earlier to save memory. The commit `2d7c026d6e` that introduced the logic claimed to do it only when all replies are received, but this is not the case. Fix the code to free only when all replies are received for real. Fixes #5242 Message-Id: <20191113142117.GA14484@scylladb.com>	2019-11-13 16:23:19 +02:00
Raphael S. Carvalho	3e70523111	distributed_loader: Release disk space of SSTables deleted by resharding Resharding is responsible for the scheduling the deletion of sstables resharded, but it was not refreshing the cache of the shards those sstables belong to, which means cache was incorrectly holding reference to them even after they were deleted. The consequence is sstables deleted by resharding not having their disk space freed until cache is refreshed by a subsequent procedure that triggers it. Fixes #5261. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20191107193550.7860-1-raphaelsc@scylladb.com>	2019-11-13 16:03:27 +02:00
Avi Kivity	6aed3b7471	Merge "cql: trivial cleanup" from Vova * 'cql-trivial-cleanup' of ssh://github.com/scylladb/scylla-dev: cql: rename modification_statement::_sets_a_collection to _selects_a_collection cql: rename _column_conditions to _regular_conditions cql: remove unnecessary optional around prefetch_data	2019-11-13 15:12:10 +02:00
Avi Kivity	1cb9f9bdfe	Merge "Use a fixed-size bitset for column set" from Kostja " Use a fixed-size, rather than a dynamically growing bitset for column mask. This avoids unnecessary memory reallocation in the most common case. " * 'column_set' of ssh://github.com/scylladb/scylla-dev: schema: pre-allocate the bitset of column_set schema: introduce schema::all_columns_count() schema: rename column_mask to column_set	2019-11-13 15:08:13 +02:00
Tomasz Grabiec	f68e17eb52	Merge "Partition/row hit/miss counters for memtable write operations" from Piotr D. Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-13 13:11:51 +01:00
Juliusz Stasiewicz	8318a6720a	cql3: error msg w/ arg counts for prepared stmts with wrong arg cnt Fixes #3748. Very small change: added argument count (expectation vs. reality) to error msg within `invalid_request_exception'.	2019-11-13 13:43:37 +02:00
Nadav Har'El	ccb9038c69	alternator: Implement Expected operators LT and GT Merged patch series from Dejan Mircevski. Implements the "LT" and "GT" operators of the Expected update option (i.e., conditional updates), and enables the pre-existing tests for them.	2019-11-13 12:07:44 +02:00
Konstantin Osipov	6159c012db	schema: pre-allocate the bitset of column_set The number of columns is usually small, and avoiding a resize speeds up bit manipulation functions.	2019-11-13 11:41:51 +03:00
Konstantin Osipov	e95d675567	schema: introduce schema::all_columns_count() schema::all_columns_count() will be used to reserve memory of the column_set bitmask.	2019-11-13 11:41:42 +03:00
Konstantin Osipov	191acec7ab	schema: rename column_mask to column_set Since it contains a precise set of columns, it's more accurate to call it a set, not a mask. Besides, the name column_mask is already used for column options on storage level.	2019-11-13 11:41:30 +03:00
Kamil Braun	d6446e352e	tests: add null/unset frozen collection/UDT INSERT JSON test When using INSERT JSON with null/unspecified frozen collection/UDT columns, the columns should be set to null. See #5270.	2019-11-12 18:24:47 +01:00
Vladimir Davydov	8110178e5d	cql: rename modification_statement::_sets_a_collection to _selects_a_collection This is merely to avoid confusion: we use _sets prefix to indicate that there are operations over static/regular columns (_sets_static_columns, _sets_regular_columns), but _sets_a_collection is set for both operations and conditions. So let's rename it to _selects_a_collection and add some comments.	2019-11-12 20:15:42 +03:00
Vladimir Davydov	a19192950e	cql: rename _column_conditions to _regular_conditions It's weird that modification_statement has _static_conditions for conditions on static columns and _column_conditions for conditions on regular columns, as if conditions on static columns are not column conditions. Let's rename _column_conditions to _regular_conditions to avoid confusion.	2019-11-12 20:15:35 +03:00
Konstantin Osipov	0ad0369684	cql: remove unnecessary optional around prefetch_data	2019-11-12 20:15:24 +03:00
Kamil Braun	6c04c5bed5	cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON Before this commit, an empty non-null value was created for frozen collection/UDT columns when an INSERT JSON statement was executed with the value left unspecified or set to null. This was incompatible with Cassandra which inserted a null (dead cell). Fixes #5270.	2019-11-12 18:05:01 +01:00
Kamil Braun	0ad7d71f31	cql3: decouple execute from term binding in user_type::setter This commit makes it possible to pass a bound value terminal directly to the setter. Continuation of commit `bfe3c20035`.	2019-11-12 18:02:21 +01:00
Takuya ASADA	614ec6fc35	install.sh: drop --pkg option, use .install file on .deb package --pkg option on install.sh is introduced for .deb packaging since it requires different install directory for each subpackage. But we actually able to use "debian/tmp" for shared install directory, then we can specify file owner of the package using .install files. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191030203142.31743-1-syuu@scylladb.com>	2019-11-12 16:50:37 +02:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Piotr Dulikowski	48f7b2e4fb	table: move out table::stats to table_stats This change was done in order to be able to forward-declare the table::stats structure.	2019-11-12 13:35:41 +01:00
Avi Kivity	cf7291462d	Merge "cql3/functions: add missing min/max/count functions for ascii type" from Piotr " Adds missing overloads of functions count, min, max for type ascii. Now they work: cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE ks; cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd'); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh'); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl'); cqlsh:ks> SELECT * FROM test_ascii; id \| value ----+------- 1 \| efgh 0 \| abcd 2 \| ijkl (3 rows) cqlsh:ks> SELECT count(value) FROM test_ascii; system.count(value) --------------------- 3 (1 rows) cqlsh:ks> SELECT min(value) FROM test_ascii; system.min(value) ------------------- abcd (1 rows) cqlsh:ks> SELECT max(value) FROM test_ascii; system.max(value) ------------------- ijkl (1 rows) Tests: unit(release) cql_group_functions_tests.py (with added check for ascii type) Fixes #5147. " * '5147-fix-min-max-count-for-ascii' of https://github.com/piodul/scylla: tests/cql_query_test: add aggregate functions test cql3/functions: add missing min/max/count for ascii	2019-11-12 14:15:14 +02:00
Piotr Dulikowski	41cb16a526	tests/cql_query_test: add aggregate functions test Adds a test for min, max and avg functions for those primitive types for which those functions are working at the moment.	2019-11-12 13:01:34 +01:00
Piotr Dulikowski	6d78d7cc69	cql3/functions: add missing min/max/count for ascii Adds missing overloads of functions `count`, `min`, `max` for type `ascii`. Now they work: cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE ks; cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd'); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh'); cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl'); cqlsh:ks> SELECT * FROM test_ascii; id \| value ----+------- 1 \| efgh 0 \| abcd 2 \| ijkl (3 rows) cqlsh:ks> SELECT count(value) FROM test_ascii; system.count(value) --------------------- 3 (1 rows) cqlsh:ks> SELECT min(value) FROM test_ascii; system.min(value) ------------------- abcd (1 rows) cqlsh:ks> SELECT max(value) FROM test_ascii; system.max(value) ------------------- ijkl (1 rows) Tests: - unit(release) - cql_group_functions_tests.py (with added check for `ascii` type) Fixes #5147.	2019-11-12 13:01:34 +01:00
Rafael Ávila de Espíndola	10bcbaf348	Lua: Document the conversions between Lua and CQL Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	6ffddeae5e	Lua: Implement decimal subtraction Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	aba8e531d1	Lua: Implement decimal addition Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bb84eabbb3	Lua: Implement support for returning decimal Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bc17312a86	Lua: Implement decimal to string conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	e83d5bf375	Lua: Implement decimal to floating point conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b568bf4f54	Lua: Implement support for decimal arguments This is just the minimum to pass a value to Lua. Right now you can't actually do anything with it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	6c3f050eb4	Lua: Implement support for returning varint Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dc377abd68	Lua: Implement support for returning duration Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	c3f021d2e4	Lua: Implement support for duration arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9208b2f498	Lua: Implement support for returning inet Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	64be94ab01	Lua: Implement support for inet arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	faf029d472	Lua: Implement support for returning time Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	772f2a4982	Lua: Implement support for time arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	484f498534	Lua: Implement support for returning timeuuid Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9c2daf6554	Lua: Implement support for returning uuid Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ae1a1a4085	Lua: Implement support for uuid and timeuuid arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	f8aeed5beb	Lua: Implement support for returning date Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	384effa54b	Lua: Implement support for date arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	63bc960152	Lua: Implement support for returning timestamp Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ee95756f62	Lua: Implement support for timestamp arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	1c6d5507b4	Lua: Implement support for returning counter Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	0d9d53b5da	Lua: Implement support for counter arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	74c4e58b6b	Lua: Add a test for nested types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b226511ce8	Lua: Implement support for returning maps Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	5c8d1a797f	Lua: Implement support for map arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b5b15ce4e6	Lua: Implement support for returning set Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	cf7ba441e4	Lua: Implement support for set arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	02f076be43	Lua: Implement support for returning udt Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	92c8e94d9a	Lua: Implement support for udt arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	a7c3f6f297	Lua: Implement support for returning list Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	688736f5ff	Lua: Implement support for returning tuple Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ab5708a711	Lua: Implement support for list and tuple arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	534f29172c	Lua: Implement support for returning boolean Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b03c580493	Lua: Implement support for boolean arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dcfe397eb6	Lua: Implement support for returning floating point Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	cf4b7ab39a	Lua: Implement support for returning blob Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	3d22433cd4	Lua: Implement support for blob arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dd754fcf01	Lua: Implement support for returning ascii Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	affb1f8efd	Lua: Implement support for returning text Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	f8ed347ee7	Lua: Implement support for string arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	0e4f047113	Lua: Implement a visitor for return values This adds support for all integer types. Followup commits will implement the missing types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	34b770e2fb	Lua: Push varint as decimal This makes it substantially simpler to support both varint and decimal, which will be implemented in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9b3cab8865	Lua: Implement support for varint to integer conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	5a40264d97	Lua: Implement support for varint arguments Right now it is not possible to do anything with the value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	3230b8bd86	Lua: Implement support for floating point arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9ad2cc2850	Lua: Implement a visitor for arguments With this we support all simple integer types. Followup patches will implement the missing types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ee1d87a600	Lua: Plug in the interpreter This add a wrapper around the lua interpreter so that function executions are interruptible and return futures. With this patch it is possible to write and use simple UDFs that take and return integer values. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bc3bba1064	Lua: Add lua.cc and lua.hh skeleton files Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	7015e219ca	Lua: Link with liblua Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	61200ebb04	Lua: Add config options This patch just adds the config options that we will expose for the lua runtime. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	d9337152f3	Use threads when executing user functions This adds a requires_thread predicate to functions and propagates that up until we get to code that already returns futures. We can then use the predicate to decide if we need to use seastar::async. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	52b48b415c	Test that schema digests with UDFs don't change This refactors test_schema_digest_does_not_change to also test a schema with user defined functions and user defined aggregates. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	fc72a64c67	Add schema propagation and storage for UDF With this it is possible to create user defined functions and aggregates and they are saved to disk and the schema change is propagated. It is just not possible to call them yet. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ce6304d920	UDF: Add a feature and config option to track if udf is enabled It can only be enabled with --experimental. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:40:47 -08:00
Rafael Ávila de Espíndola	dd17dfcbef	Reject "OR REPLACE ... IF NOT EXISTS" in the grammar The parser now rejects having both OR REPLACE and IF NOT EXISTS in the same statement. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	e7e3dab4aa	Convert UDF parsing code to c++ For now this just constructs the corresponding c++ classes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	5c45f3b573	Update UDF syntax This updates UDF syntax to the current specification. In particular, this removes DETERMINISTIC and adds "CALLED ON NULL INPUT" and "RETURNS NULL ON NULL INPUT". Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	c75cd5989c	transport: Add support for FUNCTION and AGGREGATE to schema_change While at it, modernize the code a bit and add a test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	dac3cf5059	Clear functions between cql_test_env runs At some point we should make the function list non static, but this allows us to write tests for now. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	de1a970b93	cql: convert functions to add, remove and replace functions Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	33f9d196f9	Add iterator version of functions::find This avoids allocating a std::vector and is more flexible since the iterator can be passed to erase. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	7f9dadee5c	Implement functions::type_equals. Since the types are uniqued we can just use ==. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	5cef5a1b38	types: Add a friend visitor over data_value This is a simple wrapper that allows code that is not in the types hierarchy to visit a data_value. Will be used by UDF. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	9bf9a84e4d	types: Move the data_value visitor to a header It will be used by the UDF implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Yaron Kaikov	4a9b2a8d96	dist/docker: Add SCYLLA_REPO_URL argument to Dockerfile (#5264 ) This change adds a SCYLLA_REPO_URL argument to Dockerfile, which defines the RPM repository used to install Scylla from. When building a new Docker image, users can specify the argument by passing the --build-arg SCYLLA_REPO_URL=<url> option to the docker build command. If the argument is not specified, the same RPM repository is used as before, retaining the old default behavior. We intend to use this in release engineering infrastructure to specify RPM repositories for nightly builds of release branches (for example, 3.1.x), which are currently only using the stable RPMs.	2019-11-07 09:21:05 +02:00
Pavel Emelyanov	486e3f94d0	deps: Add libunistring-dev to debian With this, previous patch to seastar and (suddenly) xenial repo for scylla-libthrift010-dev scylla-antlr35-c++-dev the build on debian buster finally passes. Signed-off-by: Pavel Emelyanov <xemul@scyladb.com> Message-Id: <CAHTybb-QFyJ7YQW0b6pjhY_xUr-_b1w_O3K1=1FOwrNM55BkLQ@mail.gmail.com>	2019-11-01 09:03:39 +02:00
Dejan Mircevski	859883b31d	alternator: Implement GT operator in Expected Add cmp_gt and use it in check_compare() to handle the GT case. Also reactivate GT tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 17:18:22 -04:00
Dejan Mircevski	0f7d837757	alternator: Factor out check_compare() Code for check_LT(), check_GT(), etc. will be nearly identical, so factor it out into a single function that takes a comparator object. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 17:01:29 -04:00
Dejan Mircevski	a47b768959	alternator: Implement LT operator in Expected Add check_LT() function and reactivate LT tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 16:07:29 -04:00
Dejan Mircevski	ceae3c182f	alternator: Overload base64_decode on rjson::value In `1ca9dc5d47`, it was established that the correct way to base64-decode a JSON value is via string_view, rather than directly from GetString(). This patch adds a base64_decode(rjson::value) overload, which automatically uses the correct procedure. It saves typing, ensures correctness (fixing one incorrect call found), and will come in handy for future EXPECTED comparisons. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 15:56:03 -04:00
Dejan Mircevski	9955f0342f	alternator: Make unwrap_number() visible unwrap_number() is now a public function in serialization.hh instead of a static function visible only in executor.cc. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 10:46:30 -04:00
Nadav Har'El	3f859adebd	Merge: Fix filtering static columns on empty partitions Merged patch series from Piotr Sarna: An otherwise empty partition can still have a valid static column. Filtering didn't take that fact into account and only filtered full-fledged rows, which may result in non-matching rows being returned to the client. Fixes #5248	2019-10-31 10:50:21 +02:00
Pavel Emelyanov	5fe4757725	docs: The scylla's dpdk config is boolean Docs say one can say --disable-dpdk , while it's not so. It's the seastar's configure.py that has tristate -dpdk option, the scylla's one can only be enabled. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <CAHTybb-rxP8DbH-wW4Zf-w89iuCirt6T6-PjZAUfVFj7C5yb=A@mail.gmail.com>	2019-10-31 10:12:17 +02:00
Vladimir Davydov	9ea8114f8c	cql: fix CAS metric label "type" label is already in use for the counter type ("derive", "gauge", etc). Using the same label for "cas" / "non-cas" overwrites it. Let's instead call the new label "conditional" and use "yes" / "no" for its value, as suggested by Kostja. Message-Id: <3082b16e4d6797f064d58da95fb4e50b59ab795c.1572451480.git.vdavydov@scylladb.com>	2019-10-30 17:14:17 +01:00
Avi Kivity	398c482cd0	Merge "combined reader gallop mode" from Piotr " In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode. In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations. In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally. A microbenchmark was added in order to test performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization. As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping. Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode) test name before after improvement one_row 49.070ns 48.287ns 1.60% single_active 61.574us 61.235us 0.55% many_overlapping 488.193us 514.977us -5.49% disjoint_interleaved 57.462us 57.111us 0.61% disjoint_ranges 56.545us 56.006us 0.95% overlapping_partitions_disjoint_rows 127.039us 80.849us 36.36% Same results, normalized per mutation fragment: test name before after improvement one_row 16.36ns 16.10ns 1.60% single_active 109.46ns 108.86ns 0.55% many_overlapping 216.97ns 228.88ns -5.49% disjoint_interleaved 102.15ns 101.53ns 0.61% disjoint_ranges 100.52ns 99.57ns 0.95% overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36% Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz. Tests: unit(release) Fixes #3593. " * '3593-combined_reader-gallop-mode' of https://github.com/piodul/scylla: mutation_reader: gallop mode microbenchmark mutation_reader: combined reader gallop tests mutation_reader: gallop mode for combined reader mutation_reader: refactor prepare_next	2019-10-30 17:34:47 +02:00
Piotr Sarna	dd00470a44	tests: add a test case for filtering on static columns The test case covers filtering with an empty partition. Refs #5248	2019-10-30 15:34:10 +01:00
Piotr Sarna	ca6fe598ec	cql3: fix filtering on a static column for empty partitions An otherwise empty partition can still have a valid static column. Filtering didn't take that fact into account and only filtered full-fledged rows, which may result in non-matching rows being returned to the client. Fixes #5248	2019-10-30 15:31:54 +01:00
Tomasz Grabiec	9da3aec115	Merge "Mutation diff improvements" from Benny - accept diff_command option - standard input support	2019-10-30 13:40:58 +01:00
Tomasz Grabiec	0d9367e08f	Merge "Scyllatop: one pass update of multiple metrics" from Benny Update previous results dictionary using the update_metrics method. It calls metric_source.query_list to get a list of results (similar to discover()) then for each line in the response it updates results dictionary. New results may be appeneded depending on the do_append parameter (True by default). Previously, with prometheous, each metric.update called query_list resulting in O(n^2) when all metric were updated, like in the scylla_top dtest - causing test timeout when testing debug build. (E.g. dtest-debug/216/testReport/scyllatop_test/TestScyllaTop/default_start_test/)	2019-10-30 13:38:39 +01:00
Tomasz Grabiec	b7b0a53b50	Merge "Add metrics for light-weigth transactions" from Vova This patch set adds metrics useful for analyzing light-weight transaction performance. The same metrics are available in Cassandra.	2019-10-30 12:09:03 +01:00
Vladimir Davydov	f0075ba845	cql: account cas requests separately This patch adds "type" label to the following CQL metrics: inserts updates deletes batches statements_in_batches The label is set to "cas" for conditional statements and "non-cas" for unconditional statements. Note, for a batch to be accounted as CAS, it is enough to have just one conditional statement. In this case all statements within the batch are accounted as CAS as well.	2019-10-30 13:44:35 +03:00
Piotr Dulikowski	81883a9f2e	mutation_reader: gallop mode microbenchmark This microbenchmark tests performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization. As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping. Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode) test name before after improvement one_row 49.070ns 48.287ns 1.60% single_active 61.574us 61.235us 0.55% many_overlapping 488.193us 514.977us -5.49% disjoint_interleaved 57.462us 57.111us 0.61% disjoint_ranges 56.545us 56.006us 0.95% overlapping_partitions_disjoint_rows 127.039us 80.849us 36.36% Same results, normalized per mutation fragment: test name before after improvement one_row 16.36ns 16.10ns 1.60% single_active 109.46ns 108.86ns 0.55% many_overlapping 216.97ns 228.88ns -5.49% disjoint_interleaved 102.15ns 101.53ns 0.61% disjoint_ranges 100.52ns 99.57ns 0.95% overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36% Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.	2019-10-30 09:51:18 +01:00
Piotr Dulikowski	29d6842db9	mutation_reader: combined reader gallop tests	2019-10-30 09:51:18 +01:00
Piotr Dulikowski	2b4ca0c562	mutation_reader: gallop mode for combined reader In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode. In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations. In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally. Fixes #3593.	2019-10-30 09:51:18 +01:00
Piotr Dulikowski	2a46a09e7c	mutation_reader: refactor prepare_next Move out logic responsible for adding readers at partition boundary into `maybe_add_readers_at_partition_boundary`, and advancing one reader into `prepare_one`. This will allow to reuse this logic outside `prepare_next`.	2019-10-30 09:49:12 +01:00
Avi Kivity	623071020e	commitlog: change variadic stream in read_log_file to future<struct> Since seastar::streams are based on future/promise, variadic streams suffer the same fate as variadic futures - deprecation and eventual removal. This patch therefore replaces a variadic stream in commitlog::read_log_file() with a non-variadic stream, via a helper struct. Tests: unit (dev)	2019-10-29 19:25:12 +01:00
Botond Dénes	271ab750a6	scylla-gdb.py: add replica section to scylla memory Recently, scylla memory started to go beyond just providing raw stats about the occupancy of the various memory pools, to additionally also provide an overview of the "usual suspects" that cause memory pressure. As part of this, recently `46341bd63f` added a section of the coordinator stats. This patch continues this trend and adds a replica section, with the "usual suspects": * read concurrency semaphores * execution stages * read/write operations Example: Replica: Read Concurrency Semaphores: user sstable reads: 0/100, remaining mem: 84347453 B, queued: 0 streaming sstable reads: 0/ 10, remaining mem: 84347453 B, queued: 0 system sstable reads: 0/ 10, remaining mem: 84347453 B, queued: 0 Execution Stages: data query stage: 03 "service_level_sg_0" 4967 Total 4967 mutation query stage: Total 0 apply stage: 03 "service_level_sg_0" 12608 06 "statement" 3509 Total 16117 Tables - Ongoing Operations: pending writes phaser (top 10): 2 ks.table1 2 Total (all) pending reads phaser (top 10): 3380 ks.table2 898 ks.table1 410 ks.table3 262 ks.table4 17 ks.table8 2 system_auth.roles 4969 Total (all) pending streams phaser (top 10): 0 Total (all) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191029164817.99865-1-bdenes@scylladb.com>	2019-10-29 18:03:06 +01:00
Vladimir Davydov	e510288b6f	api: wire up column_family cas-related statistics	2019-10-29 19:26:18 +03:00
Vladimir Davydov	b75862610e	paxos_state: account paxos round latency This patch adds the following per table stats: cas_prepare_latency cas_propose_latency cas_commit_latency They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed by Cassandra.	2019-10-29 19:26:18 +03:00
Vladimir Davydov	21c3c98e5b	api: wire up storage_proxy cas-related statistics	2019-10-29 19:26:18 +03:00
Vladimir Davydov	c27ab87410	storage_proxy: add cas request accounting This patch implements accounting of Cassandra's metrics related to lightweight transactions, namely: cas_read_latency transactional read latency (histogram) cas_write_latency transactional write latency (histogram) cas_read_timeouts number of transactional read timeouts cas_write_timeouts number of transactional write timeouts cas_read_unavailable number of transactional read unavailable errors cas_write_unavailable number of transactional write unavailable errors cas_read_unfinished_commit number of transaction commit attempts that occurred on read cas_write_unfinished_commit number of transaction commit attempts that occurred on write cas_write_condition_not_met number of transaction preconditions that did not match current values cas_read_contention how many contended reads were encountered (histogram) cas_write_contention how many contended writes were encountered (histogram)	2019-10-29 19:25:47 +03:00
Vladimir Davydov	967a9e3967	storage_proxy: zap ballot_and_contention Pass contention by reference to begin_and_repair_paxos(), where it is incremented on every sleep. Rationale: we want to account the total number of times query() / cas() had to sleep, either directly or within begin_and_repair_paxos(), no matter if the function failed or succeeded.	2019-10-29 19:22:18 +03:00
Botond Dénes	49aa8ab8a0	scylla-gdb.py: add compatibility with Scylla 3.0 Even though every Scylla version has its own scylla-gdb.py, because we don't backport any fixes or improvements, practically we end up always using master's version when debugging older versions of Scylla too. This is made harder by the fact that both Scylla's and its dependencies' (most notably that of libstdc++ and boost) code is constantly changing between releases, requiring edits to scylla-gdb.py to make it usable with past releases. This patch attempts to make it easier to use scylla-gdb.py with past releases, more specifically Scylla 3.0. This is achieved by wrapping problematic lines in a `try: except:` and putting the backward compatible version in the `except:` clause. These lines have comments with the version they provide support for, so they can be removed when said version is not supported anymore. I did not attempt to provide full coverage, I only fixed up problems that surfaced when using my favourite commands with 3.0. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191029155737.94456-1-bdenes@scylladb.com>	2019-10-29 17:05:19 +01:00
Botond Dénes	e48f301e95	repair: repair_cf_range(): extract result of local checksum calculation only once The loop that collects the result of the checksum calculations and logs any errors. The error logging includes `checksums[0]` which corresponds to the checksum calculation on the local node. This violates the assumption of the code following the loop, which assumes that the future of `checksums[0]` is intact after the loop terminates. However this is only true when the checksum calculation is successful and is false when it fails, as in this case the loop extracts the error and logs it. When the code after the loop checks again whether said calculation failed, it will get a false negative and will go ahead and attempt to extract the value, triggering an assert failure. Fix by making sure that even in the case of failed checksum calculation, the result of `checksum[0]` is extracted only once. Fixes: #5238 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191029151709.90986-1-bdenes@scylladb.com>	2019-10-29 17:00:37 +01:00
Avi Kivity	60ea29da90	Update seastar submodule * seastar 2963970f6b...75e189c6ba (7): > posix-stack: Do auto-resolve of ipv6 scope iff not set for link-local dests > README.md: Add redpanda and smf to 'Projects using Seastar' > unix_domain_test: don't assume that at temporary_buffer is null terminated > socket_address: Use offsetof instead of null pointer > README: add projects using seastar section to readme > Adjustments for glibc 2.30 and hwloc 2.0 > Mark future::failed() as const	2019-10-29 14:34:10 +02:00
Gleb Natapov	0e9df4eaf8	lwt: mark lwt as experimental We may want to change paxos tables format and change internode protocol, so hide lwt behind experimental flag for now. Message-Id: <20191029102725.GM2866@scylladb.com>	2019-10-29 14:33:48 +02:00
Benny Halevy	79d5fed40b	mutation_fragment_stream_validator: validate end of stream in partition_key filter Currently end of stream validation is done in the destructor, but the validator may be destructed prematurely, e.g. on exception, as seen in https://github.com/scylladb/scylla/issues/5215 This patch adds a on_end_of_stream() method explicitly called by consume_pausable_in_thread. Also, the respective concepts for ParitionFilter, MutationFragmentFilter and a new on for the on_end_of_stream method were unified as FlattenedConsumerFilter. Refs #5215 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 506ff40bd447f00158c24859819d4bb06436c996)	2019-10-29 12:35:33 +01:00
Benny Halevy	d5f53bc307	mutation_fragment_stream_validator: validate partition key monotonicity Fixes #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 736360f823621f7994964fee77f37378ca934c56)	2019-10-29 12:35:33 +01:00
Gleb Natapov	e5e44bfda2	client_state: fix get_timestamp_for_paxos() to always advance a timestamp Message-Id: <20191029102336.GL2866@scylladb.com>	2019-10-29 13:07:33 +02:00
Tomasz Grabiec	c2a4c915f3	Merge "Fix a few issues with CAS requests" from Vladimir D. There are a few issues at the CQL layer, because of which the result of a CAS request execution may differ between Scylla and Cassandra. Mostly, it happens when static columns are involved. The goal of this patch set is to fix these issues, thus making Scylla's implementation of CAS yield the same results as Cassandra's.	2019-10-29 11:50:15 +01:00
Rafael Ávila de Espíndola	c74864447b	types: Simplify validate_visitor for strings We have different types for ascii and utf8, so there is no need for an extra if. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191024232911.22700-1-espindola@scylladb.com>	2019-10-29 11:02:55 +02:00
Nadav Har'El	d69ab1b588	CDC: (atomic) delta + (non-optional) pre-image data columns Merged patch series by Calle Wilund, with a few fixes by Piotr Jastrzębski: Adds delta and pre-image data column writes for the atomic columns in a cdc-enabled table. Note that in this patch set it is still unconditional. Adding option support comes in next set. Uses code more or less derived from alternator to select pre-image, using raw query interface. So should be fairly low overhead to query generation. Pre-image and delta mutations are mixed in with the actual modification mutations to generate the full cdc log (sans post-image).	2019-10-29 09:39:28 +02:00
Calle Wilund	7db393fe12	cdc_test: Add helper methods + preimage test Add filtering, sorting etc helpers + simple pre-image test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-29 07:49:05 +01:00
Vladimir Davydov	65b86d155e	cql: add static row to CAS failure result if there are static conditions Even if no rows match clustering key restrictions of a conditional statement with static columns conditions, we still must include the static column value into the CAS failure result set. For example, the following conditional DELETE statement create table t(k int, c int, s int static, v int, primary key(k, c)); insert into t(k, s) values(1, 1); delete v from t where k=1 and c=1 if v=1 and s=1; must return [applied=False, v=null, s=1] not just [applied=False, v=null, s=null] To fix that, set partition_slice::option::always_return_static_content for querying rows used for checking conditions so that we have the static row in update_parameters::prefetch_data even if no regular row matches clustering column restrictions. Plus modify cas_request:: applies_to() so that it sets is_in_cas_result_set flag for the static row in case there are static column conditions, but the result set happens to be empty. As pointed out by Tomek, there's another reason to set partition_slice:: option::always_return_static_content apart from building a correct result set on CAS failure. There could be a batch with two statements, one with clustering key restrictions which select no row, and another statement with only static column conditions. If we didn't enable this flag, we wouldn't get a static row even if it exists, and static column conditions would evaluate as if the static row didn't exist, for example, the following batch create table t(k int, c int, s int static, primary key(k, c)); insert into t(k, s) values(1, 1); begin batch insert into t(k, c) values(1, 1) if not exists update t set s = 2 where k = 1 if s = 1 apply batch; would fail although it clearly must succeed.	2019-10-28 22:30:37 +03:00
Vladimir Davydov	e0b31dd273	query: add flag to return static row on partition with no rows A SELECT statement that has clustering key restrictions isn't supposed to return static content if no regular rows matches the restrictions, see #589. However, for the CAS statement we do need to return static content on failure so this patch adds a flag that allows the caller to override this behavior.	2019-10-28 21:50:44 +03:00
Vladimir Davydov	57d284d254	cql: exclude statements not checked by cas from result set Apart from conditional statements, there may be other reading statements in a batch, e.g. manipulating lists. We must not include rows fetched for them into the CAS result set. For instance, the following CAS batch: create table t(p int, c int, i int, l list<int>, primary key(p, c)); insert into t(p, c, i) values(1, 1, 1) insert into t(p, c, i, l) values(1, 1, 1, [1, 2, 3]) begin batch update t set i=3 where p=1 and c=1 if i=2 update t set l=l-[2] where p=1 and c=2 apply batch; is supposed to return [applied] \| p \| c \| i ----------+---+---+--- False \| 1 \| 1 \| 1 not [applied] \| p \| c \| i ----------+---+---+--- False \| 1 \| 1 \| 1 False \| 1 \| 2 \| 1 To filter out such collateral rows from the result set, let's mark rows checked by conditional statements with a special flag.	2019-10-28 21:50:43 +03:00
Vladimir Davydov	74b9e80e4c	cql: fix EXISTS check that applies only to static columns If a CQL statement only updates static columns, i.e. has no clustering key restrictions, we still fetch a regular row so that we can check it against EXISTS condition. In this case we must be especially careful: we can't simply pass the row to modification_statement::applies_to, because it may turn out that the row has no static columns set, i.e. there's no in fact static row in the partition. So we filter out such rows without static columns right in cas_request::applies_to before passing them further to modification_statement::applies_to. Example: create table t(p int, c int, s int static, primary key(p, c)); insert into t(p, c) values(1, 1); insert into t(p, s) values(1, 1) if not exists; The conditional statement must succeed in this case.	2019-10-28 21:49:37 +03:00
Vladimir Davydov	8fbf344f03	cql: ignore clustering key if statement checks only static columns In case a CQL statement has only static columns conditions, we must ignore clustering key restrictions. Example: create table t(p int, c int, s int static, v int, primary key(p, c)); insert into t(p, s) values(1, 1); update t set v=1 where p=1 and c=1 if s=1; This conditional statement must successfully insert row (p=1, c=1, v=1) into the table even though there's no regular row with p=1 and c=1 in the table before it's executed, because the statement condition only applies to the static column s, which exists and matches.	2019-10-28 21:13:19 +03:00
Vladimir Davydov	54cf903bb2	cql: differentiate static from regular EXISTS conditions If a modification statement doesn't have a clustering column restriction while the table has static columns, then EXISTS condition just needs to check if there's a static row in the partition, i.e. it doesn't need to select any regular rows. Let's treat such EXIST condition like a static column condition so that we can ignore its clustering key range while checking CAS conditions.	2019-10-28 21:13:05 +03:00
Vladimir Davydov	934a87999f	cql: turn prefetch_data::row into struct This will allow us to add helper methods and store extra info in each row. For example, we can add a method for checking if a row has static columns. Also, to build CAS result set, we need to differentiate rows fetched to check conditions from those fetched for reading operations. Using struct as row container will allow us to store this information in each prefetched row.	2019-10-28 21:12:52 +03:00
Vladimir Davydov	bdd62b8bc3	cql: remove static column check from create_clustering_ranges The check is pointless, because we check exactly the same while preparing the statement, see process_where_clause() method of modification_statement.	2019-10-28 21:12:43 +03:00
Vladimir Davydov	a8ddbffa75	cql: fix applies_only_to_static_columns check Currently, we set _sets_regular_columns/_sets_static_columns flags when adding regular/static conditions to modification_statement. We use them in applies_only_to_static_columns() function that returns true iff _sets_static_columns is set and _sets_regular_columns is clear. We assume that if this function returns true then the statement only deals with static columns and so must not have clustering key restrictions. Usually, that's true, but there's one exception: DELETE FROM ... statement that deletes whole rows. Technically, this statement doesn't have any column operations, i.e. _sets_regular_columns flag is clear. So if such a statement happens to have a static condition, we will assume that it only applies to static columns and mistakenly raise an error. Example: create table t(k int, c int, s int static, v int, primary key(k, c)); delete from t where k=1 and c=1 if s=1; To fix this, let's not set the above mentioned flags when adding conditions and instead check if _column_conditions array is empty in applies_only_to_static_columns().	2019-10-28 21:12:36 +03:00
Vladimir Davydov	fbb11dac11	cql: set conditions before processing where clause modification_statement::process_where_clause() assumes that both operations and conditions has been added to the statement when it's called: it uses this information to raise an error in case the statement restrictions are incompatible with operations or conditions. Currently, operations are set before this function is called, but not conditions. This results in "Invalid restrictions on clustering columns since the {} statement modifies only static columns" error while trying to execute the following statements: create table t(k int, c int, s int static, v int, primary key(k, c)); delete s from t where k=1 and c=1 if v=1; update t set s=1 where k=1 and c=1 if v=1; Fix this by always initializing conditions before processing WHERE clause.	2019-10-28 21:12:22 +03:00
Botond Dénes	edc1750297	scylla-gdb.py: introduce scylla smp-queues Print a histogram of the number of async work items in the shard's outgoing smp queues. Example: (gdb) scylla smp-queues 10747 17 -> 3 ++++++++++++++++++++++++++++++++++++++++ 721 17 -> 19 ++ 247 17 -> 20 + 233 17 -> 10 + 210 17 -> 14 + 205 17 -> 4 + 204 17 -> 5 + 198 17 -> 16 + 197 17 -> 6 + 189 17 -> 11 + 181 17 -> 1 + 179 17 -> 13 + 176 17 -> 2 + 173 17 -> 0 + 163 17 -> 8 + 1 17 -> 9 + Useful for identifying the target shard, when `scylla task_histogram` indicates a high number of async work items. To produce the histogram the command goes over all virtual objects in memory and identifies the source and target queues of each `seastar::smp_message_queue::async_work_item` object. Practically the source queue will always be that of the current shard. As this scales with the number of virtual objects in memory, it can take some time to run. An alternative implementation would be to instead read the actual smp queues, but the code of that is scary so I went for the simpler and more reliable solution. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191028132456.37796-1-bdenes@scylladb.com>	2019-10-28 15:42:55 +02:00
Tomasz Grabiec	3b37027598	Merge "lwt: implement basic lightweight transactions support" from Kostja This patch set introduces light-weight transactions support to ScyllaDB. It is a subset of the full series, which adds basic LWT support and which has been reviewed thus far.	2019-10-28 11:45:28 +01:00
Tomasz Grabiec	f745819ed7	Merge "lwt: paxos protocol implementation" from Gleb This is paxos implementation for LWT. LWT itself is not included in the patch so the code is essentially is not wired yet (except read path).	2019-10-28 11:29:40 +01:00
Avi Kivity	f8ba96efcf	Merge "test_udt_mutations fixes" from Benny " mutation_test/test_udt_mutations kept failing on my machine and I tracked it down to the 3rd patch in this series (use int64_t constants for long_type). While at it, this series also fixes a comment and the end iterator in BOOST_REQUIRE(std::all_of(...)) mutation_test: test_udt_mutations: fixup udt comment mutation_test: test_udt_mutations: fix end iterator in call to std::all_of mutation_test: test_udt_mutations: use int64_t constants for long_type Test: mutation_test(dev, debug) " * 'test_udt_mutations-fixes' of https://github.com/bhalevy/scylla: mutation_test: test_udt_mutations: use int64_t constants for long_type mutation_test: test_udt_mutations: fix end iterator in call to std::all_of mutation_test: test_udt_mutations: fixup udt comment	2019-10-28 10:43:52 +02:00
Calle Wilund	36328acf60	cql_assertions: Change signature to accept sstring	2019-10-28 06:16:12 +01:00
Calle Wilund	7d98f735ee	cdc: Add static columns to data/preimage mutations Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Calle Wilund	19bba5608a	cdc: Create and perform a pre-image select for mutations As well as generate per-image rows in resulting log mutation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Calle Wilund	d4ee1938c7	cdc: Add modification record for regular atomic values in mutations Fills in the data columns for regular columns iff they are atomic (not unfrozed collections)	2019-10-28 06:16:12 +01:00
Calle Wilund	3fdcbd9dff	cdc: Set row op in log Adds actual operation (part delete, range delete, update) to cdc log	2019-10-28 06:16:12 +01:00
Calle Wilund	8a6b72f47e	cdc: Add pre-image select generator method Based on a mutation, creates a pre-image select operation. Note, this uses raw proxy query to shortcut parsing etc, instead of trying to cache by generated query. Hypothesis is that this is essentially faster. The routine assumes all rows in a mutation touch same static/regular columns. If this is not always true it will need additional calculations. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Calle Wilund	d74f32b07a	cql3::untyped_result_set: Add constructor from cql3:;result_set Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Calle Wilund	3ed7a9dd69	cql3::untyped_result_set: Add view getter to make non-intrusive read chaper Also use in actual data conversion.	2019-10-28 06:16:12 +01:00
Calle Wilund	451bb7447d	cdc: Add log / log data column operation types and make data cols tuples of these Makes static/regular data columns tuple<op, value, ttl> as per spec. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-28 06:16:12 +01:00
Konstantin Osipov	e555dc502e	lwt: implement basic lightweight transactions support Support single-statement conditional updates and as well as batches. This patch almost fully rewrites column_condition.cc, implementing is_satisfied_by(). Most of the remaining complications in column_condition implementation come from the need to properly handle frozen and multi-cell collection in predicates - up until now it was not possible to compare entire collection values between each other. This is further complicated since multi-cell lists and sets are returned as maps. We can no longer assume that the columns fetched by prefetch operation are non-frozen collections. IF EXISTS/IF NOT EXISTS condition fetches all columns, besides, a column may be needed to check other condition. When fetching the old row for LWT or to apply updates on list/columns, we now calculate precisely the list of columns to fetch. The primary key columns are also included in CAS batch result set, and are thus also prefetched (the user needs them to figure out which statements failed to apply). The patch is cross-checked for compatibility with cassandra-3.11.4-1545-g86812fa502 but does deviate from the origin in handling of conditions on static row cells. This is addressed in future series.	2019-10-27 23:42:49 +03:00
Konstantin Osipov	67e68dabf0	lwt: ensure we don't crash when we get a LIKE	2019-10-27 23:42:49 +03:00
Konstantin Osipov	f8f36d066c	lwt: check for unsupported collection type in condition element access We don't support conditions with element access on non-frozen UDTs, check that only supported collection types are supplied.	2019-10-27 23:42:49 +03:00
Konstantin Osipov	c9f0adf616	lwt: rewrite cql3::raw::column_condition::prepare() Restructure the code to avoid quite a bit of code duplication.	2019-10-27 23:42:47 +03:00
Konstantin Osipov	c2217df4d8	lwt: reorganize column_condition declaration and add comments	2019-10-27 23:42:03 +03:00
Konstantin Osipov	22b0240fe7	lwt: remove useless code in column_condition.hh Each column_condition and raw::column_condition construction case had a static method wrapping its constructor, simply supplying some defaults. This neither improves clarity nor maintainability.	2019-10-27 23:42:03 +03:00
Konstantin Osipov	3e25b83391	lwt: propagate if_exists condition from the parser to AST UPDATE ... IF EXISTS is legal, but IF EXISTS condition was not propagated from the parser to AST (rad::update_statement).	2019-10-27 23:42:03 +03:00
Konstantin Osipov	df28985295	lwt: introduce cql_statment_opt_metadata cql_statement_opt_metadata is an interim node in cql (prepared) statement hierarchy parenting modification_statement and batch_statement. If there is IF condition in such statements, they return a result set, and thus have a result set metadata. The metadata itself is filled in a subsequent patch.	2019-10-27 23:42:03 +03:00
Vladimir Davydov	c8869e803e	lwt: remove commented out validateWhereClauseForConditions This logic was implemented in validate_where_clause_for_conditions() method of modification_statement class.	2019-10-27 23:42:03 +03:00
Konstantin Osipov	eb5e82c6a1	lwt: add CAS where clause validation Add checks for conditional modification statement limitations: - WHERE clustering_key IN (list) IF condition is not supported since a conditions is evaluated for a single row/cell, so allowing multiple rows to match the WHERE clause would create ambiguity, - the same is true for conditional range deletions. - ensure all clustering restrictions are eq for conditional delete We must not allow statements like create table t(p int, c int, v int, primary key (p, c)); delete from t where p=1 and c>0 if v=1; because there may be more than one statement in a partition satisfying WHERE clause, in which case it's unclear which of them should satisfy IF condition: all or just one. Raising an error on such a statement is consistent with Cassandra's behavior.	2019-10-27 23:42:03 +03:00
Konstantin Osipov	203eb3eccc	lwt: sleep a random amount of time when retrying CAS Sleep a random interval between 0 and 100 ms before retrying CAS. Reuse sleep function, make the distribution object thread local.	2019-10-27 23:42:03 +03:00
Konstantin Osipov	0674fab05c	lwt: implement storage_proxy::cas() Introduce service::cas_request abstract base class which can be used to parameterize Paxos logic. Implement storage_proxy::cas() - compare and swap - the storage proxy entry point for lightweight transactions.	2019-10-27 23:42:03 +03:00
Gleb Natapov	70adf65341	storage_proxy: make mutation holder responsible for mutation operation Currently the code that manipulates mutations during write need to check what kind of mutations are those and (sometimes) choose different code paths. This patch encapsulates the differences in virtual functions of mutation_holder object, so that high level code will not concern itself with the details. The functions that are added: apply_locally(), apply_remotely() and store_hint().	2019-10-27 23:21:51 +03:00
Gleb Natapov	b3e01a45d7	lwt: storage_proxy: implement paxos protocol This patch adds all functionality needed for Paxos protocol. The implementation does not strictly adhere to Paxos paper since the original paper allows setting a value only once, while for LWT we need to be able to make another Paxos round after "learn" phase completes, which requires things like repair to be introduced.	2019-10-27 23:21:51 +03:00
Gleb Natapov	8d6201a23b	lwt: Add RPC verbs needed for paxos implementation Paxos protocol has three stages: prepare, accept, learn. This patch adds rpc verb for each of those stages. To be term compatible with Cassandra the patch calls those stages: prepare, propose, commit.	2019-10-27 23:21:51 +03:00
Gleb Natapov	d1774693bf	lwt: Define state needed by paxos and persist it Paxos protocol relies on replicas having a state that persists over crashes/restarts. This patch defines such state and stores it in the database itself in the paxos table to make it persistent. The stored state is: in_progress_ballot - promised ballot proposal - accepted value proposal_ballot - the ballot of the accepted value most_recent_commit - most recently learned value most_recent_commit_at - the ballot of the most recently learned value	2019-10-27 23:21:51 +03:00
Gleb Natapov	15b935b95d	lwt: add data structures needed for paxos implementation This patch add two data structures that will be used by paxos. First one is "proposal" which contains a ballot and a mutation representing a value paxos protocol is trying to set. Second one is "prepare_response" which is a value returned by paxos prepare stage. It contains currently accepted value (if any) and most recently learned value (again if any). The later is used to "repair" replicas that missed previous "learn" message.	2019-10-27 23:21:51 +03:00
Benny Halevy	1895fb276e	mutation_test: test_udt_mutations: use int64_t constants for long_type Otherwise they are decomposed and serialized as 4-byte int32. For example, on my machine cell[1] looked like this: {0002, atomic_cell{0000000310600000;ts=0;expiry=-1,ttl=0}} and it failed cells_equal against: {0002, atomic_cell{0000000300000000;ts=0;expiry=-1,ttl=0}} Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-27 20:51:29 +02:00
Benny Halevy	fec772538c	mutation_test: test_udt_mutations: fix end iterator in call to std::all_of Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-27 19:49:25 +02:00
Benny Halevy	9c8cf9f51d	mutation_test: test_udt_mutations: fixup udt comment Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-27 19:47:43 +02:00
Benny Halevy	76581e7f14	docs/debugging.md: fix gdb command for retrieving shared libraries information This correct command is `info sharedlibrary`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191027153541.27286-1-bhalevy@scylladb.com>	2019-10-27 18:15:09 +02:00
Dejan Mircevski	2a136ba1bc	alternator: Fix race condition in set_routes() server::set_routes() was setting the value of server::_callbacks. This led to a race condition, as set_routes() is invoked on every shard simultaneously. It is also unnecessary, since _callbacks can be initialized in the constructor. Fixes #5220. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-27 12:31:24 +02:00
Avi Kivity	27ef73f4f1	Merge "Report file I/O in CQL tracing when reading from sstables." from Kamil " Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future. Use this file to trace reads from SSTable data and index files. Fixes #4908. " * 'traced_file' of https://github.com/kbr-/scylla: sstables: report sstable index file I/O in CQL tracing sstables: report sstable data file I/O in CQL tracing tracing: add traced_file class	2019-10-26 22:53:37 +03:00
Avi Kivity	2b856a7317	Merge "Support non-frozen UDTs." from Kamil " This change allows creating tables with non-frozen UDT columns. Such columns can then have single fields modified or deleted. I had to do some refactoring first. Please read the initial commit messages, they are pretty descriptive of what happened (read the commits in the order they are listed on my branch: https://github.com/kbr-/scylla/commits/udt, starting from kbr-@8eee36e, in order to understand them). I also wrote a bunch of documentation in the code. Fixes #2201. " * 'udt' of https://github.com/kbr-/scylla: (64 commits) tests: too many UDT fields check test collection_mutation: add a FIXME. tests: add a non-frozen UDT materialized view test tests: add a UDT mutation test. tests: add a non-frozen UDT "JSON INSERT" test. tests: add a non-frozen UDT to for_each_schema_change. tests: more non-frozen UDT tests. tests: move some UDT tests from cql_query_test.cc to new file. types: handle trailing nulls in tuples/UDTs better. cql3: enable deleting single fields of non-frozen UDTs. cql3: enable setting single fields of a non-frozen UDT. cql3: enable non-frozen UDTs. cql3: introduce user_types::marker. cql3: generalize function_call::make_terminal to UDTs. cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs. cql3: use a dedicated setter operation for inserting user types. cql3: introduce user_types::value. types: introduce to_bytes_opt_vec function. cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>. cql3: make cql3_type::raw_ut::to_string distinguish frozenness. ...	2019-10-26 22:53:37 +03:00
Piotr Sarna	657e7ef5a5	alternator: add alternator health check The health check is performed simply by issuing a GET request to the alternator port - it returns the following status 200 response when the server is healthy: $ curl -i localhost:8000 HTTP/1.1 200 OK Content-Type: text/plain Content-Length: 23 Server: Seastar httpd Date: 21 Oct 2019 12:55:33 GMT healthy: localhost:8000 This commit comes with a test. Fixes #5050 Message-Id: <3050b3819661ee19640c78372e655470c1e1089c.1571921618.git.sarna@scylladb.com>	2019-10-26 18:14:18 +03:00
Botond Dénes	01e913397a	tests: memtable_test: flush_reader_test: compare compacted mutations To filter out artificial differences due to different representation of an equivalent set of writes. Fixes: #5207 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191024103718.29266-1-bdenes@scylladb.com>	2019-10-26 18:14:18 +03:00
Kamil Braun	432ef7c9af	sstables: report sstable index file I/O in CQL tracing Use tracing::make_traced_file when reading from the index file in index_reader.	2019-10-25 14:10:28 +02:00
Kamil Braun	394c36835a	sstables: report sstable data file I/O in CQL tracing Use tracing::make_traced_file when creating an sstable input_stream. To achieve that, trace_state needs to be plumbed down through some functions.	2019-10-25 14:10:28 +02:00
Kamil Braun	a8c9d1206a	tracing: add traced_file class This is a thin wrapper over the `seastar::file` class which adds CQL trace messages before and after I/O operations.	2019-10-25 14:10:24 +02:00
Kamil Braun	2889edea3e	tests: too many UDT fields check test	2019-10-25 12:05:10 +02:00
Kamil Braun	adfc04ebec	collection_mutation: add a FIXME. We could use iterators over cells instead of a vector of cells in collection_mutation(_view)_description. Then some use cases could provide iterators that construct the cells "on the fly".	2019-10-25 12:05:10 +02:00
Kamil Braun	45d2a96980	tests: add a non-frozen UDT materialized view test	2019-10-25 12:05:10 +02:00
Kamil Braun	e0c233ede1	tests: add a UDT mutation test.	2019-10-25 12:05:08 +02:00
Kamil Braun	a21d12faae	tests: add a non-frozen UDT "JSON INSERT" test.	2019-10-25 12:04:44 +02:00
Kamil Braun	ae3464da45	tests: add a non-frozen UDT to for_each_schema_change.	2019-10-25 12:04:44 +02:00
Kamil Braun	b87b700e66	tests: more non-frozen UDT tests.	2019-10-25 12:04:44 +02:00
Kamil Braun	474742ac5d	tests: move some UDT tests from cql_query_test.cc to new file.	2019-10-25 12:04:44 +02:00
Kamil Braun	612de1f4e3	types: handle trailing nulls in tuples/UDTs better. Comparing user types after adding new fields was bugged. In the following scenario: create type ut (a int); create table cf (a int primary key, b frozen<ut>); insert into cf (a, b) values (0, (0)); alter type ut add b int; select * from cf where b = {a:0,b:null}; the row with a = 0 should be returned, even though the value stored in the database is shorter (by one null) than the value given by the user. Until now it wouldn't have.	2019-10-25 12:04:44 +02:00
Kamil Braun	1a9034e38a	cql3: enable deleting single fields of non-frozen UDTs. This was already possible by setting the field to null, but now it supports the DELETE syntax.	2019-10-25 12:04:44 +02:00
Kamil Braun	4d271051dd	cql3: enable setting single fields of a non-frozen UDT. The commit introduces the necessary modifications to the grammar, a set_field raw operation, and a setter_by_field operation.	2019-10-25 12:04:44 +02:00
Kamil Braun	e74b5deb5d	cql3: enable non-frozen UDTs. Add a cluster feature for non-frozen UDTs. If the cluster supports non-frozen UDTs, do not return an error message when trying to create a table with a non-frozen user type.	2019-10-25 12:04:44 +02:00
Kamil Braun	7ac7a3994d	cql3: introduce user_types::marker. cql3::user_types::marker is a dedicated cql3::abstract_marker for user type placeholders in prepared CQL queries. When bound, it returns a user_types::value.	2019-10-25 12:04:44 +02:00
Kamil Braun	36999c94f4	cql3: generalize function_call::make_terminal to UDTs. Use the dedicated user_types::value. There is no way this code can be executed now, so I left a TODO.	2019-10-25 12:04:44 +02:00
Kamil Braun	49a7461345	cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs. For user types, use its dedicated setter and value.	2019-10-25 12:04:44 +02:00
Kamil Braun	40f9ce2781	cql3: use a dedicated setter operation for inserting user types. cql3::user_types::setter is a dedicated cql3::operation for inserting and updating user types. It handles the multi-cell (non-frozen) case.	2019-10-25 12:04:44 +02:00
Kamil Braun	51be1e3e9d	cql3: introduce user_types::value. This is a dedicated multi_item_terminal for user type values. Will be useful in future commits.	2019-10-25 12:04:44 +02:00
Kamil Braun	abe6c2d3d2	types: introduce to_bytes_opt_vec function. It converts a vector<bytes_view_opt> to a vector<bytes_opt>. Used in a bunch of places.	2019-10-25 12:04:44 +02:00
Kamil Braun	8ff2aebd76	cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>. Previously it returned vector<cql3::raw_value>, even though we don't use unset values when setting a UDT value (fields that are not provided become nulls. Thats how C* does it). This simplifies future implementation of user_types::{value, setter}.	2019-10-25 12:04:44 +02:00
Kamil Braun	f0a3af6adc	cql3: make cql3_type::raw_ut::to_string distinguish frozenness. This is used in error messages and may be useful.	2019-10-25 12:04:44 +02:00
Kamil Braun	c89de228e3	cql3: generalize some error messages to UDTs	2019-10-25 12:04:44 +02:00
Kamil Braun	fd3bc27418	cql3: disallow non-frozen UDTs when creating secondary indexes	2019-10-25 12:04:44 +02:00
Kamil Braun	ff0bd0bb7a	cql3: check for nested non-frozen UDTs in create_type_statement.	2019-10-25 12:04:44 +02:00
Kamil Braun	adf857e9ed	cql3: add cql3_type::is_user_type. This will be used in future commits.	2019-10-25 12:04:44 +02:00
Kamil Braun	6ccb1ee19f	cql3: generalize create_table_statement::raw_statement::prepare to UDTs. Check for UDT with nested non-frozen collection. Check for UDT with COMPACT STORAGE. Check for UDT inside PRIMARY KEY.	2019-10-25 12:04:44 +02:00
Kamil Braun	a8c7670722	types: add multi_cell field to user_type_impl. is_value_compatible_with_internal and update_user_type were generalized to the non-frozen case. For now, all user_type_impls in the code are non-multi-cell (frozen). This will be changed in future commits.	2019-10-25 12:04:44 +02:00
Kamil Braun	b904d04925	cql3: add a TODO to implement column_conditions for UDTs. This will become relevant after LWT is implemented.	2019-10-25 12:04:44 +02:00
Kamil Braun	44534a4a0a	sstables: generalize some comments to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	b38b8af0f2	schema: generalize compound_name to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	270cf2b289	query-result-set: generalize result_set_builder to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	2ada219f2c	view: generalize create_virtual_column and maybe_make_virtual to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	574e1cd514	tests: generalize timestamp_based_spliiting_writer and bucket_writer to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	6da89e40df	tests: generalize random_schema.cc:generate_collection to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	0fbfb67cbb	tests: generalize mutation_test.cc summaries to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	a3a2f65fbf	types: generalize serialize_for_cql to UDTs. Also introduces a helper "linearized" function, which implements a pattern occurring in all serialize_for_cql_aux functions.	2019-10-25 12:04:44 +02:00
Kamil Braun	05d4b2e1a4	tests: generalize data_model.cc:mutation_description::build to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	338fde672a	mp_row_consumer: generalize consume_cell (kl) and consume_column (mc) to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	5e447e3250	mutation_partition_view: generalize read_collection_cell to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	90927c075a	converting_mutation_partition_applier: generalize accept_cell to UDTs.	2019-10-25 12:04:42 +02:00
Kamil Braun	d9baff0e4b	collection_mutation: generalize collection_mutation.cc:difference to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	a344019b25	collection_mutation: generalize collection_mutation_view::last_update to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	691f00408d	collection_mutation: generalize merge to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	7f5cd8e8ce	collection_mutation: generalize collection_mutation_view_description::materialize to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	20b42b1155	collection_mutation: generalize collection_mutation_view::is_any_live to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	323370e4ba	collection_mutation: generalize deserialize_collection_mutation to UDTs.	2019-10-25 10:49:19 +02:00
Kamil Braun	393974df3b	cql3: make {lists,maps,sets}::value::from_serialized take const {}_type&. This will simplify the code a bit where from_serialized is used after switching to visitors. Also reduces the number of shared_ptr copies.	2019-10-25 10:49:19 +02:00
Kamil Braun	4327bba0db	types: introduce `(de)serialize_field_index` functions. These functions are used to translate field indices, which are used to identify fields inside UDTs, from/to a serialized representation to be stored inside sstables and mutations. They do it in a way that is compatible with C*.	2019-10-25 10:49:19 +02:00
Kamil Braun	90d05eb627	cql3: reject too long user-defined types	2019-10-25 10:49:19 +02:00
Kamil Braun	0f8f950b74	cql3: optimize multi_item_terminal::get_elements(). Now it returns const std::vector<bytes_opt>& instead of std::vector<bytes_opt>.	2019-10-25 10:49:19 +02:00
Kamil Braun	4374982de0	types: collection_type_impl::to_value becomes serialize_for_cql. The purpose of collection_type_impl::to_value was to serialize a collection for sending over CQL. The corresponding function in origin is called serializeForNativeProtocol, but the name is a bit lengthy, so I settled for serialize_for_cql. The method now became a free-standing function, using the visit function to perform a dispatch on the collection type instead of a virtual call. This also makes it easier to generalize it to UDTs in future commits. Remove the old serialize_for_native_protocol with a FIXME: implement inside. It was already implemented (to_value), just called differently. remove dead methods: enforce_limit and serialized_values. The corresponding methods in C* are auxiliary methods used inside serializeForNativeProtocol. In our case, the entire algorithm is wholly written in serialize_for_cql.	2019-10-25 10:49:19 +02:00
Kamil Braun	e5c0a992ef	cql3: make cql3_type::raw::to_string private. It only needs to be used in operator<<, which is a friend of cql3_type::raw.	2019-10-25 10:42:58 +02:00
Kamil Braun	ff4d857a9d	cql3: remove a dynamic_pointer_cast to user_type_impl. There exists a method to check if something is a user type: is_user_type(); use it instead.	2019-10-25 10:42:58 +02:00
Kamil Braun	d8f8908d34	types: introduce user_type_impl::idx_of_field method. Each field of a user type has its index inside the type. This method allows to find it easily, which is needed in a bunch of places.	2019-10-25 10:42:58 +02:00
Kamil Braun	c77643a345	cql3: make cql3_type::_frozen protected. Add is_frozen() method. Noone modifies _frozen from the outside. Moving the field to `protected` makes it harder to introduce bugs.	2019-10-25 10:42:58 +02:00
Kamil Braun	d83ebe1092	collection_mutation: move collection_type_impl::difference to collection_mutation.hh.	2019-10-25 10:42:58 +02:00
Kamil Braun	7e3bbe548c	collection_mutation: move collection_type_impl::merge to collection_mutation.hh.	2019-10-25 10:42:58 +02:00
Kamil Braun	a41277a7cd	collection_mutation: move collection_type_impl::last_update to collection_mutation_view	2019-10-25 10:42:58 +02:00
Kamil Braun	30802f5814	collection_mutation: move collection_type_impl::is_any_live to collection_mutation_view	2019-10-25 10:42:58 +02:00
Kamil Braun	e16ba76c2e	collection_mutation: move collection_type_impl::is_empty to collection_mutation_view.	2019-10-25 10:42:58 +02:00
Kamil Braun	bbdb438d89	collection_mutation: easier (de)serialization of collection_mutation(s). `collection_type_impl::serialize_mutation_form` became `collection_mutation(_view)_description::serialize`. Previously callers had to cast their data_type down to collection_type to use serialize_mutation_form. Now it's done inside `serialize`. In the future `serialize` will be generalized to handle UDTs. `collection_type_impl::deserialize_mutation_form` became a free standing function `deserialize_collection_mutation` with similiar benefits. Actually, noone needs to call this function manually because of the next paragraph. A common pattern consisting of linearizing data inside a `collection_mutation_view` followed by calling `deserialize_mutation_form` has been abstracted out as a `with_deserialized` method inside collection_mutation_view. serialize_mutation_form_only_live was removed, because it hadn't been used anywhere.	2019-10-25 10:42:58 +02:00
Kamil Braun	e4101679e4	collection_mutation: generalize constructor of collection_mutation to abstract_type. The constructor doesn't use anything specific to collection_type_impl. In the future it will also handle non-frozen user types.	2019-10-25 10:42:58 +02:00
Kamil Braun	b1d16c1601	types: move collection_type_impl::mutation(_view) out of collection_type_impl. collection_type_impl::mutation became collection_mutation_description. collection_type_impl::mutation_view became collection_mutation_view_description. These classes now reside inside collection_mutation.hh. Additional documentation has been written for these classes. Related function implementations were moved to collection_mutation.cc. This makes it easier to generalize these classes to non-frozen UDTs in future commits. The new names (together with documentation) better describe their purpose.	2019-10-25 10:19:45 +02:00
Kamil Braun	c0d3e6c773	atomic_cell: move collection_mutation(_view) to a new file. The classes 'collection_mutation' and 'collection_mutation_view' were moved to a separate header, collection_mutation.hh. Implementations of functions that operate on these classes, including some methods of collection_type_impl, were moved to a separate compilation unit, collection_mutation.cc. This makes it easier to modify these structures in future commits in order to generalize them for non-frozen User Defined Types. Some additional documentation has been written for collection_mutation.	2019-10-25 10:19:45 +02:00
Kamil Braun	c90ea1056b	Remove mutation_partition_applier. It had been replaced by partition_builder in commit `dc290f0af7`.	2019-10-25 10:19:45 +02:00
Asias He	f32ae00510	gossip: Limit number of pending gossip ACK2 messages Similar to "gossip: Limit number of pending gossip ACK messages", limit the number of pending gossip ACK2 messages in gossiper::handle_ack_msg. Fixes #5210	2019-10-25 12:44:28 +08:00
Asias He	15148182ab	gossip: Limit number of pending gossip ACK messages In a cross-dc large cluster, the receiver node of the gossip SYN message might be slow to send the gossip ACK message. The ack messages can be large if the payload of the application state is big, e.g., CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK message can consume unlimited amount of memory which causes OOM eventually. To fix, this patch queues the SYN message and handles it later if the previous ACK message is still being sent. However, we only store the latest SYN message. Since the latest SYN message from peer has the latest information, so it is safe to drop the previous SYN message and keep the latest one only. After this patch, there can be at most 1 pending SYN message and 1 pending ACK message per peer node. Fixes #5210	2019-10-25 12:44:28 +08:00
Nadav Har'El	8bffb800e1	alternator: Use system_auth.roles for alternator authorization Merged patch series from Piotr Sarna: This series couples system_auth.roles with authorization routines in alternator. The `salted_hash` field, which is every user's hashed password, is used as a secret key for the signature generation in alternator. This series also adds related expiration verifications for alternator signatures. It also comes with more test cases and docs updates. Tests: alternator(local, remote), manual Piotr Sarna (11): alternator: add extracting key from system_auth.roles alternator: futurize verify_signature function alternator: move the api handler to a separate function alternator: use keys from system_auth.roles for authorization alternator: add key cache to authorization alternator-test: add a wrong password test alternator: verify that the signature has not expired alternator: add additional datestamp verification alternator-test: add tests for expired signatures docs: update alternator entry for authorization alternator-test: add authorization to README alternator-test/conftest.py \| 2 +- alternator-test/test_authorization.py \| 44 ++++++++- alternator-test/test_describe_endpoints.py \| 2 +- alternator/auth.hh \| 15 ++- alternator/server.hh \| 10 +- alternator/auth.cc \| 62 +++++++++++- alternator/server.cc \| 106 ++++++++++++--------- alternator-test/README.md \| 28 ++++++ docs/alternator/alternator.md \| 7 +- 9 files changed, 221 insertions(+), 55 deletions(-)	2019-10-23 20:51:08 +03:00
Tomasz Grabiec	e621db591e	Merge "Fix TTL serialization breakage" from Avi ommit `93270dd` changed gc_clock to be 64-bit, to fix the Y2038 problem. While 64-bit tombstone::deletion_time is serialized in a compatible way, TTLs (gc_clock::duration) were not. This patchset reverts TTL serialization to the 32-bit serialization format, and also allows opting-in to the 64-bit format in case a cluster was installed with the broken code. Only Scylla 3.1.0 is vulnerable. Fixes #4855 Tests: unit (dev)	2019-10-23 18:23:26 +02:00
Tomasz Grabiec	71720be4f7	Merge "storage_service: Reject nodetool cleanup when there is pending ranges" from Asias From Shlomi: 4 node cluster Node A, B, C, D (Node A: seed) cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node> cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node> while read is progressing Node D: nodetool decommission Node A: nodetool status node - wait for UL Node A: nodetool cleanup (while decommission progresses) I get the error on c-s once decommission ends java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated The problem is when a node gets new ranges, e.g, the bootstrapping node, the existing nodes after a node is removed or decommissioned, nodetool cleanup will remove data within the new ranges which the node just gets from other nodes. To fix, we should reject the nodetool cleanup when there is pending ranges on that node. Note, rejecting nodetool cleanup is not a full protection because new ranges can be assigned to the node while cleanup is still in progress. However, it is a good start to reject until we have full protection solution. Refs: #5045	2019-10-23 17:45:41 +02:00
Avi Kivity	2970578677	config: add configuration option for 3.1.0 heritage clusters Scylla 3.1.0 broke the serialization format for TTLs. Later versions corrected it, but if a cluster was originally installed as 3.1.0, it will use the broken serialization forever. This configuration option allows upgrades from 3.1.0 to succeed, by enabling the broken format even for later versions.	2019-10-23 18:36:35 +03:00
Avi Kivity	bf4c319399	gc_clock, serialization: define new serialization for gc_clock::duration (aka TTLs) Scylla 3.1.0 inadvertently changed the serialization format of TTLs (internally represented as gc_clock::duration) from 32-bit to 64-bit, as part of preparation for Y2038 (which comes earlier for TTLed cells). This breaks mutations transported in a mixed cluster. To fix this, we revert back to the 32-bit format, unless we're in a 3.1.0- heritage cluster, in which case we use the 64-bit format. Overflow of a TTL is not a concern, since TTLs are capped to 20 years by the TTL layer. An assertion is added to verify this. This patch only defines a variable to indicate we're in a 3.1.0 heritage cluster, but a way to set it is left to a later patch.	2019-10-23 18:36:33 +03:00
Avi Kivity	771e028c1a	Update seastar submodule * seastar 6bcb17c964...2963970f6b (4): > Merge "IPv6 scope support and network interface impl" from Calle > noncopyable_function: do not copy uninitialized data > Merge "Move smp and smp queue out of reactor" from Asias > Consolidate posix socket implementations	2019-10-23 16:43:02 +03:00
Piotr Sarna	472e3cb4e1	alternator-test: add authorization to README The README paragraph informs about turning on authorization with: alternator-enforce-authorization: true and has a short note on how to set up the secret key for tests.	2019-10-23 15:05:39 +02:00
Piotr Sarna	280eb28324	docs: update alternator entry for authorization The document now mentions that secret keys are extracted from the system_auth.roles table.	2019-10-23 15:05:39 +02:00
Piotr Sarna	ebb0af3500	alternator-test: add tests for expired signatures The first test case ensures that expired signatures are not accepted, while the second one checks that signatures with dates that reach out too far into the future are also refused.	2019-10-23 15:05:39 +02:00
Piotr Sarna	a0a33ae4f3	alternator: add additional datestamp verification The authorization signature contains both a full obligatory date header and a shortened datestamp - an additional verification step ensures that the shortened stamp matches the full date.	2019-10-23 15:05:39 +02:00
Piotr Sarna	718cba10a1	alternator: verify that the signature has not expired AWS signatures have a 15min expiration policy. For compatibility, the same policy is applied for alternator requests. The policy also ensures that signatures expanding more than 15 minutes into the future are treated as unsafe and thus not accepted.	2019-10-23 15:05:39 +02:00
Piotr Sarna	e90c4a8130	alternator-test: add a wrong password test The additional test case submits a request as a user that is expected to exist (in the local setup), but the provided password is incorrect. It also updates test_wrong_key_access so it uses an empty string for trying to authenticate as an inexistent user - in order to cover more corner cases.	2019-10-23 15:05:39 +02:00
Piotr Sarna	524b03dea5	alternator: add key cache to authorization In order to avoid fetching keys from system_auth.roles system table on every request, a cache layer is introduced. And in order not to reinvent the wheel, the existing implementation of loading_cache with max size 1024 and a 1 minute timeout is used.	2019-10-23 15:05:39 +02:00
Piotr Sarna	6dee7737d7	alternator: use keys from system_auth.roles for authorization Instead of having a hardcoded secret key, the server now verifies an actual key extracted from system_auth.roles system table. This commit comes with a test update - instead of 'whatever':'whatever', the credentials used for a local run are 'alternator':'secret_pass', which matches the initial contents of system_auth.roles table, which acts as a key store. Fixes #5046	2019-10-23 15:05:39 +02:00
Piotr Sarna	388b492040	alternator: move the api handler to a separate function The lambda used for handling the api request has grown a little bit too large, so it's moved to a separate method. Along with it, the callbacks are now remembered inside the class itself.	2019-10-23 15:05:39 +02:00
Piotr Sarna	a93cf12668	alternator: futurize verify_signature function The verify_signature utility will later be coupled with Scylla authorization. In order to prepare for that, it is first transformed into a function that returns future<>, and it also becomes a member of class server. The reason it becoming a member function is that it will make it easier to implement a server-local key cache.	2019-10-23 15:05:39 +02:00
Piotr Sarna	dc310baa2d	alternator: add extracting key from system_auth.roles As a first step towards coupling alternator authorization with Scylla authorization, a helper function for extracting the key (salted_hash) belonging to the user is added.	2019-10-23 15:05:39 +02:00
Asias He	f876580740	storage_service: Reject nodetool cleanup when there is pending ranges From Shlomi: 4 node cluster Node A, B, C, D (Node A: seed) cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node> cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node> while read is progressing Node D: nodetool decommission Node A: nodetool status node - wait for UL Node A: nodetool cleanup (while decommission progresses) I get the error on c-s once decommission ends java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated The problem is when a node gets new ranges, e.g, the bootstrapping node, the existing nodes after a node is removed or decommissioned, nodetool cleanup will remove data within the new ranges which the node just gets from other nodes. To fix, we should reject the nodetool cleanup when there is pending ranges on that node. Note, rejecting nodetool cleanup is not a full protection because new ranges can be assigned to the node while cleanup is still in progress. However, it is a good start to reject until we have full protection solution. Refs: #5045	2019-10-23 19:20:36 +08:00
Asias He	a39c8d0ed0	Revert "storage_service: remove storage_service::_is_bootstrap_mode." It will be needed by "storage_service: Reject nodetool cleanup when there is pending ranges" This reverts commit `dbca327b46`.	2019-10-23 19:20:36 +08:00
Raphael S. Carvalho	fc120a840d	compaction: dont rely on undefined behavior when making garbage collected writer Argument evaluation order is UB, so it's not guaranteed that c->make_garbage_collected_sstable_writer() is called before compaction is moved to run(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20191023052647.9066-1-raphaelsc@scylladb.com>	2019-10-23 11:04:51 +03:00
Benny Halevy	3b3611b57a	mutation_diff: standard input support Also, not that the file name is properly quoted it may contain space characters. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-23 08:29:58 +03:00
Benny Halevy	6feb4d5207	mutation_diff: accept diff_command option To support using other diff tools than colordiff Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-23 08:29:47 +03:00
Tomasz Grabiec	dfac542466	Merge "extend multi-cell list & set type support" from Kostja Make it possible to compare multi-cell lists and sets serialized as maps with literal values and serialize them to network using a standard format (vector of values). This is a pre-requisite patch for column condition evaluation in light-weight transactions.	2019-10-23 07:39:57 +03:00
Nadav Har'El	774f8aa4b8	docs/debugging.md: add guide on how to debug cores Merged patch series from Botond Dénes: This series extends the existing docs/debugging.md with a detailed guide on how to debug Scylla coredumps. The intended target audience is developers who are debugging their first core, hence the level of details (hopefully enough). That said this should be just as useful for seasoned debuggers just quickly looking up some snippet they can't remember exactly. A Throubleshooting chapter is also added in this series for commonly-met problems. I decided to create this guide after myself having struggled for more than a day on just opening(!) a coredump that was produced on Ubuntu. As my main source, I used the How-to-debug-a-coredump page from the internal wiki which contains many useful information on debugging coredumps, however I found it to be missing some crucial information, as well being very terse, thus being primarily useful for experienced debuggers who can fill in the blanks. The reason I'm not extending said wiki page is that I think this information should not be hidden in some internal wiki page. Also, docs/debugging.md now seems to be a much better base for such a document. This document was started as a comprehensive debugging manual for beginners (but not just). You will notice that the information on how to debug cores from CentOS/Redhat are quite sparse. This is because I have no experience with such cores, so for now the respective chapters are just stubs. I intend to complete them in the future after having gained the necessary experience and knowledge, however those being in possession of said knowledge are more then welcome to send a patch. :) Botond Dénes (4): docs/debugging.md: demote 'Starting GDB' and 'Using GDB' docs/debugging.md: fix formatting issues docs/debugging.md: add 'Debugging coredumps' subchapter docs/debugging.md: add 'Throubleshooting' subchapter docs/debugging.md \| 240 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 228 insertions(+), 12 deletions(-)	2019-10-23 07:39:57 +03:00
Rafael Ávila de Espíndola	b3372be679	install-dependencies: Add Lua Add lua as a dependency in preparation for UDF. This is the first patch since it has to go in before to allow for a frozen toolchain update. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> [avi: update frozen toolchain image] Message-Id: <20191018231442.11864-2-espindola@scylladb.com>	2019-10-23 07:39:57 +03:00
Konstantin Osipov	a30c08e04e	lwt: support for multi-cell set & list value serialization	2019-10-22 17:40:42 +03:00
Piotr Jastrzebski	eb8ae06ced	cdc: Return db_context::builder by reference from it's with_* functions. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-22 17:13:43 +03:00
Konstantin Osipov	605755e3f6	lwt: support for multi-cell map & list comparison with literal values Multi-cell lists and maps may be stored in different formats: as sorted vectors of pairs of values, when retreived from storage, or as sorted vectors of values, when created from parser literals or supplied as parameter values. Implement a specialized compare for use when receiver and paramter representation don't match. Add helpers.	2019-10-22 17:07:33 +03:00
Raphael S. Carvalho	3b6583990d	sstables: Fix sluggish backlog controller with incremental compaction The problem is that backlog tracker is not being updated properly after incremental compaction. When replacing sstables earlier, we tell backlog tracker that we're done with exhausted sstables[1], but we don't tell it about the new, sealed sstables created that will replace the exhausted ones. [1]: exhausted sstable is one that can be replaced earlier by compaction. We need to notify backlog tracker about every sstable replacement which was triggered by incremental compaction. Otherwise, backlog for a table that enables incremental compaction will be lower than it actually should. That's because new sstables being tracked as partial decrease the backlog, whereas the exhausted ones increase it. The formula for a table's backlog is basically: backlog(sstable set + compacting(1) - partial(2)) (1) compacting includes all compaction's input sstables, but the exhausted ones are removed from it (correct behavior). (2) partial includes all compaction's output sstables, but the ones that replaced the exhausted sstables aren't removed from it (incorrect behavior). This problem is fixed by making backlog track fully aware of the early replacement, not only the exhausted sstables, but also the new sstables that replaced the exhausted ones. The new sstables need to be moved inside the tracker from partial state to the regular one. Fixes #5157. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20191016002838.23811-1-raphaelsc@scylladb.com>	2019-10-22 16:19:57 +03:00
Vladimir Davydov	6c6689f779	cql: refactor statement accounting Rather than passing a pointer to a cql_stats member corresponding to the statement type, pass a reference to a cql_stats object and use statement_type, which is already stored in modification_statement, for determining which counter to increment. This will allow us to account conditional statements, which will have a separate set of counters, right in modification_statement::execute() - all we'll need to do is add the new counters and bump them in case execute_with_condition is called. While we are at it, remove extra inclusions from statement_type.hh so as not to introduce any extra dependencies for cql_stats.hh users. Message-Id: <20191022092258.GC21588@esperanza>	2019-10-22 12:39:14 +03:00
Nadav Har'El	51fc6c7a8e	make static_row optional to reduce memory footprint Merged patch series from Avi Kivity: The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by allocating it as an external object rather than inlined into mutation_partition. This adds overhead when the static row is present (17 bytes for the reference, back reference, and lsa allocator overhead). perf_simple_query appears to marginally (2%) faster. Footprint is reduced by ~9% for a cache entry, 12% in memtables. More details are provided in the patch commitlog. Tests: unit (debug) Avi Kivity (4): managed_ref: add get() accessor managed_ref: add external_memory_usage() mutation_partition: introduce lazy_row mutation_partition: make static_row optional to reduce memory footprint cell_locking.hh \| 2 +- converting_mutation_partition_applier.hh \| 4 +- mutation_partition.hh \| 284 ++++++++++++++++++++++- partition_builder.hh \| 4 +- utils/managed_ref.hh \| 12 + flat_mutation_reader.cc \| 2 +- memtable.cc \| 2 +- mutation_partition.cc \| 45 +++- mutation_partition_serializer.cc \| 2 +- partition_version.cc \| 4 +- tests/multishard_mutation_query_test.cc \| 2 +- tests/mutation_source_test.cc \| 2 +- tests/mutation_test.cc \| 12 +- tests/sstable_mutation_test.cc \| 10 +- 14 files changed, 355 insertions(+), 32 deletions(-)	2019-10-22 12:25:15 +03:00
Avi Kivity	bc03b0fd47	Merge "Some refactoring of node startup code" from Kamil " The node startup code (in particular the functions storage_service::prepare_to_join and storage_service::join_token_ring) is complicated and hard to understand. This patch set aims to simplify it at least a bit by removing some dead code, moving code around so it's easier to understand and adding some comments that explain what the code does. I did it to help me prepare for implementing generation and gossiping of CDC streams. " * 'bootstrap-refactors' of https://github.com/kbr-/scylla: storage_service: more comments in join_token_ring db: remove system_keyspace::update_local_tokens db: improve documentation for update_tokens and get_saved_tokens in system_keyspace storage_service: remove storage_service::_is_bootstrap_mode. storage_service: simplify storage_service::bootstrap method storage_service: fix typo in handle_state_moving storage_service: remove unnecessary use of stringstream storage_service: remove redundant call to update_tokens during join_token_ring storage_service: remove storage_service::set_tokens method. storage_service: remove is_survey_mode storage_service::handle_state_normal: tokens_to_update* -> owned_tokens storage_service::handle_state_normal: remove local_tokens_to_remove db::system_keyspace::update_tokens: take tokens by const ref db::system_keyspace::prepare_tokens: make static, take tokens by const ref token_metadata::update_normal_tokens: take tokens by const ref	2019-10-22 12:11:11 +03:00
Asias He	0a52ecb6df	gossip: Fix max generation drift measure Assume n1 and n2 in a cluster with generation number g1, g2. The cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1 reboots with generation g1' which is time based, n2 will see g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update. To fix, check the generation drift with generation value this node would get if this node were restarted. This is a backport of CASSANDRA-10969. Fixes #5164	2019-10-21 20:20:55 +02:00
Kamil Braun	f1c26bf5c9	storage_service: more comments in join_token_ring Explain why a call to update_normal_tokens is needed.	2019-10-21 11:11:03 +02:00
Kamil Braun	fb1e35f032	db: remove system_keyspace::update_local_tokens That was dead code.	2019-10-21 11:11:03 +02:00
Kamil Braun	1b0c8e5d99	db: improve documentation for update_tokens and get_saved_tokens in system_keyspace	2019-10-21 11:11:03 +02:00
Kamil Braun	dbca327b46	storage_service: remove storage_service::_is_bootstrap_mode. The flag did nothing. It was used in one place to check if there's a bug, but it can easily by proven by reading the code that the check would never pass.	2019-10-21 11:11:03 +02:00
Kamil Braun	b757a19f84	storage_service: simplify storage_service::bootstrap method The storage_service::bootstrap method took a parameter: tokens to bootstrap with. However, this method is only called in one place (join_token_ring) with only one parameter: _bootstrap_tokens. It doesn't make sense to call this method anywhere else with any other parameter. This commit also adds a comment explaining what the method does and moves it into the private section of storage_service.	2019-10-21 11:11:03 +02:00
Kamil Braun	84b41bd89b	storage_service: fix typo in handle_state_moving	2019-10-21 11:11:03 +02:00
Kamil Braun	2ff4f9b8f4	storage_service: remove unnecessary use of stringstream	2019-10-21 11:11:03 +02:00
Kamil Braun	06cc7d409d	storage_service: remove redundant call to update_tokens during join_token_ring When a non-seed node was bootstrapping, system_keyspace::update_tokens was called twice: first right after the tokens were generated (or received if we were replacing a different node) in the call to `bootstrap`, and then later in join_token_ring. The second call was redundant. The join_token_ring call was also redundant if we were not bootstrapping and had tokens saved previously (e.g. when restarting). In that case we would have read them from LOCAL and then save the same tokens again. This commit removes the redundant call and inserts calls to update_tokens where they are necessary, when new tokens are generated. The aim is to make the code easier to understand. It also adds a comment which explains why the tokens don't need to be generated in one of the cases.	2019-10-21 11:11:03 +02:00
Kamil Braun	a223864f81	storage_service: remove storage_service::set_tokens method. After commit `36ccf72f3c`, this method was used only in one place. Its name did not make it obvious what it does and when is it safe to call it. This commit pulls out the code from set_tokens to the point where it was called (join_token_ring). The code is only possible to understand in context. This code was also saving the tokens to the LOCAL table before retrieving them from this table again. There is no point in doing that: 1. there are no races, since when join_token_ring is running, it is the only function which can call system_keyspace::update_tokens (which saves them to the LOCAL table). There can be no multiple instances of join_token_ring. 2. Even if there was a race, this wouldn't fix anything. The tokens we retrieve from LOCAL by calling get_local_tokens().get0() could already be different in the LOCAL table when the get0() returns.	2019-10-21 11:09:59 +02:00
Kamil Braun	36ccf72f3c	storage_service: remove is_survey_mode That was dead, untested code, making it unnecessarily hard to implement new features.	2019-10-21 10:38:49 +02:00
Kamil Braun	602c7268cc	storage_service::handle_state_normal: tokens_to_update* -> owned_tokens Replace the two variables: tokens_to_update_in_metadata tokens_to_update_in_system_keyspace which were exactly the same, with one variable owned_tokens. The new name describes what the variable IS instead what's it used for. Add a comment to clarify what "owned" means: those are the tokens the node chose and any collision was resolved positively for this node. Move the variable definition further down in the code, where it's actually needed.	2019-10-21 10:38:49 +02:00
Kamil Braun	2db07c697f	storage_service::handle_state_normal: remove local_tokens_to_remove That was dead code. Removing tokens is handled inside remove_endpoint, using the endpoints_to_remove set.	2019-10-21 10:38:49 +02:00
Kamil Braun	8c8a17a0fe	db::system_keyspace::update_tokens: take tokens by const ref	2019-10-21 10:38:49 +02:00
Kamil Braun	00dcea3478	db::system_keyspace::prepare_tokens: make static, take tokens by const ref	2019-10-21 10:38:49 +02:00
Kamil Braun	e4ac4db1c5	token_metadata::update_normal_tokens: take tokens by const ref	2019-10-21 10:38:45 +02:00
Nadav Har'El	765dc86de4	Fix legacy token column handling for local indexes Merged patch series from Piotr Sarna: Calculating the select statement for given view_info structure used to work fine, but once local indexes were introduced, a subtle bug appeared: the legacy token column does not exist in local indexes and a valid clustering key column was omitted instead. That results in potentially incorrect partition slices being used later in read-before-write. There's a long term plan for removing select_statement from view info altogether, but nonetheless the bug needs to be fixed first. Branch: master, 3.1 Tests: unit(dev) + manual confirmation that a correct legacy column is picked	2019-10-20 16:04:40 +03:00
Nadav Har'El	631846a852	CDC: Implement minimal version that logs only primary key of each change Merge a patch series from Piotr Jastrzębski (haaawk): This PR introduces CDC in it's minimal version. It is possible now to create a table with CDC enabled or to enable/disable CDC on existing table. There is a management of CDC log and description related to enabling/disabling CDC for a table. For now only primary key of the changed data is logged. To be able to co-locate cdc streams with related base table partitions it was needed to propagate the information about the number of shards per node. This was node through gossip. There is an assumption that all the nodes use the same value for sharding_ignore_msb_bits. If it does not hold we would have to gossip sharding_ignore_msb_bits around together with the number of shards. Fixes #4986. Tests: unit(dev, release, debug)	2019-10-20 11:41:01 +03:00
Botond Dénes	4aa734f238	scylla-gdb.py: scylla generate_object_graph: use correct obj in edges Currently, the function that generates the graph edges (and vertices) with a breadth-first traversal of the object graph accidentally uses the object that is the starting point of the graph as the "to" part of each edge. This results in the graph having each of its edges point to the starting point, as if all objects in it referenced said object directly. Fix by using the object of the currently examined object. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191018113019.95093-1-bdenes@scylladb.com>	2019-10-18 13:48:20 +02:00
Botond Dénes	4dff50b7a4	docs/debugging.md: add 'Throubleshooting' subchapter To the 'Debugging Scylla with GDB' chapter.	2019-10-18 10:08:23 +03:00
Botond Dénes	77ea086975	docs/debugging.md: add 'Debugging coredumps' subchapter To the 'Debuggin Scylla with GDB` chapter. The '### Debugging relocatable binaries built with the toolchain' subchapter is demoted to be just a section in this new subchapter. It is also renamed to 'Relocatable binaries'. This subchapter intends to be a complete guide on how to debug coredumps from how to obtain the correct version of all the binaries all the way to how to correctly open the core with GDB.	2019-10-18 10:08:23 +03:00
Pekka Enberg	f01d0e011c	Update seastar submodule * seastar e888b1df...6bcb17c9 (4): > iotune: don't crash in sequential read test if hitting EOF > Remove FindBoost.cmake from install files > Merge "Move reactor backend out of reactor" from Asias > fair_queue: Add fair_queue.cc	2019-10-18 08:45:22 +03:00
Piotr Jastrzebski	2b26e3c904	test: change test_partition_key_logging to test_primary_key_logging Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	997be35ef3	modification_statement: log in cdc clustering key of a change Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	d8718a4ffc	test: add test_partition_key_logging Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	96c800ed0b	modification_statement: log in cdc partition key of a change Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	a1edb68b16	test: check that alter table with cdc manages log and desc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	a45c894032	alter_table_statement: handle 'with cdc =' Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	629cdb5065	test: check that drop table with cdc removes log and desc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	57c3377b1f	cql_test_env: add require_table_does_not_exist assertion Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	50d53cd43e	drop_table_statement: remove cdc log and desc if cdc is enabled Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	b9d6635fc5	test: check that create table with cdc sets up log and desc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:23 +02:00
Piotr Jastrzebski	81a34168a3	create_table_statement: handle 'with cdc =' Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 11:28:14 +02:00
Piotr Jastrzebski	6e29f5e826	create_table_statement: prepare announce_migration for cdc This patch wrapps announce_migration logic into a lambda that will be used both when cdc is used and when it's not. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	a9e43f4e86	test: add test_with_cdc_parameter At the moment, this test only checks that table creation and alteration sets cdc_options property on a table correctly. Future patches will extend this test to cover more CDC aspects. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	8c6d860402	cql3: add cdc table property Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	386221da84	schema_tables: handle 'cdc' options cdc options will be stored in scylla_tables to preserve compatibility with Cassandra. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	8df942a320	schema_builder: handle schema::_cdc_options Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	ca9536a771	schema: add _cdc_options field Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	f079dce7b1	snitch: Provide getter for ignore_msb_bits of an endpoint Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	afe520ad77	gossip: Add application_state::IGNORE_MSB_BITS We would like to share with other nodes the value of ignore_msb_bits property used by the node. This is needed because CDC will operate on streams of changes. Each shard on each node will have its own stream that will be identified by a stream_id. Stream_id will be selected in such a way that using stream_id as partition key will locate partition identified by stream_id on a node and shard that the stream belongs to. To be able to generate such stream_id we need to know ignore_msb_bits property value for each node. IMPORTANT NOTE: At this point CDC does not support topology changes. It will work only on a stable cluster. Support for topology modifications will be added in later steps. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	b9d5851830	snitch: Provide getter for shard_count of an endpoint Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	a66d7cfe57	gossip: Add application_state::SHARD_COUNT We would like to share with other nodes the number of shards available at the node. This is needed because CDC will operate on streams of changes. Each shard on each node will have its own stream that will be identified by a stream_id. Stream_id will be selected in such a way that using stream_id as partition key will locate partition identified by stream_id on a node and shard that the stream belongs to. To be able to generate such stream_id we need to know how many shards are on each node. IMPORTANT NOTE: At this point CDC does not support topology changes. It will work only on a stable cluster. Support for topology modifications will be added in later steps. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Piotr Jastrzebski	f7ce8e4f2b	cdc: Add flag guarding it's usage At first, CDC will only be enabled when experimental flag is on. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-10-17 10:55:31 +02:00
Tomasz Grabiec	d7c3e48e8c	Merge "Prepare modification_statement for LWT" from Kostja Refactor modification_statement to enable lightweight transaction implementation. This patch set re-arranges logic of modification_statement::get_mutations() and uses a column mask of identify the columns to prefetch. It also pre-computes a few modification statement properties at prepare, assuming the prepared statement is invalidated if the underlying schema changes.	2019-10-17 10:51:00 +02:00
Konstantin Osipov	5d3bf03811	lwt: pre-compute modification_statement properties at prepare They are used more extensively with introduction of lightweight transactions, and pre-computing makes it easier to reason about complexity of the scenarios where they are involved.	2019-10-16 22:44:44 +03:00
Konstantin Osipov	6e0f76ea60	lwt: use column mask to build partition_slice Pre-compute column mask of columns to prefetch when preparing a modification statement and use it to build partition_slice object for read command. Fetch only the required columns. Ligthweight transactions build up on this by using adding columns used in conditions and in cas result set to the column maks of columns to read. Batch statements unite all column masks to build a single relation for all rows modified by conditional statements of a batch.	2019-10-16 22:44:37 +03:00
Konstantin Osipov	f32a7a0763	lwt: move option set for modification statement read command Move the option set for read command to update_parameters class, since this class encapsulates the logic of working with the read command result.	2019-10-16 22:41:00 +03:00
Konstantin Osipov	c0f0ab5edd	lwt: introduce column mask Introduce a bitset container which can be used to compute all columns used in a query. Add a partition_slice constructor which uses the bitset.	2019-10-16 22:40:55 +03:00
Konstantin Osipov	a00b9a92b3	lwt: refactor modification statement get_mutations() Refactor get_mutations() so that the read command and apply_updates() functions can be used in lightweight transactions. Move read_command creation to an own method, as well as apply_updates(). Rewrite get_mutations() using the new API. Avoid unnecessary shared pointers.	2019-10-16 22:32:51 +03:00
Tomasz Grabiec	7b7e4be049	Merge "lwt: introduce column_definition::ordinal_id" from Kostja Introduce a column definition ordinal_id and use it in boosted update_parameters::prefetch_data as a column index of a full row. Lightweight transactions prefetch data and return a result set. Make sure update_parameters::prefetch_data can serve as a single representation of prefetched list cells as well as condition cells and as a CAS result set. I have a lot of plans for column_definition::ordinal_id, it simplifies a lot of operations with columns and will also be used for building a bitset of columns used in a query or in multiple queries of a batch.	2019-10-16 15:11:10 +02:00
Konstantin Osipov	a2b629c3a1	lwt: boost update_parameters to serve as a CAS result set In modification_statement/batch_statement, we need to prefetch data to 1) apply list operations 2) evaluate CAS conditions 3) return CAS result set. Boost update_parameters::prefetch_data to serve as a single result set for all of the above. In case of a batch, store multiple rows for multiple clustering keys involved in the batch. Use an ordered set for columns and rows to make sure 3) CAS result set is returned to the client in an ordered manner. Deserialize the primary key and add it to result set rows since it is returned to the client as part of CAS result set. Index columns using ordinal_id - this allows having a single set for all columns and makes columns easy to look up. Remove an extra memcpy to build view objects when looking up a cell by primary key, use partition_key/clustering_key objects for lookup.	2019-10-16 15:56:50 +03:00
Konstantin Osipov	a450c25946	lwt: remove dead code in cql3/update_parameters.hh	2019-10-16 15:48:40 +03:00
Konstantin Osipov	a4ccbece5c	lwt: remove an unnecessary optional around prefetch_data Get rid of an unnecessary optional around update_parameters::prefetch_data. update_parameters won't own prefetch_data in the future anyway, since prefetch_data can be shared among multiple modification statements of a batch, each statement having its own options and hence its own update_parameters instance.	2019-10-16 15:48:25 +03:00
Konstantin Osipov	7a399ebe0d	lwt: move prefetch_data_builder to update_parameters.cc Move prefetch_data_builder class from modification_statement.cc to update_parameters.cc. We're going to share the same builder to build a result set for condition evaluation and to apply updates of batch statements, so we need to share it. No other changes.	2019-10-16 15:48:08 +03:00
Konstantin Osipov	fa73421198	lwt: introduce column_definition::ordinal_id Make sure every column in the schema, be it a column of partition key, clustering key, static or regular one, has a unique ordinal identifier. This makes it easy to compute the set of columns used in a query, as well as index row cells. Allow to get column definition in schema by ordinal id.	2019-10-16 15:46:25 +03:00
Avi Kivity	543e6974b9	Merge "Fix Incremental Compaction Efficiency" from Raphael " Incremental compaction code to release exhausted sstables was inefficient because it was basically preventing any release from ever happening. So a new solution is implemented to make incremental compaction approach actually efficient while being cautious about not introducing data resurrection. This solution consists of storing GC'able tombstones in a temporary sstable and keeping it till the end of compaction. Overhead is avoided by not enabling it to strategies that don't work with runs composed of multiple fragments. Fixes #4531. tests: unit, longevity 1TB for incremental compaction " * 'fix_incremental_compaction_efficiency/v6' of https://github.com/raphaelsc/scylla: tests: Check that partition is not resurrected on compaction failure tests: Add sstable compaction test for gc-only mutation compactor consumer sstables: Fix Incremental Compaction Efficiency	2019-10-16 15:15:53 +03:00
Tomasz Grabiec	054b53ac06	Merge "Introduce scylla generate_object_graph and improve scylla find and scylla fiber" from Botond Introduce `scylla generate_object_graph`, a command which generates a visual object graph, where vertices are objects and edges are references. The graph starts from the object specified by the user. The graph allows visual inspection of the object graph and hopefully allows the user to identify the object in question. Add the `--resolve` flag to `scylla find`. When specified, `scylla find` will attempt to resolve the first pointer in the found objects as a vtable pointer. If successful the pointer as well as the resolved symbol will be added to the listing. In the listing of `scylla fiber` also print the starting task (as the first item).	2019-10-15 20:11:16 +02:00
Tomasz Grabiec	c76f905497	Merge "scylla-gdb.py: improve the toolbox for investigating OOMs (but not just)" from Botond This mini-series contains assorted improvements that I found very useful while debugging OOM crashes in the past weeks: * A wrapper for `std::list`. * A wrapper for `std::variant`. * Making `scylla find` usable from python code. * Improvements to `scylla sstables` and `scylla task_histogram` commands. * The `$downcast_vptr()` convenience function. * The `$dereference_lw_shared_ptr()` convenience function. Convenience functions in gdb are similar to commands, with some key differences: * They have a defined argument list. * They can return values. * They can be part of any gdb expression in which functions are allowed. This makes them very useful for doing operations on values then returning them so that the developer can use it the gdb shell.	2019-10-15 19:54:09 +02:00
Avi Kivity	acc433b286	mutation_partition: make static_row optional to reduce memory footprint The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by using lazy_row instead of row. Some call sites treewide were adjusted to deal with the extra indirection. perf_simple_query appears to improve by 2%, from 163krps to 165 krps, though it's hard to be sure due to noisy measurements. memory_footprint comparisons (before/after): mutation footprint: mutation footprint: - in cache: 1096 - in cache: 992 - in memtable: 854 - in memtable: 750 - in sstable: 351 - in sstable: 351 - frozen: 540 - frozen: 540 - canonical: 827 - canonical: 827 - query result: 342 - query result: 342 sizeof(cache_entry) = 112 sizeof(cache_entry) = 112 -- sizeof(decorated_key) = 36 -- sizeof(decorated_key) = 36 -- sizeof(cache_link_type) = 32 -- sizeof(cache_link_type) = 32 -- sizeof(mutation_partition) = 200 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 112 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 232 sizeof(rows_entry) = 232 sizeof(lru_link_type) = 16 sizeof(lru_link_type) = 16 sizeof(deletable_row) = 168 sizeof(deletable_row) = 168 sizeof(row) = 112 sizeof(row) = 112 sizeof(atomic_cell_or_collection) = 8 sizeof(atomic_cell_or_collection) = 8 Tests: unit (dev)	2019-10-15 15:42:05 +03:00
Avi Kivity	88613e6882	mutation_partition: introduce lazy_row lazy_row adds indirection to the row class, in order to reduce storage requirements when the row is not present. The intent is to use it for the static row, which is not present in many schemas, and is often not present in writes even in schemas that have a static row. Indirection is done using managed_ref, which is lsa-compatible. lazy_row implements most of row's methods, and a few more: - get(), get_existing(), and maybe_create(): bypass the abstraction and the underlying row - some methods that accept a row parameter also have an overload with a lazy_row parameter	2019-10-15 15:42:05 +03:00
Avi Kivity	efe8fa6105	managed_ref: add external_memory_usage() Like other managed containers, add external_memory_usage() so we can account for a partition's memory footprint in memtable/cache.	2019-10-15 15:41:42 +03:00
Botond Dénes	71923577a4	docs/debugging.md: fix formatting issues	2019-10-15 14:40:24 +03:00
Botond Dénes	4babd116d8	docs/debugging.md: demote 'Starting GDB' and 'Using GDB' They really belong to the 'Introduction' chapter, instead of being separate chapters of their own.	2019-10-15 14:40:20 +03:00
Pekka Enberg	0c1dad0838	Merge "Misc documentation cleanup" from Botond "Delete README-DPDK.md, move IDL.md to docs/ and fix docs/review-checklist.md to point to scylla's coding style document, instead of seastar's." * 'documentation-cleanup/v3' of https://github.com/denesb/scylla: docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's docs: mv coding-style.md docs/ rm README-DPDK.md docs: mv IDL.md docs/	2019-10-15 12:53:49 +02:00
Pekka Enberg	b466d7ee33	Merge "Misc documentation cleanup" from Botond "Delete README-DPDK.md, move IDL.md to docs/ and fix docs/review-checklist.md to point to scylla's coding style document, instead of seastar's." * 'documentation-cleanup/v3' of https://github.com/denesb/scylla: docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's docs: mv coding-style.md docs/ rm README-DPDK.md docs: mv IDL.md docs/	2019-10-15 08:53:22 +03:00
Benny Halevy	fef3342a34	test: random_schema::make_ckeys: fix inifinte loop Allow returning fewer random clustering keys than requested since the schema may limit the total number we can generate, for example, if there is only one boolean clustering column. Fixes #5161 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-10-15 08:52:39 +03:00
Botond Dénes	544f38ea6d	docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's	2019-10-15 08:23:08 +03:00
Botond Dénes	56df6fbd58	docs: mv coding-style.md docs/ It is not discoverable in its current location (root directory) due to the sheer number of source files in there.	2019-10-15 08:23:08 +03:00
Botond Dénes	c0706e52ce	rm README-DPDK.md Probably a leftover from the era when seastar and scylla shared the same git repo.	2019-10-15 08:23:01 +03:00
Botond Dénes	061ac53332	docs: mv IDL.md docs/ Documentations should be in docs/.	2019-10-15 08:21:09 +03:00
Piotr Sarna	9e98b51aaa	view: fix view_info select statement for local indexes Calculating the select statement for given view_info structure used to work fine, but once local indexes were introduced, a subtle bug appeared: the legacy token column does not exist in local indexes and a valid clustering key column was omitted instead. That results in potentially incorrect partition slices being used later in read-before-write. There's a long term plan for removing select_statement from view info altogether, but nonetheless the bug needs to be fixed first.	2019-10-14 17:14:19 +02:00
Piotr Sarna	2ee8c6f595	index: add is_global_index() utility The helper function is useful for determining if given schema represents a global index.	2019-10-14 17:13:32 +02:00
Botond Dénes	b2e10a3f2f	scylla-gdb.py: introduce scylla generate_object_graph When investigating OOM:s a prominent pattern is a size class that is exploded, using up most of the available memory alone. If one is lucky, the objects causing the OOM are instances of some virtual class, making their identification easy. Other times the objects are referenced by instances of some virtual class, allowing their identification with some work. However there are cases where neither these objects nor their direct referrers are instances of virtual classes. This is the case `scylla generate_object_graph` intends to help. scylla generate_object_graph, like its name suggests generates the object graph of the requested object. The object graph is a directed graph, where vertices are objects and edges are references between them, going from referrers to the referee. The vertices contain information, like the address of the object, its size, whether it is a live or not and if applies, the address and symbol name of its vtable. The edges contain the list of offsets the referrer has references at. The generated graph is an image, which allows the visual inspection of the object graph, allowing the developer to notice patterns and hopefully identify the problematic objects. The graph is generated with the help of `graphwiz`. The command generates `.dot` files which can be converted to images with the help of the `dot` utility. The command can do this if the output file is one of the supported image formats (e.g. `png`), otherwise only the `.dot` file is generated, leaving the actual image generation to the user.	2019-10-14 16:21:18 +03:00
Botond Dénes	f9e8e54603	scylla-gdb.py: boost scylla find Add `--resolve` flag, which will make the command attempt to resolve the first pointer of the found objects as a vtable pointer. If this is successful the vtable pointer as well as the symbol name will be added to the listing. This in particular makes backtracing continuation chains a breeze, as the continuation object the searched one depends on can be found at glance in the resulting listing (instead of having to manually probe each item). The arguments of `scylla find` are now parsed via `argparse`. While at it, support for all the size classes supported by the underlying `find` command were added, in addition to `w` and `g`. However the syntax of specifying the size class to use has been changed, it now has to be specified with the `-s\|--size` command line argument, instead of passing `-w` or `-g`.	2019-10-14 16:21:18 +03:00
Botond Dénes	0773104f32	scylla_fiber: also print the task that is the starting point of the fiber Or in other words, the task that is the argument of the search. Example: (gdb) scylla fiber 0x60001a305910 Starting task: (task) 0x000060001a305910 0x0000000004aa5260 vtable for seastar::continuation<...> + 16 #0 (task) 0x0000600016217c80 0x0000000004aa5288 vtable for seastar::continuation<...> + 16 #1 (task) 0x000060000ac42940 0x0000000004aa2aa0 vtable for seastar::continuation<...> + 16 #2 (task) 0x0000600023f59a50 0x0000000004ac1b30 vtable for seastar::continuation<...> + 16	2019-10-14 13:36:25 +03:00
Botond Dénes	1a8846c04a	scylla-gdb.py: move the code finding text_start and text_end to get_text_range() This code is currently duplicated in `find_vptrs()` and `scylla_task_histogram`. Refactor it out into a function. The code is also improved in two ways: * Make the search stricter, ensuring (hopefully) that indeed the executable's text section is found, not that of the first object in the `gdb file` listing. * Throw an exception in the case when the search fails.	2019-10-14 13:25:28 +03:00
Raphael S. Carvalho	7f1a2156c7	table: Don't account for shared SSTables in compaction backlog tracker We don't want to add shared sstables to table's backlog tracker because: 1) table's backlog tracker has only an influence on regular compaction 2) shared sstables are never regular compacted, they're worked by resharding which has its own backlog tracker. Such sstables belong to more than one shard, meaning that currently they're added to backlog tracker of all shards that own them. But the thing is that such sstables ends up being resharded in shard that may be completely random. So increasing backlog of all shards such sstables belong to, won't lead to faster resharding. Also, table's backlog tracker is supposed to deal only with regular compaction. Accounting for shared sstables in table's tracker may lead to incorrect speed up of regular compactions because the controller is not aware that some relevant part of the backlog is due to pending resharding. The fix is about ignoring sstables that will be resharded and let table's backlog tracker account only for sstables that can be worked on by regular compaction, and rely on resharding controlling itself with its own tracker. NOTE: this doesn't fix the resharding controlling issue completely, as described in #4952. We'll still need to throttle regular compaction on behalf of resharding. So subsequent work may be about: - move resharding to its own priority class, perhaps streaming. - make a resharding's backlog tracker accounts for sstables in all of its pending jobs, not only the ongoing ones (currently limited to 1 by shard). - limit compaction shares when resharding is in progress. THIS only fixes the issue in which controller for regular compaction shouldn't account sstables completely exclusive to resharding. Fixes #5077. Refs #4952. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190924022109.17400-1-raphaelsc@scylladb.com>	2019-10-13 10:14:13 +03:00
Raphael S. Carvalho	88611d41d0	sstables: Fix major compaction's space amplification with incremental compaction Incremental compaction efficiency depends on the reference of sstables compacted being all released because the file descriptors of sstable components are only closed once the sstable object is destructed. Incremental compaction is not working for major compaction because a reference to released sstables are being kept in the compaction manager, which prevents their disk usage from being released. So the space amplification would be the same as with a non-incremental approach, i.e. needs twice the amount of used disk space for the table(s). With this issue fixed, the database now becomes very major compaction friendly, the space requirement becoming very low, a constant which is roughly number of fragments being currently compacted multiplied by fragment size (1GB by default), for each table involved. Fixes #5140. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20191003211927.24153-1-raphaelsc@scylladb.com>	2019-10-13 09:55:11 +03:00
Raphael S. Carvalho	17c66224f7	tests: Check that partition is not resurrected on compaction failure Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-13 00:06:51 -03:00
Raphael S. Carvalho	6301a10fd7	tests: Add sstable compaction test for gc-only mutation compactor consumer Make sure gc'able-tombstone-only sstable is properly generated with data that comes from regular compaction's input sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-12 21:38:53 -03:00
Raphael S. Carvalho	91260cf91b	sstables: Fix Incremental Compaction Efficiency Compaction prevents data resurrection from happening by checking that there's no way a data shadowed by a GC'able tombstone will survive alone, after a failure for example. Consider the following scenario: We have two runs A and B, each divided to 5 fragments, A1..A5, B1..B5. They have the following token ranges: A: A1=[0, 3] A2=[4, 7] A3=[8, 11] A4=[12, 15] A5=[16,18] B is the same as A's ranges, offset by 1: B: B1=[1,4] B2=[5,8] B3=[9,12] B4=[13,16] B5=[17,19] Let's say we are finished flushing output until position 10 in the compaction. We are currently working on A3 and B3, so obviously those cannot be deleted. Because B2 overlaps with A3, we cannot delete B2 either. Otherwise, B2 could have a GC'able tombstone that shadows data in A3, and after B2 is gone, dead data in A3 could be resurrected on failure. Now, A2 overlaps with B2 which we couldn't delete yet, so we can't delete A2. Now A2 overlaps with B1 so we can't delete B1. And B1 overlaps with A1 so we can't delete A1. So we can't delete any fragment. The problem with this approach is obvious, fragments can potentially not be released due to data dependency, so incremental compaction efficiency is severely reduced. To fix it, let's not purge GC'able tombstones right away in the mutation compactor step. Instead, let's have compaction writing them to a separate sstable run that would be deleted in the end of compaction. By making sure that tombstone information from all compacting sstables is not lost, we no longer need to have incremental compaction imposing lots of restriction on which fragments could be released. Now, any sstable which data is safe in a new sstable can be released right away. In addition, incremental compaction will only take place if compaction procedure is working with one multi-fragment sstable run at least. Fixes #4531. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-12 21:36:03 -03:00
Kamil Braun	ef9d5750c8	view: fix bug in virtual columns. When creating a virtual column of non-frozen map type, the wrong type was used for the map's keys. Fixes #5165.	2019-10-11 20:47:06 +03:00
Avi Kivity	f12feec2c9	Update seastar submodule * seastar 1f68be436f...e888b1df9c (8): > sharded: Make map work with mapper that returns a future > cmake: Remove FindBoost.cmake > Reduce noncopyable_function instruction cache footprint > doc: add Loops section to the tutorial > Merge "Move file related code out of reactor" from Asias > Merge "Move the io_queue code out of reactor" from Asias > cmake: expose seastar_perf_testing lib > future: class doc: explain why discarding a future is bad - main.cc now includes new file io_queue.hh - perf tests now include seastar perf utilities via user, not system, includes since those are not exported	2019-10-10 18:17:28 +03:00
Nadav Har'El	33027a36b4	alternator: Add authorization Merged patch set from Piotr Sarna: Refs #5046 This commit adds handling "Authorization:" header in incoming requests. The signature sent in the authorization is recomputed server-side and compared with what the client sent. In case of a mismatch, UnrecognizedClientException is returned. The signature computation is based on boto3 Python implementation and uses gnutls to compute HMAC hashes. This series is rebased on a previous HTTPS series in order to ease merging these two. As such, it depends on the HTTPS series being merged first. Tests: alternator(local, remote) The series also comes with a simple authorization test and a docs update. Piotr Sarna (6): alternator: migrate split() function to string_view alternator: add computing the auth signature config: add alternator_enforce_authorization entry alternator: add verifying the auth signature alternator-test: add a basic authorization test case docs: update alternator authorization entry alternator-test/test_authorization.py \| 34 ++++++++ configure.py \| 1 + alternator/{server.hh => auth.hh} \| 22 ++--- alternator/server.hh \| 3 +- db/config.hh \| 1 + alternator/auth.cc \| 88 ++++++++++++++++++++ alternator/server.cc \| 112 +++++++++++++++++++++++--- db/config.cc \| 1 + main.cc \| 2 +- docs/alternator/alternator.md \| 7 +- 10 files changed, 241 insertions(+), 30 deletions(-) create mode 100644 alternator-test/test_authorization.py copy alternator/{server.hh => auth.hh} (58%) create mode 100644 alternator/auth.cc	2019-10-10 15:57:46 +03:00
Nadav Har'El	df62499710	docs/isolation.md: copy-edit Minor spelling and syntax corrections. No new content or semantic changes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191010093457.20439-1-nyh@scylladb.com>	2019-10-10 15:17:28 +03:00
Piotr Dulikowski	c04e8c37aa	distributed_loader: populate non-system keyspaces in parallel Before this change, when populating non-system keyspaces, each data directory was scanned and for each entry (keyspace directory), a keyspace was populated. This was done in a serial fashion - populating of one keyspace was not started until the previous one was done. Loading keyspaces in such fashion can introduce unnecessary waiting in case of a large number of keyspaces in one data directory. Population process is I/O intensive and barely uses CPU. This change enables parallel loading of keyspaces per data directory. Populating the next keyspace does not wait for the previous one. A benchmark was performed measuring startup time, with the following setup: - 1 data directory, - 200 keyspaces, - 2 tables in each keyspace, with the following schema: CREATE TABLE tbl (a int, b int, c int, PRIMARY KEY(a, b)) WITH CLUSTERING ORDER BY (b DESC), - 1024 rows in each table, with values (i, 2i, 3i) for i in 0..1023, - ran on 6-core virtual machine running on i7-8750H CPU, - compiled in dev mode, - parameters: --smp 6 --max-io-requests 4 --developer-mode=yes --datadir $DIR --commitlog-directory $DIR --hints-directory $DIR --view-hints-directory $DIR The benchmark tested: - boot time, by comparing timestamp of the first message in log, and timestamp of the following message: "init - Scylla version ... initialization completed." - keyspace population time, by comparing timestamps of messages: "init - loading non-system sstables" and "init - starting view update generator" The benchmark was run 5 times for sequential and parallel version, with the following results: - sequential: boot 31.620s, keyspace population 6.051s - parallel: boot 29.966s, keyspace population 4.360s Keyspace population time decreased by ~27.95%, and overall boot time by about ~5.23%. Tests: unit(release) Fixes #2007	2019-10-10 15:12:23 +03:00
Piotr Sarna	6ca55d3c83	docs: update alternator authorization entry The entry now contains a comment that computing a signature works, but is still based on a hardcoded key.	2019-10-10 13:51:00 +02:00
Piotr Sarna	23798b7301	alternator-test: add a basic authorization test case The test case ensures that passing wrong credential results in getting an UnrecognizedClientException.	2019-10-10 13:51:00 +02:00
Piotr Sarna	97cbb9a2c7	alternator: add verifying the auth signature The signature sent in the "Authorization:" header is now verified by computing the signature server-side with a matching secret key and confirming that the signatures match. Currently the secret key is hardcoded to be "whatever" in order to work with current tests, but it should be replaced by a proper key store. Refs #5046	2019-10-10 13:51:00 +02:00
Piotr Sarna	e245b54502	config: add alternator_enforce_authorization entry The config entry will be used to turn authorization for alternator requests on and off. The default is currently off, since the key store is not implemented yet.	2019-10-10 13:51:00 +02:00
Piotr Sarna	589a22d078	alternator: add computing the auth signature A function for computing the auth signature from user requests is added, along with helper functions. The implementation is based on gnutls's HMAC. Refs #5046	2019-10-10 13:51:00 +02:00
Piotr Sarna	ca58b46b4c	alternator: migrate split() function to string_view The implementation of string split was based on sstring type for simplicity, but it turns out that more generic std::string_view will be beneficial later to avoid unneeded string copying. Unfortunately boost::split does not cooperate well with string views, so a simple manual implementation is provided instead.	2019-10-10 13:50:59 +02:00
Botond Dénes	52afbae1e5	README.md: add links to other documentation sources Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191010103926.34705-3-bdenes@scylladb.com>	2019-10-10 14:15:01 +03:00
Botond Dénes	e52712f82c	docs: add README.md Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191010103926.34705-2-bdenes@scylladb.com>	2019-10-10 14:14:09 +03:00
Amnon Heiman	64c2d28a7f	database: Add counter for the number of schema changes Schema changes can have big effects on performance, typically it should be a rare event. It is usefull to monitor how frequently the schema changed. This patch adds a counter that increases each time a schema changed. After this patch the metrics would look like: scylla_database_schema_changed{shard="0",type="derive"} 2 Fixes #4785 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-10-08 17:54:49 +02:00
Asias He	b89ced4635	streaming: Do not open rpc stream connection if reader has no data We can use the reader::peek() to check if the reader contains any data. If not, do not open the rpc stream connection. It helps to reduce the port usage. Refs: #4943	2019-10-08 10:31:02 +02:00
Konstantin Osipov	94006d77b1	lwt: add cas_contention_timeout_in_ms to config Make the default conform to the origin. Message-Id: <20191006154532.54856-3-kostja@scylladb.com>	2019-10-08 00:02:35 +02:00
Konstantin Osipov	383e17162a	lwt: implement query_options::check_serial_consistency() Both in a single-statement transaction and in a batch we expect that serial consistency is provided. Move the check to query_options class and make it available for reuse. Keep get_serial_consistency() around for use in transport/server.cc. Message-Id: <20191006154532.54856-2-kostja@scylladb.com>	2019-10-08 00:02:35 +02:00
Piotr Sarna	36a1905e98	storage_proxy: handle unstarted write cancelling When another node is reported to be down, view updates queued for it are cancelled, but some of them may already be initiated. Right now, cancelling such a write resulted in an exception, but on conceptual level it's not really an exception, since this behaviour is expected. Previous version of this patch was based on introducing a special exception type that was later handled specially, but it's not clear if it's a good direction. Instead, this patch simply makes this path non-exceptional, as was originally done by Nadav in the first version of the series that introduced handling unstarted write cancellations. Additionally, a message containing the information that a write is cancelled is logged with debug level.	2019-10-07 16:55:36 +03:00
Vladimir Davydov	e8bcb34ed4	api: drop /storage_proxy/metrics/cas_read/condition_not_met There's no such metric in Cassandra (although Cassadra's docs mistakenly say it exists). Having it would make no sense anyway so let's drop it. Message-Id: <b4f7a6ad278235c443cb8ea740bfa6399f8e4ee1.1570434332.git.vdavydov@scylladb.com>	2019-10-07 16:54:39 +03:00
Piotr Sarna	5ab134abef	alternator-test: update HTTPS section of README README.md has 3 fixes applied: - s/alternator_tls_port/alternator_https_port - conf directory is mentioned more explicitly - it now correctly states that the self-signed certificate warning is explicitly ignored in tests Message-Id: <e5767f7dbea260852fc2fa9b613e1bebf490cc78.1570444085.git.sarna@scylladb.com>	2019-10-07 14:51:16 +03:00
Avi Kivity	8ed6f94a16	Merge "Fix handling of schema alters and eviction in cache" from Tomasz " Fixes #5134, Eviction concurrent with preempted partition entry update after memtable flush may allow stale data to be populated into cache. Fixes #5135, Cache reads may miss some writes if schema alter followed by a read happened concurrently with preempted partition entry update. Fixes #5127, Cache populating read concurrent with schema alter may use the wrong schema version to interpret sstable data. Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may fail or cause a node crash after schema alter. " * tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla: tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read tests: row_cache_stress_test: Verify all entries are evictable at the end tests: row_cache_stress_test: Exercise single-partition reads tests: row_cache_stress_test: Add periodic schema alters tests: memtable_snapshot_source: Allow changing the schema tests: simple_schema: Prepare for schema altering row_cache: Record upgraded schema in memtable entries during update memtable: Extract memtable_entry::upgrade_schema() row_cache, mvcc: Prevent locked snapshots from being evicted row_cache: Make evict() not use invalidate_unwrapped() mvcc: Introduce partition_snapshot::touch() row_cache, mvcc: Do not upgrade schema of entries which are being updated row_cache: Use the correct schema version to populate the partition entry delegating_reader: Optimize fill_buffer() row_cache, memtable: Use upgrade_schema() flat_mutation_reader: Introduce upgrade_schema()	2019-10-07 14:43:36 +03:00
Nadav Har'El	f2f0f5eb0f	alternator: add https support Merged patch series from Piotr Sarna: This series adds HTTPS support for Alternator. The series comes with --https option added to alternator-test, which makes the test harness run all the tests with HTTPS instead of HTTP. All the tests pass, albeit with security warnings that a self-signed x509 certificate was used and it should not be trusted. Fixes #5042 Refs scylladb/seastar#685 Patches: docs: update alternator entry on HTTPS alternator-test: suppress the "Unverified HTTPS request" warning alternator-test: add HTTPS info to README.md alternator-test: add HTTPS to test_describe_endpoints alternator-test: add --https parameter alternator: add HTTPS support config: add alternator HTTPS port	2019-10-07 12:38:20 +03:00
Avi Kivity	969113f0c9	Update seastar submodule * seastar c21a7557f9...1f68be436f (6): > scheduling: Add per scheduling group data support > build: Include dpdk as a single object in libseastar.a > sharded: fix foreign_ptr's move assignment > build: Fix DPDK libraries linking in pkg-config file > http server: https using tls support > Make output_stream blurb Doxygen	2019-10-07 12:18:49 +03:00
Nadav Har'El	754add1688	alternator: fix Expected's BEGINS_WITH error handling The BEGINS_WITH condition in conditional updates (via Expected) requires that the given operand be either a string or a binary. Any other operand should result in a validation exception - not a failed condition as we generate now. This patch fixes the test for this case so it will succeed against Amazon DynamoDB (before this patch it fails - this failure was masked by a typo before commit `332ffa77ea`). The patch then fixes our code to handle this case correctly. Note that BEGINS_WITH handling of wrong types is now asymmetrical: A bad type in the operand is now handled differently from a bad type in the attribute's value. We add another check to the test to verify that this is the case. Fixes #5141 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191006080553.4135-1-nyh@scylladb.com>	2019-10-06 17:16:55 +03:00
Botond Dénes	d0fa5dc34d	scylla-gdb.py: introduce the downcast_vptr convenience function When debugging one constantly has to inspect object for which only a "virtual pointer" is available, that is a pointer that points to a common parent class or interface. Finding the concrete type and downcasting the pointer is easy enough but why do it manually when it is possible to automate it trivially? $downcast_vptr() returns any virtual pointer given to it, casted to the actual concrete object. Exlample: (gdb) p $1 $2 = (flat_mutation_reader::impl ) 0x60b03363b900 (gdb) p $downcast_vptr(0x60b03363b900) $3 = (combined_mutation_reader ) 0x60b03363b900 # The return value can also be dereferenced on the spot. (gdb) p *$downcast_vptr($1) $4 = {<flat_mutation_reader::impl> = {_vptr.impl = 0x46a3ea8 <vtable for combined_mutation_reader+16>, _buffer = {_impl = {<std::al...	2019-10-04 17:45:47 +03:00
Botond Dénes	434a41d39b	scylla-gdb.py: introduce the dereference_lw_shared_ptr convenience function Dereferencing an `seastar::lw_shared_ptr` is another tedious manual task. The stored pointer (`_p`) has to be casted to the right subclass of `lw_shared_ptr_counter_base`, which involves inspecting the code, then make writing a cast expression that gdb is willing to parse. This is something machines are so much better at doing. `$dereference_lw_shared_ptr` returns a pointer to the actual pointed-to object, given an instance of `seastar::lw_shared_ptr`. Example: (gdb) p $1._read_context $2 = {_p = 0x60b00b068600} (gdb) p $dereference_lw_shared_ptr($1._read_context) $3 = {<seastar::enable_lw_shared_from_this<cache::read_context>> = {<seastar::lw_shared_ptr_counter_base> = {_count = 1}, ...	2019-10-04 17:45:47 +03:00
Botond Dénes	f5de002318	scylla-gdb.py: scylla_sstables: also print the sstable filename And expose the method that obtains the file-name of an sstble object to python code.	2019-10-04 17:45:32 +03:00
Botond Dénes	ad7a668be9	scylla-gdb.py: scylla_task_histogram: expose internal parameters Make all the parameters of the sampling tweakable via command line arguments. I strived to keep full backward compatibility, but due to the limitations of `argparse` there is one "breaking" change. The optional positional size argument is now a non-positional argument as `argparse` doesn't support optional positional arguments. Added documentation for both the command itself as well as for all the arguments.	2019-10-04 17:44:40 +03:00
Botond Dénes	7767cc486e	scylla-gdb.py: make scylla_find usable from python code	2019-10-04 17:44:40 +03:00
Botond Dénes	9cdea440ef	scylla-gdb.py: add std_variant, a wrapper for std::variant Allows conveniently obtaining the active member via calling `get()`.	2019-10-04 17:44:40 +03:00
Botond Dénes	55e9097dd9	scylla-gdb.py: add std_list, a wrapper for an std::list std_list makes an `std::list` instance accessible from python code just like a regular (read-only) python container.	2019-10-04 17:44:40 +03:00
Botond Dénes	b8f0b3ba93	std_optional: fix get() Apparently there is now another layer of indirection: `std::_Storage`.	2019-10-04 17:43:40 +03:00
Tomasz Grabiec	020a537ade	tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read	2019-10-04 11:38:13 +02:00
Tomasz Grabiec	ebedefac29	tests: row_cache_stress_test: Verify all entries are evictable at the end	2019-10-04 11:38:12 +02:00
Tomasz Grabiec	1b95f5bf60	tests: row_cache_stress_test: Exercise single-partition reads make_single_key_reader() currently doesn't actually create single-partition readers because it doesn't set mutation_reader::forwarding::no when it creates individual readers. The readers will default to mutation_reader::forwarding::yes and actually create scanning readers in preparation for fast-forwarding across partitions. Fix by passing mutation_reader::forwarding::no.	2019-10-04 11:38:12 +02:00
Tomasz Grabiec	81dd17da4e	tests: row_cache_stress_test: Add periodic schema alters Reproduces #5127.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	2fc144e1a8	tests: memtable_snapshot_source: Allow changing the schema	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	22dde90dba	tests: simple_schema: Prepare for schema altering Currently, methods of simple_schema assume that table's schema doesn't change. Accessors like get_value() assume that rows were generated using simple_schema::_s. Because if that, the column_definition& for the "v" column is cached in the instance. That column_definiion& cannot be used to access objects created with a different schema version. To allow using simple_schema after schema changes, column_definition& caching is now tagged with the table schema version of origin. Methods which access schema-dependent objects, like get_value(), are now accepting schema& corresponding to the objects. Also, it's now possible to tell simple_schema to use a different schema version in its generator methods.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	e6afc89735	row_cache: Record upgraded schema in memtable entries during update Cache update may defer in the middle of moving of partition entry from a flushed memtable to the cache. If the schema was changed since the entry was written, it upgrades the schema of the partition_entry first but doesn't update the schema_ptr in memtable_entry. The entry is removed from the memtable afterward. If a memtable reader encounters such an entry, it will try to upgrade it assuming it's still at the old schema. That is undefined behavior in general, which may include: - read failures due to bad_alloc, if fixed-size cells are interpreted as variable-sized cells, and we misinterpret a value for a huge size - wrong read results - node crash This doesn't result in a permanent corruption, restarting the node should help. It's the more likely to happen the more rows there are in a partition. It's unlikely to happen with single-row partitions. Introduced in `70c7277`. Fixes #5128.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	ea461a3884	memtable: Extract memtable_entry::upgrade_schema()	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	90d6c0b9a2	row_cache, mvcc: Prevent locked snapshots from being evicted If the whole partition entry is evicted while being updated from the memtable, a subsequent read may populate the partition using the old version of data if it attempts to do it before cache update advances past that partition. Partial eviction is not affected because populating reads will notice that there is a newer snapshot corresponding to the updater. This can happen only in OOM situations where the whole cache gets evicted. Affects only tables with multi-row partitions, which are the only ones that can experience the update of partition entry being preempted. Introduced in `70c7277`. Fixes #5134.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	57a93513bd	row_cache: Make evict() not use invalidate_unwrapped() invalidate_unwrapped() calls cache_entry::evict(), which cannot be called concurrently with cache update. invalidate() serializes it properly by calling do_update(), but evict() doesn't. The purpose of evict() is to stress eviction in tests, which can happen concurrently with cache update. Switch it to use memory reclaimer, so that it's both correct and more realistic. evict() is used only in tests.	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	c88a4e8f47	mvcc: Introduce partition_snapshot::touch()	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	25e2f87a37	row_cache, mvcc: Do not upgrade schema of entries which are being updated When a read enters a partition entry in the cache, it first upgrades it to the current schema of the cache. The same happens when an entry is updated after a memtable flush. Upgrading the entry is currently performed by squashing all versions and replacing them with a single upgraded version. That has a side effect of detaching all snapshots from the partition entry. Partition entry update on memtable flush is writing into a snapshot. If that snapshot is detached by a schema upgrade, the entry will be missing writes from the memtable which fall into continuous ranges in that entry which have not yet been updated. This can happen only if the update of the entry is preempted and the schema was altered during that, and a read hit that partition before the update went past it. Affects only tables with multi-row partitions, which are the only ones that can experience the update of partition entry being preempted. The problem is fixed by locking updated entries and not upgrading schema of locked entries. cache_entry::read() is prepared for this, and will upgrade on-the-fly to the cache's schema. Fixes #5135	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	0675088818	row_cache: Use the correct schema version to populate the partition entry The sstable reader which populates the partition entry in the cache is using the schema of the partition entry snapshot, which will be the schema of the cache at the time the partition was entered. If there was a schema change after the cache reader entered the partition but before it created the sstable reader, the cache populating reader will interpret sstable fragments using the wrong schema version. That is more likely if partitions have many rows, and the front of the partition is populated. With single-row partitions that's unlikely to happen. That is undefined behavior in general, which may include: - read failures due to bad_alloc, if fixed-size cells are interpreted as variable-sized cells, and we misinterpret a value for a huge size - wrong read results - node crash This doesn't result in a permanent corruption, restarting the node should help. Fixes #5127.	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	10992a8846	delegating_reader: Optimize fill_buffer() Use move_buffer_content_to() which is faster than fill_buffer_from() because it doesn't involve popping and pushing the fragments across buffers. We save on size estimation costs.	2019-10-03 22:03:28 +02:00
Piotr Sarna	07ac3ea632	docs: update alternator entry on HTTPS The HTTPS entry is updated - it's now supported, but still misses the same features as HTTP - CRC headers, etc.	2019-10-03 19:10:30 +02:00
Piotr Sarna	b63077a8dc	alternator-test: suppress the "Unverified HTTPS request" warning Running with --https and a self-signed certificate results in a flood of expected warnings, that the connection is not to be trusted. These warnings are silenced, as users runing a local test with --https usually use self-signed certificates.	2019-10-03 19:10:30 +02:00
Piotr Sarna	e65fd490da	alternator-test: add HTTPS info to README.md A short paragraph about running tests with `--https` and configuring the cluster to work correctly with this parameter is added to README.md.	2019-10-03 19:10:30 +02:00
Piotr Sarna	0d28d7f528	alternator-test: add HTTPS to test_describe_endpoints The test_describe_endpoints test spawns another client connection to the cluster, so it needs to be HTTPS-aware in order to work properly with --https parameter.	2019-10-03 19:10:30 +02:00
Piotr Sarna	9fd77ed81d	alternator-test: add --https parameter Running with --https parameter will result in sending the requests via HTTPS instead of HTTP. By default, port 8043 is used for a local cluster. Before running pytest --https, make sure that Scylla was properly configured to initialize a HTTPS alternator server by providing the alternator_tls_port parameter. The HTTPS-based connection runs with verification disabled, otherwise it would not work with self-signed certificates, which are useful for tests.	2019-10-03 19:10:30 +02:00
Piotr Sarna	e1b0537149	alternator: add HTTPS support By providing a server based on a TLS socket, it's now possible to serve HTTPS requests in alternator. The HTTPS server is enabled by setting its port in scylla.yaml: alternator_tls_port=XXXX. Alternator TLS relies on the existing TLS configuration, which is provided by certificate, keyfile, truststore, priority_string options. Fixes #5042	2019-10-03 19:10:30 +02:00
Piotr Sarna	b42eb8b80a	config: add alternator HTTPS port The config variable will be used to set up a TLS-based server for serving alternator HTTPS requests.	2019-10-03 19:10:29 +02:00
Nadav Har'El	9d4e71bbc6	alternator-test: fix misleading xfail message The test test_update_expression_function_nesting() fails because DynamoDB don't allow an expression like list_append(list_append(:val1, :val2), :val3) but Alternator doesn't check for this (and supports this expression). The "xfail" message was outdated, suggesting that the test fails because the "SET" expression isn't supported - but it is. So replace the message by a more accurate one. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190915104708.30471-1-nyh@scylladb.com>	2019-10-03 18:45:03 +03:00
Nadav Har'El	9747019e7b	alternator: implement additional Expected operators Merged patch set from Dejan Mircevski implementing some of the missing operators for Expected: NE, IN, NULL and NOT_NULL. Patches: alternator: Factor out Expected operand checks alternator: Implement NOT_NULL operator in Expected alternator: Implement NULL operator in Expected alternator: Fix expected_1_null testcase alternator: Implement IN operator in Expected alternator: Implement NE operator in Expected alternator: Factor out common code in Expected	2019-10-03 18:12:38 +03:00
Konstantin Osipov	25ffd36d21	lwt: prepare the expression tree for IF condition evaluation Frozen empty lists/map/sets are not equal to null value, whil multi-cell empty lists/map/sets are equal to null values. Return a NULL value for an empty multi-cell set or list if we know the receiver is not frozen - this makes it easy to compare the parameter with the receiver. Add a test case for inserting an empty list or set - the result is indistinguishable from NULL value. Message-Id: <20191003092157.92294-2-kostja@scylladb.com>	2019-10-03 14:56:25 +02:00
Avi Kivity	3cb081eb84	Merge " hinted handoff: fix races during shutdown and draining" from Vlad " Fix races that may lead to use-after-free events and file system level exceptions during shutdown and drain. The root cause of use-after-free events in question is that space_watchdog blocks on end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as it's accessed even if the corresponding end_point_hints_manager instance is destroyed in the context of manager::drain_for(). File system exceptions may occur when space_watchdog attempts to scan a directory while it's being deleted from the drain_for() context. In case of such an exception new hints generation is going to be blocked - including for materialized views, till the next space_watchdog round (in 1s). Issues that are fixed are #4685 and #4836. Tested as follows: 1) Patched the code in order to trigger the race with (a lot) higher probability and running slightly modified hinted handoff replace dtest with a debug binary for 100 times. Side effect of this testing was discovering of #4836. 2) Using the same patch as above tested that there are no crashes and nodes survive stop/start sequences (they were not without this series) in the context of all hinted handoff dtests. Ran the whole set of tests with dev binary for 10 times. " * 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla: hinted handoff: fix a race on a directory removal between space_watchdog and drain_for() hinted handoff: make taking file_update_mutex safe db::hints::manager::drain_for(): fix alignment db::hints::manager: serialize calls to drain_for() db::hints: cosmetics: identation and missing method qualifier	2019-10-03 14:38:00 +03:00
Tomasz Grabiec	aad1307b14	row_cache, memtable: Use upgrade_schema()	2019-10-03 13:28:33 +02:00
Tomasz Grabiec	3177732b35	flat_mutation_reader: Introduce upgrade_schema()	2019-10-03 13:28:33 +02:00
Asias He	a9b95f5f01	repair: Fix tracker::start and tracker::done in case of error The operation after gate.enter() in tracker::start() can fail and throw, we should call gate.leave() in such case to avoid unbalanced enter and leave calls. tracker::done() has similar issue too. Fix it by removing the gate enter and leave logic in tracker start and done. A helper tracker::run() is introduced to take care of the gate and repair status. In addition, the error log is improved. It now logs exceptions on all shards in the summary. e.g., [shard 0] repair - repair id 1 failed: std::runtime_error ({shard 0: std::runtime_error (error0), shard 1: std::runtime_error (error1)}) Fixes #5074	2019-10-03 13:33:02 +03:00
Botond Dénes	00b432b61d	querier_cache: correctly account entries evicted on insertion in the population Currently, the population stat is not increased for entries that are evicted immediately on insert, however the code that does the eviction still decreases the population stat, leading to an imbalance and in some cases the underflow of the population stat. To fix, unconditionally increase the population stat upon inserting an entry, regardless of whether it is immediately evicted or not. Fixes: #5123 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>	2019-10-03 11:49:44 +03:00
Dejan Mircevski	ac98385d04	alternator: Factor out Expected operand checks Put all AttributeValuelist size verification under verify_operand_count(), rather than have some cases invoke verify_operand_count() while others verify it in check_*() functions. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 17:11:58 -04:00
Dejan Mircevski	de18b3240b	alternator:Implement NOT_NULL operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 16:23:59 -04:00
Dejan Mircevski	75960639a4	alternator: Implement NULL operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 16:19:14 -04:00
Dejan Mircevski	e4fd5f3ef0	alternator: Fix expected_1_null testcase Testcase "For NULL, AttributeValueList must be empty" accidentally used NOT_NULL instead of NULL. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 16:19:14 -04:00
Dejan Mircevski	b7ac510581	alternator: Implement IN operator in Expected Add check_IN() and a switch case that invokes it. Reactivate IN tests. Add a testcase for non-scalar attribute values. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 16:17:38 -04:00
Dejan Mircevski	56efa55a06	alternator: Implement NE operator in Expected Recognize "NE" as a new operator type, add check_NE() function, invoke it in verify_expected_one(), and reactivate NE tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 14:47:13 -04:00
Dejan Mircevski	af0462d127	alternator: Factor out common code in Expected Operand-count verification will be repeated a lot as more operators are implemented, so factor it out into verify_operand_count(). Also move `got` null checks to check_* functions, which reduces duplication at call sites. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-02 14:36:57 -04:00
Konstantin Osipov	e8c13efb41	lwt: move mutation hashers to mutation.hh Prepare mutation hashers for reuse in CAS implementation. Message-Id: <20190930202409.40561-2-kostja@scylladb.com>	2019-10-01 19:49:31 +02:00
Konstantin Osipov	6cde985946	lwt: remove code that no longer servers as a reference Remove ifdef'ed Java code, since LWT implementation is based on the current state of the origin. Message-Id: <20190930201022.40240-2-kostja@scylladb.com>	2019-10-01 19:46:15 +02:00
Konstantin Osipov	4d214b624b	lwt: ensure enum_set::of is constexpr. This allows using it to initialize const static members. Message-Id: <20190930200530.40063-2-kostja@scylladb.com>	2019-10-01 19:45:56 +02:00
Tomasz Grabiec	3b9bf9d448	Merge "storage_proxy: replace variadic futures with structs" from Avi Seastar variadic futures are deprecated, so replace with structs to avoid nasty deprecation warnings.	2019-10-01 19:32:55 +02:00
Avi Kivity	162730862d	storage_proxy: remove variadic future from query_partition_key_range_concurrent() Seastar variadic futures are deprecated, so replace with a nice struct.	2019-09-30 21:33:44 +03:00
Avi Kivity	968b34a2b4	storage_proxy: remove variadic future from digest_read_resolver Seastar variadic futures are deprecated, so replace with a nice struct.	2019-09-30 21:32:17 +03:00
Avi Kivity	90096da9f3	managed_ref: add get() accessor While a managed_ref emulates a reference more closely than it does a pointer, it is still nullable, so add a get() (similar to unique_ptr::get()) that can be nullptr if the reference is null. The immediate use will be mutation_partition::_static_row, which is often empty and takes up about 10% of a cache entry.	2019-09-30 20:55:36 +03:00
Nadav Har'El	c9aae13fae	docs/alternator/getting-started.md: fix indentation in example code The example Python code had wrong indentation, and wouldn't actually work if naively copy-pasted. Noticed by Noam Hasson. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190929091440.28042-1-nyh@scylladb.com>	2019-09-30 13:03:29 +03:00
Avi Kivity	c6b66d197b	Merge "Couple of preparatory patches for lwt" from Gleb " This is a collection of assorted patches that will be needed for LWT. Most of them are trivial, but one touches a lot of files, so have a good chance to cause rebase headache (I already had to rebase it on top of Alternator). Lets push them earlier instead of carrying them in the lwt branch. " * 'gleb/lwt-prepare-v2' of github.com:scylladb/seastar-dev: lwt: make _last_timestamp_micros static lwt: Add client_state::get_timestamp_for_paxos() function lwt: Pass client_state reference all the way to storage_proxy::query exceptions: Add a constructor for unavailable_exception that allows providing a custom message serializer: Add std::variant support lwt: Add missing functions to utils/UUID_gen.hh	2019-09-29 13:02:26 +03:00
Avi Kivity	9e990725d9	Merge "Simplify and explain from_varint_to_integer #5031 " from Rafael " This is the second version of the patch series. The previous one was just the second patch, this one adds more tests an another patch to make it easier to test that the new code has the same behavior as the old one. " * 'espindola/overflow-is-intentional' of https://github.com/espindola/scylla: types: Simplify and explain from_varint_to_integer Add more cast tests	2019-09-29 11:27:55 +03:00
Tomasz Grabiec	b0e0f29b06	db: read: Filter-out sstables using its first and last keys Affects single-partition reads only. Refs #5113 When executing a query on the replica we do several things in order to narrow down the sstable set we read from. For tables which use LeveledCompactionStrategy, we store sstables in an interval set and we select only sstables whose partition ranges overlap with the queried range. Other compaction strategies don't organize the sstables and will select all sstables at this stage. The reasoning behind this is that for non-LCS compaction strategies the sstables' ranges will typically overlap and using interval sets in this case would not be effective and would result in quadratic (in sstable count) memory consumption. The assumption for overlap does not hold if the sstables come from repair or streaming, which generates non-overlapping sstables. At a later stage, for single-partition queries, we use the sstables' bloom filter (kept in memory) to drop sstables which surely don't contain given partition. Then we proceed to sstable indexes to narrow down the data file range. Tables which don't use LCS will do unnecessary I/O to read index pages for single-partition reads if the partition is outside of the sstable's range and the bloom filter is ineffective (Refs #5112). This patch fixes the problem by consulting sstable's partition range in addition to the bloom filter, so that the non-overlapping sstables will be filtered out with certainty and not depend on bloom filter's efficiency. It's also faster to drop sstables based on the keys than the bloom filter. Tests: - unit (dev) - manual using cqlsh Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>	2019-09-28 19:42:57 +03:00
Tomasz Grabiec	b93cc21a94	sstables: Fix partition key count estimation for a range The method sstable::estimated_keys_for_range() was severely under-estimating the number of partitions in an sstable for a given token range. The first reason is that it underestimated the number of sstable index pages covered by the range, by one. In extreme, if the requested range falls into a single index page, we will assume 0 pages, and report 1 partition. The reason is that we were using get_sample_indexes_for_range(), which returns entries with the keys falling into the range, not entries for pages which may contain the keys. A single page can have a lot of partitions though. By default, there is a 1:20000 ratio between summary entry size and the data file size covered by it. If partitions are small, that can be many hundreds of partitions. Another reason is that we underestimate the number of partitions in an index page. We multiply the number of pages by: (downsampling::BASE_SAMPLING_LEVEL * _components->summary.header.min_index_interval) / _components->summary.header.sampling_level Using defaults, that means multiplying by 128. In the cassandra-stress workload a single partition takes about 300 bytes in the data file and summary entry is 22 bytes. That means a single page covers 22 * 20'000 = 440'000 bytes of the data file, which contains about 1'466 partitions. So we underestimate by an order of magnitude. Underestimating the number of partitions will result in too small bloom filters being generated for the sstables which are the output of repair or streaming. This will make the bloom filters ineffective which results in reads selecting more sstables than necessary. The fix is to base the estimation on the number of index pages which may contain keys for the range, and multiply that by the average key count per index page. Fixes #5112. Refs #4994. The output of test_key_count_estimation: Before: count = 10000 est = 10112 est([-inf; +inf]) = 512 est([0; 0]) = 128 est([0; 63]) = 128 est([0; 255]) = 128 est([0; 511]) = 128 est([0; 1023]) = 128 est([0; 4095]) = 256 est([0; 9999]) = 512 est([5000; 5000]) = 1 est([5000; 5063]) = 1 est([5000; 5255]) = 1 est([5000; 5511]) = 1 est([5000; 6023]) = 128 est([5000; 9095]) = 256 est([5000; 9999]) = 256 est(non-overlapping to the left) = 1 est(non-overlapping to the right) = 1 After: count = 10000 est = 10112 est([-inf; +inf]) = 10112 est([0; 0]) = 2528 est([0; 63]) = 2528 est([0; 255]) = 2528 est([0; 511]) = 2528 est([0; 1023]) = 2528 est([0; 4095]) = 5056 est([0; 9999]) = 10112 est([5000; 5000]) = 2528 est([5000; 5063]) = 2528 est([5000; 5255]) = 2528 est([5000; 5511]) = 2528 est([5000; 6023]) = 5056 est([5000; 9095]) = 7584 est([5000; 9999]) = 7584 est(non-overlapping to the left) = 0 est(non-overlapping to the right) = 0 Tests: - unit (dev) Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190927141339.31315-1-tgrabiec@scylladb.com>	2019-09-28 19:36:43 +03:00
Piotr Sarna	10f90d0e25	types: remove deprecated comment The comment does not apply anymore, as this definition is no more in database.hh. Message-Id: <a0b6ff851e1e3bcb5fcd402fbf363be7af0219af.1569580556.git.sarna@scylladb.com>	2019-09-27 19:32:17 +02:00
Dejan Mircevski	9a89e0c5ec	dbuild: Update README on interactive mode `dbuild` was recently (`24c732057`) updated to run in interactive mode when given no arguments; we can now update the README to mention that. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-27 16:33:27 +02:00
Dejan Mircevski	f8638d8ae1	alternator: Add build byproducts to .gitignore Add .pytest_cache and expressions.tokens to the top-level .gitignore. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-27 16:18:45 +02:00
Dejan Mircevski	332ffa77ea	alternator: Actually use BEGINS_WITH in its tests For some reason, BEGINS_WITH tests used EQ as comparison operator. Tests: pytest test_expected.py Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-26 22:41:34 +03:00
Tomasz Grabiec	5b0e48f25b	Merge "toppartitions: don't transport schema_ptr across shards" from Avi When the toppartitions operation gathers results, it copies partition keys with their schema_ptr:s. When these schema_ptr:s are copies or destroyed, they can cause leaks or premature frees of the schema in its original shard since reference count operations in are not atomic. Fix that by converting the schema_ptr to a global_schema_ptr during transportation. Fixes #5104 (direct bug) Fixes #5018 (schema prematurely freed, toppartitions previously executed on that node) Fixes #4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node) Tests: new test added that fails with the existing code in debug mode, manual toppartitions test	2019-09-26 17:09:54 +02:00
Avi Kivity	36b4d55b28	tests: add test for toppartitions cross-shard schema_ptr copy	2019-09-26 17:40:46 +03:00
Avi Kivity	670f398a8a	toppartitions: do not copy schema_ptr:s in item keys across shards Copying schema_ptrs across shards results in memory corruption since lw_shared_ptr does not use atomic operations for reference counts. Prevent that by converting schema_ptr:s to global_schema_ptr:s before shipping them across shards in the map operation, and converting them back to local schema_ptr:s in the reduce operation.	2019-09-26 17:26:40 +03:00
Avi Kivity	f015bd69b7	toppartitions: compare schemas using schema::id(), not pointer to schema This allows keys from different stages in the schema's like to compare equal. This is safe since the partition key cannot change, unlike the rest of the schema. More importantly, it will allow us to compare keys made local after a pass through global_schema_ptr, which does not guarantee that the schema_ptr conversion will be the same even when starting with the same global_schema_ptr.	2019-09-26 17:15:46 +03:00
Avi Kivity	ea4976a128	schema_registry: mark global_schema_ptr move constructor noexcept Throwing move constructors are a a pain; so we should try to make them noexcept. Currently, global_schema_ptr's move constructor throws an exception if used illegaly (moving from a different shard); this patch changes it to an assert, on the grounds that this error is impossible to recover from. The direct motivation for the patch is the desire to store objects containing a global_schema_ptr in a chunked_vector, to move lists of partition keys across shards for the topppartitions functionality. chunked_vector currently requires noexcept move constructors for its value_type.	2019-09-26 16:56:59 +03:00
Avi Kivity	ba64ec78cf	messaging_service: use rpc::tuple instead of variadic futures for rpc Since variadic future<> is deprecated, switch to rpc::tuple for multiple return values in rpc calls. This is more or less mechanical translation.	2019-09-26 12:09:31 +02:00
Tomasz Grabiec	9183e28f2c	Merge "Recreate dependent user types" from Rafael When a user type changes we were not recreating other uses types that use it. This patch series fixes that and makes it clear which code is responsible for it. In the system.types table a user type refers to another by name. When a user type is modified, only its entry in the table is changed. At runtime a user type has direct pointer to the types it uses. To handle the discrepancy we need to recreate any dependent types when a entry in system.types changes. Fixes #5049	2019-09-26 12:06:32 +02:00
Gleb Natapov	e0b303b432	lwt: make _last_timestamp_micros static If each client_state has its own copy of the variable two clients may generate timestamps that clash and needlessly create contention. Making the variable shared between all client_state on the same shard will make sure this will not happen to two clients on the same shard. It may still happen for two client on two different shards or two different nodes.	2019-09-26 11:44:00 +03:00
Gleb Natapov	622d21f740	lwt: Add client_state::get_timestamp_for_paxos() function Paxos needs a unique timestamp that is greater than some other timestamp, so that the next round had more chances to succeed. Add a function that returns such a timestamp.	2019-09-26 11:44:00 +03:00
Gleb Natapov	e72a105b5e	lwt: Pass client_state reference all the way to storage_proxy::query client_state holds a state to generate monotonically increasing unique timestamp. Queries with a SERIAL consistency level need it to generate a paxos round.	2019-09-26 11:44:00 +03:00
Gleb Natapov	556f65e8a1	exceptions: Add a constructor for unavailable_exception that allows providing a custom message	2019-09-26 11:44:00 +03:00
Gleb Natapov	209414b4eb	serializer: Add std::variant support	2019-09-26 11:44:00 +03:00
Gleb Natapov	f9209e27d4	lwt: Add missing functions to utils/UUID_gen.hh Some lwt related code is missing in our UUID implementation. Add it.	2019-09-26 11:44:00 +03:00
Rafael Ávila de Espíndola	5af8b1e4a3	types: recreate dependent user types. In the system.types table a user type refers to another by name. When a user type is modified, only its entry in the table is changed. At runtime a user type has direct pointer to the types it uses. To handle the discrepancy we need to recreate any dependent types when a entry in system.types changes. Fixes #5049 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	4c3209c549	types: Don't include dependent user types in update. The way schema changes propagate is by editing the system tables and comparing the before and after state. When a user type A uses another user type B and we modify B, the representation of A in the system table doesn't change, so this code was not producing any changes on the diff that the receiving side uses. Deleting it makes it clear that it is the receiver's responsibility to handle dependent user types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	34eddafdb0	types: Don't modify the type list in db::cql_type_parser::raw_builder With this patch db::cql_type_parser::raw_builder creates a local copy of the list of existing types and uses that internally. By doing that build() should have no observable behavior other than returning the new types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	d6b2e3b23b	types: pass a reference to prepare_internal We were never passing a null pointer and never saving a copy of the lw_shared_ptr. Passing a reference is more flexible as not all callers are required to hold the user_types_metadata in a lw_shared_ptr. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:40:30 -07:00
Avi Kivity	03260dd910	Update seastar submodule * seastar b56a8c5045...c21a7557f9 (3): > net: socket::{set,get}_reuseaddr() should not be virtual > iotune: print verbose message in case of shutdown errors > iotune: close test file on shutdown Fixes #4946.	2019-09-25 16:08:32 +03:00
Tomasz Grabiec	06b9818e98	Merge "storage_proxy: tolerate view_update_write_response_handler id not found on shutdown" from Benny 1. Add assert in remove_response_handler to make crashes like in #5032 easier to understand. 2. Lookup the view_update_write_response_handler id before calling timeout_cb and tolerate it not found. Just log a warning if this happened. Fixes #5032	2019-09-25 14:49:42 +02:00
Avi Kivity	83bc59a89f	Merge "mvcc: Fix incorrect schema version being used to copy the mutation when applying (#5099 )" from Tomasz " Currently affects only counter tables. Introduced in `27014a2`. mutation_partition(s, mp) is incorrect because it uses s to interpret mp, while it should use mp_schema. We may hit this if the current node has a newer schema than the incoming mutation. This can happen during table schema altering when we receive the mutation from a node which hasn't processed the schema change yet. This is undefined behavior in general. If the alter was adding or removing columns, this may result in corruption of the write where values of one column are inserted into a different column. Fixes #5095. " * 'fix-schema-alter-counter-tables' of https://github.com/tgrabiec/scylla: mvcc: Fix incorrect schema verison being used to copy the mutation when applying mutation_partition: Track and validate schema version in debug builds tests: Use the correct schema to access mutation_partition	2019-09-25 15:30:22 +03:00
Tomasz Grabiec	11440ff792	mvcc: Fix incorrect schema verison being used to copy the mutation when applying Currently affects only counter tables. Introduced in `27014a2`. mutation_partition(s, mp) is incorrect, because it uses s to interpret mp, while it should use mp_schema. We may hit this if the current node has a newer schema than the incoming mutation. This can happen during alter when we receive the mutation from a node which hasn't processed the schema change yet. This is undefined behavior in general. If the alter was adding or removing columns, this may result in corruption of the write where values of one column are inserted into a different column. Fixes #5095.	2019-09-25 11:28:07 +02:00
Tomasz Grabiec	bce0dac751	mutation_partition: Track and validate schema version in debug builds This patch makes mutation_partition validate the invariant that it's supposed to be accessed only with the schema version which it conforms to. Refs #5095	2019-09-25 10:27:06 +02:00
Avi Kivity	721fa44c4f	Update seastar submodule * seastar e51a1a8ed9...b56a8c5045 (3): > net: add support for UNIX-domain sockets > future: Warn on promise::set_exception with no corresponding future or task > Merge "Handle exceptions in repeat_until_value and misc cleanups" from Rafael	2019-09-25 11:21:57 +03:00
Benny Halevy	e9388b3f03	storage_proxy::drain_on_shutdown fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	b7c7af8a75	storage_proxy: validate id from view_update_handlers_list Handle a race where a write handler is removed from _response_handlers but not yet from _view_update_handlers_list. Fixes #5032 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	1fea5f5904	storage_proxy: refactor remove_response_handler Refactor remove_response_handler_entry out of remove_response_handler, to be called on a valid iterator found by _response_handlers.find(id). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	592c4bcfc2	storage_proxy: remove_response_handler: assert id was found Help identify cases like seen in #5032 where the handler id wasn't found from the on_down -> timeout_cb path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Raphael S. Carvalho	571fa94eb5	sstables/compaction_manager: Don't perform upgrade on shared SSTables compaction_manager::perform_sstable_upgrade() fails when it feeds compaction mechanism with shared sstables. Shared sstables should be ignored when performing upgrade and so wait for reshard to pick them up in parallel. Whenever a shared sstable is brought up either on restart or via refresh, reshard procedure kicks in. Reshard picks the highest supported format so the upgrade for shared sstable will naturally take place. Fixes #5056. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190925042414.4330-1-raphaelsc@scylladb.com>	2019-09-25 11:18:40 +03:00
Asias He	19e8c14ad1	gossiper: Improve the gossip timer callback lock handling (#5097 ) - Update the outdated comments in do_stop_gossiping. It was storage_service not storage_proxy that used the lock. More importantly, storage_service does not use it any more. - Drop the unused timer_callback_lock and timer_callback_unlock API - Use with_semaphore to make sure the semaphore usage is balanced. - Add log in gossiper::do_stop_gossiping when it tries to take the semaphore to help debug hang during the shutdown. Refs: #4891 Refs: #4971	2019-09-25 10:46:38 +03:00
Tomasz Grabiec	4d9b176aaa	tests: Use the correct schema to access mutation_partition	2019-09-24 19:46:57 +02:00
Botond Dénes	425cc0c104	doc: add debugging.md A documentation file that is intended to be a place for anything debugging related: getting started tutorial, tips and tricks and advanced guides. For now it contains a short introductions, some selected links to more in-depth documentation and some trips and tricks that I could think off the top of my head. One of those tricks describes how to load cores obtained from relocatable packages inside the `dbuild` container. I originally intended to add that to `tools/toolchain/README.md` but was convinced that `docs/debugging.md` would be a better place for this. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190924133110.15069-1-bdenes@scylladb.com>	2019-09-24 20:18:45 +03:00
Botond Dénes	d57ab83bc8	querier_cache: add `inserted` stat Recently we have seen a case where the population stat of the cache was corrupt, either due to misaccounting or some more serious corruption. When debugging something like that it would have been useful to know how many items have been inserted to the cache. I also believe that such a counter could be useful generally as well. Refs: #4918 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190924083429.43038-1-bdenes@scylladb.com>	2019-09-24 10:52:49 +02:00
Avi Kivity	8e8a048ada	Merge "lsa: Assert no cross-shard region locking #5090 " from Tomasz " We observed an abort on bad_alloc which was not caused by real OOM, but could be explained by cache region being locked from a different shard, which is not allowed, concurrently with memory reclamation. It's impossible now to prove this, or, if that was indeed the case, to determine which code path was attempting such lock. This patch adds an assert which would catch such incorrect locking at the attempt. Refs #4978 Tests: - unit (dev, release, debug) " * 'assert-no-xshard-lsa-locking' of https://github.com/tgrabiec/scylla: lsa: Assert no cross-shard region locking tests: Make managed_vector_test a seastar test	2019-09-23 19:52:47 +03:00
Avi Kivity	79d17f3c80	Update seastar submodule * seastar 2a526bb120...e51a1a8ed9 (2): > rpc: introduce rpc::tuple as a way to move away from variadic future > shared_future: don't warn on broken futures	2019-09-23 19:50:40 +03:00
Avi Kivity	1b8009d10c	sstables: compaction_manager: #include seastarx.hh Make it easier for the IDE to resolve references to the seastar namespace. In any case include files should be stand-alone and not depend on previously included files.	2019-09-23 16:12:49 +02:00
Avi Kivity	07af9774b3	relocatable: erase build directory from executable and debug info The build directory is meaningless, since it is typically some directory in a continuous integration server. That means someone debugging the relocatable package needs to issue the gdb command 'set substitute-path' with the correct arguments, or they lose source debugging. Doing so in the relocatable package build saves this step. The default build is not modified, since a typical local build benefits from having the paths hardcoded, as the debugger will find the sources automatically.	2019-09-23 13:08:15 +02:00
Tomasz Grabiec	eb08ab7ed9	lsa: Assert no cross-shard region locking We observed an abort on bad_alloc which was not caused by real OOM, but could be explained by cache region being locked from a different shard, which is not allowed, concurrently with memory reclamation. It's impossible now to prove this, or, if that was indeed the case, to determine which code path was attempting such lock. This patch adds an assert which would catch such incorrect locking at the attempt. Refs #4978	2019-09-23 12:51:29 +02:00
Tomasz Grabiec	8bedcd6696	tests: Make managed_vector_test a seastar test LSA will depend on seastar reactor being present.	2019-09-23 12:51:24 +02:00
Raphael S. Carvalho	b4cf429aab	sstables/LCS: Fix increased write amplification due to incorrect SSTable demotion LCS demotes a SSTable from a given level when it thinks that level is inactive. Inactive level means N rounds (compaction attempt) without any activity in it, in other words, no SSTable has been promoted to it. The problem happens because the metadata that tracks inactiveness of each level can be incorrectly updated when there's an ongoing compaction. LCS has parallel compaction disabled. So if a table finds itself running a long operation like cleanup that blocks minor compaction, LCS could incorrectly think that many levels need demotion, and by the time cleanup finishes, some demotions would incorrectly take place. This problem is fixed by only updating the counter that tracks inactiveness when compaction completes, so it's not incorrectly updated when there's an ongoing compaction for the table. Fixes #4919. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190917235708.8131-1-raphaelsc@scylladb.com>	2019-09-22 10:46:38 +03:00
Eliran Sinvani	280715ad45	Storage proxy: protect against infinite recursion in query_partition_key_range_concurrent A recent fix to #3767 limited the amount of ranges that can return from query_ranges_to_vnodes_generator. This with the combination of a large amount of token ranges can lead to an infinite recursion. The algorithm multiplies by factor of 2 (actualy a shift left by one) the amount of requested tokens in each recursion iteration. As long as the requested number of ranges is greater than 0, the recursion is implicit, and each call is scheduled separately since the call is inside a continuation of a map reduce. But if the amount of iterations is large enough (~32) the counter for requested ranges zeros out and from that moment on two things will happen: 1. The counter will remain 0 forever (02 == 0) 2. The map reduce future will be immediately available and this will result in the continuation being invoked immediately. The latter causes the recursive call to be a "regular" recursive call thus, through the stack and not the task queue of the scheduler, and the former causes this recursion to be infinite. The combination creates a stack that keeps growing and eventually overflows resulting in undefined behavior (due to memory overrun). This patch prevent the problem from happening, it limits the growth of the concurrency counter beyond twice the last amount of tokens returned by the query_ranges_to_vnodes_generator.And also makes sure it is not get stuck at zero. Testing: Unit test in dev mode. * Modified add 50 dtest that reproduce the problem Fixes #4944 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190922072838.14957-1-eliransin@scylladb.com>	2019-09-22 10:33:31 +03:00
Gleb Natapov	73e3d0a283	messaging_service: enable reuseaddr on messaging service rpc Fixes #4943 Message-Id: <20190918152405.GV21540@scylladb.com>	2019-09-19 11:43:03 +03:00
Rafael Ávila de Espíndola	4d0916a094	commitlog: Handle gate_closed_exception Before this patch, if the _gate is closed, with_gate throws and forward_to is not executed. When the promise<> p is destroyed it marks its _task as a broken promise. What happens next depends on the branch. On master, we warn when the shared_future is destroyed, so this patch changes the warning from a broken_promise to a gate closed. On 3.1, we warn when the promises in shared_future::_peers are destroyed since they no longer have a future attached: The future that was attached was the "auto f" just before the with_gate call, and it is destroyed when with_gate throws. The net result is that this patch fixes the warning in 3.1. I will send a patch to seastar to make the warning on master more consistent with the warning in 3.1. Fixes #4394 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190917211915.117252-1-espindola@scylladb.com>	2019-09-17 23:41:21 +02:00
Avi Kivity	60656d1959	Update seastar submodule * seastar 84d8e9fe9b...2a526bb120 (1): > iotune: fix exception handling in case test file creation fails Fixes #5001.	2019-09-16 19:39:14 +03:00
Glauber Costa	c9f2d1d105	do not crash in user-defined operations if the controller is disabled Scylla currently crashes if we run manual operations like nodetool compact with the controller disabled. While we neither like nor recommend running with the controller disabled, due to some corner cases in the controller algorithm we are not yet at the point in which we can deprecate this and are sometimes forced to disable it. The reason for the crash is that manual operations will invoke _backlog_of_shares, which returns what is the backlog needed to create a certain number of shares. That scan the existing control points, but when we run without the controller there are no control points and we crash. Backlog doesn't matter if the controller is disabled, and the return value of this function will be immaterial in this case. So to avoid the crash, we return something right away if the controller is disabled. Fixes #5016 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-09-16 18:26:57 +02:00
Avi Kivity	d77171e10e	build: adjust libthread_db file name to match gdb expectations gdb searches for libthread_db.so using its canonical name of libthread_db.so.1 rather than the file name of libthread_db-1.0.so, so use that name to store the file in the archive. Fixes #4996.	2019-09-16 14:48:42 +02:00
Avi Kivity	7502985112	Update seastar submodule * seastar b3fb4aaab3...84d8e9fe9b (8): > Use aio fsync if available > Merge "fix some tcp connection bugs and add reuseaddr option to a client socket" from Gleb > lz4: use LZ4_decompress_safe > reactor: document seastar::remove_file() > core/file.hh: remove redundant std::move() > core/{file,sstring}: do not add `const` to return value > http/api_docs: always call parent constructor > Add input_stream blurb	2019-09-16 11:52:55 +03:00
Piotr Sarna	feec3825aa	view: degrade shutdown bookkeeping update failures log to warn Currently, if updating bookkeeping operations for view building fails, we log the error message and continue. However, during shutdown, some errors are more likely to happen due to existing issues like #4384. To differentiate actual errors from semi-expected errors during shutdown, the latter are now logged with a warning level instead of error. Fixes #4954	2019-09-16 10:13:06 +03:00
Piotr Sarna	f912122072	main: log unexpected errors thrown on shutdown (#4993 ) Shutdown routines are usually implemented via the deferred_action mechanism, which runs a function in its destructor. We thus expect the function to be noexcept, but unfortunately it's not always the case. Throwing in the destructor results in terminating the program anyway, but before we do that, the exception can be logged so it's easier to investigate and pinpoint the issue. Example output before the patch: INFO 2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder terminate called without an active exception Aborting on shard 0. Backtrace: 0x000000000184a9ad (...) Example output after the patch: INFO 2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder ERROR 2019-09-10 12:49:05,858 [shard 0] init - Unexpected error on shutdown: std::runtime_error (Hello there!) terminate called without an active exception Aborting on shard 0. Backtrace: 0x000000000184a9ad (...)	2019-09-16 09:42:55 +03:00
Rafael Ávila de Espíndola	1d9ba4c79b	types: Simplify and explain from_varint_to_integer This simplifies the implementation of from_varint_to_integer and avoids using the fact that a static_cast from cpp_int to uint64_t seems to just keep the low 64 bits. The boost release notes (https://www.boost.org/users/history/version_1_67_0.html) implies that the conversion function should return the maximum value a uint64_t can hold if the original value is too large. The idea of using a & with ~0 is a suggestion from the boost release notes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-15 14:44:54 -07:00
Rafael Ávila de Espíndola	6611e9faf7	Add more cast tests These cover converting a varint to a value smaller than 64 bits. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-15 14:44:54 -07:00
Benny Halevy	c22ad90c04	scyllatop: livedata, metric: expire absent metrics Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 19:48:09 +03:00
Benny Halevy	6e807a56e1	scyllatop: livedata: update all metrics based on new discovered list Update current results dictionary using the Metric.discover method. New results are added and missing results are marked as absent. (Both full metrics or specific keys) Previously, with prometheous, each metric.update called query_list resulting in O(n^2) when all metric were updated, like in the scylla_top dtest - causing test timeout when testing debug build. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 19:45:34 +03:00
Benny Halevy	16de4600a0	scyllatop: metric: return discover results as dict So that we can easily search by symbol for updating multiple results in a single pass. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	02707621d4	scyllatop: metric: update_info in discover So that all metric information can be retrieved in a single pass. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	3861460d3b	scyllatop: metric: refactor update method Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	99ab60fc27	scyllatop: metric: add_to_results In preparation to changing results to a dict use a method to add a new metric to the results. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	b489556807	scyllatop: metric: refactor discover and discover_with_help Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	8f7c721907	scyllatop: livedata: get rid of _setupUserSpecifiedMetrics Add self._metricPatterns member and merge _setupUserSpecifiedMetrics with _initializeMetrics. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Benny Halevy	c17aee0dd3	scyllatop: add debug logging Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-15 16:07:19 +03:00
Tomasz Grabiec	79935df959	commitlog: replay: Respect back-pressure from memtable space to prevent OOM Commit log replay was bypassing memtable space back-pressure, and if replay was faster than memtable flush, it could lead to OOM. The fix is to call database::apply_in_memory() instead of table::apply(). The former blocks when memtable space is full. Fixes #4982. Tests: - unit (release) - manual, replay with memtable flush failin and without failing Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>	2019-09-15 11:51:56 +03:00
Tomasz Grabiec	3c49b2960b	gdb: Introduce 'scylla memtables' Example output: (gdb) scylla memtables table "ks_truncate"."standard1": (memtable) 0x60c0005a5500: total=131072, used=131072, free=0, flushed=0 table "keyspace1"."standard1": (memtable) 0x60c0005a6000: total=5144444928, used=4512728524, free=631716404, flushed=0 (memtable) 0x60c0005a8a80: total=426901504, used=374294312, free=52607192, flushed=0 (memtable) 0x60c000eb6a80: total=0, used=0, free=0, flushed=0 table "system_traces"."sessions_time_idx": (memtable*) 0x60c0005a4d80: total=131072, used=131072, free=0, flushed=0 Message-Id: <1568133476-22463-1-git-send-email-tgrabiec@scylladb.com>	2019-09-15 10:39:55 +03:00
Kamil Braun	9bf4fe669f	Auto-expand replication_factor for NetworkTopologyStrategy (#4667 ) If the user supplies the 'replication_factor' to the 'NetworkTopologyStrategy' class, it will expand into a replication factor for each existing DC for their convenience. Resolves #4210. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-09-15 10:38:09 +03:00
Tomasz Grabiec	8517eecc28	Revert "Simplify db::cql_type_parser::parse" This reverts commit `7f64a6ec4b`. Fixes #5011 The reverted commit exposes #3760 for all schemas, not only those which have UDTs. The problem is that table schema deserialization now requires keyspace to be present. If the replica hasn't received schema changes which introduce the keyspace yet, the write will fail.	2019-09-12 12:45:21 +02:00
Nadav Har'El	67a07e9cbc	README.md: mention Alternator Mention on the top-level README.md that Scylla by default is compatible with Cassandra, but also has experimental support for DynamoDB's API. Provide links to alternator/alternator.md and alternator/getting-started.md with more information about this feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190911080913.10141-1-nyh@scylladb.com>	2019-09-11 18:01:58 +03:00
Avi Kivity	c08921b55a	Merge "Alternator - Add support for DynamoDB Compatible API in Scylla" from Nadav & Piotr " In this patch set, written by Piotr Sarna and myself, we add Alternator - a new Scylla feature adding compatibility with the API of Amazon DynamoDB(TM). DynamoDB's API uses JSON-encoded requests and responses which are sent over an HTTP or HTTPS transport. It is described in detail on Amazon's site: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/ Our goal is that any application written to use Amazon DynamoDB could be run, unmodified, against Scylla with Alternator enabled. However, at this stage the Alternator implementation is incomplete, and some of DynamoDB's API features are not yet supported. The extent of Alternator's compatibility with DynamoDB is described in the document docs/alternator/alternator.md included in this patch set. The same document also describes Alternator's design (and also points to a longer design document). By default, Scylla continues to listen only to Cassandra API requests and not DynamoDB API requests. To enable DynamoDB-API compatibility, you must set the alternator-port configuration option (via command line or YAML) to the port on which you wish to listen for DynamoDB API requests. For more information, see docs/alternator/alternator.md. The document docs/alternator/getting-started.md also contains some examples of how to get started with Alternator. " * 'alternator' of https://github.com/nyh/scylla: (272 commits) Added comments about DAX, monitoring and more alternator: fix usage of client_state alternator-test: complete test_expected.py for rest of comparison operators alternator-test: reproduce bug in Expected with EQ of set value alternator: implement the Expected request parameter alternator: add returning PAY_PER_REQUEST billing mode alternator: update docs/alternator.md on GSI/LSI situation Alternator: Add getting started document for alternator move alternator.md to its own directory alternator-test: add xfail test for GSI with 2 regular columns alternator/executor.cc: Latencies should use steady_clock alternator-test: fix LSI tests alternator-test: fix test_describe_endpoints.py for AWS run alternator-test: test_describe_endpoints.py without configuring AWS alternator: run local tests without configuring AWS alternator-test: add LSI tests alternator-test: bump create table time limit to 200s alternator: add basic LSI support alternator: rename reserved column name "attrs" alternator: migrate make_map_element_restriction to string view ...	2019-09-11 18:01:05 +03:00
Dor Laor	7d639d058e	Added comments about DAX, monitoring and more	2019-09-11 18:01:05 +03:00
Nadav Har'El	c953aa3e20	alternator-test: complete test_expected.py for rest of comparison operators This patch adds tests for all the missing comparion operators in the Expected parameter (the old-style parameter for conditional operations). All these new tests are now xfailing on Alternator (and succeeding on DynamoDB), because these operators are not yet implemented in Alternator (we only implemented EQ and BEGINS_WITH, so far - the rest are easy but need to be implemented). The test_expected.py is now hopefully comprehensive, covering the entire feature set of the "Expected" parameter and all its various cases and subcases. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190910092208.23461-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	23bb3948ee	alternator-test: reproduce bug in Expected with EQ of set value Our implementation of the "EQ" operator in Expected (conditional operation) just compares the JSON represntation of the values. This is almost always correct, but unfortunately incorrect for sets - where we can have two equal sets despite having a different order. This patch just adds an (xfailing) test for this bug. The bug itself can be fixed in the future in one of several ways including changing the implementation of EQ, or changing the serialization of sets so they'll always be sorted in the same way. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190909125147.16484-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	13d657b20d	alternator: implement the Expected request parameter In this patch we implement the Expected parameter for the UpdateItem, PutItem and DeleteItem operations. This parameter allows a conditional update - i.e., do an update only if the existing value of the item matches some condition. This is the older form of conditional updates, but is still used by many applications, including Amazon's Tic-Tac-Toe demo. As usual, we do not yet provide isolation guarantees for read-modify-write operations - the item is simply read before the modification, and there is no protection against concurrent operation. This will of course need to be addressed in the future. The Expected parameter has a relatively large number of variations, and most of them are supported by this code, except that currenly only two comparison operators are supported (EQ and BEGINS_WITH) out of the 13 listed in the documentation. The rest will be implemented later. This patch also includes comprehensive tests for the Expected feature. These tests are almost exhaustive, except for one missing part (labled FIXME) - among the 13 comparison operations, the tests only check the EQ and BEGINS_WITH operators. We'll later need to add checks to the rest of them as well. As usual, all the tests pass on Amazon DynamoDB, and after this patch all of them succeed on Alternator too. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190905125558.29133-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	c5fc48d1ee	alternator: add returning PAY_PER_REQUEST billing mode In order for Spark jobs to work correctly, a hardcoded PAY_PER_REQUEST billing mode entry is returned when describing a table with a DescribeTable request. Also, one test case in test_describe_table.py is no longer marked XFAIL. Message-Id: <a4e6d02788d8be48b389045e6ff8c1628240197c.1567688894.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	b58eadd6c9	alternator: update docs/alternator.md on GSI/LSI situation Update docs/alternator.md on the current level of compatibility of our GSI and LSI implementation vs. DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190904120730.12615-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Eliran Sinvani	a6f600c54f	Alternator: Add getting started document for alternator This patch adds a getting started document for alternator, it explains how to start up a cluster that has an alternator API port open and how to test that it works using either an application or some simple and minimal python scripts. The goal of the document is to get a user to have an up and running docker based cluster with alternator support in the shortest time possible.	2019-09-11 18:01:05 +03:00
Eliran Sinvani	573ff2de35	move alternator.md to its own directory As part of trying to make alternator more accessible to users, we expect more documents to be created so it seems like a good idea to give all of the alternator docs their own directory.	2019-09-11 18:01:05 +03:00
Piotr Sarna	6579a3850a	alternator-test: add xfail test for GSI with 2 regular columns When updating the second regular base column that is also a view key, the code in Scylla will assume it only needs to update an entry instead of replacing an old one. This leads to inconsitencies exposed in the test case. Message-Id: <5dfeb9f61f986daa6e480e9da4c7aabb5a09a4ec.1567599461.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Amnon Heiman	722b4b6e98	alternator/executor.cc: Latencies should use steady_clock To get a correct latency estimations executor should use a higher clock resolution. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	b470137cea	alternator-test: fix LSI tests LSI tests are amended, so they no longer needlessly XPASS: * two xpassing tests are no longer marked XFAIL * there's an additional test for partial projection that succeeds on DynamoDB and does not work fine yet in alternator Message-Id: <0418186cb6c8a91de84837ffef9ac0947ea4e3d3.1567585915.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	dc1d577421	alternator-test: fix test_describe_endpoints.py for AWS run The previous patch fixed test_describe_endpoints.py for a local run without an AWS configuration. But when running with "--aws", we do need to use that AWS configuration, and this patch fixes this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	897dffb977	alternator-test: test_describe_endpoints.py without configuring AWS Even when running against a local Alternator, Boto3 wants to know the region name, and AWS credentials, even though they aren't actually needed. For a local run, we can supply garbage values for these settings, to allow a user who never configured AWS to run tests locally. Running against "--aws" will, of course, still require the user to configure AWS. The previous patch already fixed this for most tests, this patch fixes the same issue in test_describe_endpoints.py, which had a separate copy of the problematic code. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	b39101cd04	alternator: run local tests without configuring AWS Even when running against a local Alternator, Boto3 wants to know the region name, and AWS credentials, even though they aren't actually needed. For a local run, we can supply garbage values for these settings, to allow a user who never configured AWS to run tests locally. Running against "--aws" will, of course, still require the user to configure AWS. Also modified the README to be clearer, and more focused on the local runs. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708121420.7485-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	efff187deb	alternator-test: add LSI tests Cases for local secondary indexes are added - loosely based on test_gsi.py suite.	2019-09-11 18:01:05 +03:00
Piotr Sarna	927dc87b9c	alternator-test: bump create table time limit to 200s Unfortunately the previous 100s limit proved to be not enough for creating tables with both local and global indexes attached to them. Empirically 200s was chosen as a safe default, as the longest test oscillated around 100s with the deviation of 10s.	2019-09-11 18:01:05 +03:00
Piotr Sarna	2fcd1ff8a9	alternator: add basic LSI support With this patch, LocalSecondaryIndexes can be added to a table during its creation. The implementation is heavily shared with GlobalSecondaryIndexes and as such suffers from the same TODOs: projections, describing more details in DescribeTable, etc.	2019-09-11 18:01:05 +03:00
Nadav Har'El	7b8917b5cb	alternator: rename reserved column name "attrs" We currently reserve the column name "attrs" for a map of attributes, so the user is not allowed to use this name as a name of a key. We plan to lift this reservation in a future patch, but until we do, let's at least choose a more obscure name to forbid - in this patch ":attrs". It is even less likely that a user will want to use this specific name as a column name. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190903133508.2033-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	ef7903a90f	alternator: migrate make_map_element_restriction to string view In order to elide unnecessary copying and allow more copy elision in the future, make_map_element_restriction helper function uses string_view instead of a const string reference. Message-Id: <1a3e82e7046dc40df604ee7fbea786f3853fee4d.1567502264.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	fc946ddfba	alternator: clean error, not a crash, on reserved column name Currently, we reserve the name ATTRS_COLUMN_NAME ("attrs") - the user cannot use it as a key column name (key of the base table or GSI or LSI) because we use this name for the attribute map we add to the schema. Currently, if the user does attempt to create such a key column, the result is undefined (sometimes corrupt sstables, sometimes outright crashes). This patches fixes it to become a clean error, saying that this column name is currently reserved. The test test_create_table_special_column_name now cleanly fails, instead of crashing Scylla, so it is converted from "skip" to "xfail". Eventually we need to solve this issue completely (e.g., in rare cases rename columns to allow us to reserve a name like ATTRS_COLUMN_NAME, or alternatively, instead of using a fixed name ATTRS_COLUMN_NAME pick a different one different from the key column names). But until we do, better fail with a clear error instead of a crash. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190901102832.7452-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	d64980f2ae	alternator-test: add initial test_condition_expression file The file initially consists of a very simple case that succeeds with `--aws` and expectedly fails without it, because the expression is not implemented yet.	2019-09-11 18:01:05 +03:00
Piotr Sarna	80edc00f62	alternator-test: add tests for unsupported expressions The test cases are marked XFAIL, as their expressions are not yet supported in alternator. With `--aws`, they pass.	2019-09-11 18:01:05 +03:00
Pekka Enberg	380a7be54b	dist/docker: Add support for Alternator This adds a "alternator-address" and "alternator-port" configuration options to the Docker image, so people can enable Alternator with "docker run" with: docker run --name some-scylla -d <image> --alternator-port=8080 Message-Id: <20190902110920.19269-1-penberg@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	3fae8239fa	alternator: throw on unsupported expressions When an unsupported expression parameter is encountered - KeyConditionExpression, ConditionExpression or FilterExpression are such - alternator will return an error instead of ignoring the parameter.	2019-09-11 18:01:05 +03:00
Amnon Heiman	811df711fb	alternator/executor: update the latencies histogram This patch update the latencies histogram for get, put, delete and update. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-09-11 18:01:05 +03:00
Amnon Heiman	4a6d1f5559	alternator/stats metrics: use labels and estimated histogram This patch make two chagnes to the alternator stats: 1. It add estimated_histogram for the get, put, update and delete operation 2. It changes the metrics naming, so the operation will be a label, it will be easier to handle, perform operation and display in this way. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	de53ed7cdd	alternator_test: mark test_gsi_3 as passing The test_gsi_3, involving creating a GSI with two key columns which weren't previously a base key, now passes, so drop the "xfail" marker. We still have problems with such materialized views, but not in the simple scenario tested by test_gsi_3. Later we should create a new test for the scenario which still fails, if any. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	0e6338ffd9	alternator: allow creating GSI with 2 base regular columns Creating an underlying materialized view with 2 regular base columns is risky in Scylla, as second's column liveness will not be correctly taken into account when ensuring view row liveness. Still, in case specific conditions are met: * the regular base column value is always present in the base row * no TTLs are involved then the materialized view will behave as expected. Creating a GSI with 2 base regular columns issues a warning, as it should be performed with care. Message-Id: <5ce8642c1576529d43ea05e5c4bab64d122df829.1567159633.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	3325e76c6f	alternator: fix default BillingMode It is important that BillingMode should default to PROVISIONED, as it does on DynamoDB. This allows old clients, which don't specify BillingMode at all, to specify ProvisionedThroughput as allowed with PROVISIONED. Also added a test case for this case (where BillingMode is absent). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190829193027.7982-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	395a97e928	alternator: correct error on missing index or table When querying on a missing index, DynamoDB returns different errors in case the entire table is missing (ResourceNotFoundException) or the table exists and just the index is missing (ValidationException). We didn't make this distinction, and always returned ValidationException, but this confuses clients that expect ResourceNotFoundException - e.g., Amazon's Tic-Tac-Toe demo. This patch adds a test for the first case (the completely missing table) - we already had a test for the second case - and returns the correct error codes. As usual the test passes against DynamoDB as well as Alternator, ensure they behave the same. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190829174113.5558-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	62c4ed8ee3	alternator: improve request logging We needlessly split the trace-level log message for the request to two messages - one containing just the operation's name, and one with the parameters. Moreover we printed them in the opposite order (parameters first, then the operation). So this patch combines them into one log message. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190829165341.3600-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	f755c22577	alternator-test: reproduce bug with using "attrs" as key column name Alternator puts in the Scylla table a column called "attrs" for all the non-key attributes. If the user happens to choose the same name, "attrs", for one of the key columns, the result of writing two different columns with the same name is a mess and corrupt sstables. This test reproduces this bug (and works against DynamoDB of course). Because the test doesn't cleanly fail, but rather leaves Scylla in a bad state from which it can't fully recover, the test is marked as "skip" until we fix this bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190828135644.23248-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	6b27eaf4d0	alternator: remove redundant key checks in UpdateItem Updating key columns is not allowed in UpdateItem requests, but the series introducing GSI support for regular columns also introduced redundant duplicates checks of this kind. This condition is already checked in resolve_update_path helper function and existing test_update_expression_cannot_modify_key test makes sure that the condition is checked. Message-Id: <00f83ab631f93b263003fb09cd7b055bee1565cd.1567086111.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	04a117cda3	alternator-test: improve test_update_expression_cannot_modify_key The test test_update_expression_cannot_modify_key() verifies that an update expression cannot modify one of the key columns. The existing test only tried the SET and REMOVE actions - this patch makes the test more complete by also testing the ADD and DELETE actions. This patch also makes the expected exception more picky - we now expect that the exception message contains the word "key" (as it, indeed, does on both DynamoDB and Alternator). If we get any other exception, there may be a problem. The test passed before this patch, and passes now as well - it's just stricter now. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190829135650.30928-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	81a97b2ac0	alternator-test: add test case for GSI with both keys A case which adds a global secondary index on a table with both hash and sort keys is added.	2019-09-11 18:01:05 +03:00
Piotr Sarna	615603877c	alternator: use from_single_value instead of from_singular in ck The code previously used clustering_key::from_singular() to compute a clustering key value. It works fine, but has two issues: 1. involves one redundant deserialization stage compared to from_single_value 2. does not work with compound clustering keys, which can appear when using indexes	2019-09-11 18:01:05 +03:00
Piotr Sarna	4474ceceed	alternator-test: enable passing tests With more GSI features implemented, tests with XPASS status are promoted to being enabled. One test case (test_gsi_describe) is partially done as DescribeTable now contains index names, but we could try providing more attributes (e.g. IndexSizeBytes and ItemCount from the test case), so the test is left in the XFAIL state.	2019-09-11 18:01:05 +03:00
Piotr Sarna	f922d6d771	alternator: Add 'mismatch' to serialization error message In order to match the tests and origin more properly, the error message for mismatched types is updated so it contains the word 'mismatch'.	2019-09-11 18:01:05 +03:00
Piotr Sarna	9dceea14f9	alternator: add describing GSI in DescribeTable The DescribeTable request now contains the list of index names as well. None of the attributes of the list are marked as 'required' in the documentation, so currently the implementation provides index names only.	2019-09-11 18:01:05 +03:00
Piotr Sarna	938a06e4c0	alternator: allow adding GSI-related regular columns to schema In order to be able to create a Global Secondary Index over a regular column, this column is upgraded from being a map entry to being a full member of the schema. As such, it's possible to use this column definition in the underlying materialized view's key.	2019-09-11 18:01:05 +03:00
Piotr Sarna	2a123925ca	alternator: add handling regular columns with schema definitions In order to prepare alternator for adding regular columns to schema, i.e. in order to create a materialized view over them, the code is changed so that updating no longer assumes that only keys are included in the table schema.	2019-09-11 18:01:05 +03:00
Piotr Sarna	befa2fdc80	alternator: start fetching all regular columns Since in the future we may want to have more regular columns in alternator tables' schemas, the code is changed accordingly, so all regular columns will be fetched instead of just the attribute map.	2019-09-11 18:01:05 +03:00
Piotr Sarna	53044645aa	alternator: avoid creating empty collection mutations If no regular column attributes are passed to PutItem, the attr collector serializes an empty collection mutation nonetheless and sends it. It's redundant, so instead, if the attr colector is empty, the collection does not get serialized and sent to replicas.	2019-09-11 18:01:05 +03:00
Nadav Har'El	317954fe19	alternator-test: add license blurbs Add copyright and license blurbs to all alternator-test source files. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825161018.10358-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	c9eb9d9c76	alternator: update license blurbs Update all the license blurbs to the one we use in the open-source Scylla project, licensed under the AGPL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825160321.10016-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	d6e671b04f	alternator: add initial tracing to requests Each request provides basic tracing information about itself. Example output from tracing: cqlsh> select request, parameters from system_traces.sessions where session_id = 39813070-c4ea-11e9-8572-000000000000; request \| parameters ------------------+----------------------------------------------------- Alternator Query \| {'query': '{"TableName": "alternator_test_15664", "KeyConditions": {"p": {"AttributeValueList": [{"S": "T0FE0QCS0X"}], "ComparisonOperator": "EQ"}}}'} cqlsh> select session_id, activity from system_traces.events where session_id = 39813070-c4ea-11e9-8572-000000000000; session_id \| activity --------------------------------------+----------------------------- 39813070-c4ea-11e9-8572-000000000000 \| Querying 39813070-c4ea-11e9-8572-000000000000 \| Performing a database query	2019-09-11 18:01:05 +03:00
Piotr Sarna	cb791abb9d	alternator: enable query tracing Probabilistic tracing can be enabled via REST API. Alternator will from now on create tracing sessions for its operations as well. Examples: # trace around 0.1% of all requests curl -X POST http://localhost:10000/storage_service/trace_probability?probability=0.001 # trace everything curl -X POST http://localhost:10000/storage_service/trace_probability?probability=1	2019-09-11 18:01:05 +03:00
Piotr Sarna	6c8c31bfc9	alternator: add client state Keeping an instance of client_state is a convenient way of being able to use tracing for alternator. It's also currently used in paging, so adding a client state to executor removes the need of keeping a dummy value.	2019-09-11 18:01:05 +03:00
Piotr Sarna	1ca9dc5d47	alternator: use correct string views in serialization String views used in JSON serialization should use not only the pointer returned by rapidjson, but also the string length, as it may contain \0 characters. Additionally, one unnecessary copy is elided.	2019-09-11 18:01:05 +03:00
Nadav Har'El	32b898db7b	alternator: docs/alternator.md: link to a longer document Add a link to a longer document (currently, around 40 pages) about DynamoDB's features and how we implemented or may implement them in Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825121201.31747-2-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	a5c3d11ccb	alternator: document choice of RF After changing the choice of RF in a previous patch, let's update the relevant part of docs/alternator.md. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825121201.31747-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	d20ec9f492	alternator: expand docs/alternator.md Expand docs/alternator.md with new sections about how to run Alternator, and a very brief introduction to its design. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190818164628.12531-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	9b0ef1a311	alternator: refuse CreateTable if uses unsupported features If a user tries to create a table with a unsupported feature - a local secondary index, a used-defined encryption key or supporting streams (CDC), let's refuse the table creation, so the application doesn't continue thinking this feature is available to it. The "Tags" feature is also not supported, but it is more harmless (it is used mostly for accounting purposes) so we do not fail the table creation because of it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190818125528.9091-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	ab25472034	alternator: migrate to visitor pattern in serialization Types can now be processed with a visitor pattern, which is more neat than a chain of if statements. Message-Id: <256429b7593d8ad8dff737d8ddb356991fb2a423.1566386758.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	42d2910f2c	alternator: add from_string with raw pointer to rjson from_string is a family of function that create rjson values from strings - now it's extended with accepting raw pointer and size. Message-Id: <d443e2e4dcc115471202759ecc3641ec902ed9e4.1566386758.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	2f53423a2f	alternator: automatically choose RF: 1 or 3 In CQL, before a user can create a table, they must create a keyspace to contain this table and, among other things, specify this keyspace's RF. But in the DynamoDB API, there is no "create keyspace" operation - the user just creates a table, and there is no way, and no opportunity, to specify the requested RF. Presumably, Amazon always uses the same RF for all tables, most likely 3, although this is not officially documented anywhere. The existing code creates the keyspace during Scylla boot, with RF=1. This RF=1 always works, and is a good choice for a one-node test run, but was a really bad choice for a real cluster with multiple nodes, so this patch fixes this choice: With this patch, the keyspace creation is delayed - it doesn't happen when the first node of the cluster boots, but only when the user creates the first table. Presumably, at that time, the cluster is already up, so at that point we can make the obvious choice automatically: a one-node cluster will get RF=1, a >=3 node cluster will get RF=3. The choice of RF is logged - and the choice of RF=1 is considered a warning. Note that with this patch, keyspace creation is still automatic as it was before. The user may manually create the keyspace via CQL, to override this automatic choice. In the future we may also add additional keyspace configuration options via configuration flags or new REST requests, and the keyspace management code will also likely change as we start to support clusters with multiple regions and global tables. But for now, I think the automatic method is easiest for users who want to test-drive Alternator without reading lengthy instructions on how to set up the keyspace. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190820180610.5341-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	1a1935eb72	alternator-test: add a test for wrong BEGINS_WITH target type The test ensures that passing a non-compatible type to BEGINS WITH, e.g. a number, results in a validation error. Tested both locally and remotely. Message-Id: <894a10d3da710d97633dd12b6ac54edccc18be82.1566291989.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	b7b998568f	alternator: add to CreateTable verification of BillingMode setting We allow BillingMode to be set to either PAY_PER_REQUEST (the default) or PROVISIONED, although neither mode is fully implemented: In the former case the payment isn't accounted, and in the latter case the throughput limits are not enforced. But other settings for BillingMode are now refused, and we add a new test to verify that. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190818122919.8431-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	66a2af4f7d	alternator-test: require a new-enough boto library The alternator tests want to exercise many of the DynamoDB API features, so they need a recent enough version of the client libraries, boto3 and botocore. In particular, only in botocore 1.12.54, released a year ago, was support for BillingMode added - and we rely on this to create pay-per-request tables for our tests. Instead of letting the user run with an old version of this library and get dozens of mysterious errors, in this patch we add a test to conftest.py which cleanly aborts the test if the libraries aren't new enough, and recommends a "pip" command to upgrade these libraries. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190819121831.26101-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	64bf2b29a8	alternator-test: exhaustive tests for DescribeTable operation The DescribeTable operation was currently implemented to return the minimal information that libraries and applications usually need from it, namely verifying that some table exists. However, this operation is actually supposed to return a lot more information fields (e.g., the size of the table, its creation date, and more) which we currently don't return. This patch adds a new test file, test_describe_table.py, testing all these additional attributes that DescribeTable is supposed to return. Several of the tests are marked xfail (expected to fail) because we did not implement these attributes yet. The test is exhaustive except for attributes that have to do with four major features which will be tested together with these features: GSI, LSI, streams (CDC), and backup/restore. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190816132546.2764-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	fbd2f5077d	alternator: enable timeouts on requests Currently Alternator starts all Scylla requests (including both reads and writes) without any timeout set. Because of bugs and/or network problems, Requests can theoretically hang and waste Scylla request for hours, long after the client has given up on them and closed their connection. The DynamoDB protocol doesn't let a user specify which timeout to use, so we should just use something "reasonable", in this patch 10 seconds. Remember that all DynamoDB read and write requests are small (even scans just scan a small piece), so 10 seconds should be above and beyond anything we actually expect to see in practice. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190812105132.18651-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	b2bd3bbc1f	alternator: add "--alternator-address" configuration parameter So far we had the "--alternator-port" option allowing to configure the port on which the Alternator server listens on, but the server always listened to any address. It is important to also be able to configure the listen address - it is useful in tests running several instances of Scylla on the same machine, and useful in multi-homed machines with several interfaces. So this patch adds the "--alternator-address" option, defaulting to 0.0.0.0 (to listen on all interfaces). It works like the many other "--*-address" options that Scylla already has. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190808204641.28648-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	ea41dd2cf8	alternator: docs/alternator.md more about filtering support Give more details about what is, and what isn't, currently supported in filtering of Scan (and Query) results. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190811094425.30951-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	88eed415bd	alternator: fix indentation It turns out that recent rjson patches introduced some buggy tabs instead of spaces due to bad IDE configuration. The indentation is restored to spaces.	2019-09-11 18:01:05 +03:00
Piotr Sarna	3c11428d8d	alternator-test: add QueryFilter validation cases QueryFilter validation was lately supplemented with non-key column checks, which is hereby tested.	2019-09-11 18:01:05 +03:00
Piotr Sarna	0e0dc14302	alternator-test: add scan case for key equality filtering With key equality filtering enabled, a test case for scanning is provided.	2019-09-11 18:01:05 +03:00
Piotr Sarna	f1641caa41	alternator: add filtering for key equality Until now, filtering in alternator was possible only for non-key column equality relations. This commit adds support for equality relations for key columns.	2019-09-11 18:01:05 +03:00
Piotr Sarna	a2828f9daa	alternator: add validation to QueryFilter QueryFilter, according to docs, can only contain non-key attributes.	2019-09-11 18:01:05 +03:00
Piotr Sarna	d055658fff	alternator: add computing key bounds from filtering Alternator allows passing hash and sort key restrictions as filters - it is, however, better to incorporate these restrictions directly into partition and clustering ranges, if possible. It's also necessary, as optimizations inside restrictions_filter assume that it will not be fed unneeded rows - e.g. if filtering is not needed on partition key restrictions, they will not be checked.	2019-09-11 18:01:05 +03:00
Piotr Sarna	9c05051b59	alternator: extract getting key value subfunction Currently the only utility function for getting key bytes from JSON was to parse a document with the following format: "key_column_name" : { "key_column_type" : VALUE }. However, it's also useful to parse only the inner document, i.e.: { "key_column_type" : VALUE }.	2019-09-11 18:01:05 +03:00
Piotr Sarna	c84019116a	alternator: make make_map_element_restriction static The function has no outside users and thus does not need to be exposed.	2019-09-11 18:01:05 +03:00
Piotr Sarna	3ee99a89b1	alternator: register filtering metrics Three metrics related to filtering are added to alternator: - total rows read during filtering operations - rows read and matched by filtering - rows read and dropped by filtering	2019-09-11 18:01:05 +03:00
Piotr Sarna	b3e35dab26	alternator: add bumping filtering stats When filtering is used in querying or scanning, the number of total filtered rows is added to stats.	2019-09-11 18:01:05 +03:00
Piotr Sarna	a6d098d3eb	alternator: add cql_stats to alternator stats Some underlying operations (e.g. paging) make use of cql_stats structure from CQL3. As such, cql_stats structure is added to alternator stats in order to gather and use these statistics.	2019-09-11 18:01:05 +03:00
Piotr Sarna	3ae54892cd	alternator: fix a comment typo s/Miscellenous/Miscellaneous/g	2019-09-11 18:01:05 +03:00
Piotr Sarna	ccf778578a	alternator: register read-before-write stats Read-before-write stat counters were already introduced, but the metrics needs to be added to a metric group as well in order to be available for users.	2019-09-11 18:01:05 +03:00
Nadav Har'El	6f81d0cb15	alternator: initial support for GSI This patch adds partial support for GSI (Global Secondary Index) in Alternator, implemented using a materialized view in Scylla. This initial version only supports the specific cases of the index indexing a column which was already part of the base table's key - e.g., indexing what used to be a sort key (clustering key) in the base table. Indexing of non-key attributes (which today live in a map) is not yet supported in this version. Creation of a table with GSIs is supported, and so is deleting the table. UpdateTable which adds a GSI to an existing table is not yet supported. Query and Scan operations on the index are supported. DescribeTable does not yet list the GSIs as it should. Seven previously-failing tests now pass, so their "xfail" tag is removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190808090256.12374-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	33611acf44	alternator: add stats for read-before-write A simple metric counting how many read-before-writes were executed is added. Message-Id: <d8cc1e9d77e832bbdeff8202a9f792ceb4f1e274.1565274797.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	ae59340c15	alternator: complement rjson.hh comments Some comments in rjson.hh header file were not clear and are hereby amended. Message-Id: <7fa4e2cf39b95c176af31fe66f404a6a51a25bec.1565275276.git.sarna@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	5eb583ab09	alternator: remove missing key FIXME The case for missing key in update_item was already properly fixed along with migrating from libjsoncpp to rapidjson, but one FIXME remained in the code by mistake. Message-Id: <94b3cf53652aa932a661153c27aa2cb1207268c7.1565271432.git.sarna@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	436f806341	alternator: remove decimal_type FIXME Decimal precision problems were already solved by commit d5a1854d93c9448b1d22c2d02eb1c46a286c5404, but one FIXME remained in the code by mistake. Message-Id: <381619e26f8362a8681b83e6920052919acf1142.1565271198.git.sarna@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	b29b753196	alternator: add comments to rjson The rapidjson library needs to be used with caution in order to provide maximum performance and avoid undefined behavior. Comments added to rjson.hh describe provided methods and potential pitfalls to avoid. Message-Id: <ba94eda81c8dd2f772e1d336b36cae62d39ed7e1.1565270214.git.sarna@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	7b02c524d0	alternator: remove a pointer-based workaround for future<json> With libjsoncpp we were forced to work around the problem of non-noexcept constructors by using an intermediate unique pointer. Objects provided by rapidjson have correct noexcept specifiers, so the workaround can be dropped.	2019-09-11 18:01:04 +03:00
Piotr Sarna	cb29d6485e	alternator: migrate to rapidjson library Profiling alternator implied that JSON parsing takes up a fair amount of CPU, and as such should be optimized. libjsoncpp is a standard library for handling JSON objects, but it also proves slower than rapidjson, which is hereby used instead. The results indicated that libjsoncpp used roughly 30% of CPU for a single-shard alternator instance under stress, while rapidjson dropped that usage to 18% without optimizations. Future optimizations should include eliding object copying, string copying and perhaps experimenting with different JSON allocators.	2019-09-11 18:01:04 +03:00
Piotr Sarna	0fd1354ef9	alternator: add handling rapidjson errors in the server If a JSON parsing error is encountered, it is transformed to a validation exception and returned to the user in JSON form.	2019-09-11 18:01:04 +03:00
Piotr Sarna	7064b3a2bf	alternator: add rapidjson helper functions Migrating from libjsoncpp to rapidjson proved to be beneficial for parsing performance. As a first step, a set of helper functions is provided to ease the migration process.	2019-09-11 18:01:04 +03:00
Piotr Sarna	0b0bfc6e54	alternator: add missing namespaces to status_type error.hh file implicitly assumed that seastar:: namespace is available when it's included, which is not always the case. To remedy that, seastar::httpd namespace is used explicitly.	2019-09-11 18:01:04 +03:00
Nadav Har'El	56309db085	alternator: correct catch table-already-exists exception Our CreateTable handler assumed that the function migration_manager::announce_new_column_family() returns a failed future if the table already exists. But in some of our code branches, this is not the case - the function itself throws instead of returning a failed future. The solution is to use seastar::futurize_apply() to handle both possibilities (direct exception or future holding an exception). This fixes a failure of the test_table.py::test_create_table_already_exists test case. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 18:01:04 +03:00
Nadav Har'El	d74b203dee	alternator: add docs/alternator.md This adds a new document, docs/alternator.md, about Alternator. The scope of this document should be expanded in the future. We begin here by introducing Alternator and its current compatibility level with Amazon DynamoDB, but it should later grow to explain the design of Alternator and how it maps the DynamoDB data model onto Scylla's. Whether this document should remain a short high-level overview, or a long and detailed design document, remains an open question. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190805085340.17543-1-nyh@scylladb.com>	2019-09-11 18:01:04 +03:00
Piotr Sarna	75ee13e5f2	dependencies: add rapidjson The rapidjson fast JSON parsing library is used instead of libjsoncpp in the Alternator subproject. [avi: update toolchain image to include the new dependency] Message-Id: <a48104dec97c190e3762f927973a08a74fb0c773.1564995712.git.sarna@scylladb.com>	2019-09-11 18:00:44 +03:00
Nadav Har'El	5eaf73a292	alternator: fix sharing of a seastar::shared_ptr between threads The function attrs_type() return a supposedly singleton, but because it is a seastar::shared_ptr we can't use the same one for multiple threads, and need to use a separate one per thread. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190804163933.13772-1-nyh@scylladb.com>	2019-09-11 16:06:05 +03:00
Nadav Har'El	1b1ede9288	alternator: fix cross-shard use of CQL type objects The CQL type singletons like utf8_type et al. are separate for separate shards and cannot be used across shards. So whatever hash tables we use to find them, also needs to be per-shard. If we fail to do this, we get errors running the debug build with multiple shards. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190804165904.14204-1-nyh@scylladb.com>	2019-09-11 16:05:39 +03:00
Nadav Har'El	7eae889513	alternator-test: some more GSI tests Expand the GSI test suite. The most important new test is test_gsi_key_not_in_index(), where the index's key includes just one of the base table's key columns, but not a second one. In this case, the Scylla implementation will nevertheless need to add the second key column to the view (as a clustering key), even though it isn't considered a key column by the DynamoDB API. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190718085606.7763-1-nyh@scylladb.com>	2019-09-11 16:05:38 +03:00
Nadav Har'El	10ad60f7de	alternator: ListTables should not list materialized views Our ListTables implementation uses get_column_families(), which lists both base tables and materialized views. We will use materialized views to implement DynamoDB's secondary indexes, and those should not be listed in the results of ListTables. The patch also includes a test for this. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190717133103.26321-2-nyh@scylladb.com>	2019-09-11 16:04:29 +03:00
Nadav Har'El	676ada4576	alternator-test: move list_tables to util.py The list_tables() utility function was used only in test_table.py but I want to use it elsewhere too (in GSI test) so let's move it to util.py. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190717133103.26321-1-nyh@scylladb.com>	2019-09-11 16:04:28 +03:00
Piotr Sarna	f3963865f5	alternator: make set_sum exception more user-friendly As in case of set_diff, an exception message in set_sum should include the user-provided request (ADD) rather than our internal helper function set_sum.	2019-09-11 16:03:27 +03:00
Piotr Sarna	9dd8644e4a	alternator-tests: enable DELETE case for sets UpdateExpression's case for DELETE operation for sets is enabled.	2019-09-11 16:03:26 +03:00
Piotr Sarna	2b215b159c	alternator: implement set DELETE UpdateExpression's DELETE operation for set is implemented on top of set_diff helper function.	2019-09-11 16:02:25 +03:00
Piotr Sarna	fe72a6740c	alternator: add set difference helper function A function for computing set differene of two sets represented as JSON is added.	2019-09-11 16:01:03 +03:00
Nadav Har'El	e13c56be0b	alternator: fail attempt to create table with GSI Although we do not support GSI yet, until now we silently ignored CreateTable's GSI parameter, and the user wouldn't know the table wasn't created as intended. In this patch, GSI is still unsupported, but now CreateTable will fail with an error message that GSI is not supported. We need to change some of the tests which test the error path, and expect an error - but should not consider a table creation error as the expected error. After this patch, test_gsi.py still fails all the tests on Alternator, but much more quickly :-) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190711161420.18547-1-nyh@scylladb.com>	2019-09-11 16:00:01 +03:00
Piotr Sarna	336c90daaa	alternator-test: add stub case for set add duplication The test case for adding two sets with common values is added. This case is a stub, because boto3 transforms the result into a Python set, which removes duplicates on its own. A proper TODO is left in order to migrate this case to a lower-level API and check the returned JSON directly for lack of duplicates.	2019-09-11 16:00:00 +03:00
Piotr Sarna	67c95cb303	alternator-test: enable tests for ADD operation Tests for UpdateExpression::ADD are enabled.	2019-09-11 15:59:59 +03:00
Piotr Sarna	f29c2f6895	alternator: add ADD operation UpdateExpression is now able to perform ADD operation on both numbers and sets.	2019-09-11 15:59:00 +03:00
Piotr Sarna	a5f2926056	alternator: add helper function for adding sets A helper function that allows creating a set sum out of two sets represented in JSON is added.	2019-09-11 15:57:41 +03:00
Piotr Sarna	18686ff288	alternator: add unwrap_set It will be needed later to implement adding sets.	2019-09-11 15:56:15 +03:00
Piotr Sarna	09993cf857	alternator: add get_item_type_string helper function It will be useful later for ensuring that parameters for various functions have matching types.	2019-09-11 15:52:31 +03:00
Nadav Har'El	d54c82209c	alternator: fix Query verification of appropriate key columns The Query operation's conditions can be used to search for a particular hash key or both hash and sort keys - but not any other combinations. We previously forgot to verify most errors, so in this patch we add missing verifications - and tests to confirm we fail the query when DynamoDB does. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190711132720.17248-1-nyh@scylladb.com>	2019-09-11 15:51:27 +03:00
Nadav Har'El	fbe63ddcc4	alternator-test: more GSI tests Add more tests for GSI - tests that DescribeTable describes the GSI, and test the case of more than one GSI for a base table. Unfortunately, creating an empty table with two GSIs routinely takes on DynamoDB more than a full minute (!), so because we now have a test with two GSIs, I had to increase the timeout in create_test_table(). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190711112911.14703-1-nyh@scylladb.com>	2019-09-11 15:51:26 +03:00
Piotr Sarna	a3be9dda7f	alternator-test: enable if_not_exists-related tests Test cases that relied on the implementation of if_not_exists are enabled.	2019-09-11 15:51:25 +03:00
Piotr Sarna	cec82490d2	alternator: implement if_not_exists The if_not_exists function is implemented on the basis of recently added read-before write mechanism.	2019-09-11 15:50:22 +03:00
Piotr Sarna	b14e3c0e72	alternator: rename holds_path to a more generic name The holds_path() utility function is actually used to check if a value needs read before write, so its name is changed to more fitting check_needs_read_before_write.	2019-09-11 15:49:19 +03:00
Nadav Har'El	5fc7b0507e	alternator: fix bug in collection mutations Alternator currently keeps an item's attributes inside a map, and we had a serious bug in the way we build mutations for this map: We didn't know there was a requirement to build this mutation sorted by the attribute's name. When we neglect to do this sorting, this confuses Scylla's merging algorithms, which assume collection cells are thus sorted, and the result can be duplicate cells in a collection, and the visible effect is a mutation that seems to be ignored - because both old and new values exist in the collection. So this patch includes a new helper class, "attribute_collector", which helps collect attribute updates (put and del) and extract them in correctly sorted order. This helper class also eliminates some duplication of arcane code to create collection cells or deletions of collection cells. This patch includes a simple test that previously failed, and one xfail test that failed just because of this bug (this was the test that exposed this bug). Both tests now succeed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190709160858.6316-1-nyh@scylladb.com>	2019-09-11 15:48:18 +03:00
Nadav Har'El	5cce53fed9	alternator-test: exhaustive tests for GSI This patch adds what is hopefully an exhaustive test suite for the global secondary indexing (GSI) feature, and all its various complications and corner cases of how GSIs can be created, deleted, named, written, read, and more (the tests are heavily documented to explain what they are testing). All these tests pass on DynamoDB, and fail on Alternator, so they are marked "xfail". As we develop the GSI feature in Alternator piece by piece, we should make these tests start to pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708160145.13865-1-nyh@scylladb.com>	2019-09-11 15:48:17 +03:00
Nadav Har'El	9eea90d30d	alternator-test: another test for BatchWriteItem This adds another test for BatchWriteItem: That if one of the operations is invalid - e.g., has a wrong key type - the entire batch is rejected, and not none of its operations are done - even the valid ones. The test succeeds, because we already handle this case correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190707134610.30613-1-nyh@scylladb.com>	2019-09-11 15:48:16 +03:00
Nadav Har'El	01f4cf1373	alternator-test: test UpdateItem's SET with #reference Test an operation like SET #one = #two, where the RHS has a reference to a name, rather than the name itself. Also verify that DynamoDB gives an error if ExpressionAttributeNames includes names not needed by neither left or right hand side of such assignments. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708133311.11843-1-nyh@scylladb.com>	2019-09-11 15:48:15 +03:00
Piotr Sarna	e482f27e2f	alternator-test: add test for reading key before write The test case checks if reading keys in order to use their values in read-before-write updates works fine.	2019-09-11 15:48:14 +03:00
Piotr Sarna	7b605d5bec	alternator-test: add test case for nested read-before-write A test for read-before-write in nested paths (inside a function call or inside a +/- operator) is added.	2019-09-11 15:48:13 +03:00
Piotr Sarna	da795d8733	alternator-test: enable basic read-before-write cases With unsafe read-before-write implemented, simple cases can be enabled by removing their xfail flag.	2019-09-11 15:48:12 +03:00
Piotr Sarna	2e473b901a	alternator: fix indentation	2019-09-11 15:48:09 +03:00
Piotr Sarna	bf13564a9d	alternator: add unsafe read-before-write to update_item In order to serve update requests that depend on read-before-write, a proper helper function which fetches the existing item with a given key from the database is added. This read-before-write mechanism is not considered safe, because it provides no linearizability guarantees and offers no synchronization protection. As such, it should be consider a placeholder that works fine on a single machine and/or no concurrent access to the same key.	2019-09-11 15:45:21 +03:00
Piotr Sarna	2fb711a438	alternator: add context parameters to calculate_value The calculate_value utility function is going to need more context in order to resolve paths present in the right-hand side of update_item operators: update_info and schema.	2019-09-11 15:40:17 +03:00
Piotr Sarna	cbe1836883	alternator: add allowing key columns when resolving path Historically, resolving a path checked for key columns, which are not allowed to be on the left-hand side of the assignment. However, path resolving will now also be used for right-hand side, where it should be allowed to use the key value.	2019-09-11 15:39:15 +03:00
Piotr Sarna	20a6077fb3	alternator: add optional previous item to calculate_value In order to implement read-before-write in the future, calculate_value now accepts an additional parameter: previous_item. If read-before-write was performed, previous_item will contain an item for the given key which already exists in the database at the time of the update.	2019-09-11 15:38:13 +03:00
Piotr Sarna	784aaaa8ff	alternator: move describe_item implementation up It will be needed later to add read-before-write to update_item.	2019-09-11 15:37:13 +03:00
Nadav Har'El	bd4dfa3724	alternator-test: move create_test_table() to util.py This patch moves the create_test_table() utility function, which creates a test table with a unique name, from the fixtures (conftest.py) to util.py. This will allow reusing this function in tests which need to create tables but not through the existing fixtures. In particular we will need to do this for GSI (global secondary index) tests in the next patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708104438.5830-1-nyh@scylladb.com>	2019-09-11 15:37:12 +03:00
Nadav Har'El	ce13a0538c	alternator-test: expand tests of duplicate items in BatchWriteItem The tests we had for BatchWriteItem's refusal to accept duplicate keys only used test_table_s, with just a hash key. This patch adds tests for test_table, i.e., a table with both hash and sort keys - to check that we check duplicates in that case correctly as well. Moreover, the expanded tests also verify that although identical keys are not allowed, keys with just one component (hash or sort key) the same but the other not the same - are fine. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190705191737.22235-1-nyh@scylladb.com>	2019-09-11 15:37:11 +03:00
Nadav Har'El	9bc2685a92	alternator-test: run local tests without configuring AWS Even when running against a local Alternator, Boto3 wants to know the region name, and AWS credentials, even though they aren't actually needed. For a local run, we can supply garbage values for these settings, to allow a user who never configured AWS to run tests locally. Running against "--aws" will, of course, still require the user to configure AWS. Also modified the README to be clearer, and more focused on the local runs. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708121420.7485-1-nyh@scylladb.com>	2019-09-11 15:37:10 +03:00
Nadav Har'El	cb42c75e0a	alternator-test: don't hardcode us-east-1 region For "--aws" tests, use the default region chosen by the user in the AWS configuration (~/.aws/config or environment variable), instead of hard-coding "us-east-1". Patch by Pekka Enberg. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190708105852.6313-1-nyh@scylladb.com>	2019-09-11 15:37:09 +03:00
Piotr Sarna	8f9e720f10	alternator-test: enable precision test for add With big_decimal-based implementation, the precision test passes. Message-Id: <6d631a43901a272cb9ebd349cb779c9677ce471e.1562318971.git.sarna@scylladb.com>	2019-09-11 15:37:08 +03:00
Piotr Sarna	78e495fac3	alternator: allow arithmetics without losing precision Calculating value represented as 'v1 + v2' or 'v1 - v2' was previously implemented with a double type, which offers limited precision. From now on, these computations are based on big_decimal, which allows returning values without losing precision. This patch depends on 'add big_decimal arithmetic operators' series. Message-Id: <f741017fe3d3287fa70618068bdc753bfc903e74.1562318971.git.sarna@scylladb.com>	2019-09-11 15:36:08 +03:00
Piotr Sarna	466f25b1e8	alternator-test: enable batch duplication cases With duplication checks implemented, batch write and delete tests no longer need to be marked @xfail. Message-Id: <6c5864607e06e8249101bd711dac665743f78d9f.1562325663.git.sarna@scylladb.com>	2019-09-11 15:36:07 +03:00
Piotr Sarna	eb7ada8387	alternator: add checking for duplicate keys in batches Batch writes and batch deletes do not allow multiple entries for the same key. This patch implements checking for duplicated entries and throws an error if applicable. Message-Id: <450220ba74f26a0893430cb903e4749f978dfd31.1562325663.git.sarna@scylladb.com>	2019-09-11 15:35:01 +03:00
Nadav Har'El	b810fa59c4	alternator-test: move utility functions to a new "util.py" Move some common utility functions to a common file "util.py" instead of repeating them in many test files. The utility functions include random_string(), random_bytes(), full_scan(), full_query(), and multiset() (the more general version, which also supports freezing nested dicts). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190705081013.1796-1-nyh@scylladb.com>	2019-09-11 15:35:00 +03:00
Nadav Har'El	2fb77ed9ad	alternator: use std::visit for reading std::variant The idiomatic way to use an std::variant depending the type holds is to use std::visit. This modern API makes it unnecessary to write many boiler-plate functions to test and cast the type of the variant, and makes it impossible to forget one of the options. So in this patch we throw out the old ways, and welcome the new. Thanks to Piotr Sarna for the idea. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190704205625.20300-1-nyh@scylladb.com>	2019-09-11 15:33:57 +03:00
Nadav Har'El	4d07e2b7c5	alternator: support BatchGetItem This patch adds to Alternator an implementation of the BatchGetItem operation, which allows to start a number of GetItem requests in parallel in a single request. The implementation is almost complete - the only missing feature is the ability to ask only for non-top-level attributes in ProjectionExpression. Everything else should work, and this patch also includes tests which, as usual, pass on DynamoDB and now also on Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:33:50 +03:00
Nadav Har'El	d1a5512a35	alternator: fix second boot Amazingly, it appears we never tested booting Alternator a second time :-) Our initialization code creates a new keyspace, and was supposed to ignore the error if this keyspace already existed - but we thought the error will come as an exceptional future, which it didn't - it came as a thrown exception. So we need to change handle_exception() to a try/catch. With this patch, I can kill Alternator and it will correctly start again. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:22:48 +03:00
Nadav Har'El	374162f759	alternator: generate error on spurious key columns Operations which take a key as parameter, namely GetItem, UpdateItem, DeleteItem and BatchWriteItem's DeleteRequest, already fail if the given key is missing one of the nessary key attributes, or has the wrong types for them. But they should also fail if the given key has spurious attributes beyond those actually needed in a key. So this patch adds this check, and tests to confirm that we do these checks correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:21:50 +03:00
Nadav Har'El	da4da6afbf	alternator: fix PutItem to really replace item. The PutItem operation, and also the PutRequest of BatchWriteItem, are supposed to completely replace the item - not to merge the new value with the previous value. We implemented this wrongly - we just wrote the new item forgetting a tombstone to remove the old item. So this patch fixes these operations, and adds tests which confirm the fix (as usual, these tests pass on DynamoDB, failed on Alternator before this patch, and pass after the patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:20:55 +03:00
Nadav Har'El	a0fffcebde	alternator: add support for DeleteRequest in BatchWriteItem Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:20:01 +03:00
Nadav Har'El	83b91d4b49	alternator: add DeleteItem Add support for the DeleteItem operation, which deletes an item. The basic deletion operation is supported. Still not supported are: 1. Parameters to conditionally delete (ConditionalExpression or Expected) 2. Parameters to return pre-delete content 3. ReturnItemCollectionMetrics (statistics relevant for tables with LSI) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:19:46 +03:00
Nadav Har'El	b09603ed9b	alternator: cleaner error on DeleteRequest In BatchWriteItem, we currently only support the PutRequest operation. If a user tries to use DeleteRequest (which we don't support yet), he will get a bizarre error. Let's test the request type more carefully, and print a better error message. This will also be the place where eventually we'll actually implement the DeleteRequest. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:16:02 +03:00
Nadav Har'El	a7f7ce1a73	alternator-test: tests for BatchWriteItem This patch adds more comprehensive tests for the BatchWriteItem operation, in a new file batch_test.py. The one test we already had for it was also moved from test_item.py here. Some of the test still xfail for two reasons: 1. Support for the DeleteRequest operation of BatchWriteItem is missing. 2. Tests that forbid duplicate keys in the same request are missing. As usual, all tests succeed on DynamoDB, and hopefully (I tried...) cover all the BatchWriteItem features. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:16:01 +03:00
Nadav Har'El	a8dd3044e2	alternator: support (most of) ProjectionExpression DynamoDB has two similar parameters - AttributesToGet and ProjectionExpression - which are supported by the GetItem, Scan and Query operations. Until now we supported only the older AttributesToGet, and this patch adds support to the newer ProjectionExpression. Besides having a different syntax, the main difference between AttributesToGet and ProjectionExpression is that the latter also allows fetching only a specific nested attribute, e.g., a.b[3].c. We do not support this feature yet, although it would not be hard to add it: With our current data representation, it means fetching the top-level attribute 'a', whose value is a JSON, and then post-filtering it to take out only the '.b[3].c'. We'll do that later. This patch also adds more test cases to test_projection_expression.py. All tests except three which check the nested attributes now pass, and those three xfail (they succeed on DynamoDB, and fail as expected on Alternator), reminding us what still needs to be done. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:15:01 +03:00
Nadav Har'El	98c4e646a5	alternator-test: tests for yet-unimplemented ProjectionExpression Our GetItem, Query and Scan implementations support the AttributesToGet parameter to fetch only a subset of the attributes, but we don't yet support the more elaborate ProjectionExpression parameter, which is similar but has a different syntax and also allows to specify nested document paths. This patch adds existive testing of all the ProjectionExpression features. All these tests pass against DynamoDB, but fail against the current Alternator so they are marked "xfail". These tests will be helpful for developing the ProjectionExpression feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:15:00 +03:00
Nadav Har'El	7c9e64ed81	alternator-test: more tests for AttributesToGet parameter The AttributesToGet parameter - saying which attributes to fetch for each item - is already supported in the GetItem, Query and Scan operations. However, we only had a test for it for it for Scan. This patch adds similar tests also for the GetItem and Query operations. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:14:59 +03:00
Nadav Har'El	9c53f33003	alternator-test: another test for top-level attribute overwrite Yet another test for overwriting a top-level attribute which contains a nested document - here, overwriting it by just a string. This test passes. In the current implementation we don't yet support updates to specific attribute paths (e.g. a.b[3].c) but we do support well writing and over-writing top-level attributes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:14:58 +03:00
Nadav Har'El	f6fa971e96	alternator: initial implementation of "+" and "-" in UpdateExpression This patch implements the last (finally!) syntactic feature of the UpdateExpression - the ability to do SET a=val1+val2 (where, as before, each of the values can be a reference to a value, an attribute path, or a function call). The implementation is not perfect: It adds the values as double-precision numbers, which can lose precision. So the patch adds a new test which checks that the precision isn't lost - a test that currently fails (xfail) on Alternator, but passes on DynamoDB. The pre-existing test for adding small integer now passes on Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:14:01 +03:00
Nadav Har'El	a5af962d80	alternator: support the list_append() function in UpdateExpression In the previous patch we added function-call support in the UpdateExpression parser. In this patch we add support for one such function - list_append(). This function takes two values, confirms they are lists, and concatenates them. After this patch only one function remains unimplemented: if_not_exists(). We also split the test we already had for list_append() into two tests: One uses only value references (":val") and passes after this patch. The second test also uses references to other attributes and will only work after we start supporting read-modify-write. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:13:07 +03:00
Nadav Har'El	9d2eba1c75	alternator: parse more types of values in UpdateExpression Until this patch, in update expressions like "SET a = :val", we only allowed the right-hand-side of the assignment to be a reference to a value stored in the request - like ":val" in the above example. But DynamoDB also allows the value to be an attribute path (e.g., "a.b[3].c", and can also be a function of a bunch of other values. This patch adds supports for parsing all these value types. This patch only adds the correct parsing of these additional types of values, but they are still not supported: reading existing attributes (i.e., read-modify-write operations) is still not supported, and none of the two functions which UpdateExpression needs to support are supported yet. Nevertheless, the parsing is now correct, and the the "unknown_function" test starts to pass. Note that DynamoDB allows the right-hand side of an assignment to be not only a single value, but also value+value and value-value. This possibility is not yet supported by the parser and will be added later. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:12:06 +03:00
Piotr Sarna	cb50207c7b	alternator-test: add initial filtering test for scans Currently the only supported case is equality on non-key attributes. More complex filtering tests are also included in test_query.py.	2019-09-11 15:12:05 +03:00
Piotr Sarna	b5eb3aed10	alternator-test: add initial filtering test for query The test cases verify that equality-based filtering on non-key attributes works fine. It also contains test stubs for key filtering and non-equality attribute filtering.	2019-09-11 15:12:04 +03:00
Piotr Sarna	319e946d8f	alternator-test: diversify attribute values in filled test table Filled test table used to have identical non-key attributes for all rows. These values are now diversified in order to allow writing filtering test cases.	2019-09-11 15:12:03 +03:00
Piotr Sarna	e4516617eb	alternator: add filtering to Query Query requests now accept QueryFilter parameter.	2019-09-11 15:11:10 +03:00
Piotr Sarna	4ea02bec89	alternator: enable filtering for Scan Scans can now accept ScanFilter parameter to perform filtering on returned rows.	2019-09-11 15:10:12 +03:00
Piotr Sarna	8cb078f757	alternator: add initial filtering implementation Filtering is currently only implemented for the equality operator on non-key attributes. Next steps (TODO) involve: 1. Implementing filtering for key restrictions 2. Implementing non-key attribute filtering for operators other than EQ. It, in turn, may involve introducing 'map value restrictions' notion to Scylla, since now it only allows equality restrictions on map values (alternator attributes are currently kept in a CQL map). 3. Implementing FilterExpression in addition to deprecated QueryFilter	2019-09-11 15:08:50 +03:00
Nadav Har'El	aa94e7e680	alternator: clean up parsing of attribute-path components Before this patch, we read either an attribute name like "name" or a reference to one "#name", as one type of token - NAME. However, while attribute paths indeed can use either one, in some other contexts - such as a function name - only "name" is allowed, so we need to distinguish between two types of tokens: NAME and NAMEREF. While separating those, I noticed that we incorrectly allowed a "#" followed by zero alphanumeric characters to be considered a NAMEREF, which it shouldn't. In other words, NAMEREF should have ALNUM+, not ALNUM*. Same for VALREF, which can't be just a ":" with nothing after it. So this patch fixes these mistakes, and adds tests for them. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:08:36 +03:00
Nadav Har'El	13476c8202	alternator: complain about unused values or names in UpdateExpression DynamoDB complains, and fails an update, if the update contains in ExpressionAttributeNames or ExpressionAttributeValues names which aren't used by the expression. Let's do the same, although sadly this means more work to track which of the references we've seen and which we haven't. This patch makes two previously xfail (expected fail) tests become successful tests on Alternator (they always succeeded against DynamoDB). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:07:35 +03:00
Nadav Har'El	c4fc02082b	alternator-test: complete test for UpdateItem's UpdateExpression The existing tests in test_update_expression.py thoroughly tested the UpdateExpression features which we currently support. But tests for features which Alternator doesn't yet support were partial. In this patch, we add a large number of new tests to test_update_expression.py aiming to cover ALL the features of UpdateExpression, regardless of whether we already support it in Alternator or not. Every single feature and esoteric edge-case I could discover is covered in these tests - and as far as I know these tests now cover the entire UpdateExpression feature. All the tests succeed on DynamoDB, and confirm our understanding of what DynamoDB actually does on all these cases. After this patch, test_update_expression.py is a whopper, with 752 lines of code and 37 separate test functions. 23 out of these 37 tests are still "xfail" - they succeed on DynamoDB but fail on Alternator, because of several features we are still missing. Those missing features include direct updates of nested attributes, read-modify-write updates (e.g., "SET a=b" or "SET a=a+1"), functions (e.g., "SET a = list_append(a, :val)"), the ADD and DELETE operations on sets, and various other small missing pieces. The benefit of this whopper test is two-fold: First, it will allow us to test our implementation as we continue to fill it (i.e., "test- driven development"). Second, all these tested edge cases basically "reverse engineer" how DynamoDB's expression parser is supposed to work, and we will need this knowledge to implement the still-missing features of UpdateExpression. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:07:34 +03:00
Nadav Har'El	ede5943401	alternator-test: test for UpdateItem's UpdateExpression This patch adds an extensive array of tests for UpdateItem's UpdateExpression support, which was introduced in the previous patch. The tests include verification of various edge cases of the parser, support for ":value" and "#name" references, functioning SET and REMOVE operations, combinations of multiple such operations, and much more. As usual, all these tests were ran and succeed on DynamoDB, as well as on Alternator - to confirm Alternator behaves the same as DynamoDB. There are two tests marked "xfail" (expected to fail), because Alternator still doesn't support the attribute copy syntax (e.g., "SET a = b", doing a read-before-write). There are some additional areas which we don't support - such as the DELETE and ADD operations or SET with functions - but those areas aren't yet test in these tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:07:33 +03:00
Nadav Har'El	4baa0d3b67	alternator: enable support for UpdateItem's UpdateExpression For the UpdateItem operation, so far we supported updates via the AttributeUpdates parameter, specifying which attributes to set or remove and how. But this parameter is considered deprecated, and DynamoDB supports a more elaborate way to modify attributes, via an "UpdateExpression". In the previous patch we added a function to parse such an UpdateExpression, and in this patch we use the result of this parsing to actually perform the required updates. UpdateExpression is only partially supported after this patch. The basic "SET" and "REMOVE" operations are supported, but various other cases aren't fully supported and will be fixed in followup patches. The following patch will add extensive tests to confirm exactly what works correctly with the new UpdateExpression support. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:34 +03:00
Nadav Har'El	829bafd181	alternator: add expression parsers The DynamoDB protocol is based on JSON, and most DynamoDB requests describe the operation and its parameters via JSON objects such as maps and lists. However, in some types of requests an "expression" is passed as a single string, and we need to parse this string. These cases include: 1. Attribute paths, such as "a[3].b.c", are used in projection expressions as well as inside other expressions described below. 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f", used in conditional updates, filters, and other places. 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d" This patch introduces the framework to parse these expressions, and an implementation of parsing update expressions. These update expressions will be used in the UpdateItem operation in the next patch. All these expression syntaxes are very simple: Most of them could be parsed as regular expressions, or at most a simple hand-written lexical analyzer and recursive-descent parser. Nevertheless, we decided to specify these parsers in the same ANTLR3 language already used in the Scylla project for parsing CQL, hopefully making these parsers easier to reason about, and easier to change if needed - and reducing the amount of boiler- plate code. The parsing of update expressions is most complete except that in SET actions, only the "path = value" form is supported and not yet forms forms such as "path1 = path2" (which does read-before-write) or "path1 = path1 + value" or "path = function(...)". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:12 +03:00
Nadav Har'El	f0f50607a7	alternator-test: split nested-document tests to new file We need to write more tests for various case of handling nested documents and nested attributes. Let's collect them all in the same test file. This patch mostly moves existing code, but also adds one small test, test_nested_document_attribute_write, which just writes a nested document and reads it back (it's mostly covered by the existing test_put_and_get_attribute_types, but is specifically about a nested document). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:11 +03:00
Nadav Har'El	12abe8e797	alternator-test: make local test the default We usually run Alternator tests against the local Alternator - testing against AWS DynamoDB is rarer, and usually just done when writing the test. So let's make "pytest" without parameters default to testing locally. To test against AWS, use "pytest --aws" explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:10 +03:00
Piotr Sarna	b67f22bfc6	alternator: move related functions to serialization.cc Existing functions related to serialization and deserialization are moved to serialization.cc source file. Message-Id: <fb49a08b05fdfcf7473e6a7f0ac53f6eaedc0144.1559646761.git.sarna@scylladb.com>	2019-09-11 15:06:05 +03:00
Piotr Sarna	fdba9866fc	alternator: apply new serialization to reads and writes Attributes for reads (GetItem, Query, Scan, ...) and writes (PutItem, UpdateItem, ...) are now serialized and deserialized in binary form instead of raw JSON, provided that their type is S, B, BOOL or N. Optimized serialization for the rest of the types will be introduced as follow-ups. Message-Id: <6aa9979d5db22ac42be0a835f8ed2931dae208c1.1559646761.git.sarna@scylladb.com>	2019-09-11 15:02:21 +03:00
Piotr Sarna	b3fd4b5660	alternator: add simple attribute serialization routines Attributes used to be written into the database in raw JSON format, which is far from optimal. This patch introduces more robust serializationi routines for simple alternator types: S, B, BOOL, N. Serialization uses the first byte to encode attribute type and follows with serializing data in binary form. More complex types (sets, lists, etc.) are currently still serialized in raw JSON and will be optimized in follow-up patches. Message-Id: <10955606455bbe9165affb8ac8fba4d9e7c3705f.1559646761.git.sarna@scylladb.com>	2019-09-11 15:01:07 +03:00
Piotr Sarna	27f00d1693	alternator: move error class to a separate header Error class definitions were previously in server.hh, but they are separate entities - future .cc files can use the errors without the need of including server definitions. Message-Id: <b5689e0f4c9f9183161eafff718f45dd8a61b653.1559646761.git.sarna@scylladb.com>	2019-09-11 14:52:58 +03:00
Nadav Har'El	52810d1103	configure.py: move alternator source files to separate list For some unknown reason we put the list of alternator source files in configure.py inside the "api" list. Let's move it into a separate list. We could have just put it in the scylla_core list, but that would cause frequent and annoying patch conflicts when people add alternator source files and Scylla core source files concurrently. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:52:39 +03:00
Nadav Har'El	d4b3c493ad	alternator: stub support for UpdateItem with UpdateExpression So far for UpdateItem we only supported the old-style AttributeUpdates parameter, not the newer UpdateExpression. This patch begins the path to supporting UpdateExpression. First, trying to use both parameters should result in an error, and this patch does this (and tests this). Second, passing neither parameters is allowed, and should result in an empty item being created. Finally, since today we do not yet support UpdateExpression, this patch will cause UpdateItem to fail if UpdateExpression is used, instead of silently being ignored as we did so far. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:51:40 +03:00
Nadav Har'El	04856a81f5	alternator-tests: two simple test for nested documents This patch adds two simple tests for nested documents, which pass: test_nested_document_attribute_overwrite() tests what happens when we UpdateItem a top-level attribute to a dictionary. We already tested this works on an empty item in a previous test, but now we check what happens when the attribute already existed, and already was a dictionary, and now we update it to a new dictionary. In the test attribute a was {b:3, c:4} and now we update it to {c:5}. The test verifies that the new dictionary completely replaces the old one - the two are not merged. The new value of the attribute is just {c:5}, not {b:3, c:5}. The second test verifies that the AttributeUpdates parameter of UpdateItem cannot be used to update a just a nested attributes. Any dots in the attribute name are considered an actual dot - not part of a path of attribute names. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:51:39 +03:00
Nadav Har'El	b782d1ef8d	alternator-test: test_query.py: change item list comparison Comparing two lists of items without regard for order is not trivial. For this reason some tests in test_query.py only compare arrays of sort keys, and those tests are fine. But other tests used a trick of converting a list of items into a of set_of_frozen_elements() and compare this sets. This trick is almost correct, but it can miss cases where items repeat. So in this patch, we replace the set_of_frozen_elements() approach by a similar one using a multiset (set with repetitions) instead of a set. A multiset in Python is "collections.Counter". This is the same approach we started to also used in test_scan.py in a recent patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:51:38 +03:00
Nadav Har'El	15f47a351e	alternator: remove unused code Remove the incomplete and unused function to convert DynamoDB type names to ScyllaDB type objects: DynamoDB has a different set of types relevant for keys and for attributes. We already have a separate function, parse_key_type(), for parsing key types, and for attributes - we don't currently parse the type names at all (we just save them as JSON strings), so the function we removed here wasn't used, and was in fact #if'ed out. It was never completed, and it now started to decay (the type for numbers is wrong), so we're better off completely removing it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:50:44 +03:00
Nadav Har'El	b63bd037ea	alternator: implement correct "number" type for keys This patch implements a fully working number type for keys, and now Alternator fully and correctly supports every key type - strings, byte arrays, and numbers. The patch also adds a test which verifies that Scylla correctly sorts number sort keys, and also correctly retrieves them to the full precision guaranteed by DynamoDB (38 decimal digits). The implementation uses Scylla's "decimal" type, which supports arbitrary precision decimal floating point, and in particular supports the precision specified by DynamoDB. However, "decimal" is actually over-qualified for this use, so might not be optimal for the more specific requirements of DynamoDB. So a FIXME is left to optimize this case in the future. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:47 +03:00
Nadav Har'El	cb1b2b1fc2	alternator-test: test_scan.py: change item list comparison Comparing two lists of items without regard for order is not trivial. test_scan.py currently has two ways of doing this, both unsatisfactory: 1. We convert each list to a set via set_of_frozen_elements(), and compare the sets. But this comparison can miss cases where items repeat. 2. We use sorted() on the list. This doesn't work on Python 3 because it removed the ability to compare (with "<") dictionaries. So in this patch, we replace both by a new approach, similar to the first one except we use a multiset (set with repetitions) instead of a set. A multiset in Python is "collections.Counter". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:46 +03:00
Nadav Har'El	4a1b6bf728	alternator-test: drop "test_2_tables" fixture Creating and deleting tables is the slowest part of our tests, so we should lower the number of tables our tests create. We had a "test_2_tables" fixture as a way to create two tables, but since our tests already create other tables for testing different key types, it's faster to reuse those tables - instead of creating two more unused tables. On my system, a "pytest --local", running all 38 tests locally, drops from 25 seconds to 20 seconds. As a bonus, we also have one fewer fixture ;-) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:45 +03:00
Nadav Har'El	013fb1ae38	alternator-text: fix errors in len/length variable name Also change "xrage" to "range" to appease Python 3 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:44 +03:00
Nadav Har'El	30a123d8ad	DynamoDB limits the size of hash keys to 2048 bytes, sort keys to 1024 bytes, and the entire item to 400 KB which therefore also limits the size of one attribute. This test checks that we can reach up to these limits, with binary keys and attributes. The test does not check what happens once we exceed these limits. In such a case, DynamoDB throws an error (I checked that manually) but Alternator currently simply succeeds. If in the future we decide to add artificial limits to Alternator as well, we should add such tests as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:43 +03:00
Nadav Har'El	b91eca28bd	alternator-test: don't use "len" as a parameter name "len" is an unfortunate choice for a variable name, in case one day the implementation may want to call the built-in "len" function. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:42 +03:00
Nadav Har'El	e21e0e6a37	alternator-test: test sort-key ordering - for both string and binary keys We already have a test for string sort-key ordering of items returned by the Scan operation, and this test adds a similar test for the Query operation. We verify that items are retrieved in the desired sorted order (sorted by the aptly-named sort key) and not in creation order or any other wrong order. But beyond just checking that Query works as expected (it should, given it uses the same machinary as Scan), the nice thing about this test is that it doesn't create a new table - it uses a shared table and creates one random partition inside it. This makes this test faster and easier to write (no need for a new fixture), and most importantly - easily allows us to write similar tests for other key types. So this patch also tests the correct ordering of binary sort keys. It helped exposed bugs in previous versions of the binary key implementation. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:41 +03:00
Nadav Har'El	1d058cf753	alternator-test: test item operations with binary keys Simple tests for item operations (PutItem, GetItem) with binary key instead of string for the hash and sort keys. We need to be able to store such keys, and then retrieve them correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:40 +03:00
Nadav Har'El	4bfd5d7ed1	alternator: add support for bytes as key columns Until now we only supported string for key columns (hash or sort key). This patch adds support for the bytes type (a.k.a binary or blob) as well. The last missing type to be supported in keys is the number type. Note that in JSON, bytes values are represented with base64 encoding, so we need to decode them before storing the decoded value, and re-encode when the user retrieves the value. The decoding is important not just for saving storage space (the encoding is 4/3 the size of the decoded) but also for correct sorting of the binary keys. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:49:35 +03:00
Nadav Har'El	57b46a92d7	alternator: add base64 encoding and decoding functions The DynamoDB API uses base64 encoding to encode binary blobs as JSON strings. So we need functions to do these conversions. This code was "inspired" by https://github.com/ReneNyffenegger/cpp-base64 but doesn't actually copy code from it. I didn't write any specific unit tests for this code, but it will be exercised and tested in a following patch which tests Alternator's use of these functions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:46:13 +03:00
Piotr Sarna	0980fde9d5	alternator-test: add dedicated BEGINS_WITH case to Query BEGINS_WITH behaves in a special way when a key postfix consists of <255> bytes. The initial test does not use that and instead checks UTF-8 characters, but once bytes type is implemented for keys, it should also test specifically for corner cases, like strings that consist of <255> byte only. Message-Id: <fe10d7addc1c9d095f7a06f908701bb2990ce6fe.1558603189.git.sarna@scylladb.com>	2019-09-11 14:46:12 +03:00
Piotr Sarna	5bc7bb00e0	alternator-test: rename test_query_with_paginator Paginator is an implementation detail and does not belong in the name, and thus the test is renamed to test_query_basic_restrictions. Message-Id: <849bc9d210d0faee4bb8479306654f2a59e18517.1558524028.git.sarna@scylladb.com>	2019-09-11 14:46:11 +03:00
Piotr Sarna	9e2ecf5188	alternator: fix string increment for BEGINS_WITH BEGINS_WITH statement increments a string in order to compute the upper bound for a clustering range of a query. Unfortunately, previous implementation was not correct, as it appended a <0> byte if the last character was <255>, instead of incrementing a last-but-one character. If the string contains <255> bytes only, the upper bound of the returned upper bound is infinite. Message-Id: <3a569f08f61fca66cc4f5d9e09a7188f6daad578.1558524028.git.sarna@scylladb.com>	2019-09-11 14:45:17 +03:00
Nadav Har'El	7b9180cd99	alternator: common get_read_consistency() function We had several places in the code that need to parse the ConsistentRead flag in the request. Let's add a function that does this, and while at it, checks for more error cases and also returns LOCAL_QUORUM and LOCAL_ONE instead of QUORUM and ONE. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:44:24 +03:00
Nadav Har'El	56907bf6c6	alternator: for writes, use LOCAL_QUORUM instead of QUORUM As Shlomi suggested in the past, it is more likely that when we eventually support global tables, we will use LOCAL_QUORUM, not QUORUM. So let's switch to that now. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:44:20 +03:00
Nadav Har'El	8c347cc786	alternator-test: verify that table with only hash key also works So far, all of the tests in test_item.py (for PutItem, GetItem, UpdateItem), were arbitrarily done on a test table with both hash key and sort key (both with string type). While this covers most of the code paths, we still need to verify that the case where there is not a sort key, also works fine. E.g., maybe we have a bug where a missing clustering key is handled incorrectly or an error is incorrectly reported in that case? But in this patch we add tests for the hash-key-only case, and see that it already works correctly. No bug :-) We add a new fixture test_table_s for creating a test table with just a single string key. Later we'll probably add more of these test tables for additional key types. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:41:16 +03:00
Nadav Har'El	c53b2ebe4d	alternator-test: also test for missing part of key Another type of key type error can be to forget part of the key (the hash or sort key). Let's test that too (it already works correctly, no need to patch the code). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:41:15 +03:00
Nadav Har'El	f58abb76d6	alternator: gracefully handle wrong key types When a table has a hash key or sort key of a certain type (this can be string, bytes, or number), one cannot try to choose an item using values of different types. We previously did not handle this case gracefully, and PutItem handled it particularly bad - writing malformed data to the sstable and basically hanging Scylla. In this patch we fix the pk_from_json() and ck_from_json() functions to verify the expected type, and fail gracefully if the user sent the wrong type. This patch also adds tests for these failures, for the GetItem, PutItem, and UpdateItem operations. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:40:23 +03:00
Nadav Har'El	9ee912d5cf	alternator: correct handling of missing item in GetItem According to the documentation, trying to GetItem a non-existant item should result in an empty response - NOT a response with an empty "Item" map as we do before this patch. This patch fixes this case, and adds a test case for it. As usual, we verify that the test case also works on Amazon DynamoDB, to verify DynamoDB really behaves the way we thik it does. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:39:32 +03:00
Nadav Har'El	7f73f561d5	alternator: fix support for empty items If an empty item (i.e., no attributes except the key) is created, or an item becomes empty (by deleting its existing attributes), the empty item must be maintained - it cannot just disappear. To do this in Scylla, we must add a row marker - otherwise an empty attribute map is not enough to keep the row alive. This patch includes 4 test cases for all the various ways an empty item can be created empty or non-empty item be emptied, and verifies that the empty item can be correctly retrieved (as usual, to verify that our expectation of "correctness" is indeed correct, we run the same tests against DynamoDB). All these 4 tests failed before this patch, and now succeed. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:38:40 +03:00
Nadav Har'El	95ed2f7de8	alternator: remove two unused lines of code These lines of codes were superfluous and their result unused: the make_item_mutation() function finds the pk and ck on its own. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:37:49 +03:00
Nadav Har'El	eb81b31132	alternator: add statistics his patch adds a statistics framework to Alternator: Executor has (for each shard) a _stats object which contains counters for various events, and also is in charge of making these counters visible via Scylla's regular metrics API (http://localhost:9180/metrics). This patch includes a counter for each of DynamoDB's operation types, and we increase the ones we support when handled. We also added counters for total operations and unsupported operations (operation types we don't yet handle). In the future we can easily add many more counters: Define the counter in stats.hh, export it in stats.cc, and increment it in where relevant in executor.cc (or server.cc). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:36:26 +03:00
Piotr Sarna	d267e914ad	alternator-test: add initial Query test The test covers simple restrictions on primary keys. Message-Id: <2a7119d380a9f8572210571c565feb8168d43001.1558356119.git.sarna@scylladb.com>	2019-09-11 14:36:25 +03:00
Piotr Sarna	b309c9d54b	alternator: implement basic Query The implementation covers the following restrictions - equality for hash key; - equality, <, <=, >, >=, between, begins_with for sort key. Message-Id: <021989f6d0803674cbd727f9b8b3815433ceeea5.1558356119.git.sarna@scylladb.com>	2019-09-11 14:36:16 +03:00
Piotr Sarna	8571046d3e	alternator: move do_query to separate function A fair portion of code from scan() will be used later to implement query(), so it's extracted as a helper function. Message-Id: <d3bc163a1cb2032402768fcbc6a447192fba52a4.1558356119.git.sarna@scylladb.com>	2019-09-11 14:31:31 +03:00
Nadav Har'El	4a8b2c794d	alternator-test: another edge case for Scan with AttributesToGet Ask to retrieve only an attribute name which none of the items have. The result should be a silly list of empty items, and indeed it is. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:31:30 +03:00
Nadav Har'El	c766d1153d	alternator-test: shorten test_scan.py by reusing full_scan more Use full_scan() in another test instead of open-coding the scan. There are two more tests that could have used full_scan(), but since they seem to be specifically adding more assertions or using a different API ("paginators"), I decided to leave them as-is. But new tests should use full_scan(). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:31:29 +03:00
Nadav Har'El	2666b29c77	alternator-test: test AttributesToGet parameter in Scan request This is a short, but extensive, test to the AttributesToGet parameter to Scan, allowing to select for output only some of the attributes. The AttributesToGet feature has several non-obvious features. Firstly, it doesn't require that any key attributes be selected. So since each item may have different non-key attributes, some scanned items may be missing some of the selected columns, and some of the items may even be missing all the selected columns - in which case DynamoDB returns an empty item (and doesn't entirely skip this item). This test covers all these cases, and it adds yet another item to the 'filled_test_table' fixture, one which has different attributes, so we can see these issues. As usual, this test passes in both DynamoDB and Alternator, to assure we correspond to the right behavior, not just what we think is right. This test actually exposed a bug in the way our code returned empty items (items which had none of the selected columns), a bug which was fixed by the previous patch. Instead of having yet another copy of table-scanning code, this patch adds a utility function full_scan(), to scan an entire table (with optional extra parameters for the scan) and return the result as an array. We should simply existing tests in test_scan.py by using this new function. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:31:28 +03:00
Avi Kivity	446faba49c	Merge "dbuild: add --image option, help, and usage" from Benny * tag 'dbuild-image-help-usage-v1' of github.com:bhalevy/scylla: dbuild: add usage dbuild: add help option dbuild: list available images when no image arg is given dbuild: add --image option	2019-09-11 14:30:45 +03:00
Nadav Har'El	f871a4bc87	alternator: fix bug in returning an empty item in a Scan When a Scan selects only certain attributes, and none of the key attributes are selected, for some of the scanned items nothing will remain to be output, but still Dynamo outputs an empty item in this case. Our code had a bug where after each item we "moved" the object leaving behind a null object, not an empty map, so a completely empty item wasn't output as an empty map as expected, and resulted in boto3 failing to parse the response. This simple one-line patch fixes the bug, by resetting the item to an empty map after moving it out. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:30:37 +03:00
Piotr Sarna	8525b14271	alternator: add lookup table for requests Instead of using a really long if-else chain, requests are now looked up via a routing table. Message-Id: <746a34b754c3070aa9cbeaf98a6e7c6781aaee65.1557914794.git.sarna@scylladb.com>	2019-09-11 14:29:59 +03:00
Piotr Sarna	f3440f2e4a	alternator-test: migrate filled_test_table to use batches Filled test table fixture now takes advantage of batch writes in order to run faster. Message-Id: <e299cdffa9131d36465481ca3246199502d65e0c.1557914382.git.sarna@scylladb.com>	2019-09-11 14:29:58 +03:00
Piotr Sarna	4c3bdd3021	alternator-test: add batch writing test case Message-Id: <a950799dd6d31db429353d9220b63aa96676a7a7.1557914382.git.sarna@scylladb.com>	2019-09-11 14:29:57 +03:00
Piotr Sarna	c0ecd1a334	alternator: add basic BatchWriteItem The initial implementation only supports PutRequest requests, without serving DeleteRequest properly. Message-Id: <451bcbed61f7eb2307ff5722de33c2e883563643.1557914382.git.sarna@scylladb.com>	2019-09-11 14:29:50 +03:00
Nadav Har'El	9a0c13913d	alternator: improve where DescribeEndpoints gets its information Instead of blindly returning "localhost:8000" in response to DescribeEndpoints and for sure causing us problems in the future, the right thing to do is to return the same domain name which the user originally used to get to us, be it "localhost:8000" or "some.domain.name:1234". But how can we know what this domain name was? Easy - this is why HTTP 1.1 added a mandatory "Host:" header, and the DynamoDB driver I tested (boto3) adds it as expected, indeed with the expected value of "localhost:8000" on my local setup. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:25:22 +03:00
Nadav Har'El	a4a3b2fe43	alternator-test: test for sort order of items in a single partition Although different partitions are returned by a Scan in (seemingly) random order, items in a single partition need to be returned sorted by their sort key. This adds a test to verify this. This patch adds to the filled_test_table fixture, which until now had just one item in each partition, another partition (with the key "long") with 164 additional items. The test_scan_sort_order_string test then scans this table, and verifies that the items are really returned in sorted order. The sort order is, of course, string order. So we have the first item with sort key "1", then "10", then "100", then "101", "102", etc. When we implement numeric keys we'll need to add a version of this test which uses a numeric clustering key and verifies the sort order is numeric. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:25:21 +03:00
Nadav Har'El	32c388b48c	alternator: fix clustering key setup Because of a typo, we incorrectly set the table's sort key as a second partition key column instead of a clustering key column. This has bad but subtle consequences - such as that the items are not sorted according to the sort key. So in this patch we fix the typo. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:24:30 +03:00
Nadav Har'El	29e0f68ee0	alternator: add initial implementation of DescribeEndpoints DescribeEndpoints is not a very important API (and by default, clients don't use it) but I wanted to understand how DynamoDB responds to it, and what better way than to write a test :-) And then, if we already have a test, let's implement this request in Scylla as well. This is a silly implementation, which always returns "localhost:8000". In the future, this will need to be configurable - we're not supposed here to return this server's IP address, but rather a domain name which can be used to get to all servers. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:22:47 +03:00
Avi Kivity	211b0d3eb4	Merge "sstables, gdb: Centralize tracking of sstable instances" from Tomasz " Currently, GDB scripts locate sstables by scanning the heap for bag_sstable_set containers. That has disadvatanges: - not all containers are considered - it's extremely slow on large heaps - fragile, new containers can be added, and we won't even know This series fixes all above by adding a per-shard sstable tracker which tracks sstable objects in a linked-list. " * 'sstable-tracker' of github.com:tgrabiec/scylla: gdb: Use sstable tracker to get the list of sstables gdb: Make intrusive_list recognize member_hook links sstables: Track whether sstable was already open or not sstables: Track all instances of sstable objects sstables: Make sstable object not movable sstables: Move constructor out of line	2019-09-11 14:22:41 +03:00
Nadav Har'El	982b5e60e7	alternator: unify and improve TableName field handling Most of the request types need to a TableName parameter, specifying the name of the table they operate on. There's a lot of boilerplate code required to get this table name and verify that it is valid (the parameter exists, is a string, passes DynamoDB's naming rules, and the table actually exists), which resulted in a lot of code duplication - and in some cases missing checks. So this patch introduces two utility functions, get_table_name() and get_table(), to fetch a table name or the schema of an existing table, from the request, with all necessary validation. If validation fails, the appropriate api_error() is thrown so the user gets the right error message. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:21:53 +03:00
Nadav Har'El	b8fc783171	alternator-test: clean up conftest.py Remove unused random-string code from conftest.py, and also add a TODO comment how we should speed up filled_test_table fixture by using a batch write - when that becomes available in Alternator. (right now this fixture takes almost 4 seconds to prepare on a local Alternator, and a whopping 3 minutes (!) to prepare on DynamoDB). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:21:52 +03:00
Piotr Sarna	a4387079ac	alternator-test: add initial scan test Message-Id: <c28ff1d38930527b299fe34e9295ecd25607398c.1557757402.git.sarna@scylladb.com>	2019-09-11 14:21:51 +03:00
Piotr Sarna	b6d148c9e0	alternator-test: add filled test table fixture The fixture creates a test table and fills it with random data, which can be later used for testing reads. Message-Id: <649a8b8928e1899c5cbd82d65d745a464c1163c8.1557757402.git.sarna@scylladb.com>	2019-09-11 14:21:50 +03:00
Piotr Sarna	4def674731	alternator: implement basic scan The most basic version of Scan request is implemented. It still contains a list of TODOs, among which the support for Segments parameter for scan parallelism. Message-Id: <5d1bfc086dbbe64b3674b0053e58a0439e64909b.1557757402.git.sarna@scylladb.com>	2019-09-11 14:21:39 +03:00
Piotr Sarna	0ce3866fb5	alternator: lower debug messages verbosity in the HTTP server The HTTP server still uses WARN log level to log debug messages, which is way higher than necessary. These messages are degraded to TRACE level. Message-Id: <59559277f2548d4046001bebff45ab2d3b7063b5.1557744617.git.sarna@scylladb.com>	2019-09-11 14:12:40 +03:00
Nadav Har'El	d45220fb39	alternator-test: simplify test_put_and_get_attribute_types The test test_put_and_get_attribute_types needlessly named all the different attributes and their variables, causing a lot of repetition and chance for mistakes when adding additional attributes to the test. In this rewrite, we only have a list of items, and automatically build attributes with them as values (using sequential names for the attributes) and check we read back the same item (Python's dict equality operator checks the equality recursively, as expected). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:12:39 +03:00
Nadav Har'El	ea32841dab	alternator-test: test all attribute types Although we planned to initially support only string types, it turns out for the attributes (not the key), we actually support all types already, including all scalar types (string, number, bool, binary and null) and more complex types (list, nested document, and sets). This adds a tests which PutItem's these types and verifies that we can retrieve them. Note that this test deals with top-level attributes only. There is no attempt to modify only a nested attribute (and with the current code, it wouldn't work). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:12:38 +03:00
Nadav Har'El	c645538061	alternator-test: rewrite ListTables test In our tests, we cannot really assume that ListTables should returns only the tables we created for the test, or even that a page size of 100 will be enough to list our 3 pages. The issue is that on a shared DynamoDB, or in hypothetical cases where multiple tests are run in parallel, or previous tests had catestrophic errors and failed to clean up, we have no idea how many unrelated tables there are in the system. There may be hundreds of them. So every ListTables test will need to use paging. So in this re-implementation, we begin with a list_tables() utility function which calls ListTables multiple times to fetch all tables, and return the resulting list (we assume this list isn't so huge it becomes unreasonable to hold it in memory). We then use this utility function to fetch the table list with various page sizes, and check that the test tables we created are listed in the resulting list. There's no longer a separate test for "all" tables (really was a page of 100 tables) and smaller pages (1,2,3,4) - we now have just one test that does the page sizes 1,2,3,4, 50 and 100. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:12:37 +03:00
Piotr Sarna	6b83e17b74	alternator: add tests to ListTables command Test cases cover both listing appropriate table names and pagination. Message-Id: <e7d5f1e5cce10c86c47cdfb4d803149488935ec0.1557402320.git.sarna@scylladb.com>	2019-09-11 14:12:36 +03:00
Piotr Sarna	dfbf4ffe0f	alternator-test: add 2 tables fixture For some tests, more than 1 table is needed, so another fixture that provided two additional test tables is added. Message-Id: <75ae9de5cc1bca19594db1f0bc03260f83459380.1557402320.git.sarna@scylladb.com>	2019-09-11 14:12:35 +03:00
Piotr Sarna	b6dde25bcc	alternator: implement ListTables ListTables is used to extract all table names created so far. Message-Id: <04f4d804a40ff08a38125f36351e56d7426d2e3d.1557402320.git.sarna@scylladb.com>	2019-09-11 14:10:54 +03:00
Piotr Sarna	b73a9f3744	alternator: use trace level for debug messages In the early development stage, warn level was used for all debug messages, while it's more appropriate to use 'trace' or 'debug'. Message-Id: <419ca5a22bc356c6e47fce80b392403cefbee14d.1557402320.git.sarna@scylladb.com>	2019-09-11 14:10:02 +03:00
Nadav Har'El	4ed9aa4fb4	alternator-test: cleanup in conftest.py This patch cleans up some comments and reorganizes some functions in conftest.py, where the test_table fixture was defined. The goal is to later add additional types of test tables with different schemas (e.g., just a partition key, different key types, etc.) without too much code duplication. This patch doesn't change anything functional in the tests, and they still pass ("pytest --local" runs all tests against the local Alternator). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:10:01 +03:00
Nadav Har'El	5c564b7117	alternator: make ck_from_json() easier to use The ck_from_json() utility function is easier to use if it handles the no-clustering-key case as the callers need them too, instead of requiring them to handle the no-clustering-key case separately. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:09:06 +03:00
Nadav Har'El	3ae0066aae	alternator: add support for UpdateItem's DELETE operation So far we supported UpdateItem only with PUT operations - this patch adds support for DELETE operations, to delete specific attributes from an item. Only the case of a missing value is support. DynamoDB also provides the ability to pass the old value, and only perform the deletion if the value and/or its type is still up-to-date - but we don't support this yet and fail such request if it is attempted. This patch also includes a test for this case in alternator-test/ Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:08:57 +03:00
Nadav Har'El	81679d7401	alternator-test: add tests for UpdateItem Add initial tests for UpdateItem. Only the features currently supported by our code (only string attributes, only "PUT" action) are tested. As usual, this test (like all others) was tested to pass on both DynamoDB and Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:03:10 +03:00
Nadav Har'El	0c2a440f7f	alternator: add initial UpdateItem implementation Add an initial UpdateItem implementation. As PutItem and GetItem we are still limited to string attributes. This initial implementation of UpdateItem implements only the "PUT" action (not "DELETE" and certainly not "ADD") and not any of the more advanced options. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 14:03:00 +03:00
Piotr Sarna	686d1d9c3c	alternator: add attrs_column() helper function Message-Id: <d93ae70ccd27fe31d0bc6915a20d83d7a85342cf.1557223199.git.sarna@scylladb.com>	2019-09-11 13:08:52 +03:00
Piotr Sarna	6ad9b10317	alternator: make constant names more explicit KEYSPACE and ATTRS constants refer to their names, not objects, so they're named more explicitly. Message-Id: <14b1f00d625e041985efbc4cbde192bd447cbf03.1557223199.git.sarna@scylladb.com>	2019-09-11 13:07:14 +03:00
Piotr Sarna	2975ca668c	alternator: remove inaccessible return statement Message-Id: <afaef20e7e110fa23271fb8c3dc40cec0716efb6.1557223199.git.sarna@scylladb.com>	2019-09-11 13:06:21 +03:00
Piotr Sarna	6e8db5ac6a	alternator: inline keywords It was decided that all alternator-specific keywords can be inlined in code instead of defining them as constants. Message-Id: <6dffb9527cfab2a28b8b95ac0ad614c18027f679.1557223199.git.sarna@scylladb.com>	2019-09-11 13:04:38 +03:00
Nadav Har'El	50a69174b3	alternator: some cleanups in validate_table_name() Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:03:44 +03:00
Nadav Har'El	0e06d82a1f	alternator: clean up api_error() interface All operation-generated error messages should have the 400 HTTP error code. It's a real nag to have to type it every time. So make it the default. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:01:47 +03:00
Nadav Har'El	0634629a79	alternator-test: test for error on creating an already-existing table Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:01:46 +03:00
Nadav Har'El	6fe6cf0074	alternator: correct error when trying to CreateTable an existing table Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:00:54 +03:00
Nadav Har'El	871dd7b908	alternator: fix return object from PutItem Without special options, PutItem should return nothing (an empty JSON result). Previously we had trouble doing this, because instead of return an empty JSON result, we converted an empty string into JSON :-) So the existing code had an ugly workaround which worked, sort of, for the Python driver but not for the Java driver. The correct fix, in this patch, is to invent a new type json_string which is a string already in JSON and doesn't need further conversion, so we can use it to return the empty result. PutItem now works from YCSB's Java driver. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 13:00:47 +03:00
Nadav Har'El	ae1ee91f3c	alternator-test: more examples in README.md Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:56:07 +03:00
Nadav Har'El	886438784c	alternator-test: test table name limit of 222 bytes, instead of 255. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:56:06 +03:00
Nadav Har'El	28e7fa20ed	alternator: limit table names to 222 bytes Although we would like to allow table names up to 222 bytes, this is not currently possible because Scylla tacks additional 33 bytes to create a directory name, and directory names are limited to 255 bytes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:55:07 +03:00
Nadav Har'El	a702e5a727	alternator-test: verify appropriate error when invalid key type is used Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:55:06 +03:00
Nadav Har'El	8af58b0801	alternator: better key type parsing The supported key types are just S(tring), B(lob), or N(umber). Other types are valid for attributes, but not for keys, and should not be accepted. And wrong types used should result in the appropriate user-visible error. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:54:12 +03:00
Nadav Har'El	6cdcf5abac	alternator-test: additional cases of invalid schemas in CreateTable Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:54:11 +03:00
Nadav Har'El	9839183157	alternator: better invalid schema detection for CreateTable To be correct, CreateTable's input parsing need to work in reverse from what it did: First, the key columns are listed in KeySchema, and then each of these (and potetially more, e.g., from indexes) need to appear AttributeDefinitions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:53:22 +03:00
Nadav Har'El	8bfbc1bae5	alternator-test: tests for CreateTable with bad schema Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:53:21 +03:00
Benny Halevy	0f01a4c1b8	dbuild: add usage Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-11 12:53:02 +03:00
Benny Halevy	f43bffdf9c	dbuild: add help option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-11 12:52:50 +03:00
Nadav Har'El	dc34c92899	alternator: better error handling for schema errors in CreateTable Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:52:31 +03:00
Nadav Har'El	77de0af40f	alternator-test: test for PutItem to nonexistant table We expect to see the right error code, not some "internal error". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:52:30 +03:00
Nadav Har'El	ca3553c880	alternator: PutItem: appropriate error for a non-existant table Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:51:38 +03:00
Nadav Har'El	275a07cf10	alternator-test: add another column to test_basic_string_put_and_get() Just to make sure our success isn't limited to just a single non-key attribute, let's add another one. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:51:37 +03:00
Nadav Har'El	6ca72b5fed	alternator: GetItem should by default returns all the columns, not none The test pytest --local test_item.py::test_basic_string_put_and_get Now passes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:51:31 +03:00
Benny Halevy	c840c43fa7	dbuild: list available images when no image arg is given Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-11 12:51:26 +03:00
Nadav Har'El	9920143fb7	alternator: change empty return of PutItem Without any arguments, PutItem should return no data at all. But somehow, for reasons I don't understand, the boto3 driver gets confused from an empty JSON thinking it isn't JSON at all. If we return a structure with an empty "attributes" fields, boto3 is happy. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:49:20 +03:00
Nadav Har'El	8dec31d23b	alternator: add initial implementation of DeleteTable Add an initial implementation of Delete table, enough for making the pytest --local test_table.py::test_create_and_delete_table Pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:45:42 +03:00
Nadav Har'El	41d4b88e78	alternator: on unknown operation, return standard API error When given an unknown operation (we didn't implement yet many of them...) we should throw the appropriate api_error, not some random exception. This allows the client to understand the operation is not supported and stop retrying - instead of retrying thinking this was a weird internal error. For example the test pytest --local test_table.py::test_create_and_delete_table Now fails immediately, saying Unsupported operation DeleteTable. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:45:04 +03:00
Nadav Har'El	1b1921bc94	alternator: fix JSON in DescribeTable response The structure's name in DescribeTable's output is supposed to be called "Table", not "TableDescription". Putting in the wrong place caused the driver's table creation waiters to fail. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:44:14 +03:00
Nadav Har'El	6a455035ba	alternator: validate table name in CreateTable validate table name in CreateTable, and if it doesn't fit DynamoDB's requirement, return the appropriate error as drivers expect. With this patch, test_table.py::test_create_table_unsupported_names now passes (albeit with a one minute pause - this a bug with keep-alive support...). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:24 +03:00
Nadav Har'El	0da214c2fe	alternator-test: test_create_table_unsupported_names minor fix Check the expected error message to contain just ValidationException instead of an overly specific text message from DynamoDB, so we aren't so constraint in our own messages' wording. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:23 +03:00
Nadav Har'El	4f721a0637	alternator-test: test for creating table with very long name Dynamo allows tables names up to 255 characters, but when this is tested on Alternator, the results are disasterous: mkdir with such a long directory name fails, Scylla considers this an unrecoverable "I/O error", and exits the server. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:22 +03:00
Nadav Har'El	6967dd3d8f	test-table: test DescribeTable on non-existent table Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:21 +03:00
Nadav Har'El	d0cdc65b4c	Add "--local" option to run test against local Scylla installation For example "pytest --local test_item.py" Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:21 +03:00
Nadav Har'El	079c7c3737	test_item.py: basic string put and get test Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:20 +03:00
Nadav Har'El	4550f3024d	test_table fixture: be quicker to realize table was created. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:19 +03:00
Nadav Har'El	f1f76ed17b	test_table fixture: automatically delete Automatically delete the test table when the test ends. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:18 +03:00
Nadav Har'El	a946e255c6	test_item.py: start testing CRUD operations Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:17 +03:00
Nadav Har'El	4d7d871930	Start to use "test fixtures" Start to use "test fixtures" defined in conftest.py: The connection to the DynamoDB API, and also temporary tables, can be reused between multiple tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:16 +03:00
Nadav Har'El	6984ccf462	Add some table tests and README Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:43:15 +03:00
Nadav Har'El	f66ec337f7	alternator: very initial implementation of DescribeTable This initial implementation is enough to pass a test of getting a failure for a non-existant table - test_table.py::test_describe_table_non_existent_table and to recognize an existing table. But it's still missing a lot of fields for an existing table (among others, the schema). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:41:32 +03:00
Nadav Har'El	ad9eb0a003	alternator: errors should be output from server as Dynamo drivers expect Exceptions from the handlers need to be output in a certain way - as a JSON with specific fields - as DynamoDB drivers expect them to be. If a handler throws an alternator::api_error with these specific fields, they are output, but any other exception is converted into the same format as an "Internal Error". After this patch, executor code can throw an alternator::api_error and the client will receive this error in the right format. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:40:55 +03:00
Nadav Har'El	db49bc6141	alternator: add alternator::api_error exception type DynamoDB error messages are returned in JSON format and expect specific information: Some HTTP error code (often but not always 400), a string error "type" and a user-readable message. Code that wants to return user-visible exceptions should use this type, and in the next patch we will translate it to the appropriate JSON string. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:39:26 +03:00
Nadav Har'El	9d72bc3167	alternator: table creation time is in seconds The "Timestamp" type returned for CreationDateTime can be one of several things but if it is a number, it is supposed to be the time in seconds since the epoch - not in milliseconds. Returning milliseconds as we wrongly did causes boto3 (AWS's Python driver) to throw a parse exception on this response. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:38:41 +03:00
Nadav Har'El	c0518183c2	alternator: require alternator-port configuration Until now, we always opened the Alternator port along with Scylla's regular ports (CQL etc.). This should really be made optional. With this patch, by default Alternator does NOT start and does not open a port. Run Scylla with --alternator-port=8000 to open an Alternator API port on port 8000, as was the default until now. It's also possible to set this in scylla.yaml. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 12:38:31 +03:00
Piotr Sarna	2ec78164bc	alternator: add minimal HTTP interface The interface works on port 8000 by default and provides the most basic alternator operations - it's an incomplete set without validation, meant to allow testing as early as possible.	2019-09-11 12:34:18 +03:00
Benny Halevy	443e0275ab	dbuild: add --image option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-11 11:46:33 +03:00
Tomasz Grabiec	06154569d5	gdb: Use sstable tracker to get the list of sstables	2019-09-10 17:05:19 +02:00
Tomasz Grabiec	a141d30eca	gdb: Make intrusive_list recognize member_hook links GDB now gives "struct boost::intrusive::member_hook" from template_arguments()	2019-09-10 17:05:19 +02:00
Tomasz Grabiec	c014c79d4b	sstables: Track whether sstable was already open or not Some sstable objects correspond to sstables which are being written and are not sealed yet. Such sstables don't have all the fields filled-in. Tools which calculate statistics (like GDB scripts) need to distinguish such sstables.	2019-09-10 17:05:18 +02:00
Tomasz Grabiec	33bef82f6b	sstables: Track all instances of sstable objects Will make it easier to collect statistics about sstable in-memory metadata.	2019-09-10 17:05:16 +02:00
Tomasz Grabiec	fd74504e87	sstables: Make sstable object not movable Will be easier to add non-movable fields. We don't really need it to be movable, all instances should be managed by a shared pointer.	2019-09-10 17:04:54 +02:00
Tomasz Grabiec	589c7476e0	sstables: Move constructor out of line	2019-09-10 17:04:54 +02:00
Tomasz Grabiec	785fe281e7	gdb: scylla sstables: Print table name Message-Id: <1568121825-32008-1-git-send-email-tgrabiec@scylladb.com>	2019-09-10 16:36:21 +03:00
Glauber Costa	6651f96a70	sstables: do not keep sharding information from scylla metadata in memory (#4915 ) There is no reason to keep parts of the the Scylla Metadata component in memory after it is read, parsed, and its information fed into the SSTable. We have seen systems in which the Scylla metadata component is one of the heaviest memory users, more than the Summary and Filter. In particular, we use the token metadata, which is the largest part of the Scylla component, to calculate a single integer -> the shards that are responsible for this SSTable. Once we do that, we never use it again Tests: unit (release/debug), + manual scylla write load + reshard. Fixes #4951 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-09-09 22:28:51 +03:00
Tomasz Grabiec	a09479e63c	Merge "Validate position in partition monotonicity" from Benny Introduce mutation_fragment_stream_validator class and use it as a Filter to flat_mutation_reader::consume_in_thread from sstable::write_components to validate partition region and optionally clustering key monotonicity. Fixes #4803	2019-09-09 15:38:31 +02:00
Benny Halevy	42f6462837	config: enable_sstable_key_validation by default in debug build Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	34d306b982	config: add enable_sstable_key_validation option key monotonicity validation requires an overhead to store the last key and also to compare therefore provide an option to enable/disable it (disabled by default). Refs #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	507c99c011	mutation_fragment_stream_validator: add compare_keys flag Storing and comparing keys is expensive. Add a flag to enable/disable this feature (disabled by default). Without the flag, only the partition region monotonicity is validated, allowing repeated clustering rows, regardless of clustering keys. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	bc2ef1d409	mutation_fragment: declare partition_region operator<< in header file Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	496467d0a2	sstables: writer: Validate input mutation fragment stream Fixes #4803 Refs #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	a37acee68f	position_in_partition: define operator=(position_in_partition_view) The respective constructor is explicit. Define this assignment operator to be used by flat_mutation_reader mutation_fragment_stream_validator filter so that it can use mutation_fragment::position() verbatim and keep its internal state as position_in_partition. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	41b60b8bc5	compaction: s/filter_func/make_partition_filter/ It expresses the purpose of this function better as suggested by Tomasz Grabiec. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	24c7320575	dbuild: run interactive shell by default If not given any other args to run, just run an interactive shell. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190909113140.9130-1-bhalevy@scylladb.com>	2019-09-09 15:15:57 +03:00
Nadav Har'El	2543760ee6	docs/metrics.md: document additional "lables" Recently we started to use more the concept of metric labels - several metrics which share the same name, but differ in the value of some label such a "group" (for different scheduling groups). This patch documents this feature in docs/metrics.md, gives the example of scheduling groups, and explains a couple more relevant Promethueus syntax tricks. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190909113803.15383-1-nyh@scylladb.com>	2019-09-09 15:15:57 +03:00
Botond Dénes	59a96cd995	scylla-gdb.py: introduce scylla task-queues This command provides an overview of the reactors task queues. Example: id name shares tasks A 00 "main" 1000.00 4 01 "atexit" 1000.00 0 02 "streaming" 200.00 0 A 03 "compaction" 171.51 1 04 "mem_compaction" 1000.00 0 *A 05 "statement" 1000.00 2 06 "memtable" 8.02 0 07 "memtable_to_cache" 200.00 0 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190906060039.42301-1-bdenes@scylladb.com>	2019-09-09 15:15:57 +03:00
Avi Kivity	8e8975730d	Update seastar submoodule * seastar cb7026c16f...b3fb4aaab3 (10): > Revert "scheduling groups: Adding per scheduling group data support" > scheduling groups: Adding per scheduling group data support > rpc: check that two servers are not created with the same streaming id > future: really ignore exceptions in ignore_ready_future > iostream: Constify eof() function > apply.hh: add missing #include for size_t > scheduling_group_demo: add explicit yields since future::get() no longer does > Fix buffer size used when calling accept4() > future-util: reduce allocations and continuations in parallel_for_each > rpc: lz4_decompressor: Add a static constexpr variable decleration for Cpp14 compatibility	2019-09-09 15:15:34 +03:00
Gleb Natapov	9e9f64d90e	messaging_service: configure different streaming domain for each rpc server A streaming domain identifies a server across shards. Each server should have different one. Fixes: #4953 Message-Id: <20190908085327.GR21540@scylladb.com>	2019-09-08 14:05:40 +03:00
Piotr Sarna	01410c9770	transport: make sure returning connection errors happens inside the gate. Previously, the gate could get closed too early, which would result in shutting down the server before it had an opportunity to respond to the client. Refs #4818	2019-09-08 13:23:20 +03:00
Avi Kivity	5663218fac	Merge "types: Fix decimal to integer and varint to integer conversion" from Rafael " The release notes for boost 1.67.0 includes: Breaking Change: When converting a multiprecision integer to a narrower type, if the value is too large (or negative) to fit in the smaller type, then the result is either the maximum (or minimum) value of the target Since we just moved out of boost 1.66, we have to update our code. This fixes issue #4960 " * 'espindola/fix-4960' of https://github.com/espindola/scylla: types: fix varint to integer conversion types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer types: fix decimal to integer conversion types: extract helper for converting a decimal to a cppint types: rename and detemplate make_castas_fctn_from_decimal_to_integer	2019-09-08 10:45:42 +03:00
Avi Kivity	244218e483	Merge "simplify date type" from Rafael " With this patch series one has to be explicit to create a date_type_impl and now there is only the one documented difference between date_type_impl and timestamp_type_impl. " * 'espindola/simplify-date-type' of https://github.com/espindola/scylla: types: Reduce duplication around date_type_impl types: Don't use date_type_native_type when we want a timestamp types: Remove timestamp_native_type types: Don't specialize data_type_for for db_clock::time_point types: Make it harder to create date_type	2019-09-08 10:21:48 +03:00
Rafael Ávila de Espíndola	3bac4ebac7	types: Reduce duplication around date_type_impl According to the comments, the only different between date_type_impl and timestamp_type_impl is the comparison function. This patch makes that explicit by merging all code paths except: * The warning when converting between the two * The compare function The date_type_impl type can still be user visible via very old sstables or via the thrift protocol. It is not clear if we still need to support either, but with this patch it is easy to do so. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola	36d40b4858	types: Don't use date_type_native_type when we want a timestamp In these cases it is pretty clear that the original code wanted to create a timestamp_type data_value but was creating a date_type one because of the old defaults. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola	01cd21c04d	types: Remove timestamp_native_type Now that we know that anything expecting a date_type has been converted to date_type_native_type, switch to using db_clock::time_point when we want a timestamp_type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola	df6c2d1230	types: Don't specialize data_type_for for db_clock::time_point This also moves every user to date_type_native_type. A followup patch will convert to timestamp_type when appropriate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola	e09fa2dcff	types: Make it harder to create date_type date_type was replaced with timestamp_type, but it was very easy to create a date_type instead of a timestamp_type by accident. This patch changes the code so that a date_type is no longer implicitly used when constructing a data_value. All existing code that was depending on this is converted to explicitly using date_type_native_type. A followup patch will convert to timestamp_type when appropriate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-07 10:07:33 -07:00
Gleb Natapov	f78b2c5588	transport: remove remaining craft related to cql's server load balancing Commit `7e3805ed3d` removed the load balancing code from cql server, but it did not remove most of the craft that load balancing introduced. The most of the complexity (and probably the main reason the code never worked properly) is around service::client_state class which is copied before been passed to the request processor (because in the past the processing could have happened on another shard) and then merged back into the "master copy" because a request processing may have changed it. This commit remove all this copying. The client_request is passed as a reference all the way to the lowest layer that needs it and it copy construction is removed to make sure nobody copies it by mistake. tests: dev, default c-s load of 3 node cluster Message-Id: <20190906083050.GA21796@scylladb.com>	2019-09-07 18:17:53 +03:00
Avi Kivity	3b5aa13437	Merge "Optimize type find" from Rafael " This avoids a double dispatch on _kind and also removes a few shared_ptr copies. The extra work was a small regression from the recent types refactoring. " * 'espindola/optimize_type_find' of https://github.com/espindola/scylla: types: optimize type find implementation types: Avoid shared_ptr copies	2019-09-07 18:14:36 +03:00
Gleb Natapov	5b9dc00916	test: fix query_processor_test::test_query_counters to use SERIAL consistency correctly It is not possible to scan a table with SERIAL consistency only to read a single partition. Message-Id: <20190905143023.GQ21540@scylladb.com>	2019-09-07 18:07:01 +03:00
Gleb Natapov	e52ebfb957	cql3: remove unused next_timestamp() function next_timestamp() just calls get_timestamp() directly and nobody uses it anyway. Message-Id: <20190905101648.GO21540@scylladb.com>	2019-09-05 17:20:21 +03:00
Botond Dénes	783277fb02	stream_session: STREAM_MUTATION_FRAGMENTS: print errors in receive and distribute phase Currently when an error happens during the receive and distribute phase it is swallowed and we just return a -1 status to the remote. We only log errors that happen during responding with the status. This means that when streaming fails, we only know that something went wrong, but the node on which the failure happened doesn't log anything. Fix by also logging errors happening in the receive and distribute phase. Also mention the phase in which the error happened in both error log messages. Refs: #4901 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190903115735.49915-1-bdenes@scylladb.com>	2019-09-05 13:43:00 +02:00
Rafael Ávila de Espíndola	dd81e94684	types: fix varint to integer conversion The previous code was using the boost::multiprecision::cpp_int to integer conversion, but that doesn't have the same semantics an cql for signed numbers. This fixes the dtest cql_cast_test.py:CQLCastTest.cast_varint_test. Fixes #4960 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola	263e18b625	types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer It will be used when converting varint to integer too. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola	2d453b8e17	types: fix decimal to integer conversion The previous code was using the boost::multiprecision::cpp_rational to integer conversion, but that doesn't have the same semantics an cql. This patch avoids creating a cpp_rational in the first place and works just with integers. This fixes the dtest cql_cast_test.py:CQLCastTest.cast_decimal_test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola	fb760774dd	types: extract helper for converting a decimal to a cppint It will also be used in the decimal to integer conversion. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 15:08:07 -07:00
Rafael Ávila de Espíndola	40e6882906	types: rename and detemplate make_castas_fctn_from_decimal_to_integer It was only ever used for varint. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-04 14:54:47 -07:00
Avi Kivity	301246f6c0	storage_proxy: protect _view_update_handlers_list iterators from invalidation on_down() iterates over _view_update_handlers_list, but it yields during iteration, and while it yields, elements in that list can be removed, resulting in a use-after-free. Prevent this by registering iterators that can be potentially invalidated, and any time we remove an element from the list, check whether we're removing an element that is being pointed to by a live iterator. If that is the case, advance the iterator so that it points at a valid element (or at the end of the list). Fixes #4912. Tests: unit (dev)	2019-09-04 17:19:28 +03:00
Tomasz Grabiec	9f5826fd4b	Merge "Use canonical mutations for background schema sync" from Botond Currently the background schema sync (push/pull) uses frozen mutation to send the schema mutations over the wire to the remote node. For this to work correctly, both nodes have to have the exact same schema for the system schema tables, as attempting to unpack the frozen mutation with the wrong schema leads to undefined behaviour. To avoid this and to ensure syncing schema between nodes with different schema table schema versions is defined we migrate the background schema sync to use canonical mutations for the transfer of the schema mutations. Canonical mutations are immune to this problem, as they support deserializing with any version of the schema, older or newer one. The foreground schema sync mechanisms -- the on-demand schema pulls on reads and writes -- already use canonical mutations to transmit the schema mutations. It is important to note that due to this change, column-level incompatibilities between the schema mutations and the schema used to deserialize them will be hidden. This is undesired and should be fixed in a follow-up (#4956). Table level incompatibilities are detected and schema mutations containing such mutations will be rejected just like before. This patch adds canonical mutation support to the two background schema sync verbs: * `DEFINITIONS_UPDATE` (schema push) * `MIGRATION_REQUEST` (schema pull) Both verbs still support the old frozen mutation schema transfer, albeit that path is now much less efficient. After all nodes are upgraded, the pull verb can effectively avoid sending frozen mutations altogether, completely migrating to canonical mutations. Unfortunately this was not possible for the push verb, so that one now has an overhead as it needs to send both the frozen and canonical mutations. Fixes: #4273	2019-09-04 13:58:14 +02:00
Benny Halevy	bc29520eb8	flat_mutation_reader: consume_in_thread: add mutation_filter For validating mutation_fragment's monotonicity. Note: forwarding constructor allows implicit conversion by current callers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-04 13:42:37 +03:00
Rafael Ávila de Espíndola	000514e7cc	sstable: close file_writer if an exception in thrown The previous code was not exception safe and would eventually cause a file to be destroyed without being closed, causing an assert failure. Unfortunately it doesn't seem to be possible to test this without error injection, since using an invalid directory fails before this code is executed. Fixes #4948 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190904002314.79591-1-espindola@scylladb.com>	2019-09-04 13:28:55 +03:00
Botond Dénes	7adc764b6e	messaging_service: add canonical_support to schema pull and push verbs The verbs are: * DEFINITIONS_UPDATE (push) * MIGRATION_REQUEST (pull) Support was added in a backward-compatible way. The push verb, sends both the old frozen mutation parameter, and the new optional canonical mutation parameter. It is expected that new nodes will use the latter, while old nodes will fall-back to the former. The pull verb has a new optional `options` parameter, which for now contains a single flag: `remote_supports_canonical_mutation_retval`. This flag, if set, means that the remote node supports the new canonical mutation return value, thus the old frozen mutations return value can be left empty.	2019-09-04 10:32:44 +03:00
Botond Dénes	d9a8ff15d8	service::migration_manager: add canonical_mutation merge_schema_from() overload Add an overload which takes a vector of canonical mutations. Going forward, this is the overload to use.	2019-09-04 10:32:44 +03:00
Botond Dénes	e02b93cae1	schema_tables: convert_schema_to_mutations: return canonical_mutations In preparation to the schema push/pull migrating to use canonical mutations, convert the method producing the schema mutations to return a vector of canonical mutations. The only user, MIGRATION_REQUEST verb, converts the canonical mutations back to frozen mutations. This is very inefficient, but this path will only be used in mixed clusters. After all nodes are upgraded the verb will be sending the canonical mutations directly instead.	2019-09-04 08:47:20 +03:00
Rafael Ávila de Espíndola	b100f95adc	types: optimize type find implementation This turns find into a template so there is only one switch over the kind of each type in the search. To evaluate the change in code size sizes, I added [[noinline]] to find and obtained the following results. The release columns for release in the before case have an extra column because the functions are sufficiently complex to trigger gcc to split them in hot + cold. before: dev release (hot + cold split) find 0x35f = 863 0x3d5 + 0x112 = 1255 references_duration 0x62 + 0x22 + 0x8 = 140 0x55 + 0x1f + 0x2a + 0x8 = 166 references_user_type 0x6b + 0x26 + 0x111 = 418 0x65 + 0x1f + 0x32 + 0x11b = 465 after: dev release find 0xd6 + 0x1b4 = 650 0xd2 + 0x1f5 = 711 references_duration 0x13 = 19 0x13 = 19 references_user_type 0x1a = 26 0x21 = 33 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-03 08:23:21 -07:00
Rafael Ávila de Espíndola	e0065b414e	types: Avoid shared_ptr copies They are somewhat expensive (in code size at least) and not needed everywhere. Inside the getter the variables are 'const data_type&', so we can return that. Everything still works when a copy is needed, but in code that just wants to check a property we avoid the copy. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-03 07:43:35 -07:00
Benny Halevy	bdfb73f67d	scripts/create-relocatable-package: ldd: print executable name in exception Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190903080511.534-1-bhalevy@scylladb.com>	2019-09-03 15:34:38 +03:00
Avi Kivity	294a86122e	Merge "nonroot installer" from Takuya " This is nonroot installer patchset v9. " * 'nonroot_v9' of https://github.com/syuu1228/scylla: dist/common/scripts: support nonroot mode on setup scripts reloc/python3: add install.sh on python relocatable package install.sh: add --nonroot mode dist/common/systemd: untemplataize .service, use drop-in units instead dist/debian: delete debian/.install, debian/*.dirs	2019-09-03 15:33:20 +03:00
Piotr Sarna	7b297865e1	transport: wait for the connections to finish when stopping (#4818 ) During CQL request processing, a gate is used to ensure that the connection is not shut down until all ongoing requests are done. However, the gate might have been left too early if the database was not ready to respond immediately - which could result in trying to respond to an already closed connection later. This issue is solved by postponing leaving the gate until the continuation chain that handles the request is finished. Refs #4808	2019-09-03 14:49:11 +03:00
Avi Kivity	8fb59915bb	Merge "Minor cleanup patches for sstables" from Asias * 'cleanup_sstables' of https://github.com/asias/scylla: sstables: Move leveled_compaction_strategy implementation to source file sstables: Include dht/i_partitioner.hh for dht::partition_range	2019-09-03 14:47:44 +03:00
Takuya ASADA	31ddb2145a	dist/common/scripts: support nonroot mode on setup scripts Since nonroot mode requires to run everything on non-privileged user, most of setup scripts does not able to use nonroot mode. We only provide following functions on nonroot mode: - EC2 check - IO setup - Node exporter installer - Dev mode setup Rest of functions will be skipped on scylla_setup. To implement nonroot mode on setup scripts, scylla_util provides utility functions to abstract difference of directory structure between normal installation and nonroot mode.	2019-09-03 20:06:35 +09:00
Takuya ASADA	cfa8885ae1	reloc/python3: add install.sh on python relocatable package To support nonroot installation on scylla-python3, add install.sh on scylla-python3 relocatable package.	2019-09-03 20:06:30 +09:00
Takuya ASADA	2de14e0800	install.sh: add --nonroot mode This implements the way to install Scylla without requires root privilege, not distribution dependent, does not uses package manager.	2019-09-03 20:06:24 +09:00
Takuya ASADA	cde798dba5	dist/common/systemd: untemplataize *.service, use drop-in units instead Since systemd unit can override parameters using drop-in unit, we don't need mustache template for them. Also, drop --disttype and --target options on install.sh since it does not required anymore, introduce --sysconfdir instead for non-redhat distributions.	2019-09-03 20:06:15 +09:00
Takuya ASADA	49a360f234	dist/debian: delete debian/.install, debian/.dirs Since `ac9b115`, we switched to install.sh on Debian so we don't rely on .deb specific packaging scripts anymore. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2019-09-03 20:06:09 +09:00
Benny Halevy	7827e3f11d	tests: test_large_data: do not stop database Now that compaction returns only after the compacted sstables are deleted we no longer need to stop the base to force waiting for deletes (that were previously done asynchronously) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	19b67d82c9	table::on_compaction_completion: fix indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	8dd6e13468	table::on_compaction_completion: wait for background deletes Don't let background deletes accumulate uncontrollably. Fixes #4909 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	da6645dc2c	table: refresh_snapshot before deleting any sstables The row cache must not hold refrences to any sstable we're about to delete. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:29 +03:00
Nadav Har'El	6c4ad93296	api/compaction_manager: do not hold map on the stack Merged patch series by Amnon Heiman: This patch fixes a bug that a map is held on the stack and then is used by a future. Instead, the map is now moved to the relevant lambda function. Fixes #4824	2019-09-01 13:16:34 +03:00
Avi Kivity	e962beea20	toolchain: update to Fedora 30 and gcc 9.2 In Fedora 30 we have a new boost version, so we no longer need to use our patched boost, so we also remove the scylladb/toolchain copr.	2019-09-01 12:05:26 +03:00
Piotr Sarna	23c891923e	main: make sure view_builder doesn't propagate semaphore errors Stopping services which occurs in a destructor of deferred_action should not throw, or it will end the program with terminate(). View builder breaks a semaphore during its shutdown, which results in propagating a broken_semaphore exception, which in turn results in throwing an exception during stop().get(). In order to fix that issue, semaphore exceptions are explicitly ignored, since they're expected to appear during shutdown. Fixes #4875	2019-09-01 11:59:57 +03:00
Tomasz Grabiec	c8f8a9450f	Merge "Improve cpu instruction set support checks" from Avi To prevent termination with SIGILL, tighten the instruction set support checks. First, check for CLMUL too. Second, add a check in scylla_prepare to catch the problem early. Fixes #4921.	2019-08-30 16:54:04 +02:00
Avi Kivity	07010af44c	scylla_prepare: verify processor satisfies instruction set requirements Scylla requires the CLMUL and SSE 4.2 instruction sets and will fail without them. There is a check in main(), but that happens after the code is running and it may already be too late. Add a check in scylla_prepare which runs before the main executable.	2019-08-29 15:34:29 +03:00
Avi Kivity	9579946e72	main: extend CPU feature check to verify that PCLMUL is available Since `79136e895f`, we use the pclmul instruction set, so check it is there.	2019-08-29 15:13:32 +03:00
Gleb Natapov	e61a86bbb2	to_string: Add operator<< overload for std::tuple. Message-Id: <20190829100902.GN21540@scylladb.com>	2019-08-29 13:35:02 +03:00
Rafael Ávila de Espíndola	036f51927c	sstables: Remove unused include Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190827210424.37848-1-espindola@scylladb.com>	2019-08-28 11:32:44 +03:00
Benny Halevy	869b518dca	sstables: auto-delete unsealed sstables Fixes #4807 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190827082044.27223-1-bhalevy@scylladb.com>	2019-08-28 09:46:17 +03:00
Botond Dénes	969aa22d51	configure.py: promote unused result warning to error Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190827111428.6829-2-bdenes@scylladb.com>	2019-08-28 09:46:17 +03:00
Botond Dénes	480b42b84f	tests/gossip_test: silence discarded future warning Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190827111428.6829-1-bdenes@scylladb.com>	2019-08-28 09:46:17 +03:00
Avi Kivity	d85339e734	Update seastar submodule * seastar 20bfd61955...cb7026c16f (2): > net: dpdk: suppress discarded future warning > Merge "Optimize promises in then/then_wrapped" from Rafael	2019-08-28 09:46:17 +03:00
Avi Kivity	f1d73d0c13	Merge "systemd: put scylla processes in systemd slices. #4743 " from Glauber " It is well known that seastar applications, like Scylla, do not play well with external processes: CPU usage from external processes may confuse the I/O and CPU schedulers and create stalls. We have also recently seen that memory usage from other application's anonymous and page cache memory can bring the system to OOM. Linux has a very good infrastructure for resource control contributed by amazingly bright engineers in the form of cgroup controllers. This infrastructure is exposed by SystemD in the form of slices: a hierarchical structure to which controllers can be attached. In true systemd way, the hierarchy is implicit in the filenames of the slice files. a "-" symbol defines the hierarchy, so the files that this patch presents, scylla-server and scylla-helper, essentially create a "scylla" cgroup at the top level with "server" and "helper" children. Later we mark the Services needed to run scylla as belonging to one or the other through the Slice= directive. Scylla DBAs can benefit from this setup by using the systemd-run utility to fire ad-hoc commands. Let's say for example that someone wants to hypothetically run a backup and transfer files to an external object store like S3, making sure that the amount of page cache used won't create swap pressure leading to database timeouts. One can then run something like: sudo systemd-run --uid=id -u scylla --gid=id -g scylla -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool (or even better, the backup tool can itself be a systemd timer) " * 'slices' of https://github.com/glommer/scylla: systemd: put scylla processes in systemd slices. move postinst steps to an external script	2019-08-26 20:16:55 +03:00
Benny Halevy	20083be9f6	sstables: delete_atomically: fix misplaced parenthesis in pending_delete_log warning message Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190818064637.9207-1-bhalevy@scylladb.com>	2019-08-26 19:50:21 +03:00
Avi Kivity	b9e9d7d379	Merge "Resolve discarded future warnings" from Botond " The warning for discarded futures will only become useful, once we can silence all present warnings and flip the flag to make it become error. Then it will start being useful in finding new, accidental discarding of futures. This series silences all remaining warnings in the Scylla codebase. For those cases where it was obvious that the future is discarded on purpose, the author taking all necessary precaution (handling exception) the warning was simply silenced by casting the future to void and adding a relevant comment. Where the discarding seems to have been done in error, I have fixed the code to not discard it. To the rest of the sites I added a FIXME to fix the discarding. " * 'resolve-discarded-future-warnings/v4.2' of https://github.com/denesb/scylla: treewide: silence discarded future warnings for questionable discards treewide: silence discarded future warnings for legit discards tests: silence discarded future warnings tests/cql_query_test.cc: convert some tests to thread	2019-08-26 19:40:25 +03:00
Botond Dénes	136fc856c5	treewide: silence discarded future warnings for questionable discards This patches silences the remaining discarded future warnings, those where it cannot be determined with reasonable confidence that this was indeed the actual intent of the author, or that the discarding of the future could lead to problems. For all those places a FIXME is added, with the intent that these will be soon followed-up with an actual fix. I deliberately haven't fixed any of these, even if the fix seems trivial. It is too easy to overlook a bad fix mixed in with so many mechanical changes.	2019-08-26 19:28:43 +03:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Botond Dénes	cff4c4932d	tests: silence discarded future warnings	2019-08-26 18:54:44 +03:00
Botond Dénes	486fa8c10c	tests/cql_query_test.cc: convert some tests to thread Some tests are currently discarding futures unjustifiably, however adding code to wait on these futures is quite inconvenient due to the continuation style code of these tests. Convert them to run in a seastar thread to make the fix easier.	2019-08-26 18:54:44 +03:00
Tomasz Grabiec	ac5ff4994a	service: Announce the new schema version when features are enabled Introduced in `c96ee98`. We call update_schema_version() after features are enabled and we recalculate the schema version. This method is not updating gossip though. The node will still use it's database::version() to decide on syncing, so it will not sync and stay inconsistent in gossip until the next schema change. We should call updatE_schema_version_and_announce() instead so that the gossip state is also updated. There is no actual schema inconsistency, but the joining node will think there is and will wait indefinitely. Making a random schema change would unbock it. Fixes #4647. Message-Id: <1566825684-18000-1-git-send-email-tgrabiec@scylladb.com>	2019-08-26 17:54:59 +03:00
Avi Kivity	a7b82af4c3	Update seastar submodule * seastar afc5bbf511...20bfd61955 (18): > reactor: closing file used to check if direct_io is supported > future: set_coroutine(): s/state()/_state/ > tests/perf/perf_test.hh: suppress discarded future warning > tests: rpc: fix memory leak in timeout wraparound tests > Revert "future-util: reduce allocations and continuations in parallel_for_each" > reactor: fix rename_priority_class() build failure in C++14 mode > future: mark future_state_base::failed() as unlikely > future-util: reduce allocations and continuations in parallel_for_each > future-utils: generalize when_all_estimate_vector_capacity() > output_stream: Add comment on sequentiality > docs/tutorial.md: minor cleanups in first section > core: fix a race in execution stages (Fixes #4856, fixes #4766) > semaphore: use semaphore's clock type in with_semaphore()/get_units() > future: fix doxygen documentation for promise<> > sharded: fixed detecting stop method when building with clang > reactor: fixed locking error in rename_priority_class > Assert that append_challenged_posix_file_impl are closed. > rpc: correctly handle huge timeouts	2019-08-26 15:37:58 +03:00
Asias He	3ea1255020	storage_service: Use sleep_abortable instead of sleep (#4697 ) Make the sleep abortable so that it is able to break the loop during shutdown. Fixes #4885	2019-08-26 13:35:44 +03:00
Asias He	2f24fd9106	sstables: Move leveled_compaction_strategy implementation to source file It is better than putting everything in header.	2019-08-26 16:49:48 +08:00
Asias He	b69138c4e4	sstables: Include dht/i_partitioner.hh for dht::partition_range Get rid of one FIXME.	2019-08-26 16:35:18 +08:00
Nadav Har'El	b60d201a11	API: column_family.cc Add get_built_indexes implmentation Merged patch series from Amnon Heiman amnon@scylladb.com This Patch adds an implementation of the get built index API and remove a FIXME. The API returns a list of secondary indexes belongs to a column family and have already been fully built. Example: CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) ); CREATE index on scylla_demo.mytableID (time); $ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid' ["mytableid_time_idx"]	2019-08-25 18:37:44 +03:00
Amnon Heiman	2d3185fa7d	column_family.cc: remove unhandle future The sum_ratio struct is a helper struct that is used when calculating ratio over multiple shards. Originally it was created thinking that it may need to use future, in practice it was never used and the future was ignore. This patch remove the future from the implementation and reduce an unhandle future warning from the compilation. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-25 16:51:14 +03:00
Amnon Heiman	21dee3d8ef	API:column_family.cc Add get_build_index implmentation This Patch adds an implementation of the get build index API and remove a FIXME. The API returns the list of the built secondary indexes belongs to a column family. Example: CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) ); CREATE index on scylla_demo.mytableID (time); $ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid' ["mytableid_time_idx"] Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-25 16:46:49 +03:00
Juliana Oliveira	711ed76c82	auth: standard_role_manager: read null columns as false When a role is created through the `create role` statement, the 'is_superuser' and 'can_login' columns are set to false by default. Likewise, `list roles`, `alter roles` and `* roles` operations expect to find a boolean when reading the same columns. This is not the case, though, when a user directly inserts to `system_auth.roles` and doesn't set those columns. Even though manually creating roles is not a desired day-to-day operation, it is an insert just like any other and it should work. `* roles` operations, on the other hand, are not prepared for this deviations. If a user manually creates a role and doesn't set boolean values to those columns, `* roles` will return all sorts of errors. This happens because `* roles` is explicitly expecting a boolean and casting for it. This patch makes `* roles` more friendly by considering the boolean variable `false` - inside `* roles` context - if the actual value is `null`; it won't change the `null` value. Fixes #4280 Signed-off-by: Juliana Oliveira <juliana@scylladb.com> Message-Id: <20190816032617.61680-1-juliana@scylladb.com>	2019-08-25 11:52:43 +03:00
Pekka Enberg	118a141f5d	scylla_blocktune.py: Kill btrfs related FIXME The scylla_blocktune.py has a FIXME for btrfs from 2016, which is no longer relevant for Scylla deployments, as Red Hat dropped support for the file system in 2017. Message-Id: <20190823114013.31112-1-penberg@scylladb.com>	2019-08-24 20:40:08 +03:00
Botond Dénes	18581cfb76	multishard_mutation_query: create_readr(): use the caller's priority class The priority class the shard reader was created with was hardcoded to be `service::get_local_sstable_query_read_priority()`. At the time this code was written, priority classes could not be passed to other shards, so this method, receiving its priority class parameters from another shard, could not use it. This is now fixed, so we can just use whatever the caller wants us to use. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190823115111.68711-1-bdenes@scylladb.com>	2019-08-23 16:10:43 +02:00
Tomasz Grabiec	080989d296	Merge "cql3: cartesian product limits" from Avi Cartesian products (generated by IN restrictions) can grow very large, even for short queries. This can overwhelm server resources. Add limit checking for cartesian products, and configuration items for users that are not satisfied with the default of 100 records fetched. Fixes #4752. Tests: unit (dev), manual test with SIGHUP.	2019-08-21 19:35:59 +02:00
Avi Kivity	67b0d379e0	main: add glue between db::config and cql3::cql_config Copy values between the flat db::config and the hierarchical cql_config, adding observers to keep the values updated.	2019-08-21 19:35:59 +02:00
Avi Kivity	8c7ad1d4cd	cql: single_column_clustering_key_restrictions: limit cartesian products Cartesian products (via IN restrictions) make it easy to generate huge primary key sets with simple queries, overflowing server resources. Limit them in the coordinator and report an exception instead of trying to execute a query that would consume all of our memory. A unit test is added.	2019-08-21 19:35:59 +02:00
Avi Kivity	3a44fa9988	cql3, treewide: introduce empty cql3::cql_config class and propagate it We need a way to configure the cql interpreter and runtime. So far we relied on accessing the configuration class via various backdoors, but that causes its own problems around initialization order and testability. To avoid that, this patch adds an empty cql_config class and propagates it from main.cc (and from tests) to the cql interpreter via the query_options class, which is already passed everywhere. Later patches will fill it with contents.	2019-08-21 19:35:59 +02:00
Rafael Ávila de Espíndola	86c29256eb	types: Fix references_user_type This was broken since the type refactoring. It was checking the static type, which is always abstract_type. Unfortunately we only had dtests for this. This can probably be optimized to avoid the double switch over kind, but it is probably better to do the simple fix first. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190821155354.47704-1-espindola@scylladb.com>	2019-08-21 19:13:59 +03:00
Dejan Mircevski	ea9d358df9	cql3: Optimize LIKE regex construction Currently we create a regex from the LIKE pattern for every row considered during filtering, even though the pattern is always the same. This is wasteful, especially since we require costly optimization in the regex compiler. Fix it by reusing the regex whenever the pattern is unchanged since the last call. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-21 16:45:47 +03:00
Piotr Sarna	526f4c42aa	storage_proxy: fix iterator liveness issue in on_down (#4876 ) The loop over view update handlers used a guard in order to ensure that the object is not prematurely destroyed (thus invalidating the iterator), but the guard itself was not in the right scope. Fixed by replacinga 'for' loop with a 'while' loop, which moves the iterator incrementation inside the scope in which it's still guarded and valid. Fixes #4866	2019-08-21 15:44:43 +03:00
Avi Kivity	4ef7429c4a	build: build seastar in build directory Currently, seastar is built in seastar/build/{mode}. This means we have two build directories: build/{mode} and seastar/build/{mode}. This patch changes that to have only a single build directory (build/{mode}). It does that by calling Seastar's cmake directly instead of through Seastar's ./configure.py. However, to support dpdk, if that is enabled it calls cmake through Seastar's ./cooking.sh (similar to what Seastar's ./configure.py does). All ./configure.py flags are translated to cmake variables, in the same way that Seastar does. Contains fix from Rafael to pass the flags for the correct mode.	2019-08-21 13:10:17 +02:00
Rafael Ávila de Espíndola	278b6abb2b	Improve documentation on the system.large_* tables This clarifies that "rows" are clustering rows and that there is no information about individual collection elements. The patch also documents some properties common to all these tables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190820171204.48739-1-espindola@scylladb.com>	2019-08-21 10:36:25 +03:00
Vlad Zolotarov	d253846c91	hinted handoff: fix a race on a directory removal between space_watchdog and drain_for() The endpoint directories scanned by space_watchdog may get deleted by the manager::drain_for(). If a deleted directory is given to a lister::scan_dir() this will end up in an exception and as a result a space_watchdog will skip this round and hinted handoff is going to be disabled (for all agents including MVs) for the whole space_watchdog round. Let's make sure this doesn't happen by serializing the scanning and deletion using end_point_hints_manager::file_update_mutex. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 11:46:46 -04:00
Vlad Zolotarov	b34c36baa2	hinted handoff: make taking file_update_mutex safe end_point_hints_manager::file_update_mutex is taken by space_watchdog but while space_watchdog is waiting for it the corresponding end_point_hints_manager instance may get destroyed by manager::drain_for() or by manager::stop(). This will end up in a use-after-free event. Let's change the end_point_hints_manager's API in a way that would prevent such an unsafe locking: - Introduce the with_file_update_mutex(). - Make end_point_hints_manager::file_update_mutex() method private. Fixes #4685 Fixes #4836 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 11:26:19 -04:00
Vlad Zolotarov	dbad9fcc7d	db::hints::manager::drain_for(): fix alignment Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 10:58:36 -04:00
Vlad Zolotarov	7a12b46fc9	db::hints::manager: serialize calls to drain_for() If drain_for() is running together with itself: one instance for the local node and one for some other node, erasing of elements from the _ep_managers map may lead to a use-after-free event. Let's serialize drain_for() calls with a semaphore. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 10:58:36 -04:00
Vlad Zolotarov	09600f1779	db::hints: cosmetics: identation and missing method qualifier Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-08-20 10:58:36 -04:00
Avi Kivity	698b72b501	relocatable: switch from run-time relocation to install-time relocation Our current relocation works by invoking the dynamic linker with the executable as an argument. This confuses gdb since the kernel records the dynamic linker as the executable, not the real executable. Switch to install-time relocation with patchelf: when installing the executable and libraries, all paths are known, and we can update the path to the dynamic loader and to the dynamic libraries. Since patchelf itself is dynamically linked, we have to relocate it dynamically (with the old method of invoking it via the dynamic linker). This is okay since it's a one-time operation and since we don't expect to debug core dumps of patchelf crashes. We lose the ability to run scylla directly from the uninstalled tarball, but since the nonroot installer is already moving in the direction of requiring install.sh, that is not a great loss, and certainly the ability to debug is more important. dh_strip barfs on some binaries which were treated with patchelf, so exclude them from dh_strip. This doesn't lose any functionality, since these binaries didn't have debug information to begin with (they are already-stripped Fedora executables). Fixes #4673.	2019-08-20 00:25:43 +02:00
Botond Dénes	4cb873abfe	query::trim_clustering_row_ranges_to(): fix handling of non-full prefix keys Non-full prefix keys are currently not handled correctly as all keys are treated as if they were full prefixes, and therefore they represent a point in the key space. Non-full prefixes however represent a sub-range of the key space and therefore require null extending before they can be treated as a point. As a quick reminder, `key` is used to trim the clustering ranges such that they only cover positions >= then key. Thus, `trim_clustering_row_ranges_to()` does the equivalent of intersecting each range with (key, inf). When `key` is a prefix, this would exclude all positions that are prefixed by key as well, which is not desired. Fixes: #4839 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190819134950.33406-1-bdenes@scylladb.com>	2019-08-20 00:24:51 +02:00
Avi Kivity	21d6f0bb16	Merge "Add LIKE test cases for all non-string types #4859 " from Dejan " Follow-up to #4610, where a review comment asked for test coverage on all types. Existing tests cover all the types admissible in LIKE, while this PR adds coverage for all inadmissible types. Tests: unit (dev) " * 'like-nonstring' of https://github.com/dekimir/scylla: cql_query_test: Add LIKE tests for all types cql_query_test: Remove LIKE-nonstring-pattern case cql_query_test: Move a testcase elsewhere in file	2019-08-20 00:24:51 +02:00
Tomasz Grabiec	6813ae22b0	Merge "Handle termination signals during streaming" from Avi In `b197924`, we changed the shutdown process not to rely on the global reactor-defined exit, but instead added a local variable to hold the shutdown state. However, we did not propagate that state everywhere, and now streaming processes are not able to abort. Fix that by enhancing stop_signal with a sharded<abort_source> member that can be propagated to services. Propagate it to storage_service and thence to boot_strapper and range_streamer so that streaming processes can be aborted. Fixes #4674 Fixes #4501 Tests: unit (dev), manual bootstrap test	2019-08-20 00:24:51 +02:00
Avi Kivity	2c7435418a	Merge "database: assign proper io priority for streaming view updates" from Piotr " Streamed view updates parasitized on writing io priority, which is reserved for user writes - it's now properly bound to streaming write priority. Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...} Tests: unit(dev) " * 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla: db,view: wrap view update generation in stream scheduling group database: assign proper io priority for streaming view updates	2019-08-20 00:24:51 +02:00
Pekka Enberg	d0eecbf3bb	api/storage_proxy: Wire up hinted-handoff status to API We support hinted-handoff now, so let's return it's status via the API. Message-Id: <20190819080006.18070-1-penberg@scylladb.com>	2019-08-20 00:24:50 +02:00
Piotr Sarna	3cc5a04301	db,view: wrap view update generation in stream scheduling group Generating view updates is used by streaming, so the service itself should also run under the matching scheduling group.	2019-08-20 00:24:50 +02:00
Piotr Sarna	1ab07b80b4	database: assign proper io priority for streaming view updates Streamed view updates parasitized on writing io priority, which is reserved for user writes - it's now properly bound to streaming write priority.	2019-08-20 00:24:50 +02:00
Tomasz Grabiec	b9447d0319	Revert "relocatable: switch from run-time relocation to install-time relocation" This reverts commit `4ecce2d286`. Should be committed via the next branch.	2019-08-20 00:22:30 +02:00
Avi Kivity	4ecce2d286	relocatable: switch from run-time relocation to install-time relocation Our current relocation works by invoking the dynamic linker with the executable as an argument. This confuses gdb since the kernel records the dynamic linker as the executable, not the real executable. Switch to install-time relocation with patchelf: when installing the executable and libraries, all paths are known, and we can update the path to the dynamic loader and to the dynamic libraries. Since patchelf itself is dynamically linked, we have to relocate it dynamically (with the old method of invoking it via the dynamic linker). This is okay since it's a one-time operation and since we don't expect to debug core dumps of patchelf crashes. We lose the ability to run scylla directly from the uninstalled tarball, but since the nonroot installer is already moving in the direction of requiring install.sh, that is not a great loss, and certainly the ability to debug is more important. dh_strip barfs on some binaries which were treated with patchelf, so exclude them from dh_strip. This doesn't lose any functionality, since these binaries didn't have debug information to begin with (they are already-stripped Fedora executables). Fixes #4673.	2019-08-20 00:20:19 +02:00
Glauber Costa	da260ecd61	systemd: put scylla processes in systemd slices. It is well known that seastar applications, like Scylla, do not play well with external processes: CPU usage from external processes may confuse the I/O and CPU schedulers and create stalls. We have also recently seen that memory usage from other application's anonymous and page cache memory can bring the system to OOM. Linux has a very good infrastructure for resource control contributed by amazingly bright engineers in the form of cgroup controllers. This infrastructure is exposed by SystemD in the form of slices: a hierarchical structure to which controllers can be attached. In true systemd way, the hierarchy is implicit in the filenames of the slice files. a "-" symbol defines the hierarchy, so the files that this patch presents, scylla-server and scylla-helper, essentially create a "scylla" cgroup at the top level with "server" and "helper" children. Later we mark the Services needed to run scylla as belonging to one or the other through the Slice= directive. Scylla DBAs can benefit from this setup by using the systemd-run utility to fire ad-hoc commands. Let's say for example that someone wants to hypothetically run a backup and transfer files to an external object store like S3, making sure that the amount of page cache used won't create swap pressure leading to database timeouts. One can then run something like: ``` sudo systemd-run --uid=`id -u scylla` --gid=`id -g scylla` -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool ``` (or even better, the backup tool can itself be a systemd timer) Changes from last version: - No longer use the CPUQuota - Minor typo fixes - postinstall fixup for small machines Benchmark results: ================== Test: read from disk, with 100% disk util using a single i3.xlarge (4 vCPUs). We have to fill the cache as we read, so this should stress CPU, memory and disk I/O. cassandra-stress command: ``` cassandra-stress read no-warmup duration=5m -rate threads=20 -node 10.2.209.188 -pop dist=uniform$1..150000000$ ``` Baseline results: ``` Results: Op rate : 13,830 op/s [READ: 13,830 op/s] Partition rate : 13,830 pk/s [READ: 13,830 pk/s] Row rate : 13,830 row/s [READ: 13,830 row/s] Latency mean : 1.4 ms [READ: 1.4 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.8 ms [READ: 2.8 ms] Latency 99.9th percentile : 3.4 ms [READ: 3.4 ms] Latency max : 12.0 ms [READ: 12.0 ms] Total partitions : 4,149,130 [READ: 4,149,130] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Question 1: =========== Does putting scylla in a special slice affect its performance ? Results with Scylla running in a slice: ``` Results: Op rate : 13,811 op/s [READ: 13,811 op/s] Partition rate : 13,811 pk/s [READ: 13,811 pk/s] Row rate : 13,811 row/s [READ: 13,811 row/s] Latency mean : 1.4 ms [READ: 1.4 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.2 ms [READ: 2.2 ms] Latency 99th percentile : 2.6 ms [READ: 2.6 ms] Latency 99.9th percentile : 3.3 ms [READ: 3.3 ms] Latency max : 23.2 ms [READ: 23.2 ms] Total partitions : 4,151,409 [READ: 4,151,409] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion : No significant change Question 2: =========== What happens when there is a CPU hog running in the same server as scylla? CPU hog: ``` taskset -c 0 /bin/sh -c "while true; do true; done" & taskset -c 1 /bin/sh -c "while true; do true; done" & taskset -c 2 /bin/sh -c "while true; do true; done" & taskset -c 3 /bin/sh -c "while true; do true; done" & sleep 330 ``` Scenario 1: CPU hog runs freely: ``` Results: Op rate : 2,939 op/s [READ: 2,939 op/s] Partition rate : 2,939 pk/s [READ: 2,939 pk/s] Row rate : 2,939 row/s [READ: 2,939 row/s] Latency mean : 6.8 ms [READ: 6.8 ms] Latency median : 5.3 ms [READ: 5.3 ms] Latency 95th percentile : 11.0 ms [READ: 11.0 ms] Latency 99th percentile : 14.9 ms [READ: 14.9 ms] Latency 99.9th percentile : 17.1 ms [READ: 17.1 ms] Latency max : 26.3 ms [READ: 26.3 ms] Total partitions : 884,460 [READ: 884,460] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Scenario 2: CPU hog runs inside scylla-helper slice ``` Results: Op rate : 13,527 op/s [READ: 13,527 op/s] Partition rate : 13,527 pk/s [READ: 13,527 pk/s] Row rate : 13,527 row/s [READ: 13,527 row/s] Latency mean : 1.5 ms [READ: 1.5 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.9 ms [READ: 2.9 ms] Latency 99.9th percentile : 3.8 ms [READ: 3.8 ms] Latency max : 18.7 ms [READ: 18.7 ms] Total partitions : 4,069,934 [READ: 4,069,934] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion: With systemd slice we can keep the performance very close to baseline Question 3: =========== What happens when there is a CPU hog running in the same server as scylla? I/O hog: (Data in the cluster is 2x size of memory) ``` while true; do find /var/lib/scylla/data -type f -exec grep glauber {} + done ``` Scenario 1: I/O hog runs freely: ``` Results: Op rate : 7,680 op/s [READ: 7,680 op/s] Partition rate : 7,680 pk/s [READ: 7,680 pk/s] Row rate : 7,680 row/s [READ: 7,680 row/s] Latency mean : 2.6 ms [READ: 2.6 ms] Latency median : 1.3 ms [READ: 1.3 ms] Latency 95th percentile : 7.8 ms [READ: 7.8 ms] Latency 99th percentile : 10.9 ms [READ: 10.9 ms] Latency 99.9th percentile : 16.9 ms [READ: 16.9 ms] Latency max : 40.8 ms [READ: 40.8 ms] Total partitions : 2,306,723 [READ: 2,306,723] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Scenario 2: I/O hog runs in the scylla-helper systemd slice: ``` Results: Op rate : 13,277 op/s [READ: 13,277 op/s] Partition rate : 13,277 pk/s [READ: 13,277 pk/s] Row rate : 13,277 row/s [READ: 13,277 row/s] Latency mean : 1.5 ms [READ: 1.5 ms] Latency median : 1.4 ms [READ: 1.4 ms] Latency 95th percentile : 2.4 ms [READ: 2.4 ms] Latency 99th percentile : 2.9 ms [READ: 2.9 ms] Latency 99.9th percentile : 3.5 ms [READ: 3.5 ms] Latency max : 183.4 ms [READ: 183.4 ms] Total partitions : 3,984,080 [READ: 3,984,080] Total errors : 0 [READ: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:05:00 ``` Conclusion: With systemd slice we can keep the performance very close to baseline Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-08-19 14:31:28 -04:00
Avi Kivity	c32f9a8f7b	dht: check for aborts during streaming Propagate the abort_source from main() into boot_strapper and range_stream and check for aborts at strategic points. This includes aborting running stream_plans and aborting sleeps between retries. Fixes #4674	2019-08-18 20:41:07 +03:00
Avi Kivity	5af6f5aa22	main: expose SIGINT/SIGTERM as abort_source In order to propagate stop signals, expose them as sharded<abort_source>. This allows propagating the signal to all shards, and integrating it with sleep_abortable(). Because sharded<abort_source>::stop() will block, we'll now require stop_signal to run in a thread (which is already the case).	2019-08-18 20:15:26 +03:00
Avi Kivity	20aed3398d	Merge "Simplify types" from Rafael " This is hopefully the last large refactoring on the way of UDF. In UDF we have to convert internal types to Lua and back. Currently almost all our types and hidden in types.cc and expose functionality via virtual functions. While it should be possible to add a convert_{to\|from}_lua virtual functions, that seems like a bad design. In compilers, the type definition is normally public and different passes know how to reason about each type. The alias analysis knows about int and floats, not the other way around. This patch series is inspired by both the LLVM RTTI (https://www.llvm.org/docs/HowToSetUpLLVMStyleRTTI.html) and std::variant. The series makes the types public, adds a visit function and converts the various virtual methods to just use visit. As a small example of why this is useful, it then moves a bit of cql3 and json specific logic out of types.cc and types.hh. In a similar way, the UDF code will be able to used visit to convert objects to Lua. In comparison with the previous versions, this series doesn't require the intermediate step of converting void* to data_value& in a few member functions. This version also has fewer double dispatches I a am fairly confident has all the tools for avoiding all double dispatches. " * 'simplify-types-v3' of https://github.com/espindola/scylla: (80 commits) types: Move abstract_type visit to a header types: Move uuid_type_impl to a header types: Move inet_addr_type_impl to a header types: Move varint_type_impl to a header types: Move timeuuid_type_impl to a header types: Move date_type_impl to a header types: Move bytes_type_impl to a header types: Move utf8_type_impl to a header types: Move ascii_type_impl to a header types: Move string_type_impl to a header types: Move time_type_impl to a header types: Move simple_date_type_impl to a header types: Move timestamp_type_impl to a header types: Move duration_type_impl to a header types: Move decimal_type_impl to a header types: Move floating point types to a header types: Move boolean_type_impl to a header types: Move integer types to a header types: Move integer_type_impl to a header types: Move simple_type_impl to a header ...	2019-08-18 19:04:05 +03:00
Takuya ASADA	f574112301	dist/debian: handle --dist correctly On `ac9b115`, it mistakenly ignores --dist option. It should set 'housekeeping' template variable to 'enable'. Fixes #4857 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190816120127.14099-1-syuu@scylladb.com>	2019-08-18 15:00:33 +03:00
Avi Kivity	14d40cc659	Update seastar submodule * seastar fe2b5b0c6...afc5bbf51 (4): > Merge "handle discarded futures or suppress warning" from Benny > Remove variadic futures from the Seastar implementation > Revert "Merge "handle discarded futures or suppress warning" from Benny" > io_queue: Forward declare smp class	2019-08-17 12:18:18 +03:00
Dejan Mircevski	48bb89fcb7	cql_query_test: Add LIKE tests for all types As requested in a prior code review [1], ensure that LIKE cannot be used on any non-string type. [1] https://github.com/scylladb/scylla/pull/4610#pullrequestreview-255590129 Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-16 17:55:35 -04:00
Dejan Mircevski	ef071bf7ce	cql_query_test: Remove LIKE-nonstring-pattern case This testcase was previously commented out, pending a fix that cannot be made. Currently it is impossible to validate the marker-value type at filtering time. The value is entered into the options object under its presumed type of string, regardless of what it was made from. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-16 17:07:44 -04:00
Dejan Mircevski	20e688e703	cql_query_test: Move a testcase elsewhere in file Somehow this test case sits in the middle of LIKE-operator tests: test_alter_type_on_compact_storage_with_no_regular_columns_does_not_crash Move it so LIKE test cases are contiguous. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-16 17:07:44 -04:00
Glauber Costa	ffc328c924	move postinst steps to an external script There are systemd-related steps done in both rpm and deb builds. Move that to a script so we avoid duplication. The tests are so far a bit specific to the distributions, so it needs to be adapted a bit. Also note that this also fixes a bug with rpm as a side-effect: rpm does not call daemon-reload after potentially changing the systemd files (it is only implied during postun operations, that happen during uninstall). daemon-reload was called explicitly for debian packages, and now it is called for both. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-08-15 10:43:17 -04:00
Rafael Ávila de Espíndola	7f0a434cfa	types: Move abstract_type visit to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	dccefd1ddb	types: Move uuid_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	038728a381	types: Move inet_addr_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	1966416cb3	types: Move varint_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	9229f99c86	types: Move timeuuid_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	993f132619	types: Move date_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	a299ed3b9b	types: Move bytes_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	09ac2a1bc6	types: Move utf8_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	da472a65ec	types: Move ascii_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	b98bac65b0	types: Move string_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	3e5b1e2630	types: Move time_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	909df932ac	types: Move simple_date_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	8f3bebb6e8	types: Move timestamp_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola	3260153d35	types: Move duration_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	2f6a26b1c1	types: Move decimal_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	480ca52b59	types: Move floating point types to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	6a4ec7488e	types: Move boolean_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	404b26d3fa	types: Move integer types to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	bd3e725605	types: Move integer_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	03aca28dc5	types: Move simple_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	e8ba37fa5a	types: Move counter_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	cb03c79a48	types: Move empty_type_impl to a header Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	1cb7127bf3	types: Make abstract_type::serialize a static helper Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	b175657ee7	types: Devirtualize abstract_type::validate Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola	bf96f1111c	types: Make abstract_type::serialized_size a static helper Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 16:25:41 -07:00
Rafael Ávila de Espíndola	6831e05471	types: Move functions that use abstract_type::serialized_size out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	047e34a31d	types: Remove serialize_value It is no longer needed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	1e0663c56c	types: Devirtualize abstract_type::from_string Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	68b26047cc	types: Devirtualize abstract_type::serialize Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	18da5f9001	types: Devirtualize abstract_type::from_json_object Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	b987b2dcbe	types: Devirtualize abstract_type::to_json_string Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	b4bc888eac	types: Refactor abstract_type::serialized_size The following logic was duplicated: * For all types, if value is null, the result is zero. * For non collection types, if the native object is empty, the result is zero. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	968365b7e3	types: Devirtualize abstract_type::serialized_size Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	793bc50d69	types: Delete abstract_type::validate_collection_member Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	37686964f0	types: Devirtualize abstract_type::hash Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	396f5c7656	types: Devirtualize abstract_type::native_typeid Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	492043a77d	types: Devirtualize abstract_type::native_value_delete Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	4d849d7742	types: Devirtualize abstract_type::native_value_clone Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	ba887b7e56	types: Delete abstract_type::native_value_destroy Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	5c0e78d70c	types: Delete abstract_type::native_value_move Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	2bc6471a1e	types: Delete abstract_type::native_value_copy Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	33394dfdc1	types: Delete abstract_type::native_value_size Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	c22ca2f9c9	types: Delete abstract_type::native_value_alignment Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	37c0f5b985	types: Devirtualize get_string Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	f633f70616	types: Devirtualize abstract_type::is_value_compatible_with_internal It now is a static helper. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	19c9a033d9	types: Devirtualize abstract_type::is_compatible_with Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	d245d08045	types: Devirtualize abstract_type::is_string Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	ae30d78ca9	types: Devirtualize abstract_type::equal Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	f087756684	types: Implement less with compare We defined less for some types and compare for others. There is no type for which compare is substantially more expensive, so define it for all types and implement less with compare. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	9bbf55e9c0	types: Devirtualize abstract_type::compare Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	a5daa8d258	types: Devirtualize abstract_type::less Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	a3e898a648	types: Devirtualize abstract_type::deserialize Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	8145faa66f	types: Inline is_byte_order_comparable into only user Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	325418db16	types: Devirtualize abstract_type::is_byte_order_comparable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	d2b063877b	types: Devirtualize abstract_type::is_byte_order_equal Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	21da060b24	types: Devirtualize abstract_type::update_user_type The type walking is similar to what the find function does, but refactoring it doesn't seem worth it if these are the only two uses. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	ae6e96a1e2	types: Refactor references_duration and references_user_type With this patch the logic for walking all nested types is moved to a helper function. It also fixes reversed_type_impl not being handled in references_duration. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	25a5631a46	types: Devirtualize abstract_type::references_user_type Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	544337f380	types: Devirtualize abstract_type::references_duration Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	a6b48bda03	types: Devirtualize abstract_type::is_native Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	f5b4fe5685	types: Devirtualize abstract_type::is_atomic Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	ec09fb94cb	types: Devirtualize abstract_type::is_multi_cell Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	1bea7747ce	types: Devirtualize abstract_type::is_tuple Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	1581805a8d	types: Devirtualize abstract_type::is_collection Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	1137695cb2	types: Devirtualize abstract_type::is_counter Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	d3ba0d132a	types: Devirtualize abstract_type::is_user_type Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	0ff539500f	types: Devirtualize abstract_type::cql3_type_name_impl Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	5314b489e3	types: Devirtualize abstract_type::get_cql3_kind_impl Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	2f0c64844f	types: Devirtualize abstract_type::is_reversed Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	33d2ec8e1c	types: Devirtualize abstract_type::underlying_type Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	064db9b92e	types: Devirtualize abstract_type::to_string_impl Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	69d6fd21d2	types: Add a listlike_collection_type_impl class With this we can share code that wants to access the element type of set and list. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	a4837301a6	types: Move _is_multi_cell to collection_type_impl It was duplicated in each concrete collection type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	de6d6c46a1	types: Remove collection_type_impl::kind All uses have been switched to abstract_type::kind. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	c80c19459e	types: Add a visitor over data_value Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	5701051857	types: Add a generic visit over abstract_type The api is inspired by on std::variant. This bridges the runtime type of a abstract_type object to a compile time overload resolution. For example, it is possible to have a single lambda to visit a string_type_impl, but it corresponds to two leaf types (ascii and utf8). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	e5c7deaeb5	types: Add a kind to abstract_type The type hierarchy is closed, so we can give each leaf an enum value. This will be used to implement a visitor pattern and reduce code duplication. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	5c098eb7d0	types: Add more tests for abstract_type::to_string_impl The corresponding code is correct, but I noticed no tests would fail if it was broken while refactoring it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	096de10eee	types: Remove abstract_type::equals All types are interned, so we can just compare the pointers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola	6a8ffb35ff	types: Make a few concrete_type member functions public These only use public member functions from data_value, so there is no reason for not making them public too. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-08-14 10:02:00 -07:00
Gleb Natapov	1779c3b7f6	move admission control semaphore from cql server to storage_service There are two reasons for the move. First is that cql server lifetime is shorter than storage_proxy one and the later stores references to the semaphore in each service_permit it holds. Second - we want thrift (and in the future other user APIs) to share the same admission control memory pool. Fixes #4844 Message-Id: <20190814142614.GT17984@scylladb.com>	2019-08-14 18:49:56 +03:00
Gleb Natapov	a1e9e6faa2	storage_service: remove outdated comment We in fact do stop cql server in storage_service::drain_on_shutdown() which is called in main.cc during shutdown. Message-Id: <20190814085027.GP17984@scylladb.com>	2019-08-14 11:52:49 +03:00
Avi Kivity	9f512509c7	github: remove github pull request template (#4833 ) Since we do accept pull requests (in a long-running experiment), the pull request template suggesting not to use them is inaccurate, and many requesters forget to remove the boilerplace. Remove the outdate template.	2019-08-14 09:28:39 +03:00
Pekka Enberg	595434a554	Merge "docker: relax permission checks" from Avi "Commit `e3f7fe4` added file owner validation to prevent Scylla from crashing when it tries to touch a file it doesn't own. However, under docker, we cannot expect to pass this check since user IDs are from different namespaces: the process runs in a container namespace, but the data files usually come from a mounted volume, and so their uids are from the host namespace. So we need to relax the check. We do this by reverting `b1226fb`, which causes Scylla to run as euid 0 in docker, and by special-casing euid 0 in the ownership verification step. Fixes #4823." * 'docker-euid-0' of git://github.com/avikivity/scylla: main: relax file ownership checks if running under euid 0 Revert "dist/docker/redhat: change user of scylla services to 'scylla'"	2019-08-13 19:55:05 +03:00
Tomasz Grabiec	64ff1b6405	cql: alter type: Format field name as text instead of hex Fixes #4841 Message-Id: <1565702635-26214-1-git-send-email-tgrabiec@scylladb.com>	2019-08-13 16:25:48 +03:00
Tomasz Grabiec	34cff6ed6b	types: Fix abort on type alter which affects a compact storage table with no regular columns Fixes #4837 Message-Id: <1565702247-23800-1-git-send-email-tgrabiec@scylladb.com>	2019-08-13 16:25:02 +03:00
Avi Kivity	1ed3356e0e	main: relax file ownership checks if running under euid 0 During startup, we check that the data files are owned by our euid. But in a container environment, this is impossible to enforce because uid/username mappings are different between the host and the container, and the data files are likely to be mounted from the host. To allow for such environments, relax the checks if euid=0. This both matches what happens in a container (programs run as root) and the kernel access checks (euid 0 can do anything). We can reconsider this when container uid mapping is better developed. Fixes #4823. Fixes #4536.	2019-08-13 14:36:08 +03:00
Avi Kivity	ca28fdc37d	Revert "dist/docker/redhat: change user of scylla services to 'scylla'" This reverts commit `b1226fb15a`. When the data volume is mounted from the host (as is usual in container deployments), we can't expect that the files will be owned by the in-container scylla user. So that commit didn't really fix #4536. A follow-up patch will relax the check so it passes in a container environment.	2019-08-13 14:36:00 +03:00
Pekka Enberg	fed38f5179	reloc/build_reloc.sh: Add '--configure-flags' command line option This adds a '--configure-flags FLAGS' command line option, which overrides the flags passed to scylla.git 'configure.py' script. We need this for flexibility of custom builds in Jenkins pipelines, for example. Message-Id: <20190813095428.13590-1-penberg@scylladb.com>	2019-08-13 14:05:25 +03:00
Tomasz Grabiec	0cf4fab2ca	Merge "Multishard combining reader more robust reader recreation" from Botond Make the reader recreation logic more robust, by moving away from deciding which fragments have to be dropped based on a bunch of special cases, instead replacing this with a general logic which just drops all already seen fragments (based on their position). Special handling is added for the case when the last position is a range tombstone with a non full prefix starting position. Reproducer unit tests are added for both cases. Refs #4695 Fixes #4733	2019-08-13 11:53:07 +02:00
Gleb Natapov	00c4078af3	cache_hitrate_calculator: do not ignore a future returned from gossiper::add_local_application_state We should wait for a future returned from add_local_application_state() to resolve before issuing new calculation, otherwise two add_local_application_state() may run simultaneously for the same state. Fixes #4838. Message-Id: <20190812082158.GE17984@scylladb.com>	2019-08-13 11:48:38 +03:00
Botond Dénes	fe58324fb9	tests: test_multishard_combining_reader_as_mutation_source: don't copy mutations cross shard It's illegal. Freeze-unfreeze them instead when crossing shard boundaries.	2019-08-13 10:16:02 +03:00
Botond Dénes	d746fb59a7	mutation_reader_test: harden test_multishard_combining_reader_as_mutation_source Add `single_fragment_buffer` test variable. When set, the shard readers are created with a max buffer size of 1, effectively causing them to read a single fragment at a time. This, when combined with `evict_readers=true` will stress the recreate reader logic to the max.	2019-08-13 10:16:02 +03:00
Botond Dénes	899afc0661	flat_mutation_reader_assertions: produces_range_tombstone(): be more lenient Be more tolerant with different but equivalent representation of range deletions. When expecting a range tombstone, keep reading range tombstones while these can be merged with the cumulative range tombstone, resulting from the merging of the previous range tombstones. This results in tests tolerating range tombstones that are split into several, potentially overlapping range tombstones, representing the same underlying deletion.	2019-08-13 10:16:02 +03:00
Botond Dénes	53e1dca5ca	tests/mutation_source_test: generate_mutation_sets() add row that falls into deleted prefix This is tailored to the multishard_combining_reader, to make sure it doesn't loos rows following a range tombstone with a prefix starting position (whose prefix their keys fall into).	2019-08-13 09:47:55 +03:00
Botond Dénes	6bfe468a17	multishard_combining_reader: remote_reader::recreate_reader(): restore indentation	2019-08-13 09:47:55 +03:00
Botond Dénes	68353acc1c	multishard_combining_reader: remote_reader: use next instead of last pos Currently the remote reader uses the last seen fragment's position to calculate the position the reader should continue from when the reader is recreated after having been evicted. Recently it was discovered that this logic breaks down badly when this last position is a non-full clustering prefix (a range tombstone start bound). In this case, if only the last position is available, there is no good way of computing the starting position. Starting after this position will potentially miss any rows that fall into the prefix (the current behaviour). Starting from before it will cause all range tombstones with said prefix to be re-emitted, causing other problems. A better solution is to exploit the fact that sometimes we also know what the next fragment is. These "some" times are the exact times that are problematic with the current approach -- when the last fragment is a range tombstone. Exploiting this extra knowledge allows for a much better way for calculating the starting position: instead of maintaining the last position, we maintain the next position, which is always safe to start from. This is not always possible, but in many cases we can know for sure what the next position is, for example if the last position was a static row we can be sure the next position is the first clustering position (or partition end). In the few cases where we cannot calculate the next position we fall back to the previous logic and start from after the last positions. The good news is that in these remaining cases (the last fragment is a clustering row) it is safe to do so. This patch also does some refactoring of the remote-reader internals, all fill-buffer related logic is grouped together in a single `fill_buffer()` method.	2019-08-13 09:47:55 +03:00
Botond Dénes	3949189918	multishard_combining_reader: remote_reader::do_fill_buffer(): reorganize drop logic To make it more readable.	2019-08-13 09:47:55 +03:00
Botond Dénes	20c06adf80	position_in_partition: add for_partition_start()	2019-08-13 09:47:55 +03:00
Botond Dénes	87973498a1	query: refactor trim_clustering_row_ranges_to() Allow expressing `pos` in term of a `position_in_partition_view`, which allows finer control of the exact position, allowing specifying position before, at or after a certain key. The previous overload is kept for backward compatibility, invoking the new overload behind the curtains.	2019-08-13 09:47:55 +03:00
Botond Dénes	3a5e7db9b6	tests: add unit test for query::trim_clustering_row_ranges_to() We are about to do a major refactoring of this method. Add extensive unit tests to ensure we don't brake it in the process.	2019-08-13 09:47:55 +03:00
Botond Dénes	1b4e88b972	position_in_partition_view: add get_bound_weight()	2019-08-13 09:47:55 +03:00
Avi Kivity	0d0ee20f76	Merge "Implement `sstable_info` API command (info on sstables)" from Calle " Refs #4726 Implement the api portion of a "describe sstables" command. Adds rest types for collecting both fixed and dynamic attributes, some grouped. Allows extensions to add attributes as well. (Hint hint) " * 'sstabledesc' of https://github.com/elcallio/scylla: api/storage_service: Add "sstable_info" command sstables/compress: Make compressor pointer accessible from compression info sstables.hh: Add attribute description API to file extension sstables.hh: Add compression component accessor sstables.hh: Make "has_component" public	2019-08-12 21:16:08 +03:00
Dejan Mircevski	8be147d069	cql3: Handle empty LIKE pattern Match SQL's LIKE in allowing an empty pattern, which matches only an empty text field. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-12 19:48:31 +03:00
Rafael Ávila de Espíndola	99c7f8457d	logalloc: Add a migrators_base that is common to debug and release This simplifies the debug implementation and it now should work with scylla-gdb.py. It is not clear what, if anything, is lost by not using random ids. They were never being reused in the debug implementation anyway. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190618144755.31212-1-espindola@scylladb.com>	2019-08-12 19:44:55 +03:00
Calle Wilund	2b19bfbfbc	types: Remove obsolete "FIXME" inet_addr_type_impl has supported ipv6 for some time now. Message-Id: <20190812142731.6384-1-calle@scylladb.com>	2019-08-12 17:30:15 +03:00
Calle Wilund	1afc899e37	type_parser: Fix/improve exception messages Removes long-standing FIXME for message detail Also simplifies some code, removing duplication. Message-Id: <20190812134144.2417-1-calle@scylladb.com>	2019-08-12 17:03:43 +03:00
Calle Wilund	fdf2017487	cql3::term: Remove unneeded const_cast Removed no longer needed FIXME (to_string became const long ago) Message-Id: <20190812133943.2011-1-calle@scylladb.com>	2019-08-12 17:00:46 +03:00
Amnon Heiman	6a0490c419	api/compaction_manager: indentation	2019-08-12 14:04:40 +03:00
Amnon Heiman	8181601f0e	api/compaction_manager: do not hold map on the stack This patch fixes a bug that a map is held on the stack and then is used by a future. Instead, the map is now wrapped with do_with. Fixes #4824 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-12 14:04:00 +03:00
Asias He	131acc09cc	repair: Adjust parallelism according to memory size (#4696 ) After commit `8a0c4d5` (Merge "Repair switch to rpc stream" from Asias), we increased the row buffer size for repair from 512KiB to 32MiB per repair instance. We allow repairing 16 ranges (16 repair instance) in parallel per repair request. So, a node can consume 16 * 32MiB = 512MiB per user requested repair. In addition, the repair master node can hold data from all the repair followers, so the memory usage on repair master can be larger than 512MiB. We need to provide a way to limit the memory usage. In this patch, we limit the total memory used by repair to 10% of the shard memory. The ranges that can be repaired in parallel is: max_repair_ranges_in_parallel = max_repair_memory / max_repair_memory_per_range. For example, if each shard has 4096MiB of memory, then we will have max_repair_ranges_in_parallel = 4096MiB / 32MiB = 12. Fixes #4675	2019-08-12 11:09:27 +03:00
Avi Kivity	e6cde72d2b	Merge "Fix cql server admission control to take all leftover work into account" from Gleb " Current admission control takes a permit when cql requests starts and releases it when reply is sent, but some requests may leave background work behind after that point (some because there is genuine background work to do like complete a write or do a read repair, and some because a read/write may stuck in a queue longer than the request's timeout), so after Scylla replies with a timeout some resources are still occupied. The series fixes this by passing the permit down to storage_proxy where it is held until all background work is completed. Fixes #4768 " * 'gleb/admission-v3' of github.com:scylladb/seastar-dev: transport: add a metric to follow memory available for service permit. storage_proxy: store a permit in a read executor storage_proxy: store a permit in a write response handler Pass service permit to storage_proxy transport: introduce service_permit class and use it instead of semaphore_units transport: hold admission a permit until a reply is sent transport: remove cql server load balancer	2019-08-12 11:02:37 +03:00
Gleb Natapov	3e27c2198a	transport: add a metric to follow memory available for service permit. Add a metric to follow memory available for service permit. When this memory is close to zero cql server stops admitting new requests.	2019-08-12 10:20:43 +03:00
Gleb Natapov	7d7b1685aa	storage_proxy: store a permit in a read executor A read executor exists until read operation completes in its entirety so storing a permit there guaranties that it will be freed only after no background work left for the request on this server.	2019-08-12 10:20:43 +03:00
Gleb Natapov	d5ced800f0	storage_proxy: store a permit in a write response handler A write response handler exists until write operation completes in its entirety so storing a permit there guaranties that it will be freed only after no background work left for the request on this server.	2019-08-12 10:20:43 +03:00
Gleb Natapov	6a4207f202	Pass service permit to storage_proxy Current cql transport code acquire a permit before processing a query and release it when the query gets a reply, but some quires leave work behind. If the work is allowed to accumulate without any limit a server may eventually run out of memory. To prevent that the permit system should account for the background work as well. The patch is a first step in this direction. It passes a permit down to storage proxy where it will be later hold by background work.	2019-08-12 10:20:43 +03:00
Raphael S. Carvalho	b436c41128	compaction_manager: Prevent sstable runs from being partially compacted Manager trims sstables off to allow compaction jobs to proceed in parallel according to their weights. The problem is that trimming procedure is not sstable run aware, so it could incorrectly remove only a subset of a sstable run, leading to partial sstable run compaction. Compaction of a sstable run could lead to inneficiency because the run structure would be messed up, affecting all amplification factors, and the same generation could even end up being compacted twice. This is fixed by making the trim procedure respect the sstable runs. Fixes #4773. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190730042023.11351-1-raphaelsc@scylladb.com>	2019-08-11 17:20:20 +03:00
Gleb Natapov	ddff7f48cf	transport: introduce service_permit class and use it instead of semaphore_units service_permit is a new class that allows sharing a permit among different parts of request processing many of which can complete at different times.	2019-08-11 16:08:55 +03:00
Gleb Natapov	2daa72b7dc	transport: hold admission a permit until a reply is sent Current code release admission permit to soon. If new requests are admitted faster than client read replies back reply queue can grow to be very big. The patch moves service permit release until after a reply is sent.	2019-08-11 16:08:55 +03:00
Gleb Natapov	7e3805ed3d	transport: remove cql server load balancer It is buggy, unused and unnecessary complicates the code.	2019-08-11 16:08:52 +03:00
Nadav Har'El	f9d6eaf5ff	reconcilable_result: switch to chunked_vector Merged patch series from Avi Kivity: In rare but valid cases (reconciling many tombstones, paging disabled), a reconciled_result can grow large. This triggers large allocation warnings. Switch to chunked_vector to avoid the large allocation. In passing, fix chunked_vector's begin()/end() const correctness, and add the reverse iterator function family which is needed by the conversion. Fixes #4780. Tests: unit (dev) Commit Summary utils: chunked_vector: make begin()/end() const correct utils::chunked_vector: add rbegin() and related iterators reconcilable_result: use chunked_vector to hold partitions	2019-08-11 16:03:13 +03:00
Avi Kivity	ce2b0b2682	Merge "Add listen/rpc "prefer_ipv6" options to DNS lookup #4775 " from Calle " Add listen/rpc "prefer_ipv6" options to DNS lookup of bind addresses for API/rpc/prometheus etc . Fixes #4751 Adds using a preferred address family to dns name lookups related to listen address and rpc address, adhering to the respective "prefer" options. API, prometheus and broadcast address are all considered to be covered by the "listen_interface_prefer_ipv6" option. Note: scylla does not yet support actual interface binding, but these options should apply equally to address name parameters. Setting a "prefer_ipv6" option automtially enables ipv6 dns family query. " * 'calle/ipv6' of https://github.com/elcallio/scylla: init: Use the "prefer_ipv6" options available for rpc/listen address/interface inet_address: Add optional "preferred type" to lookup config: Add rpc_interface_prefer_ipv6 parameter config: Add listen_interface_perfer_ipv6 parameter config.cc: Fix enable_ipv6_dns_lookup actual param name	2019-08-11 15:21:45 +03:00
Pekka Enberg	73113c0ea4	utils/fb_utilities.hh: Kill obsolete FIXME and commented out Java code The FIXME was added in the very first commit ("utils: Convert utils/FBUtilities.java") that introduced the fb_utilities class as a stub. However, we have long implemented the parts that we actually use, so drop the FIXME as obsolete. In addition, drop the remaining uncommented Java code as unused and also obsolete. Message-Id: <20190808182758.1155-1-penberg@scylladb.com>	2019-08-11 10:26:36 +03:00
Botond Dénes	fd925f6049	position_in_partition_view: add constructor with bound_weight This is a low level constructor which allows directly providing a bound weight to go with the key.	2019-08-09 10:54:27 +03:00
Pekka Enberg	547c072f93	dbuild: Make Maven local repository accessible The Maven build tool ("mvn"), which is used by scylla-jmx and scylla-tools-java, stores dependencies in a local repository stored at $HOME/.m2. Make sure it's accessible to dbuild. Message-Id: <20190808140216.26141-1-penberg@scylladb.com>	2019-08-08 17:36:13 +03:00
Avi Kivity	8f19b16fe4	Update seastar submodule * seastar ed608e3c9e...fe2b5b0c6b (2): > Merge "handle discarded futures or suppress warning" from Benny > output_stream: Add close() blurb	2019-08-08 16:22:38 +03:00
Avi Kivity	4a5ec61438	Update seastar submodule * seastar a1cf07858b...ed608e3c9e (4): > core: Add ability to abort on EBADF and ENOTSOCK > Revert "Merge "handle discarded futures or suppress warning" from Benny" > Merge "handle discarded futures or suppress warning" from Benny > reactor: remove replace variadic future<pollable_fd, socket_address> with future<tuple>	2019-08-08 14:22:29 +03:00
Raphael S. Carvalho	76cde84540	sstables/compaction_manager: Fix logic for filtering out partial sstable runs ignore_partial_runs() brings confusion because i__p__r() equal to true doesn't mean filter out partial runs from compaction. It actually means not caring about compaction of a partial run. The logic was wrong because any compaction strategy that chooses not to ignore partial sstable run[1] would have any fragment composing it incorrectly becoming a candidate for compaction. This problem could make compaction include only a subset of fragments composing the partial run or even make the same fragment be compacted twice due to parallel compaction. [1]: partial sstable run is a sstable that is still being generated by compaction and as a result cannot be selected as candidate whatsoever. Fix is about making sure partial sstable run has none of its fragments selected for compaction. And also renaming i__p__r. Fixes #4729. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190807022814.12567-1-raphaelsc@scylladb.com>	2019-08-08 14:11:35 +03:00
Pekka Enberg	7d4bf10d87	docs/building-packages.md: Document how to build Scylla packages This documents the steps needed to build Scylla's Linux packages with the relocatable package infrastructure we use today. Message-Id: <20190807134017.4275-1-penberg@scylladb.com>	2019-08-08 14:11:35 +03:00
Pekka Enberg	79cece9f33	toolchain: Fix default command for dbuild Docker image Running "dbuild" without a build command fails as follows: $ ./tools/toolchain/dbuild Error: This command has to be run under the root user. Israel Fruchter discovered that the default command of our Docker image is this: "Cmd": [ "bash", "-c", "dnf -y install python3-cassandra-driver && dnf clean all" ] Let's make "/bin/bash" the default command instead, which will make "dbuild" with no build command to return to the host shell. Message-Id: <20190807133955.4202-1-penberg@scylladb.com>	2019-08-08 14:11:35 +03:00
Pekka Enberg	76cdec222f	build_reloc.sh: Remove "--with" passed to "configure.py" The build_reloc.sh script passes "--with=scylla" and "--with=iotune" to the configure.py script. This is redundant as the "scylla-package.tar.gz" target of ninja already limits itself to them. Removing the "--with" options allows building unit tests after a relocatable package has been built without having to rebuild anything. Message-Id: <20190807130505.30089-1-penberg@scylladb.com>	2019-08-07 16:28:00 +03:00
Avi Kivity	e548bdb2e8	thrift, transport: switch to new seastar accept() API (#4814 ) Seastar switched accept() to return a single struct instead of a variadic future, adjust the code to the new API to avoid deprecation warnings.	2019-08-07 15:23:26 +02:00
Pekka Enberg	f68fffd99a	reloc/build_reloc.sh: Make build mode configurable Add a '--mode <mode>' command line option to the 'build_reloc.sh' script so that we can create relocatable packages for debug builds. The '--mode' command line option defaults to 'release' so existing users are unaffected. Message-Id: <20190807120759.32634-1-penberg@scylladb.com>	2019-08-07 16:19:37 +03:00
Asias He	fee26b9f6e	repair: Fix use after free in do_estimate_partitions_on_local_shard (#4813 ) We need to keep the sstables object alive during the operation of do_for_each. Notes: No need to backport to 3.1. Fixes #4811	2019-08-07 15:19:21 +02:00
Asias He	49a73aa2fc	streaming: Move stream_mutation_fragments_cmd to a new file (#4812 ) Avoid including the lengthy stream_session.hh in messaging_service. More importantly, fix the build because currently messaging_service.cc and messaging_service.hh does not include stream_mutation_fragments_cmd. I am not sure why it builds on my machine. Spotted this when backporting the "streaming: Send error code from the sender to receiver" to 3.0 branch. Refs: #4789	2019-08-07 14:59:46 +02:00
Asias He	288371ce75	streaming: Do not call rpc stream flush in send_mutation_fragments The stream close() guarantees the data sent will be flushed. No need to call the stream flush() since the stream is not reused. Follow up fix for commit `bac987e32a` (streaming: Send error code from the sender to receiver). Refs #4789	2019-08-07 14:31:17 +02:00
Avi Kivity	689fc72bab	Update seastar submodule * seastar d199d27681...a1cf07858b (1): > Merge 'Do not return a variadic future form server_socket::accept()' from Avi Seastar configure.py now has --api-level=1, to keep us one the old variadic future server_socket::accept() API.	2019-08-06 18:37:27 +03:00
Avi Kivity	97f66c72af	Update seastar submodule * seastar d90834443c...d199d27681 (3): > sharded: support for non-cooperative service types > shared_future: silence warning about discarded future > Fix backtrace suppression message in cpu_stall_detector. Fixes #4560.	2019-08-06 18:00:48 +03:00
Asias He	bac987e32a	streaming: Send error code from the sender to receiver In case of error on the sender side, the sender does not propagate the error to the receiver. The sender will close the stream. As a result, the receiver will get nullopt from the source in get_next_mutation_fragment and pass mutation_fragment_opt with no value to the generating_reader. In turn, the generating_reader generates end of stream. However, the last element that the generating_reader has generated can be any type of mutation_fragment. This makes the sstable that consumes the generating_reader violates the mutation_fragment stream rule. To fix, we need to propagate the error. However RPC streaming does not support propagate the error in the framework. User has to send an error code explicitly. Fixes: #4789	2019-08-06 16:54:56 +02:00
Piotr Jastrzebski	24f6d90a45	sstables: add test of sstables_mutation_reader for missing partition_end Reproduces #4783 Issue was fixed by `9b8ac5ecbc` Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-08-06 15:11:19 +03:00
Calle Wilund	6c62e5741e	init: Use the "prefer_ipv6" options available for rpc/listen address/interface Fixes #4751 Adds using a preferred address family to dns name lookups related to listen address and rpc address, adhering to the respective "prefer" options. API, prometheus and broadcast address are all considered to be covered by the "listen_interface_prefer_ipv6" option. Note: scylla does not yet support actual interface binding, but these options should apply equally to address name parameters. Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.	2019-08-06 08:32:10 +00:00
Calle Wilund	6c0c1309b3	inet_address: Add optional "preferred type" to lookup Allows using prio in address family dns lookup. I.e. prefer ipv4/ipv6 if avail.	2019-08-06 08:32:10 +00:00
Calle Wilund	d3410f0e48	config: Add rpc_interface_prefer_ipv6 parameter As already existing in scylla.yaml	2019-08-06 08:32:10 +00:00
Calle Wilund	0028cecb8e	config: Add listen_interface_perfer_ipv6 parameter As already existing in scylla.yaml. https://github.com/apache/cassandra/blob/cassandra-3.11/conf/cassandra.yaml#L622	2019-08-06 08:32:10 +00:00
Calle Wilund	39d18178eb	config.cc: Fix enable_ipv6_dns_lookup actual param name When adding option (and iterating through config refactoring) the member name and the config param name got out of sync	2019-08-06 08:32:09 +00:00
Calle Wilund	298da3fc4b	api/storage_service: Add "sstable_info" command Assembles information and attributes of sstables in one or more column families. v2: * Use (not really legal) nested "type" in json * Rename "table" param to "cf" for consistency * Some comments on data sizes * Stream result to avoid huge string allocations on final json	2019-08-06 08:14:15 +00:00
Calle Wilund	95a8ff12e7	sstables/compress: Make compressor pointer accessible from compression info	2019-08-06 07:07:44 +00:00
Calle Wilund	d15c63627c	sstables.hh: Add attribute description API to file extension	2019-08-06 07:07:44 +00:00
Calle Wilund	4c67d702c2	sstables.hh: Add compression component accessor	2019-08-06 07:07:44 +00:00
Calle Wilund	770f912221	sstables.hh: Make "has_component" public	2019-08-06 07:07:44 +00:00
Avi Kivity	b77c4e68c2	Merge "Add Zstandard compression #4802 " from Kamil " This adds the option to compress sstables using the Zstandard algorithm (https://facebook.github.io/zstd/). To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor' to the 'compression' argument when creating a table. You can also specify a 'compression_level' (default is 3). See Zstd documentation for the available compression levels. Resolves #2613. This PR also fixes a bug in sstables/compress.cc, where chunk length in bytes was passed to the compressor as chunk length in kilobytes. Fortunately, none of the compressors implemented until now used this parameter. Example usage (assuming there exists a keyspace 'a'): create table a.a (a text primary key, b int) with compression = {'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': 1, 'chunk_length_in_kb': '64'}; Notes: 1. The code uses an external dependency: https://github.com/facebook/zstd. Since I'm using "experimental" features of the library (using my own allocated memory to store the compression/decompression contexts), according to the library's documentation we need to link it statically (https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L63). I added a git submodule. 2. The compressor performs some dynamic allocations. Depending on the specified chunk length and/or compression level the allocations might be big and seastar throws warnings. But with reasonable chunk length sizes it should be OK. 3. It doesn't yet provide an option to train it with dictionaries, but that should be easy to add in another commit. " * 'zstd' of https://github.com/kbr-/scylla: Configure: rename seastar_pool to submodule_pool, add more submodules to the pool Add unit tests for Zstd compression Enable tests that use compressed sstable files Add ZStandard compression Fix the value of the chunk length parameter passed to compressors	2019-08-05 16:29:27 +03:00
Botond Dénes	23cc6d6fb2	make_flat_mutation_reader_from_fragments: reader: silence discarded future warning The fragment reader calls `fast_forward_to()` from its constructor to discard fragments that fall outside the query range. Mmove the the fast-forward code in to an internal void returning method, and call that from both the constructor and `fast_forward_to()`, to avoid a warning on a discarded future<>. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190801133942.10744-1-bdenes@scylladb.com>	2019-08-05 16:21:50 +03:00
Kamil Braun	3a0308f76f	Configure: rename seastar_pool to submodule_pool, add more submodules to the pool Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:55:56 +02:00
Kamil Braun	c3c7c06e10	Add unit tests for Zstd compression Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:55:56 +02:00
Kamil Braun	8b58cdab0a	Enable tests that use compressed sstable files The files in tests/sstables/3.x/compressed/ were not used in the tests. This commit: - renames the directory to tests/sstables/3.x/lz4/, - adds analogous directories and files for other compressors, - adds tests using these files, - does some minor refactoring. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:55:56 +02:00
Kamil Braun	f14e6e73bb	Add ZStandard compression This adds the option to compress sstables using the Zstandard algorithm (https://facebook.github.io/zstd/). To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor' to the 'compression' argument when creating a table. You can also specify a 'compression_level'. See Zstd documentation for the available compression levels. Resolves #2613. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:55:53 +02:00
Kamil Braun	7a61bcb021	Fix the value of the chunk length parameter passed to compressors This commit also fixes a bug in sstables/compress.cc, where chunk length in bytes was passed to the compressor as chunk length in kilobytes. Fortunately, none of the compressors implemented until now used this parameter. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-08-05 14:31:33 +02:00
Avi Kivity	95c0804731	Merge "Catch unclosed partition sstable write #4794 " from Tomasz " Not emitting partition_end for a partition is incorrect. SStable writer assumes that it is emitted. If it's not, the sstable will not be written correctly. The partition index entry for the last partition will be left partially written, which will result in errors during reads. Also, statistics and sstable key ranges will not include the last partition. It's better to catch this problem at the time of writing, and not generate bad sstables. Another way of handling this would be to implicitly generate a partition_end, but I don't think that we should do this. We cannot trust the mutation stream when invariants are violated, we don't know if this was really the last partition which was supposed to be written. So it's safer to fail the write. Enabled for both mc and la/ka. Passing --abort-on-internal-error on the command line will switch to aborting instead of throwing an exception. The reason we don't abort by default is that it may bring the whole cluster down and cause unavailability, while it may not be necessary to do so. It's safer to fail just the affected operation, e.g. repair. However, failing the operation with an exception leaves little information for debugging the root cause. So the idea is that the user would enable aborts on only one of the nodes in the cluster to get a core dump and not bring the whole cluster down. " * 'catch-unclosed-partition-sstable-write' of https://github.com/tgrabiec/scylla: sstables: writer: Validate that partition is closed when the input mutation stream ends config, exceptions: Add helper for handling internal errors utils: config_file: Introduce named_value::observe()	2019-08-04 15:18:31 +03:00
Asias He	3b39a59135	storage_service: Replicate and advertise tokens early in the boot up process When a node is restarted, there is a race between gossip starts (other nodes will mark this node up again and send requests) and the tokens are replicated to other shards. Here is an example: - n1, n2 - n2 is down, n1 think n2 is down - n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends reads/writes to n2, but n2 hasn't replicated the token_metadata to all the shards. - n2 complains: token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error (sorted_tokens is empty in first_token_index!) The code path looks like below: 0 stoarge_service::init_server 1 prepare_to_join() 2 add gossip application state of NET_VERSION, SCHEMA and so on. 3 _gossiper.start_gossiping().get() 4 join_token_ring() 5 _token_metadata.update_normal_tokens(tokens, get_broadcast_address()); 6 replicate_to_all_cores().get() 7 storage_service::set_gossip_tokens() which adds the gossip application state of TOKENS and STATUS The race talked above is at line 3 and line 6. To fix, we can replicate the token_metadata early after it is filled with the tokens read from system table before gossip starts. So that when other nodes think this restarting node is up, the tokens are already replicated to all the shards. In addition, this patch also fixes the issue that other nodes might see a node miss the TOKENS and STATUS application state in gossip if that node failed in the middle of a restarting process, i.e., it is killed after line 3 and before line 7. As a result we could not replace the node. Tests: update_cluster_layout_tests.py Fixes: #4709 Fixes: #4723	2019-08-04 15:18:31 +03:00
Avi Kivity	aebb9bd755	Merge "tests/mutation_source_test: pass query time to populate" from Botond " Altough `733c68cb1` made sure to synchronize the query time used for compaction happening in the mutation_source_test suite and that happening in the `flat_mutation_assertions` class, there remained another hidden compaction that potentially could use a different timestamp and hence produce false positive test failures. This was hastily fixed by `cea3338e3`, by just increasing the TTL of cells, thus avoiding possible differences in compaction output. This mini-series is the proper fix to this problem. It passes a query time to the populate function, allowing the users of the mutation source test suite to forward it to any compaction they might be doing on the data. The quick fix is reverted in favor of the proper fix. Refs: #4747 " * 'mutation_source_tests_proper_ttl_fix/v1' of https://github.com/denesb/scylla: Revert "tests/mutation_source_tests: generate_mutation_sets() use larger ttl" tests/sstable_mutation_test: test_sstable_conforms_to_mutation_source: use query_time tests/mutation_source_test: add populate_fn overload with query_time	2019-08-04 15:18:31 +03:00
Tomasz Grabiec	43c7144133	sstables: writer: Validate that partition is closed when the input mutation stream ends Not emitting partition_end for a partition is incorrect. Sstable writer assumes that it is emitted. If it's not, the sstable will not be written correctly. The partition index entry for the last partition will be left partially written, which will may result in errors during reads. Also, statistics and sstable key ranges will not include the last partition. It's better to catch this problem at the time of writing, and not generate bad sstables. Another way of handling this would be to implicitly generate a partition_end, but I don't think that we should do this. We cannot trust the mutation stream when invariants are violated, we don't know if this was really the last partition which was supposed to be written. So it's safer to fail the write. Enabled for both mc and la/ka.	2019-08-02 11:13:54 +02:00
Tomasz Grabiec	bf70ee3986	config, exceptions: Add helper for handling internal errors The handler is intended to be called when internal invariants are violated and the operation cannot safely continue. The handler either throws (default) or aborts, depending on configuration option. Passing --abort-on-internal-error on the command line will switch to aborting. The reason we don't abort by default is that it may bring the whole cluster down and cause unavailability, while it may not be necessary to do so. It's safer to fail just the affected operation, e.g. repair. However, failing the operation with an exception leaves little information for debugging the root cause. So the idea is that the user would enable aborts on only one of the nodes in the cluster to get a core dump and not bring the whole cluster down.	2019-08-02 11:13:54 +02:00
Tomasz Grabiec	61a9cfbfa9	utils: config_file: Introduce named_value::observe()	2019-08-02 11:13:53 +02:00
Avi Kivity	093d2cd7e5	reconcilable_result: use chunked_vector to hold partitions Usually, a reconcilable_result holds very few partitions (1 is common), since the page size is limited by 1MB. But if we have paging disabled or if we are reconciling a range full of tombstones, we may see many more. This can cause large allocations. Change to chunked_vector to prevent those large allocations, as they can be quite expensive. Fixes #4780.	2019-08-01 18:49:13 +03:00
Avi Kivity	eaa9a5b0d7	utils::chunked_vector: add rbegin() and related iterators Needed as an std::vector replacement.	2019-08-01 18:39:47 +03:00
Avi Kivity	df6faae980	utils: chunked_vector: make begin()/end() const correct begin() of a const vector should return a const_iterator, to avoid giving the caller the ability to mutate it. This slipped through since iterator's constructor does a const_cast. Noticed by code inspection.	2019-08-01 18:38:53 +03:00
Botond Dénes	0b748bb8fe	Revert "tests/mutation_source_tests: generate_mutation_sets() use larger ttl" This reverts commit `cea3338e38`. The above was a quick fix to allow the tests to pass, there is a proper fix now.	2019-08-01 13:05:46 +03:00
Botond Dénes	ac91f1f6b8	tests/sstable_mutation_test: test_sstable_conforms_to_mutation_source: use query_time Use the query_time passed in to the populate function and forward it to the sstable constructor, so that the compaction happening during sstable write uses the same query time that any compaction done by the mutation source test suit does.	2019-08-01 13:04:21 +03:00
Botond Dénes	ce1ed2cb70	tests/mutation_source_test: add populate_fn overload with query_time So tests that do compaction can pass the query_time they used for it to clients that do some compaction themselves, making sure all compactions happen with the same query time, avoiding false positive test failures.	2019-08-01 13:03:03 +03:00
Vlad Zolotarov	15eaf2fd8e	dist: scylla_util.py: get_mode_cpuset(): don't let false alarm error messages Don't let perftune.py print false alarm error message when we calculate a compute CPU set for tuning modes. This may happen when we calculate a CPU set for non-MQ tuning modes on small systems on which these modes are forbidden because they would result in a zero CPU set, e.g. sq_split on a system with a single physical core. We are going to utilize a newly introduced --get-cpu-mask-quiet execution mode introduced to the seastar/script/perftune.py by the "perftune.py: introduce --get-cpu-mask-quiet" series which would return a zero CPU set if that's what it turns out to be instead of exiting with an error what --get-cpu-mask would do in such a case. The rest of scylla_util.py logic is going to handle a zero CPU set returned by get_mode_cpuset() correctly. Fixes #4211 Fixes #4443 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190731212901.9510-1-vladz@scylladb.com>	2019-08-01 11:14:39 +03:00
Botond Dénes	339be3853d	foreign_reader: silence warning about discarded future And add a comment explaining why this is fine. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190801062234.69081-1-bdenes@scylladb.com>	2019-08-01 10:11:24 +03:00
Avi Kivity	47b0f40d27	Merge "introduce metrics for non-local queries" from Konstantin " A fix for #4338 "storage_proxy add a counter for cql requests that arrived to a non replica" Such requests should be tracked since forwarding them to a correct replica can create a lot network noise and incur significant performance penalty. The current metrics are considered insufficient after introduction of heat-weighted load balancing. " Fixes #4388. * 'gh-4338' of https://github.com/kostja/scylla: metrics: introduce a metric for non-local reads metrics: account writes forwarded by a coordinator in an own metric.	2019-08-01 10:09:33 +03:00
Avi Kivity	77686ab889	Merge "Make SSTable cleanup run aware" from Raphael " Fixes #4663. Fixes #4718. " * 'make_cleanup_run_aware_v3' of https://github.com/raphaelsc/scylla: tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id table: Make SSTable cleanup run aware compaction: introduce constants for compaction descriptor compaction: Make it possible to config the identifier of the output sstable run table: do not rely on undefined behavior in cleanup_sstables	2019-07-31 19:10:22 +03:00
Botond Dénes	a41e8f0bcf	query::consume_page: move away from variadic future Require the `consumer` to return 0 or 1 value in its future. Update all downstream code. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190731140440.57295-1-bdenes@scylladb.com>	2019-07-31 18:49:47 +03:00
Avi Kivity	320fd2be60	Update seastar submodule * seastar 3f88e9068b...d90834443c (12): > Print warning when somaxconn lower than backlog parameter used for listen() > Merge "perftune.py: introduce --get-cpu-mask-quiet" from Vlad > seastar-json2code: Handle "$ref"-usage for nested object types properly > Make future [[nodiscard]] > Allow pass listen_options to http_server::listen > Handle EPOLLHUP and EPOLLERR from epoll explicitly > reactor: fix false positives in the stall detector due to large task queue > Merge "Small asan related improvements" from Rafael > thread: reduce allocations during context switch > thread: remove deprecated thread_scheduling_group and its unit test > reactor: make _polls to be non atomic > reactor: remove unused _tasks_processed variable	2019-07-31 18:30:10 +03:00
Takuya ASADA	60ec8b2a04	install.sh: install everything when --pkg is not specified On previous commit `ac9b115a8f`, install.sh requires to specify single package using --pkg, there is no way to select all. It should be select all packages when running install.sh without --pkg. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190731013245.5857-1-syuu@scylladb.com>	2019-07-31 16:43:57 +03:00
Asias He	5d3e4d7b73	messaging_service: Check if messaging_service is stopped before get_rpc_client get_rpc_client assumes the messaging_service is not stopped. We should check is_stopping() before we call get_rpc_client. We do such check in existing code, e.g., send_message and friends. Do the same check in the newly introduced make_sink_and_source_for_stream_mutation_fragments() and friends for row level repair. Fixes: #4767	2019-07-31 11:44:57 +03:00
Avi Kivity	74349bdf7e	Merge "Partially devirtualize CQL restrictions" from Piotr " This series is a batch of first small steps towards devirtualizing CQL restrictions: - one artificial parent class in the hierarchy is removed: abstract_restriction - the following functions are devirtualized: * is_EQ() * is_IN() * is_slice() * is_contains() * is_LIKE() * is_on_token() * is_multi_column() Future steps can involve the following: - introducing a std::variant of restriction targets: it's either a column or a vector of columns - introducing a std::variant of restriction values: it's one of: {term, term_slice, std::vector<term>, abstract_marker} The steps above will allow devirtualizing most of the remaining virtual functions in favor of std::visit. They will also reduce the number of subclasses, e.g. what's currently `token_restriction::IN_with_values` can be just an instance of `restriction`, knowing that it's on a token, having a target of std::vector<column> and a value of std::vector<term>. Tests: unit(dev), dtest: cql_tests, cql_additional_tests " * 'refactor_restrictions_2' of https://github.com/psarna/scylla: cql3: devirtualize is_on_token() cql3: devirtualize is_multi_column() cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE tests: add enum_set adding case cql3: allow adding enum_sets cql3: remove abstract_restriction class	2019-07-31 11:44:57 +03:00
Vlad Zolotarov	9df53b8bca	configure.py: ignore 'thrift -version' exit code (At least) on Ubuntu 19 'thrift -version' prints the expected string but its exit status is non-zero: $ thrift -version Thrift version 0.9.1 $ echo $? 1 We don't really care about the exit status but rather about the printed version string. If there is going to be some problem with the command, e.g. it's missing, the printed string is not going to be as expected anyway - let's verify that explicitly by checking the format of the returned string in that case. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190722211729.24225-1-vladz@scylladb.com>	2019-07-31 11:44:57 +03:00
Botond Dénes	cea3338e38	tests/mutation_source_tests: generate_mutation_sets() use larger ttl Currently all cells generated by this method uses a ttl of 1. This causes test flakyness as tests often compact the input and output mutations to weed out artificial differences between them. If this compaction is not done with the exact same query time, then some cells will be expired in one compaction but not in the other. `733c68cb1` attempted to solve this by passing the same query time to `flat_mutation_reader_assertions::produce_compacted()` as well as `mutation_partition::compact_for_query()` when compacting the input mutation. However a hidden compaction spot remained: the ka/la sstable writer also does some compaction, and for this it uses the time point passed to the `sstable` constructor, which defaults to `gc_clock::now()`. This leads to false positive failures in `sstable_mutation_test.cc`. At this point I don't know what the original intent was behind this low `ttl` value. To solve the immediate problem of the tests failing, I increased it. If it turns out that this `ttl` value has a good reason, we can do a more involved fix, of making sure all sstables written also get the same query time as that used for the compaction. Fixes: #4747 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190731081522.22915-1-bdenes@scylladb.com>	2019-07-31 11:44:57 +03:00
Piotr Sarna	2f65144a20	cql3: devirtualize is_on_token() Instead of being a virtual function, is_on_token leverages the existing enum inside the `restriction` class.	2019-07-29 17:18:50 +02:00
Piotr Sarna	68aa42c545	cql3: devirtualize is_multi_column() Instead of being a virtual function, is_multi_column leverages an enum.	2019-07-29 17:18:50 +02:00
Piotr Sarna	83fbfe5a4f	cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE Instead of virtual functions, operation for each restriction is determined by an enum value it stores.	2019-07-29 17:18:49 +02:00
Piotr Sarna	e9798354ae	tests: add enum_set adding case	2019-07-29 17:15:51 +02:00
Piotr Sarna	989c31f68b	cql3: allow adding enum_sets Enum set can now be added to another enum set in order to create a sum of both.	2019-07-29 17:15:51 +02:00
Piotr Sarna	5e06801f12	cql3: remove abstract_restriction class All restrictions inherit from `abstract_restriction` class, which has only one parent class: `restriction`. To simplify the inheritance tree, `restriction` and `abstract_restriction` are merged into one class named `restriction`.	2019-07-29 15:54:39 +02:00
Botond Dénes	733c68cb13	tests: flat_reader_assertions::produces_compacted(): add query_time param `produces_compacted()` is usually used in tandem of another compaction done on the expected output (`m` param). This is usually done so that even though the reader works with an uncompacted stream, when checking the checking of the result will not fail due to insignificant changes to the data, e.g. expired collection cells dropped while merging two collections. Currently, the two compactions, the one inside `produce_compacted()` and the one done by the caller uses two separate calls to `gc_clock::now()` to obtain the query time. This can lead to off-by-one errors in the two query times and subsequently artificial differences between the two compacted mutations, ultimately failing the test due to a false-positive. To prevent this allow callers to pass in a query time, the same they used to compact the input mutation (`m`). This solves another source of flakyness in unit tests using the mutation source test suite. Refs: #4695 Fixes: #4747 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190726144032.3411-1-bdenes@scylladb.com>	2019-07-28 10:59:50 +03:00
Botond Dénes	f215286525	tests/mutation_reader_tests: move away from variadic futures Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190724101005.19126-1-bdenes@scylladb.com>	2019-07-27 13:21:24 +03:00
Botond Dénes	0f30bc0004	mutation_reader: move away from variadic futures Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190724102246.20450-1-bdenes@scylladb.com>	2019-07-27 13:21:24 +03:00
Botond Dénes	6742c77229	scylla-gdb.py: fix scylla_ptr Broken since `b3adabda2`. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190726140532.124406-1-bdenes@scylladb.com>	2019-07-27 13:21:24 +03:00
Avi Kivity	b272db368f	sstable: index_reader: close index_reader::reader more robustly If we had an error while reading, then we would have failed to close the reader, which in turn can cause memory corruption. Make the closing more robust by using then_wrapped (that doesn't skip on exception) and log the error for analysis. Fixes #4761.	2019-07-26 14:26:04 +02:00
Avi Kivity	fcf3195e54	Update seastar submodule * seastar c1be3c912f...3f88e9068b (3): > reactor: improve handling of connect storms > json: Make date formatter use RFC8601/RFC3339 format > reactor: fix deadlock of stall detector vs dlopen Fixes #4759.	2019-07-25 18:29:54 +03:00
Takuya ASADA	ac9b115a8f	dist/debian: use install.sh on Debian Currently, install.sh just used for building .rpm, we have similar build script under dist/debian, sometimes it become inconsistent with install.sh. Since most of package build process are same, we should share install.sh on both .rpm and .deb package build process. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190725123207.2326-1-syuu@scylladb.com>	2019-07-25 18:22:42 +03:00
Botond Dénes	6dd8c4da83	test_multishard_combining_reader_non_strictly_monotonic_positions: use the same deletion_time for tombstones Across all calls to `make_fragments_with_non_monotonic_positions()`, to prevent off-by one errors between the separately generated test input and expected output. This problem was already supposed to be fixed by `5f22771ea8` but for some reason that only used the same deletion time inside a single call, which will still fall short in some cases. This should hopefully fix this problem for good. Refs: #4695 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190724073240.125975-1-bdenes@scylladb.com>	2019-07-25 12:37:34 +02:00
Kamil Braun	148d4649d6	Add option to create a XUnit output file for non-boost tests in test.py. (#4757 ) If the user specifies an output file name using "--xunit=<filename>", test.py will write the test results of non-boost tests to the file in the XUnit XML format. Every boost test creates its own results file already. Resolves #4680. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-25 12:47:47 +03:00
Vlad Zolotarov	53cf90b075	ec2_snitch: properly build the AWS meta server address Explicity pass the port number of the AWS metadata server API when creating a corresponding socket. This patch fixes the regression introduced by `4ef940169f`. Fixes #4719 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-07-25 10:50:01 +03:00
Tomasz Grabiec	3af8431a40	Merge "compaction: allow collecting purged data" from Botond compaction: allow collecting purged data Allow the compaction initiator to pass an additional consumer that will consume any data that is purged during the compaction process. This allows the separate retention of these dead cells and tombstone until some long-running process like compaction safely finishes. If the process fails or is interrupted the purged data can be used to prevent data resurrection. This patch was developed to serve as the basis for a solution to #4531 but it is not a complete solution in and on itself. This series is a continuation of the patch: "[PATCH v1 1/3] Introduce Garbage Collected Consumer to Mutation Compactor" by Raphael S. Carvalho <raphaelsc@scylladb.com>. Refs: #4531 * https://github.com/denesb/scylla.git compaction_collect_purged_data/v8: Introduce compaction_garbage_collector interface collection_type_impl::mutation: compact_and_expire() add collector parameter row: add garbage_collector row_marker: de-inline compact_and_expire() row_marker: add garbage_collector Introduce Garbage Collected Consumer to Mutation Compactor tests: mutation_writer_test.cc/generate_mutations() -> random_schema.hh/generate_random_mutations() tests/random_schema: generate_random_mutations(): remove `engine` parameter tests/random_schema: add assert to make_clustering_key() tests/random_schema: generate_random_mutations(): allow customizing generated data tests: random_schema: futurize generate_random_mutations() random_schema: generate_random_mutations(): restore indentation data_model: extend ttl and expiry support tests/random_schema: generate_random_mutations(): generate partition tombstone random_schema: add ttl and expiry support tests/random: add get_bool() overload with random engine param random_schema: generate_random_mutations(): ensure partitions are unique tests: add unit tests for the data stream split in compaction	2019-07-23 17:12:28 +02:00
Avi Kivity	44b5878011	Merge "Fix possible stalls in row level repair" from Asias " After switching to rpc stream interface, we increased the row buffer size. Code works on the buffer that do not yield can stall the reactor. This series fixes the issue by futurizing or running the code in thread and yield. Fixes: #4642 " * 'repair_switch_to_rpc_stream_fix_stall' of https://github.com/asias/scylla: repair: Enable rpc stream in row level repair repair: Wrap with foreign_ptr to avoid cross cpu free repair: Futurize get_repair_rows_size and row_buf_size repair: Avoid calling get_repair_rows_size in get_sync_boundary repair: Futurize row_buf_csum repair: Yield inside get_set_diff repair: Use get_repair_rows_size helper in get_sync_boundary repair: Avoid stall in do_estimate_partitions_on_local_shard remove get_row_diff repair: Futurize get_row_diff to avoid stall repair: Fix possible stall in request_row_hashes repair: Allow default construct for repair_row repair: Remove apply_rows repair: Run get_row_diff_with_rpc_stream in a thread repair: Run get_row_diff_and_update_peer_row_hash_sets inside a thread repair: Run get_row_diff inside a thread repair: Add apply_rows_on_master_in_thread repair: Add apply_rows_on_follower repair: Futurize working_row_hashes repair: Remove get_full_row_hashes helper	2019-07-22 15:54:06 +03:00
Avi Kivity	9e630eb734	Update seastar submodule * seastar 44a300cd50...c1be3c912f (9): > execution_stage: prevent unbounded growth > io queues: Add renaming functionality to io priority class > scheduling: Add rename functionality to scheduling groups > net: Add listen_backlog option for posix stack > future: deprecate variadic futures > include,tests: add workaround for missing guaranteed copy elision > core/dpdk_rte: handle 64+ cores > perftune: add a dry-run mode > build: support building dpdk on arm64 Fixes #4749.	2019-07-22 15:41:54 +03:00
Avi Kivity	e03c7003f1	toppartitions: fix race between listener removal and reads Data listener reads are implemented as flat_mutation_readers, which take a reference to the listener and then execute asynchronously. The listener can be removed between the time when the reference is taken and actual execution, resulting in a dangling pointer dereference. Fix by using a weak_ptr to avoid writing to a destroyed object. Note that writes don't need protection because they execute atomically. Fixes #4661. Tests: unit (dev)	2019-07-22 13:26:18 +02:00
Avi Kivity	d730969278	Merge "make sure failure to create snapshots won't crash the node" from Glauber " Issue #4558 describes a situation in which failure to execute clearsnapshots will hard crash the node. The problem is that clearsnapshots will internally use lister::rmdir, which in turn has two in-tree users: clearing snapshots and clearing temporary directories during sstable creation. The way it is currently coded, it wraps the io functions in io_check, which means that failures to remove the directory will crash the database. We recently saw how benign failures crashed a database during clearsnapshot: we had snapshot creation running in parallel, adding more files to the directory that wasn't empty by the time of deletion. I have also seen very often users add files to existing directories by accident, which is another possibility to trigger that. This patch removes the io_check from lister, and moves it to the caller in which we want to be more strict. We still want to be strict about the creation of temporary directories, since users shouldn't be touching that in any way. Also while working on that, I realized we have no tests for snapshots of any kind in tree, so let's write some " * 'snapshots' of https://github.com/glommer/scylla: tests: add tests for snapshots. lister: don't crash the node on failure to remove snapshot	2019-07-22 11:09:23 +03:00
Rafael Ávila de Espíndola	636e2470b1	Always close commitlog files We were using segment::_closed to decide whether _file was already closed. Unfortunately they are not exactly the same thing. As far as I understand it, segments can be closed and reused without actually closing the file. Found with a seastar patch that asserts on destroying an open append_challenged_posix_file_impl. Fixes #4745. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190721171332.7995-1-espindola@scylladb.com>	2019-07-22 10:08:57 +03:00
Vlad Zolotarov	5632c0776e	tests: fix the compilation with fmt v5.3.0 Compilation fails with fmt release 5.3.0 when we print a bytes_view using "{}" formatter. Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument" Fix this by explicitly using to_hex() converter. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190716221231.22605-3-vladz@scylladb.com>	2019-07-21 16:42:54 +03:00
Nadav Har'El	db8d4a0cc6	Add computed columns Merged patch series by Piotr Sarna: This series introduces the concept of "computed" column, which represents values not provided directly by the user, but computed on the fly - possibly using other column values. It will be used in the future to implement map value indexing, collection indexing, etc. Right now the only use is the token column for secondary indexes - which is a column computed from the base partition key value. After this series, another one that depends on it and adds map value indexing will be pushed. Tests: unit(dev) Piotr Sarna (14): schema: add computed info to column definition schema: add implementation of computing token column schema: allow marking columns as computed in schema builder service: add computed columns feature view: check for computed columns in view view: remove unused token_for function database: add fixing previous secondary index schemas tests: disable computed columns feature in schema change test tests: add schema change test regeneration comment db: add system_schema.computed_columns docs: init system_schema_keyspace.md with column computations tests: generate new test case for schema change + computed cols index: mark token column as 'computed' when creating mv tests: add checking computed columns in SI column_computation.hh \| 63 ++++++++ db/schema_features.hh \| 4 +- db/schema_tables.hh \| 4 + idl/frozen_schema.idl.hh \| 1 + schema.hh \| 40 +++++ schema_builder.hh \| 4 +- schema_mutations.hh \| 18 ++- service/storage_service.hh \| 8 + view_info.hh \| 2 - database.cc \| 6 +- db/schema_tables.cc \| 146 ++++++++++++++++-- db/view/view.cc \| 46 +++--- index/secondary_index_manager.cc \| 2 +- schema.cc \| 58 ++++++- schema_mutations.cc \| 14 +- service/storage_service.cc \| 5 + tests/schema_change_test.cc \| 63 ++++++-- tests/secondary_index_test.cc \| 28 ++++ docs/system_schema_keyspace.md \| 40 +++++ plus about 200 new test sstable files	2019-07-21 13:05:46 +03:00
Piotr Sarna	4d1eaf8478	tests: add checking computed columns in SI The test case checks if token column generated for global indexing is indeed only present in global indexes and is marked as a computed column.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a8f7d64a08	index: mark token column as 'computed' when creating mv Secondary indexes use a computed token column to preserve proper query ordering. This column is now marked as 'computed'.	2019-07-19 11:58:42 +02:00
Piotr Sarna	1c0ef5f9e9	tests: generate new test case for schema change + computed cols The original "test_schema_digest_does_not_change" test case ensures that schema digests will match for older nodes that do not support all the features yet (including computed columns). The additional case uses sstables generated after computed columns are allowed, in order to make sure that the digest computed including computed columns does not change spuriously as well.	2019-07-19 11:58:42 +02:00
Piotr Sarna	1e54752167	docs: init system_schema_keyspace.md with column computations The documentation file for system_schema keyspace is introduced, and its first entry describes the column_computation table.	2019-07-19 11:58:42 +02:00
Piotr Sarna	c1d5aef735	db: add system_schema.computed_columns Information on which columns of a table are 'computed' is now kept in system_schema.computed_columns system table.	2019-07-19 11:58:42 +02:00
Piotr Sarna	589200f5a2	tests: add schema change test regeneration comment Schema change test might need regenerating every time a system table is added. In order to save future developer's time on debugging this test, a short description of that requirement is added.	2019-07-19 11:58:42 +02:00
Piotr Sarna	03ade01db7	tests: disable computed columns feature in schema change test In order to make sure that old schema digest is not recomputed and can be verified - computed columns feature is initially disabled in schema_change_test. The reason for that is as follows: running CQL test env assumes that we are running the newest cluster with all features enabled. However, the mere existence of some features might influence digest calculation. So, in order for the existing test to work correctly, it should have exactly the same set of cluster supported features as it had during its creation. It used to be "all features", but now it's "all features except computed columns". One can think of that as running a cluster with some nodes not yet knowing what computed columns are, so they are not taken into account when computing digests. Additionally, a separate test case that takes computed column digest into account will be generated and added in this series.	2019-07-19 11:58:42 +02:00
Piotr Sarna	17c323c096	database: add fixing previous secondary index schemas If a schema was created before computed columns were implemented, its token column may not have been marked as computed. To remedy this, if no computed column is found, the schema will be recreated. The code will work correctly even without this patch in order to support upgrading from legacy versions, but it's still important: it transforms token columns from the legacy format to new computed format, which will eventually (after a few release cycles) allow dropping the support for legacy format altogether.	2019-07-19 11:58:42 +02:00
Piotr Sarna	3c5dd94306	view: remove unused token_for function The function was only used once in code removed in this series.	2019-07-19 11:58:42 +02:00
Piotr Sarna	6a6871aa0e	view: check for computed columns in view Currently, having a 'computed' column in view update generation indicates that token value needs to be generated and assigned to it.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a0e02df36a	service: add computed columns feature Computed columns feature should be checked before creating index schemas the new way - by adding computed column names to system_schema.computed_columns.	2019-07-19 11:58:42 +02:00
Piotr Sarna	a1100e3737	schema: allow marking columns as computed in schema builder In order to be able to transform legacy materialized view definitions, builder is now able to mark an existing column as computed.	2019-07-19 11:58:41 +02:00
Piotr Sarna	65bf6d34fe	schema: add implementation of computing token column Computed column of 'token' type can now have its value computed.	2019-07-19 11:47:48 +02:00
Piotr Sarna	491b7a817f	schema: add computed info to column definition Some columns may represent not user-provided values, but ones computed from other columns. Currently an example is token column used in secondary indexes to provide proper ordering. In order to avoid hardcoding special cases in execution stage, optional additional information for computed columns is stored in column definition.	2019-07-19 11:47:46 +02:00
Tomasz Grabiec	7604980d63	database: Add missing partition slicing on streaming reader recreation streaming_reader_lifecycle_policy::create_reader() was ignoring the partition_slice passed to it and always creating the reader for the full slice. That's wrong because create_reader() is called when recreating a reader after it's evicted. If the reader stopped in the middle of partition we need to start from that point. Otherwise, fragments in the mutation stream will appear duplicated or out of ordre, violating assumptions of the consumers. This was observed to result in repair writing incorrect sstables with duplicated clustering rows, which results in malformed_sstable_exception on read from those sstables. Fixes #4659. In v2: - Added an overload without partition_slice to avoid changing existing users which never slice Tests: - unit (dev) - manual (3 node ccm + repair) Backport: 3.1 Reviewd-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>	2019-07-18 18:35:28 +03:00
Asias He	64a4c0ede2	streaming: Do not open rpc stream connection if ranges are not relevant to a shard Given a list of ranges to stream, stream_transfer_task will create an reader with the ranges and create a rpc stream connection on all the shards. When user provides ranges to repair with -st -et options, e.g., using scylla-manger, such ranges can belong to only one shard, repair will pass such ranges to streaming. As a result, only one shard will have data to send while the rpc stream connections are created on all the shards, which can cause the kernel run out of ports in some systems. To mitigate the problem, do not open the connection if the ranges do not belong to the shard at all. Refs: #4708	2019-07-18 18:31:21 +03:00
Avi Kivity	51cff8ad23	Merge "Fix storage service for tests" from Botond " Fix another source of flakyness in mutation_reader_test. This one is caused by storage_service_for_tests lacking a config::broadcast_to_all_shards() call, triggering an invalid memory access (or SEGFAULT) when run on more than one shards. Refs: #4695 " * 'fix_storage_service_for_tests' of https://github.com/denesb/scylla: tests: storage_service_for_tests: broadcast config to all shards tests: move storage_service_for_tests impl to test_services.cc	2019-07-18 18:27:47 +03:00
Nadav Har'El	997b92a666	migration_manager: allow dropping table and all its views The function announce_column_family_drop() drops (deletes) a base table and all the materialized-views used for its secondary indexes, but not other materialized views - if there are any, the operation refuses to continue. This is exactly what CQL's "DROP TABLE" needs, because it is not allowed to drop a table before manually dropping its views. But there is no inherent reason why it we can't support an operation to delete a table and all its views - not just those related to indexes. This patch adds such an option to announce_column_family_drop(). This option is not used by the existing CQL layer, but can be used by other code automating operations programatically without CQL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190716150559.11806-1-nyh@scylladb.com>	2019-07-18 13:26:25 +02:00
Takuya ASADA	bd7d1b2d38	dist/common/systemd: change stop timeout sec to 900s Currently scylla-server.service uses DefaultTimeoutStopSec = 90, if Scylla does not able to clean-shutdown in 90sec we may have data corruption on the node. Since we already set TimeoutStartSec = 900, we can use TimeoutSec to set both TimeoutStartSec and TimeoutStopSec to 900. See #4700 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190717095416.10652-1-syuu@scylladb.com>	2019-07-17 15:37:47 +03:00
Nadav Har'El	759752947b	drop_index_statement: fix column_family() All statement objects which derive from cf_statement, including drop_index_statement, have a column_family() returning the name of the column family involved in this statement. For most statement this is known at the time of construction, because it is part of the statement, but for "DROP INDEX", the user doesn't specify the table's name - just the index name. So we need to override column_family() to find the table name. The existing implementation assert()ed that we can always find such a table, but this is not true - for example, in a DROP INDEX with "IF EXISTS", it is perfectly fine for no such table to exist. In this case we don't want a crash, and not even an except - it's fine that we just return an empty table name. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190716180104.15985-1-nyh@scylladb.com>	2019-07-17 09:44:47 +03:00
Glauber Costa	be26cbd952	tests: add tests for snapshots. While inspecting the snapshot code, I realized that we don't have any tests for it. So I decided to add some. Unfortunately I couldn't come up with a test of clearsnapshot reliably failing to remove the directory: relying on create snapshot + clearsnapshot is racy (not always happen), and other tricks that can be used to reproduce this -- like creating a root-owned file inside the snapshots directory -- is environment-dependent, and a bit ugly for unit tests. Dtests would probably be a better place for that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-07-16 13:35:53 -04:00
Glauber Costa	2008d982c3	lister: don't crash the node on failure to remove snapshot lister::rmdir has two in-tree users: clearing snapshots and clearing temporary directories during sstable creation. The way it is currently coded, it wraps the io functions in io_check, which means that failures to remove the directory will crash the database. We recently saw how benign failures crashed a database during clearsnapshot: we had snapshot creation running in parallel, adding more files to the directory that wasn't empty by the time of deletion. I have also seen very often users add files to existing directories by accident, which is another possibility to trigger that. This patch removes the io_check from lister, and moves it to the caller in which we want to be more strict. We still want to be strict about the creation of temporary directories, since users shouldn't be touching that in any way. Fixes #4558 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-07-16 13:35:36 -04:00
Kamil Braun	4417e78125	Fix timestamp_type_impl::timestamp_from_string. Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00. Fixes #4641. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-16 19:16:56 +03:00
Asias He	722ab3bb65	repair: Log repair id in check_failed_ranges Add the word `id` before the repair id in the log. It makes the log easier to figure out what the number stands for.	2019-07-16 19:10:19 +03:00
Avi Kivity	43690ecbdf	Merge "Fix disable_sstable_write synchronization with on_compaction_completion" from Benny " disable_sstable_write needs to acquire _sstable_deletion_sem to properly synchronize with background deletions done by on_compaction_completion to ensure no sstables will be created or deleted during reshuffle_sstables after storage_service::load_new_sstables disables sstable writes. Fixes #4622 Test: unit(dev), nodetool_additional_test.py migration_test.py " * 'scylla-4622-fix-disable-sstable-write' of https://github.com/bhalevy/scylla: table: document _sstables_lock/_sstable_deletion_sem locking order table: disable_sstable_write: acquire _sstable_deletion_sem table: uninline enable_sstable_write table: reshuffle_sstables: add log message	2019-07-16 19:06:58 +03:00
Amnon Heiman	399d79fc6f	init: do not allow replace-address for seeds If a node is a seed node, it can not be started with replace-address-first-boot or the replace-address flag. The issue is that as a seed node it will generate new tokens instead of replacing the existing one the user expect it to replaec when supplying the flags. This patch will throw a bad_configuration_error exception in this case. Fixes #3889 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-16 18:53:19 +03:00
Calle Wilund	dbc3499fd1	server: Fix cql notification inet address serialization Fixes #4717 Bug in ipv6 support series caused inet_address serialization to include an additional "size" parameter in the address chunk. Message-Id: <20190716134254.20708-1-calle@scylladb.com>	2019-07-16 16:51:59 +03:00
Botond Dénes	b40cf1c43d	tests: storage_service_for_tests: broadcast config to all shards Due to recent changes to the config subsystem, configuration has to be broadcast to all shards if one wishes to use it on them. The `storage_service_for_tests` has a `sharded<gms::gossiper>` member, which reads config values on initialization on each shard, causing a crash as the configuration was initialized only on shard 0. Add a call to `config::broadcast_to_all_shards()` to ensure all shards have access to valid config values.	2019-07-16 10:37:17 +03:00
Botond Dénes	fc9f46d7c1	tests: move storage_service_for_tests impl to test_services.cc Let's make it easier to find.	2019-07-16 10:36:49 +03:00
Raphael S. Carvalho	7180731d43	tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:50 -03:00
Raphael S. Carvalho	332c2ff710	table: Make SSTable cleanup run aware The cleanup procedure will move any sstable out of its sstable run because sstables are cleaned up individually and they end up receiving a new run identifier, meaning a table may potentially end up with a new sstable run for each of the sstables cleaned. SStable cleanup needs to be run aware, so that the run structure is not messed up after the operation is done. Given that only one fragment or other, composing a sstable run, may need cleanup, it's better to keep them in their original sstable run. Fixes #4663. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:47 -03:00
Raphael S. Carvalho	8c97e0e43e	compaction: introduce constants for compaction descriptor Make it easier for users, and also avoid duplicating knowledge about descriptor defaults across the codebase. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:44 -03:00
Raphael S. Carvalho	a1db29e705	compaction: Make it possible to config the identifier of the output sstable run Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:38 -03:00
Raphael S. Carvalho	0e732ed1cf	table: do not rely on undefined behavior in cleanup_sstables It shouldn't rely on argument evaluation order, which is ub. Fixes #4718. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:22 -03:00
Paweł Dziepak	060e3f8ac2	mutation_partition: verify row::append_cell() precondition row::append_cell() has a precondition that the new cell column id needs to be larger than that of any other already existing cell. If this precondition is violated the row will end up in an invalid state. This patch adds assertion to make sure we fail early in such cases.	2019-07-15 23:25:06 +02:00
Botond Dénes	5f22771ea8	tests/mutation_reader_test stabilize test_multishard_combining_reader_non_strictly_monotonic_positions Currently the test_multishard_combining_reader_non_strictly_monotonic_positions is flaky. The test is somewhat unconventional, in that it doesn't use the same instance of data as the input to the test and as it's expected output, instead it invokes the method which generates this data (`make_fragments_with_non_monotonic_positions()`) twice, first to generate the input, and a secondly to generate the expected output. This means that the test is prone to any deviation in the data generated by said method. One such deviation, discovered recently, is that the method doesn't explicitly specify the deletion time of the generated range tombstones. This results in this deletion time sometimes differing between the test input and the expected output. Solve by explicitly passing the same deletion time to all created range tombstones. Refs: #4695	2019-07-15 23:24:16 +02:00
Tomasz Grabiec	14700c2ac4	Merge "Fix the system.size_estimates table" from Kamil Fixes a segfault when querying for an empty keyspace. Also, fixes an infinite loop on smp > 1. Queries to system.size_estimates table which are not single-partition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. Fixes #4689.	2019-07-15 22:09:30 +02:00
Asias He	8774adb9d0	repair: Avoid deadlock in remove_repair_meta Start n1, n2 Create ks with rf = 2 Run repair on n2 Stop n2 in the middle of repair n1 will notice n2 is DOWN, gossip handler will remove repair instance with n2 which calls remove_repair_meta(). Inside remove_repair_meta(), we have: ``` 1 return parallel_for_each(*repair_metas, [repair_metas] (auto& rm) { 2 return rm->stop(); 3 }).then([repair_metas, from] { 4 rlogger.debug("Removed all repair_meta for single node {}", from); 5 }); ``` Since 3.1, we start 16 repair instances in parallel which will create 16 readers.The reader semaphore is 10. At line 2, it calls ``` 6 future<> stop() { 7 auto gate_future = _gate.close(); 8 auto writer_future = _repair_writer.wait_for_writer_done(); 9 return when_all_succeed(std::move(gate_future), std::move(writer_future)); 10 } ``` The gate protects the reader to read data from disk: ``` 11 with_gate(_gate, [] { 12 read_rows_from_disk 13 return _repair_reader.read_mutation_fragment() --> calls reader() to read data 14 }) ``` So line 7 won't return until all the 16 readers return from the call of reader(). The problem is, the reader won't release the reader semaphore until the reader is destroyed! So, even if 10 out of the 16 readers have finished reading, they won't release the semaphore. As a result, the stop() hangs forever. To fix in short term, we can delete the reader, aka, drop the the repair_meta object once it is stopped. Refs: #4693	2019-07-15 21:51:57 +02:00
Benny Halevy	0e4567c881	table: document _sstables_lock/_sstable_deletion_sem locking order Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-15 19:20:35 +03:00
Botond Dénes	135c84c29a	tests: add unit tests for the data stream split in compaction	2019-07-15 17:38:00 +03:00
Botond Dénes	719ad51bea	random_schema: generate_random_mutations(): ensure partitions are unique Duplicate partitions can appear as a result of the same partition key generated more than once. For now we simply remove any duplicates. This means that in some circumstances there will be less partitions generated than asked.	2019-07-15 17:38:00 +03:00
Botond Dénes	eaedbed069	tests/random: add get_bool() overload with random engine param	2019-07-15 17:38:00 +03:00
Botond Dénes	057f9aa655	random_schema: add ttl and expiry support When generating data, the user can now also generate ttls and expiry for all generated atoms. This happens in a controlled way, via a generator functor, very similar to how the timestamps are generated. This functor is also used by `random_schema` to generate `deletion_time` for all tombstones, so the user now has full control of when all of the atoms can be GC'd.	2019-07-15 17:38:00 +03:00
Botond Dénes	76a853e345	tests/random_schema: generate_random_mutations(): generate partition tombstone	2019-07-15 17:38:00 +03:00
Botond Dénes	4d9f3e5705	data_model: extend ttl and expiry support	2019-07-15 17:38:00 +03:00
Botond Dénes	96d3c1efb1	random_schema: generate_random_mutations(): restore indentation	2019-07-15 17:38:00 +03:00
Botond Dénes	b26fe76fc1	tests: random_schema: futurize generate_random_mutations() To avoid reactor stalls when generate many and/or large partitions.	2019-07-15 17:38:00 +03:00
Botond Dénes	cf135c6257	tests/random_schema: generate_random_mutations(): allow customizing generated data Allow callers to specify the number of partitions generated, as well as the number of clustering rows and range tombstones generated per partition.	2019-07-15 17:38:00 +03:00
Botond Dénes	d2930ffa53	tests/random_schema: add assert to make_clustering_key() Verify that the schema does indeed have clustering columns. Better an assert than a cryptic "division by 0" exception deeper in the call stack.	2019-07-15 17:38:00 +03:00
Botond Dénes	d90ac6bd7b	tests/random_schema: generate_random_mutations(): remove `engine` parameter Use an internally create instance of random engine. Passing a readily seeded engine from the outside is pointless now that we have a mechanism to seed entire test suites with a command line algorithm: the internal engine is seeded from tests::random, so the seed of the test suite determines the internal seed as well. Update the sole user of this method (mutation_writer_test.cc) to not generate local seeds anymore.	2019-07-15 17:38:00 +03:00
Botond Dénes	fd2f53f292	tests: mutation_writer_test.cc/generate_mutations() -> random_schema.hh/generate_random_mutations() We plan on allowing other tests to use this method. The first step is to make it available in a header.	2019-07-15 17:38:00 +03:00
Botond Dénes	7a4a609e88	Introduce Garbage Collected Consumer to Mutation Compactor Introduce consumer in mutation compactor that will only consume data that is purged away from regular consumer. The goal is to allow compaction implementation to do whatever it wants with the garbage collected data, like saving it for preventing data resurrection from ever happening, like described in issue #4531. noop_compacted_fragments_consumer is made available for users that don't need this capability. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 17:38:00 +03:00
Botond Dénes	4c2781edaa	row_marker: add garbage_collector The new collector parameter is a pointer to a `compaction_garbage_collector` implementation. This collector is passed the row_marker when it expired and would be discarded. The collector param is optional and defaults to nullptr.	2019-07-15 17:38:00 +03:00
Botond Dénes	7db2006162	row_marker: de-inline compact_and_expire()	2019-07-15 17:38:00 +03:00
Botond Dénes	4c7a7ffe8f	row: add garbage_collector The new collector parameter is a pointer to a `compaction_garbage_collector` implementation. This collector is passed all atoms that are expired and can would be discarded. The body of `compact_and_expire()` was changed so that it checks cells' tombstone coverage before it checks their expiry, so that cells that are both covered by a tombstone and also expired are not passed to the collector. The collector is forwarded to `collection_type_impl::mutation::compact_and_expire()` as well. The collector param is optional and defaults to nullptr	2019-07-15 17:38:00 +03:00
Botond Dénes	307b48794d	collection_type_impl::mutation: compact_and_expire() add collector parameter The new collector parameter is a pointer to a `compaction_garbage_collector` implementation. This collector is passed all atoms that are expired and would be discarded. The body of `compact_and_expire()` was changed so that it checks cells' tombstone coverage before it checks their expiry, so that cells that are both covered by a tombstone and also expired are not passed to the collector. The collector param is optional and defaults to nullptr. To accommodate the collector, which needs to know the column id, a new `column_id` parameter was added as well.	2019-07-15 17:37:55 +03:00
Calle Wilund	1ed9a44396	utils::config_file: Propagare broadcast_to_all_shards to dependent files Fixes #4713 Modifying config files to use sharded storage misses the fact that extensions are allowed to add non-member config fields to the main configuration, typically from "extra" config_file objects. Unless those "extra" files are broadcast when main file broadcast, the values will not be readable from other shards. This patch propagates the broadcast to all other config files whose entries are in the top level object. This ensures we always keep data up to date on config reload. Message-Id: <20190715135851.19948-1-calle@scylladb.com>	2019-07-15 17:02:09 +03:00
Nadav Har'El	9cc9facbea	configure.py: atomically overwrite build.ninja configure.py currently takes some time to write build.ninja. If the user interrupts (e.g., control-C) configure.py, it can leave behind a partial or even empty build.ninja file. This is most frustrating when the user didn't explicitly run "configure.py", but rather just ran "ninja" and ninja decided to run configure.py, and after interrupting it the user cannot run "ninja" again because build.ninja is gone. Another result of losing build.ninja is that the user now needs to remember which parameters to run "configure.py", because the old ones stored in build.ninja were lost. The solution in this patch is simple: We write the new build.ninja contents into a temporary file, not directly into build.ninja. Then, only when the entire file has been succesfully written, do we rename the temporary file to its intended name - build.ninja. Fixes #4706 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190715122129.16033-1-nyh@scylladb.com>	2019-07-15 15:34:48 +03:00
Botond Dénes	5002ebb73f	Introduce compaction_garbage_collector interface This interface can be used to implement a garbage collector that collects atoms that are purged due to expiry during compaction. The intended usage is collecting purged atoms for safekeeping until the compaction process finishes safely, to be dropped only at the end when the compaction is known to have finished successfully.	2019-07-15 15:30:43 +03:00
Eliran Sinvani	997a146c7f	auth: Prevent race between role_manager and pasword_authenticator When scylla is started for the first time with PasswordAuthenticator enabled, it can be that a record of the default superuser will be created in the table with the can_login and is_superuser set to null. It happens because the module in charge of creating the row is the role manger and the module in charge of setting the default password salted hash value is the password authenticator. Those two modules are started together, it the case when the password authenticator finish the initialization first, in the period until the role manager completes it initialization, the row contains those null columns and any loging attempt in this period will cause a memory access violation since those columns are not expected to ever be null. This patch removes the race by starting the password authenticator and autorizer only after the role manger finished its initialization. Tests: 1. Unit tests (release) 2. Auth and cqlsh auth related dtests. Fixes #4226 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190714124839.8392-1-eliransin@scylladb.com>	2019-07-14 16:19:57 +03:00
Rafael Ávila de Espíndola	67c624d967	Add documentation for large_rows and large_cells Fixes #4552 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190614151907.20292-1-espindola@scylladb.com>	2019-07-12 19:21:26 +03:00
Amnon Heiman	1c6dec139f	API: compaction_manager add get pending tasks by table The pending tasks by table name API return an array of pending tasks by keyspace/table names. After this patch the following command would work: curl -X GET 'http://localhost:10000/compaction_manager/metrics/pending_tasks_by_table' Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-12 19:21:26 +03:00
Takuya ASADA	842f75d066	reloc: provide libthread_db.so.1 to debug thread on gdb In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc since it's not available on ldd result with scylla binary. To debug thread, we need to add the library in a relocatable package manually. Fixes #4673 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190711111058.7454-1-syuu@scylladb.com>	2019-07-12 19:21:26 +03:00
Piotr Sarna	ac7531d8d9	db,hints: decouple in-flight hints limits from resource manager The resource manager is used to manage common resources between various hints managers. In-flight hints used to be one of the shared resources, but it proves to cause starvation, when one manager eats the whole limit - which may be especially painful if the background materialized views hints manager starves the regular hints manager, which can in turn start failing user writes because of admission control. This patch makes the limit per-manager again, which effectively reverts the limit to its original behavior. Fixes #4483 Message-Id: <8498768e8bccbfa238e6a021f51ec0fa0bf3f7f9.1559649491.git.sarna@scylladb.com>	2019-07-12 19:21:26 +03:00
Rafael Ávila de Espíndola	4e7ffb80c0	cql: Fix use of UDT in reversed columns We were missing calls to underlying_type in a few locations and so the insert would think the given literal was invalid and the select would refuse to fetch a UDT field. Fixes #4672 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190708200516.59841-1-espindola@scylladb.com>	2019-07-12 19:21:26 +03:00
Kamil Braun	60a4867a5b	Fix infinite looping when performing a range query on system.size_estimates. Queries to system.size_estimates table which are not single parition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. This commit fixes the issue and closes #4689.	2019-07-12 18:09:15 +02:00
Kamil Braun	ba5a02169e	Fix segmentation fault when querying system.size_estimates for an empty keyspace.	2019-07-12 18:02:10 +02:00
Kamil Braun	a1665b74a9	Refactor size_estimates_virtual_reader Move the implementation of size_estimates_mutation_reader to a separate compilation unit to speed up compilation times and increase readability. Refactor tests to use seastar::thread.	2019-07-12 17:53:00 +02:00
Benny Halevy	6dad9baa1c	table: disable_sstable_write: acquire _sstable_deletion_sem `disable_sstable_write` needs to acquire `_sstable_deletion_sem` to properly synchronize with background deletions done by `on_compaction_completion` to ensure no sstables will be created or deleted during `reshuffle_sstables` after `storage_service::load_new_sstables` disables sstable writes. Fixes #4622 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	bbbd749f70	table: uninline enable_sstable_write Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	c6bad3f3c2	table: reshuffle_sstables: add log message To mark the point in time writes are disabled and scanning of the data directory is beginning. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Asias He	aa8d7af4f0	repair: Enable rpc stream in row level repair Add the row_level_diff_detect_algorithm::send_full_set_rpc_stream as supported algo. If both repair master and followers support it, the master will use the rpc stream interface, otherwise use the old rpc verb interface.	2019-07-11 08:59:48 +08:00
Asias He	38b72b398b	repair: Wrap with foreign_ptr to avoid cross cpu free The moved set_diff and rows will be freed on the target cpu instead of the source cpu, which will cause a lot of cross-cpu frees. To fix, wrap them in foreign_ptr.	2019-07-11 08:59:48 +08:00
Asias He	06c84be257	repair: Futurize get_repair_rows_size and row_buf_size To prevent stall when number of rows inside row buf is large.	2019-07-11 08:36:39 +08:00
Asias He	809c992b30	repair: Avoid calling get_repair_rows_size in get_sync_boundary Instead of calling get_repair_rows_size() which might stall with large number of rows, return the size of the rows from read_rows_from_disk.	2019-07-11 08:36:39 +08:00
Asias He	4d41f8e57e	repair: Futurize row_buf_csum To prevent stall when number of rows inside row buf is large.	2019-07-11 08:36:39 +08:00
Asias He	0ef167c9c8	repair: Yield inside get_set_diff get_set_diff always runs inside a thread, so we can thread::maybe_yield() to avoid stall.	2019-07-11 08:36:39 +08:00
Asias He	f871d9edd4	repair: Use get_repair_rows_size helper in get_sync_boundary We have a helper get_repair_rows_size to get the row size in the list.	2019-07-11 08:36:39 +08:00
Asias He	ccbc9fb0ca	repair: Avoid stall in do_estimate_partitions_on_local_shard Do not use boost::accumulate which does not yield. Use do_for_each for each sstable to avoid stall.	2019-07-11 08:36:39 +08:00
Asias He	b7b5cb33e8	remove get_row_diff	2019-07-11 08:36:39 +08:00
Rafael Ávila de Espíndola	281f3a69f8	mc writer: Fix exception safety when closing _index_writer This fixes a possible cause of #4614. From the backtrace in that issue, it looks like a file is being closed twice. The first point in the backtrace where that seems likely is in the MC writer. My first idea was to add a writer::close and make it the responsibility of the code using the writer to call it. That way we would move work out of the destructor. That is a bit hard since the writer is destroyed from flat_mutation_reader::impl::~consumer_adapter and that would need to get a close function too. This patch instead just fixes an exception safety issue. If _index_writer->close() throws, _index_writer is still valid and ~writer will try to close it again. If the exception was thrown after _completed.set_value(), that would explain the assert about _completed.set_value() being called twice. With this patch the path outside of the destructor now moves the writer to a local variable before trying to close it. Fixes #4614 Message-Id: <20190710171747.27337-1-espindola@scylladb.com>	2019-07-10 19:27:19 +02:00
Paweł Dziepak	eb7d17e5c5	lsa: make sure align_up_for_asan() doesn't cause reads past end of segment In debug mode the LSA needs objects to be 8-byte aligned in order to maximise coverage from the AddressSanitizer. Usually `close_active()` creates a dummy objects that covers the end of the segment being closed. However, it the last real objects ends in the last eight bytes of the segment then that dummy won't be created because of the alignment requirements. This broke exit conditions on loops trying to read all objects in the segment and caused them to attempt to dereference address at the end of the segment. This patch fixes that. Fixes #4653.	2019-07-10 19:19:24 +02:00
Avi Kivity	e32bdb6b90	Merge "Warn user about using SimpleStrategy with Multi DC deployment" from Kamil " If the user creates a keyspace with the 'SimpleStrategy' replication class in a multi-datacenter environment, they will receive a warning in the CQL shell and in the server logs. Resolves #4481 and #4651. " * 'multidc' of https://github.com/kbr-/scylla: Warn user about using SimpleStrategy with Multi DC deployment Add warning support to the CQL binary protocol implementation	2019-07-10 16:47:07 +03:00
Avi Kivity	138b28ae43	Merge "Fix command line parsing and add logging." from Kamil " Fixes #4203 and #4141. " * 'cmdline' of https://github.com/kbr-/scylla: Add logging of parsed command line options Fix command line argument parsing in main.	2019-07-10 16:40:57 +03:00
Avi Kivity	405fd517b0	Merge "IPv6 support" from Calle " Fixes #2027 Modifies inet address type in scylla to use seastar::net::inet_address, and removes explicit use of ipv4_addr in various network code in favour of socket_address. Thus capable of resolving and binding to ipv6. Adds config option to enable/disable ipv6 (default enabled), so upgrading cluster can continue to work while running mixed version nodes (since gossip message address serialization becomes different). " * 'calle/ipv6' of https://github.com/elcallio/scylla: test-serialization: Add small roundtrip test for inet address (v4 + v6) inet_address/init: Make ipv6 default enabled db::config: Add enable ipv6 switch (default off) gms::inet_address: Make serialization ipv6 aware Remove usage of inet_address::raw_addr() Replace use of "ipv4_addr" with socket_address inet_address: Add optional family to lookup gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address types: Add ipv6_address support	2019-07-10 15:07:56 +03:00
Benny Halevy	b4dc118639	tests: logalloc_test: scale down test_region_groups Post commit `b3adabda2d` (Reduce logalloc differences between debug and release) logalloc_test's memory footprint has grown, in particular in test_region_groups, and it triggers the oom killer on our test automation machines. This patch scales down this test case so it requires less memory. Fixes #4669 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-10 12:06:10 +02:00
Pekka Enberg	bb53c109b4	test.py: Add option for repeating test execution This adds a '--repeat N' command line option to test.py, which can be used to execute the tests N times. This is useful for finding flakey tests, for example. Message-Id: <20190710092115.15960-1-penberg@scylladb.com>	2019-07-10 12:42:39 +03:00
Botond Dénes	ce647fac9f	timestamp_based_splitting_writer: fix the handling of partition tombstone Currently the handling of partition tombstones is broken in multiple ways: * The partition-tombstone is lost when the bucket is calculated for its timestamp (due to a misplaced `std::exchange()`). * When the `partition_start` fragment (containing the partition tombstone) is actually written to the bucket we emit another `partition_start` fragment before it because the bucket has not seen that partition before and we fail to notice that we are actually writing the partition header. This bug was allowed to fly under the radar because the unit test was accidentally not creating partition tombstones in the generated data (due to a mistake). It was discovered while working on unit tests for another test and fixing the data generation function to actually generate partition tombstones. This patch fixes both problems in the handling of partition tombstones but it doesn't yet fixes the test. That is deferred until the patch series which uncovered this bug is merged to avoid merge conflicts. The other series mentioned here is: [PATCH v6 00/15] compaction: allow collecting purged data Fixes: #4683 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190710092427.122623-1-bdenes@scylladb.com>	2019-07-10 12:36:57 +03:00
Pekka Enberg	e6cc90aa98	test: add 'eventually' block to index paging test (#4681 ) Without 'eventually', the test is flaky because the index can still be not up to date while checking its conditions. Fixes #4670 Tests: unit(dev)	2019-07-10 11:46:03 +03:00
Kamil Braun	d6736a304a	Add metric for failed memtable flushes Resolves #3316. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-10 11:30:10 +03:00
Amnon Heiman	2fbc5ea852	config_file.hh: get_value return a pointer to the value The get_value method returns a pointer to the value that is used by the value_to_json method. The assumption is that the void pointer points to the actual value. Fixes #4678 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-10 10:40:35 +03:00
Piotr Sarna	ebbe038d19	test: add 'eventually' block to index paging test Without 'eventually', the test is flaky because the index can still be not up to date while checking its conditions. Fixes #4670	2019-07-09 17:07:16 +02:00
Asias He	39ca044dab	repair: Allow repair when a replica is down Since commit `bb56653` (repair: Sync schema from follower nodes before repair), the behaviour of handling down node during repair has been changed. That is, if a repair follower is down, it will fail to sync schema with it and the repair of the range will be skipped. This means a range can not be repaired unless all the nodes for the replicas are up. To fix, we filter out the nodes that is down and mark the repair is partial and repair with the nodes that are still up. Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test Fixes: #4616 Backports: 3.1 Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>	2019-07-09 10:07:36 +03:00
Konstantin Osipov	56f3bda4c7	metrics: introduce a metric for non-local reads A read which arrived to a non-replica and had to be forwarded to a replica by the coordinator is accounted in an own metric, reads_coordinator_outside_replica_set. Most often such read is produced by a driver which is unaware of token distribution on the ring. If a read was forwarded to another replica due to heat weighted load balancing or query preference set by the user, it's not accounted in the metric. In case of a multi-partition read (a query using IN statement, e.g. x in (1, 2, 3)), if any of the keys is read from a non-local node the read is accounted as a non-local. The rationale behind it is that if the user tries to be careful and send IN queries only to the same vnode, they are rewarded with the counter staying at zero, while if they send multi-partition IN queries without any precautions, they will see the metric go up which gives them a starting point for investigating performance problems. Closes #4338	2019-07-08 19:23:38 +03:00
Calle Wilund	5dfc356380	test-serialization: Add small roundtrip test for inet address (v4 + v6) Verify we get back what we put in.	2019-07-08 15:28:21 +00:00
Konstantin Osipov	da1d1b74da	metrics: account writes forwarded by a coordinator in an own metric. Add a metric to account writes which arrived to a non-replica and had to be forwarded by a coordinator to a replica. The name of the added metric is 'writes_coordinator_outside_replica_set'. Do not account forwarded read repair writes, since they are already accounted by a reads_coordinator_outside_replica_set metric, added in a subsequent patch. In scope of #4338.	2019-07-08 18:17:48 +03:00
Calle Wilund	3cfb79e0ff	inet_address/init: Make ipv6 default enabled Makes lookup find any (incl ipv6 numeric) address. Init will look at enable_ipv6 and use explcit ipv4 family lookup if not enabled.	2019-07-08 14:13:10 +00:00
Calle Wilund	1f5e1d22bf	db::config: Add enable ipv6 switch (default off) Off by default to prevent problems during cluster migration when needing to gossip with non-ipv6 aware nodes.	2019-07-08 14:13:09 +00:00
Calle Wilund	c540e36fe2	gms::inet_address: Make serialization ipv6 aware Because inet_address was initially hardcoded to ipv4, its wire format is not very forward compatible. Since we potentially need to communicate with older version nodes, we manually define the new serial format for inet_address to be: ipv4: 4 bytes address ipv6: 4 bytes marker 0xffffffff (invalid address) 16 bytes data -> address	2019-07-08 14:13:09 +00:00
Calle Wilund	e9816efe06	Remove usage of inet_address::raw_addr()	2019-07-08 14:13:09 +00:00
Calle Wilund	4ef940169f	Replace use of "ipv4_addr" with socket_address Allows the various sockets to use ipv6 address binding if so configured.	2019-07-08 14:13:09 +00:00
Calle Wilund	5ba545f493	inet_address: Add optional family to lookup	2019-07-08 14:13:09 +00:00
Calle Wilund	5fd811ec8a	gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address Thusly handle all types net::inet_address can handle. I.e. ipv6.	2019-07-08 14:13:09 +00:00
Calle Wilund	482fd72ca2	types: Add ipv6_address support As ipv4, just redirect to inet_address.	2019-07-08 14:09:25 +00:00
Asias He	b7abaa04da	repair: Futurize get_row_diff to avoid stall The copy of _working_row_buf and boost::copy_range can stall if the number of rows are big. Futurize get_row_diff to avoid stall.	2019-07-08 15:22:16 +08:00
Asias He	a4b24e44a3	repair: Fix possible stall in request_row_hashes The std::find_if and std::copy can stall if the number of rows are big. Introduce a helper move_row_buf_to_working_row_buf to move the rows that yields to avoid stall.	2019-07-08 15:22:16 +08:00
Asias He	b48dc42e73	repair: Allow default construct for repair_row All members of repair_row are now optional. Enable the default constructor so that _row_buf.resize() can work.	2019-07-08 15:22:16 +08:00
Asias He	18fb0714a0	repair: Remove apply_rows It is not used any more. The user now calls apply_rows_on_master_in_thread and apply_rows_on_follower instead.	2019-07-08 15:22:16 +08:00
Asias He	882530ce26	repair: Run get_row_diff_with_rpc_stream in a thread So that we can make get_row_diff_source_op run inside a thread, in turn it can now call apply_rows_on_master_in_thread which eliminates stall.	2019-07-08 15:22:16 +08:00
Asias He	948b833d74	repair: Run get_row_diff_and_update_peer_row_hash_sets inside a thread So it can use apply_rows_on_master_in_thread which eliminates stall.	2019-07-08 15:22:16 +08:00
Asias He	7f29d13984	repair: Run get_row_diff inside a thread So it can use apply_rows_on_master_in_thread which elimiates stall.	2019-07-08 15:22:16 +08:00
Asias He	6b2e3946fb	repair: Add apply_rows_on_master_in_thread Like apply_rows, except it runs inside a thread and runs on master node only.	2019-07-08 15:22:16 +08:00
Asias He	7c6a29027f	repair: Add apply_rows_on_follower Add a version for apply_rows on follower node only.	2019-07-08 15:22:16 +08:00
Asias He	cc14c6e0c4	repair: Futurize working_row_hashes To avoid stall when the number of rows is big.	2019-07-08 15:22:16 +08:00
Asias He	f3d2ba6ec7	repair: Remove get_full_row_hashes helper It is a single wrapper for working_row_hashes and is used only once. Remove it.	2019-07-08 15:22:16 +08:00
Benny Halevy	a0499bbd31	lister::guarantee_type: do not follow symlink Simliar to commit `9785754e0d` lister::guarantee_type needs to check the entry's type, not the symlink it may point to. Fixes #4606 The nodetool_refresh_with_wrong_upload_modes_test dtest creates a broken symlink and following it fails, as it should, with the default follow_symlink::yes Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190626110734.4558-1-bhalevy@scylladb.com>	2019-07-07 15:29:28 +03:00
Avi Kivity	63edd46562	Merge "Expand big decimal with arithmetic operators" from Piotr " This miniseries expands big_decimal interface with convenience operators (-=, +, -), provides test cases for it and makes one of the constructors explicit. Tests: unit(dev) " * 'expand_big_decimal_interface' of https://github.com/psarna/scylla: utils: make string-based big decimal constructor explicit tests: add more operators to big decimal tests utils: add operators to big_decimal	2019-07-06 12:26:08 +03:00
Avi Kivity	24caf0824d	Merge "Complete the LIKE operator" from Dejan " Implement LIKE parsing, intermediate representation, and query processing. Add tests for this implementation (leaving the LIKE functionality tests in tests/like_matcher_test.cc). Refs #4477. " * 'finish-like' of https://github.com/dekimir/scylla: cql3: Add LIKE operator to CQL grammar cql3: Ensure LIKE filtering for partition columns cql3: Add LIKE restriction cql3: Add LIKE relation	2019-07-06 12:26:08 +03:00
kbr-	8995945052	Implement tuple_type_impl::to_string_impl. (#4645 ) Resolves #4633. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-06 12:26:08 +03:00
Avi Kivity	187859ad78	review-checklist: mention that the guidelines are not absolute rules and can be overridden	2019-07-06 12:26:08 +03:00
Kamil Braun	c0915c40eb	Warn user about using SimpleStrategy with Multi DC deployment If the user creates a keyspace with the 'SimpleStrategy' replication class in a multi-datacenter environment, they will receive a warning in the CQL shell and in the server logs. Resolves #4481. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-05 09:25:03 +02:00
Kamil Braun	35dbe9371c	Add warning support to the CQL binary protocol implementation The CQL binary protocol v4 adds support for server-side warnings: https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec This adds a convenient API to add warnings to messages returned to the user. Resolves #4651. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-05 09:24:56 +02:00
Kamil Braun	2f0f53ac72	Add logging of parsed command line options The recognized command line options are now being printed when Scylla is run, together with the whole command used. Fixes #4203.	2019-07-05 09:00:28 +02:00
Piotr Sarna	eed2543bcc	utils: make string-based big decimal constructor explicit As a rule of thumb, single-parameter constructors should be explicit in order to avoid unexpected implicit conversions.	2019-07-04 11:33:00 +02:00
Piotr Sarna	7e722f8dd5	tests: add more operators to big decimal tests	2019-07-04 11:32:57 +02:00
Piotr Sarna	a5e41408ec	utils: add operators to big_decimal For convenience, operators -=, + and - are implemented on top of +=.	2019-07-04 11:32:53 +02:00
Dejan Mircevski	6727e8f073	cql3: Add LIKE operator to CQL grammar Extend the grammar with LIKE and add CQL query tests for it. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 11:01:13 +02:00
Dejan Mircevski	1c583de8bb	cql3: Ensure LIKE filtering for partition columns Partition columns are implicitly filtered whenever possible, avoiding expensive post-processing. But there are exceptions, eg, when partition key is only partially restricted, or for CONTAINS expressions. Here we add LIKE to this list of exceptions. Also fix compute_bounds() to punt on LIKE restrictions, which cannot be translated into meaningful bounds. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 10:59:13 +02:00
Dejan Mircevski	63cec653e5	cql3: Add LIKE restriction This restriction leverages like_matcher to perform filtering. Make single_column_relation::new_LIKE_restriction() return this new restriction. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 10:58:56 +02:00
Dejan Mircevski	21d7722594	cql3: Add LIKE relation Add a new type of relation with operator LIKE. Handle it in relation::to_restriction by introducing a new virtual method for it. The temporary implementation of this method returns null; that will be replaced in a subsequent patch. Add abstract_type::is_string() to recognize string columns and disallow LIKE operator on non-string columns. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 10:54:30 +02:00
Kamil Braun	f155a2d334	Fix command line argument parsing in main. Command line arguments are parsed twice in Scylla: once in main and once in Seastar's app_template::run. The first parse is there to check if the "--version" flag is present --- in this case the version is printed and the program exists. The second parsing is correct; however, most of the arguments were improperly treated as positional arguments during the first parsing (e.g., "--network host" would treat "host" as a positional argument). This happened because the arguments weren't known to the command line parser. This commit fixes the issue by moving the parsing code until after the arguments are registered. Resolves #4141. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-03 14:11:34 +02:00
Avi Kivity	8a0c4d508a	Merge "Repair switch to rpc stream" from Asias " The put_row_diff, get_row_dif and get_full_row_hashes verbs are switched to use rpc stream instead of rpc verb. They are the verbs that could send big rpc messages. The rpc stream sink and source are created per repair follower for each of the above 3 verbs. The sink and source are shared for multiple requests during the entire repair operation for a given range, so there is no overhead to setup rpc stream. The row buffer is now increased to 32MiB from 256KiB, giving better bandwidth in high latency links. The downside of bigger row buffer is reduced possibility that all the rows inside a row buffer are identical. This causes more full hashes to be exchanged. To address this issue, the plan is to add better set reconciliation algorithm in addition to the current send full hashes. I compared rebuild using regular stream plan with repair using rpc stream. With 2 nodes, 1 smp, 8M rows, delete all data on one of the node before repair or rebuild. repair using seastar rpc verb Time to complete: 82.17s rebuild using regular streaming which uses seastar rpc stream Time to complete: 63.87s repair using seastar rpc stream Time to complete: 68.48s For 1) and 3), the improvement is 16.6% (repair using rpc verb v.s. repair using rpc stream) For 2) and 3), the difference is 7.2% (repair v.s. stream) The result is promising for the future repair-based bootstrap/replace node operations. NOTE: We do not actually enable rpc stream in row level repair for now. We will enable it after we fix the the stall issues caused by handling bigger row buffers. Fixes #4581 " * 'repair_switch_to_rpc_stream_v9' of https://github.com/asias/scylla: (45 commits) docs: Add RPC stream doc for row level repair repair: Mark some of the helper functions static repair: Increase max row buf size repair: Hook rpc stream version of verbs in row level repair repair: Add use_rpc_stream to repair_meta repair: Add is_rpc_stream_supported repair: Add needs_all_rows flag to put_row_diff repair: Optimize get_row_diff repair: Register repair_get_full_row_hashes_with_rpc_strea repair: Register repair_put_row_diff_with_rpc_stream repair: Register repair_get_row_diff_with_rpc_stream repair: Add repair_get_full_row_hashes_with_rpc_stream_handler repair: Add repair_put_row_diff_with_rpc_stream_handler repair: Add repair_get_row_diff_with_rpc_stream_handler repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op repair: Add repair_put_row_diff_with_rpc_stream_process_op repair: Add repair_get_row_diff_with_rpc_stream_process_op repair: Add put_row_diff_with_rpc_stream repair: Add put_row_diff_sink_op repair: Add put_row_diff_source_op ...	2019-07-03 10:08:55 +03:00
Asias He	f686f0b9d6	docs: Add RPC stream doc for row level repair This documents RPC stream usage in row level repair.	2019-07-03 08:09:57 +08:00
Asias He	78ae5af203	repair: Mark some of the helper functions static They are used only inside repair/row_level.cc. Make them static.	2019-07-03 08:09:57 +08:00
Asias He	e8c13444ba	repair: Increase max row buf size If the cluster supports row level repair with rpc stream interface, we can use bigger row buf size to have better repair bandwidth in high latency links.	2019-07-03 08:01:37 +08:00
Asias He	7d08a8d223	repair: Hook rpc stream version of verbs in row level repair If rpc stream is supported, use the rpc stream version of the get_row_diff, put_row_diff, get_full_row_hashes.	2019-07-03 08:01:37 +08:00
Asias He	fccaa0324f	repair: Add use_rpc_stream to repair_meta Determine if rpc stream should be used.	2019-07-03 08:01:37 +08:00
Asias He	7bf0c646be	repair: Add is_rpc_stream_supported Given a row_level_diff_detect_algorithm, return if this algo supports rpc stream interface.	2019-07-03 08:01:04 +08:00
Asias He	1c92643f02	repair: Add needs_all_rows flag to put_row_diff So we can avoid copy _working_row_buf in get_row_diff on master node if there is only one follower node and all repair rows are needed by follower node.	2019-07-03 07:56:22 +08:00
Asias He	6595417567	repair: Optimize get_row_diff Move _working_row_buf instead of copy if it is follower node or it is master node with only one follow. In these cases, the _working_row_buf will not be used after this function, so we can move it.	2019-07-03 07:56:22 +08:00
Asias He	c4eb0ee361	repair: Register repair_get_full_row_hashes_with_rpc_strea Register the get_full_row_hashes rpc stream verb.	2019-07-03 07:56:22 +08:00
Asias He	b56cced5b8	repair: Register repair_put_row_diff_with_rpc_stream Register the put_row_diff rpc stream verb.	2019-07-03 07:56:22 +08:00
Asias He	67130031b1	repair: Register repair_get_row_diff_with_rpc_stream Register the get_row_diff rpc stream verb.	2019-07-03 07:56:22 +08:00
Asias He	f255f902bd	repair: Add repair_get_full_row_hashes_with_rpc_stream_handler It is the handler for the get_full_row_hashes rpc stream verb on the receiving side.	2019-07-03 07:56:17 +08:00
Asias He	e3267ad98c	repair: Add repair_put_row_diff_with_rpc_stream_handler It is the handler for the put_row_diff rpc stream verb on the receiving side.	2019-07-03 07:55:24 +08:00
Asias He	06ac014261	repair: Add repair_get_row_diff_with_rpc_stream_handler It is the handler for the get_row_diff rpc stream verb on the receiving side.	2019-07-03 07:54:43 +08:00
Asias He	5f25969da3	repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op It is the helper for the get_full_row_hashes rpc stream verb handler.	2019-07-03 07:54:03 +08:00
Asias He	39d5a9446e	repair: Add repair_put_row_diff_with_rpc_stream_process_op It is the helper for the put_row_diff rpc stream verb handler.	2019-07-03 07:53:21 +08:00
Asias He	049e793fe5	repair: Add repair_get_row_diff_with_rpc_stream_process_op It is the helper for the get_row_diff rpc stream verb handler.	2019-07-03 07:52:12 +08:00
Avi Kivity	fca1ae69ff	database: convert _cfg from a pointer to a reference _cfg cannot be null, so it can be converted to a reference to indicate this. Follow-up to `fe59997efe`.	2019-07-02 17:57:50 +02:00
Calle Wilund	f317d7a975	commitlog: Simplify commitlog extension iteration Fixes #4640 Iterating extensions in commitlog.cc should mimic that in sstables.cc, i.e. a simple future-chain. Should also use same order for read and write open, as we should preserve transformation stack order. Message-Id: <20190702150028.18042-1-calle@scylladb.com>	2019-07-02 18:37:44 +03:00
Takuya ASADA	332a6931c4	dist/redhat: fix install path of scripts On recent changes install.sh mistakenly copies dist/common/scripts to /opt/scylladb/scripts/scripts, it should be /opt/scylladb/scripts. Same on /opt/scylladb/scyllatop as well. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190702120030.13729-1-syuu@scylladb.com>	2019-07-02 17:29:33 +03:00
Asias He	b1188f299e	repair: Add put_row_diff_with_rpc_stream It is rpc stream version of put_row_diff. It uses rpc stream instead of rpc verb to put the repair rows to follower nodes.	2019-07-02 21:22:41 +08:00
Asias He	31b30486a7	repair: Add put_row_diff_sink_op It is a helper that works on the sink() of the put_row_diff rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	dbe035649b	repair: Add put_row_diff_source_op It is a helper that works on the source() of the put_row_diff rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	72d3563da1	repair: Add get_row_diff_with_rpc_stream It is rpc stream version of get_row_diff. It uses rpc stream instead of rpc verb to get the repair rows from follower nodes.	2019-07-02 21:22:41 +08:00
Asias He	4cb44baa08	repair: Add get_row_diff_sink_op It is a helper that works on the sink() of the get_row_diff rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	a1e19514f9	repair: Add get_row_diff_source_op It is a helper that works on the source() of the get_row_diff rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	473bd7599c	repair: Add get_full_row_hashes_with_rpc_stream It is rpc stream version of get_full_row_hashes. It uses rpc stream instead of rpc verb to get the repair hashes data from follower nodes.	2019-07-02 21:22:41 +08:00
Asias He	1e2a598fe7	repair: Add get_full_row_hashes_sink_op It is a helper that works on the sink() of the get_full_row_hashes rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	149c54b000	repair: Add get_full_row_hashes_source_op It is a helper that works on the source() of the get_full_row_hashes rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	b3e7299032	repair: Add sink and source object into repair_meta They will soon be used to sync repair hashes and repair rows bewteen master and follower nodes.	2019-07-02 21:22:41 +08:00
Asias He	acd40fd529	repair: Add sink_source_for_put_row_diff Use sink_source_for_repair to define sink_source_for_put_row_diff with sink = repair_row_on_wire_with_cmd, source = repair_stream_cmd for REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	4405f7a6ff	repair: Add sink_source_for_get_row_diff Use sink_source_for_repair to define sink_source_for_get_row_diff with sink = repair_hash_with_cmd, source = repair_row_on_wire_with_cmd for REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	0bffd07e7e	repair: Add sink_source_for_get_full_row_hashes Use the sink_source_for_repair to define sink_source_for_get_full_row_hashes with sink = repair_stream_cmd, source = repair_hash_with_cmd for REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM rpc stream verb.	2019-07-02 21:22:41 +08:00
Asias He	8400dafa12	repair: Add sink_source_for_repair helper class It is used to store the sink and source objects for the rpc stream verbs used by row level repair.	2019-07-02 21:22:41 +08:00
Asias He	37b3de4ea0	messaging_service: Add REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM support It is used by row level repair.	2019-07-02 21:18:55 +08:00
Asias He	a7c7ba9765	messaging_service: Add REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM support It is used by row level repair.	2019-07-02 21:18:55 +08:00
Asias He	dc92bda93b	messaging_service: Add REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM support	2019-07-02 21:18:55 +08:00
Asias He	f312c95b74	messaging_service: Add do_make_sink_source helper It is used by the row level repair rpc stream verbs to make sink and source object.	2019-07-02 21:18:55 +08:00
Asias He	bc295a00a6	messaging_service: Add rpc stream verb for row level repair - REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM Get repair rows from follower nodes - REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM Put repair rows to follower nodes - REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM: Get full hashes from follower nodes	2019-07-02 21:18:55 +08:00
Asias He	c93113f3a5	idl: Add repair_row_on_wire_with_cmd	2019-07-02 21:18:54 +08:00
Asias He	a90fb24efc	idl: Add repair_hash_with_cmd	2019-07-02 21:18:37 +08:00
Asias He	599d40fbe9	idl: Add repair_stream_cmd	2019-07-02 21:18:15 +08:00
Asias He	672c24f6b0	idl: Add send_full_set_rpc_stream for row_level_diff_detect_algorithm	2019-07-02 21:17:36 +08:00
Avi Kivity	c987397e52	transport: reject initial frames with wild body sizes (#4620 ) If someone opens a connection to port 9042 and sends some random bytes, there is a 1 in 64 probability we'll recognize it as a valid frame (since we only check the version byte, allowing versions 1-4) and we'll try to read frame.length bytes for the body. If this value is very large, we'll run out of memory very quickly. Fix this by checking for reasonable body size (100kB). The initial message must be a STARTUP, whose body is a [string map] of options, of which just three are recognized. 100kB is plenty for future expansion. Note that this does not replace true security on listening ports and only serves to protect against mistakes, not attacks. An attacker can easily exhaust server memory by opening many connections and trickle-feeding them small amounts of data so they appear alive. We can't use the config item native_transport_max_frame_size_in_mb, because that can be legitimately large (and the default is atrocious, 256MB). Fixes #4366.	2019-07-01 19:02:34 +02:00
Tomasz Grabiec	eb496b5eae	Merge "Allow changing configuration at runtime" from Avi This patchset allows changing the configuration at runtime, The user triggers this by editing the configuration file normally, then signalling the database with SIGHUP (as is traditional). The implementation is somewhat complicated due the need to store non-atomic mutable state per-shard and to synchronize the values in all shards. This is somewhat similar to Seastar's sharded<>, but that cannot be used since the configuration is read before Seastar is initialized (due to the need to read command-line options). Tests: unit (dev, debug), manual test with extra prints (dev) Ref #2689 Fixes #2517.	2019-07-01 15:04:59 +02:00
Avi Kivity	28a514820d	Update seastar submodule * seastar a5b9f77d52...44a300cd50 (1): > build: fix dpdk library link order Should fix the build with dpdk enabled.	2019-07-01 11:56:59 +03:00
Takuya ASADA	02c6db29c8	dist/debian: manage .pyc as a part of package Since `828b63f4fb` only add .pyc on .rpm package, we also need it to .deb package. See #4612 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190629023739.8472-1-syuu@scylladb.com>	2019-06-30 15:54:42 +03:00
Avi Kivity	af2a3859f6	Update seastar submodule * seastar b629d5ef7a...a5b9f77d52 (6): > perftune.py: add comment explaining why we don't log errors when binding NVMe IRQs for all but i3.nonmetal machines > sharded: do a two phase shutdown for sharded services > chunked_fifo: add iterator > perftune.py: fix the i3 metal detection pattern > core/memory: remove translation api > reactor: file_type: offer option to not follow symbolic links	2019-06-30 11:32:21 +03:00
Avi Kivity	2abe015150	database: allow live update of the compaction_enforce_min_threshold config item Change the type from bool to updateable_value<bool> throughout the dependency chain and mark it as live updateable. In theory we should also observe the value and trigger compaction if it changes, but I don't think it is worthwhile.	2019-06-28 16:43:25 +03:00
Avi Kivity	c98d1ea942	tests: cql_test_env: prepare config for updateable values Once we start using updateable_value<>, we must make it refer to the updateable_value_source<> on the same shard, and to do that we need to call broadcast_to_all_shards() first (this creates the per-shard copy).	2019-06-28 16:43:25 +03:00
Avi Kivity	8cffec37aa	main: re-read configuration file on SIGHUP Trap SIGHUP and signal a loop to re-read the configuration file.	2019-06-28 16:43:25 +03:00
Avi Kivity	2ee07bb09b	main: preserve config::client_encryption_options configuration source With dynamically updateable configuration, tracking the source of a value is more important, since we'll accept or reject updates depending on the source. Fix the source of client_encryption_options, which we RMW, by preserving the original source.	2019-06-28 16:43:25 +03:00
Avi Kivity	6061a833a3	config: make values updateable Replace the per-shard value we store with an updateable_value_source, which allows updating it dynamically and allows users to track changes. The broadcast_to_all_shards() function is augmented to apply modifications when called on a live system.	2019-06-28 16:43:25 +03:00
Avi Kivity	f7de01d082	config: store copies of config items per shard Since some of our values are not atomic (strings) and the administrative information needed to track references to values is also not atomic, we will need to store them per-shard. To do that we add a vector of per-shard data to config_file, where each element is itself a vector of configuration items. Since we need to operate generically on items (copying them from shard to shard) we store them in a type-erased form. Only mutable state is stored per-shard.	2019-06-28 16:43:25 +03:00
Avi Kivity	fb23cd1ff6	Introduce updatable_value The updateable_value and updateable_value_source classes allow broadcasting configuration changes across the application. The updateable_value_source class represents a value that can be updated, and updateable_value tracks its source and reflects changes. A typical use replaces "uint64_t config_item" with "updateable_value<uint64_t> config_item", and from now on changes to the source will be reflected in config_item. For more complicated uses, which must run some callback when configuration changes, you can also call config_item.observe(callback) to be actively notified of changes.	2019-06-28 16:43:25 +03:00
Avi Kivity	8d7c1c7231	db: seed_provider_type: add operator==() Dynamically updateable configuration requires checking whether configuration items changed or not, so we can skip firing notifiers for the common case where nothing changed. This patch adds a comparison operator for seed_provider_type, which was missing it.	2019-06-28 16:43:25 +03:00
Avi Kivity	da2a98cde6	config: don't allow assignment to config values Currently, we allow adjusting configuration via cfg.whatever() = 5; by returning a mutable reference from cfg.whatever(). Soon, however, this operation will have side effects (updating all references to the config item, and triggering notifiers). While this can be done with a proxy, it is too tricky. Switch to an ordinary setter interface: cfg.whatever.set(5); Because boost::program_options no longer gets a reference to the value to be written to, we have to move the update to a notifier, and the value_ex() function has to be adjusted to infer whether it was called with a vector type after it is called, not before.	2019-06-28 16:43:25 +03:00
Avi Kivity	b146fd1356	config: make noncopyable config_file and db::config are soon not going to be copyable. The reason is that in order to support live updating, we'll need per-shard copies of each value, and per-shard tracking of references to values. While these can be copied, it will be an asycnronous operation and thus cannot be done from a copy constructor. So to prepare for these changes, replace all copies of db::config by references and delete config_file's copy constructor. Some existing references had to be made const in order to adapt the const-ness of db::config now being propagated (rather than being terminated by a non-const copy).	2019-06-28 16:43:25 +03:00
Avi Kivity	fe59997efe	database: don't copy config object Copying the config object breaks the link between the original and the copied object, so updates to config items will not be visible. To allow updates, don't copy any more, and instead keep a pointer. The pointer won't work will once config is updateable, since the same object is shared across multiple shard, but that can be addressed later.	2019-06-28 15:20:39 +03:00
Avi Kivity	339699b627	database: remove default constructor Currently, database::_cfg is a copy of the global configuration. But this means that we have multiple master copies of the configuration, which makes updating the configuration harder. In order to eliminate the copy we have to eliminate the database default constructor, which creates a config object, so that all remaining constructors can receive config by reference and retain that reference.	2019-06-28 15:20:39 +03:00
Avi Kivity	70d8127400	gossip_test: pass configuration to database object We want to eliminate the default database constructor (to be explained in the next patch), so eliminate its only use in gossip_test, using the regular constructor instead.	2019-06-28 15:20:39 +03:00
Glauber Costa	d916601ea4	toppartitions: fix typo toppartitons -> toppartitions Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190627160937.7842-1-glauber@scylladb.com>	2019-06-27 19:13:58 +03:00
Tomasz Grabiec	e071445373	Merge "More precise poisoning in logalloc" from Rafael With this unused descriptors and objects should always be poisoned. * https://github.com/espindola/scylla/ align-descriptors-so-that-they-are-poisoned-v4: Convert macros to inline functions More precise poisoning in logalloc	2019-06-27 16:30:40 +02:00
Takuya ASADA	eabb872789	dist/redhat: install /usr/sbin symlinks correctly On current scylla.spec, shell glob pattern "scylla_setup" does not correctly expanded, it mistakenly created a symlink named "/usr/sbin/scylla_setup". We need to expand them, need to create symlinks for each setup scripts. Fixes #4605 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190627053530.10406-2-syuu@scylladb.com>	2019-06-27 14:22:40 +03:00
Takuya ASADA	828b63f4fb	dist/redhat: manage .pyc as a part of package Since we don't install .pyc files on our package, python3 will generate .pyc file when we launch setup script first time. Then we will have unmanaged files under script directory, it will remain when Scylla package upgraded / removed. We need to compile .py when we generate relocatable package, add compiled .pyc files on .rpm/.deb packages. Fixes #4612 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190627053530.10406-1-syuu@scylladb.com>	2019-06-27 14:22:39 +03:00
Rafael Ávila de Espíndola	d8dbacc7f6	More precise poisoning in logalloc This change aligns descriptors and values to 8 bytes so that poisoning a descriptor or value doesn't interfere with other descriptors and values. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-06-26 13:13:48 -07:00
Rafael Ávila de Espíndola	6a2accb483	Convert macros to inline functions Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-06-26 13:13:48 -07:00
Avi Kivity	dd76943125	Merge "Segregate data when streaming by timestamp for time window compaction strategy" from Botond " When writing streamed data into sstables, while using time window compaction strategy, we have to emit a new sstable for each time window. Otherwise we can end up with sstables, mixing data from wildly different windows, ruining the compaction strategy's ability to drop entire sstables when all data within is expired. This gets worse as these mixed sstables get compacted together with sstables that used to contain a single time window. This series provides a solution to this by segregating the data by its atom's the time-windows. This is done on the new RPC streaming and the new row-level, repair, memtable-flush and compaction, ensuring that the segregation requirement is respected at all times. Fixes: #2687 " * 'segregate-data-into-sstables-by-time-window-streaming/v2.1' of ssh://github.com/denesb/scylla: streaming,repair: restore indentation repair: pass the data stream through the compaction strategy's interposer consumer streaming: pass the data stream through the compaction strategy's interposer consumer TWCS: implement add_interposer_consumer() compaction_strategy: add add_interposer_consumer() Add mutation_source_metadata tests: add unit test for timestamp_based_splitting_writer Add timestamp_based_splitting_writer Introduce mutation_writer namespace	2019-06-26 19:18:52 +03:00
Tomasz Grabiec	3e30a33e31	Merge "Introduce tests::random_schema" from Botond Most of our tests use overly simplistic schemas (`simple_schema`) or very specialized ones that focus on exercising a specific area of the tested code. This is fine in most places as not all code is schema dependent, however practice has showed that there can be nasty bugs hiding in dark corners that only appear with a schema that has a specific combination of types. This series introduces `tests::random_schema` a utility class for generating random schemas and random data for them. An important goal is to make using random schemas in tests as simple and convenient as possible, therefore fostering the appearance of tests using random schemas. Random schema was developed to help testing code I'm currently working on, which segregates data by time-windows. As I wasn't confident in my ability to think of every possible combination of types that can break my code I came up with random-schema to help me finding these corner cases. So far I consider it a success, it already found bugs in my code that I'm not sure I would have found if I had relied on specific schemas. It also found bugs in unrelated areas of the code which proves my point in the first paragraph. * https://github.com/denesb/scylla.git random_schema/v5: tests/data_model: approximate to the modeled data structures data_value: add ascii constructor tests/random-utils.hh: add stepped_int_distribution tests/random-utils.hh: get_int() add overloads that accept external rand engine tests/random-utils.hh: add get_real() tests: introduce random_schema	2019-06-26 18:10:20 +02:00
Botond Dénes	12b8405720	streaming,repair: restore indentation Deferred from the previous two patches.	2019-06-26 18:45:36 +03:00
Botond Dénes	e3f4692868	repair: pass the data stream through the compaction strategy's interposer consumer	2019-06-26 18:45:36 +03:00
Botond Dénes	9c2407573c	streaming: pass the data stream through the compaction strategy's interposer consumer	2019-06-26 18:45:36 +03:00
Botond Dénes	ee563928df	TWCS: implement add_interposer_consumer() Exploit the interposer customization point to inject a consumer that will segregate the mutation stream based on the contained atoms' timestamps, allowing the requirements of TWCS to be mantained every time sstables are written to disk. For the implementation, `timestamp_based_splitting_writer` is used, with a classifier that maps timestamps to windows.	2019-06-26 18:45:36 +03:00
Tomasz Grabiec	2d3e3640df	Merge "Collection: use utils::chunked_vector to store the cells" from Botond This is a band-aid patch that is supposed to fix the immediate problem of large collections causing large allocations. The proper fix is to use IMR but that will take time. In the meanwhile alleviate the pressure on the memory allocator by using a chunked storage collection (utils::chunked_vector) instead of std::vector. In the linked issue seastar::chunked_fifo was also proposed as the container to use, however chunked fifo is not traversable in reverse which disqualifies it from this role. Refs: #3602	2019-06-26 15:32:25 +02:00
Botond Dénes	a280dcfe4c	compaction_strategy: add add_interposer_consumer() This will be the customization point for compaction strategies, used to inject a specific interposer consumer that can manipulate the fragment stream so that it satisfies the requirements of the compaction strategy. For now the only candidate for injecting such an interposer is time-window compaction strategy, which needs to write sstables that only contains atoms belonging to the same time-window. By default no interposer is injected. Also add an accompanying customization point `adjust_partition_estimate()` which returns the estimated per-sstable partition-estimate that the interposer will produce.	2019-06-26 15:45:59 +03:00
Botond Dénes	3ce902a4be	Add mutation_source_metadata This struct contains metadata regarding to a mutation_source. Currently it contains the min and max timestamp. This will be used later by compaction strategies to determine whether a given mutation stream has to be split or not.	2019-06-26 15:45:59 +03:00
Botond Dénes	25d7cbedc0	tests: add unit test for timestamp_based_splitting_writer	2019-06-26 15:45:59 +03:00
Botond Dénes	df29600eec	Add timestamp_based_splitting_writer This writer implements the core logic of time-window based data segregation. It splits the fragment stream provided by a reader, such that each atom (cell) in the stream will be written into a consumer based on the time-window its timestamp belongs to. The end result is that each consumer will only see fragments, whoose atoms all have timestamps belonging to the same time-window. When a mutation fragment has atoms belonging to different time-windows, it is split into as many fragments as needed so each has only atoms that belong to the same time-window.	2019-06-26 15:45:59 +03:00
Botond Dénes	2693f1838a	Introduce mutation_writer namespace Currently there is a single mutation_writer: `multishard_writer`, however in the next path we are going to add another one. This is the right moment to move these into a common namespace (and folder), we have way too much stuff scattered already in the top-level namespace (and folder). Also rename `tests/multishard_writer_test.cc` to `tests/mutation_writer_test.cc`, this test-suite will be the home of all the different mutation writer's unit test cases.	2019-06-26 15:45:59 +03:00
Avi Kivity	adcc95dddc	Merge "sstable: mc: reader: Optimize multi-partition scans for data sets with small partitions" from Tomasz " Currently, parser and the consumer save its state and return the control to the caller, which then figures out that it needs to enter a new partition, and that it doesn't need to skip. We do it twice, after row end, and after row start. All this work could be avoided if the consumer installed by the reader adjusted its state and pushed the fragments on the spot. This patch achieves just that. This results in less CPU overhead. The ka/la reader is left still stopping after row end. Brings a 20% improvement in frag/s for a full scan in perf_fast_forward (Haswell, NVMe): perf_fast_forward -c1 -m1G --run-tests=small-partition-skips: Before: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 0.952372 4 1000000 1050009 755 1050765 1046585 976.0 971 124256 1 0 0 0 0 0 0 0 99.7% After: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 0.790178 4 1000000 1265538 1150 1266687 1263684 975.0 971 124256 2 0 0 0 0 0 0 0 99.6% Tests: unit (dev) " * 'sstable-optimize-partition-scans' of https://github.com/tgrabiec/scylla: sstable: mc: reader: Do not stop parsing across partitions sstables: reader: Move some parser state from sstable_mutation_reader to mp_row_consumer_reader sstables: reader: Simplify _single_partition_read checking sstables: reader: Update stats from on_next_partition() sstables: mutation_fragment_filter: Drop unnecessary calls to _walker.out_of_range() sstables: ka/la: reader make push_ready_fragments() safe to call many times sstables: mc: reader: Move out-of-range check out of push_ready_fragments() sstables: reader: Return void from push_ready_fragments() sstables: reader: Rename on_end_of_stream() to on_out_of_clustering_range() sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end	2019-06-26 13:19:12 +03:00
Avi Kivity	06a9596491	tests: cql_test_env: disable commitlog O_DSYNC O_DSYNC causes commitlog to pre-allocate each commitlog segment by writing zeroes into it. In normal operation, this is amortized over the many times the segment will be reused. In tests, this is wasteful, but under the default workstation configuration with /tmp using tmpfs, no actual writes occur. However on a non-default configuration with /tmp mounted on a real disk, this causes huge disk I/O and eventually a crash (observed in schema_change_test). The crash is likely only caused indirectly, as the extra I/O (exacerbated by many tests running in parallel) xcauses timeouts. I reproduced this problem by running 15 copies of schema_change_test in parallel with /tmp mounted on a real filesystem. Without this change, I usually observe one or two of the copies crashing, with the change they complete (and much more quickly, too).	2019-06-26 12:15:53 +02:00
Asias He	f0f0beba2e	repair: Move the global tracker object into repair_service The tracker object was a static object in repair.cc. At the time we initialize it, we do not know the smp::count, so we have to initialize the _repairs object when it is used on the fly. void init_repair_info() { if (_repairs.size() != smp::count) { _repairs.resize(smp::count); } } This introduces a race if init_repair_info is called on different thread(shard). To fix, put the tracker object inside the newly introduced repair_service object which is created in main.cc. Fixes #4593 Message-Id: <b1adef1c0528354d2f92f8aaddc3c4bee5dc8a0a.1561537841.git.asias@scylladb.com>	2019-06-26 12:53:10 +03:00
Botond Dénes	572a738777	collection: use chunked_vector to store cells This is quick fix to the immediate problem of large collections causing large allocations, triggering stalls or OOM. The proper fix is to use IMR for storing the cells, but that is a complex change that will require time, so let's not stall/OOM in the meanwhile.	2019-06-26 11:40:44 +03:00
Botond Dénes	c68ffc330e	types: don't copy collection_type_impl::mutation_view Just because its a view its not cheap to copy.	2019-06-26 11:39:41 +03:00
Asias He	fb3f0125ee	repair: Add default construct for partition_key_and_mutation_fragments This is useful when we want to add an empty partition_key_and_mutation_fragments.	2019-06-26 09:12:55 +08:00
Asias He	3fc53a6b72	repair: Add send_full_set_rpc_stream in row_level_diff_detect_algorithm It is used to negotiate if the master can use the rpc stream interface to transfer data.	2019-06-26 09:12:55 +08:00
Asias He	6054a56333	repair: Add repair_row_on_wire_with_cmd It is used to contain both a repair cmd and repair_row_on_wire object.	2019-06-26 09:12:55 +08:00
Asias He	9f36d775dc	repair: Add repair_hash_with_cmd It is a wrapper contains both a repair cmd and repair_hash object.	2019-06-26 09:12:55 +08:00
Asias He	6b59279e26	repair: Add repair_stream_cmd It is used by row level repair to add small protocol on top of the rpc stream interface.	2019-06-26 09:12:55 +08:00
Rafael Ávila de Espíndola	94d2194c77	dht: token: Simplify operator< While this is a strict weak ordering, it is not obvious and duplicates a bit of logic. This ptach simplifies it by using tri_compare. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190621204820.37874-1-espindola@scylladb.com>	2019-06-25 19:06:30 +03:00
Tomasz Grabiec	269e65a8db	Merge "Sync schema before repair" from Asias This series makes sure new schema is propagated to repair master and follower nodes before repair. Fixes #4575 * dev.git asias/repair_pull_schema_v2: migration_manager: Add sync_schema repair: Sync schema from follower nodes before repair	2019-06-25 19:05:29 +03:00
Amos Kong	f0cd589a75	dist: suppress the yaml load warning YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Fix it by use new safe interface - yaml.safe_load() Signed-off-by: Amos Kong <amos@scylladb.com> Cc: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <9b68601845117274573474ede0341cc81f80efa6.1561156205.git.amos@scylladb.com>	2019-06-25 19:05:29 +03:00
Avi Kivity	fc629bb14f	Merge "cql3: lift infinite bound check" from Benny & Piotr " If the database supports infinite bound range deletions, CQL layer will no longer throw an error indicating that both ranges need to be specified. Fixes #432 Update test_range_deletion_scenarios unit test accordingly. " * 'cql3-lift-infinite-bound-check' of https://github.com/bhalevy/scylla: cql3: lift infinite bound check if it's supported service: enable infinite bound range deletions with mc database: add flag for infinite bound range deletions	2019-06-25 19:05:29 +03:00
Nadav Har'El	a88c9ca5a5	Merge branch 'add_proper_aggregation_for_paged_indexing_2' of git://github.com/psarna/scylla into next Piotr Sarna says: Fixes #4540 This series adds proper handling of aggregation for paged indexed queries. Before this series returned results were presented to the user in per-page partial manner, while they should have been returned as a single aggregated value. Tests: unit(dev) Piotr Sarna (8): cql3: split execute_base_query implementation cql3: enable explicit copying of query_options cql3: add a query options constructor with explicit page size cql3: add proper aggregation to paged indexing cql3: make DEFAULT_COUNT_PAGE_SIZE constant public tests: add query_options to cquery_nofail tests: add indexing + paging + aggregation test case tests: add indexing+paging test case for clustering keys	2019-06-25 19:05:29 +03:00
Avi Kivity	7195f75fb2	Update seastar submodule * seastar ded50bd8a4...b629d5ef7a (9): > sharded: no_sharded_instance_exception: fix grammar > core,net: output_stream: remove redundant std::move() > perftune: make sure that ethtool -K has a chance of succeeding > net/dpdk: upgrade to dpdk-19.05 > perftune.py: Fix a few more places where we use deprecated pyudev.Device ones > reactor: provide an uptime function > rpc: add sink::flush() to streaming api > Use a table to document the various build modes > foreign_ptr: Fix compilation error due to unused variable	2019-06-25 19:05:29 +03:00
Avi Kivity	9d21341733	review-checklist.md: add common checks - code style - naming - micro-performance - concurrency - unit-testing - templates and type erasure - singletons	2019-06-25 19:05:29 +03:00
Piotr Sarna	efa7951ea5	main: stop view builder conditionally The view builder is started only if it's enabled in config, via the view_building=true variable. Unfortunately, stopping the builder was unconditional, which may result in failed assertions during shutdown. To remedy this, view building is stopped only if it was previously started. Fixes #4589	2019-06-25 19:05:29 +03:00
Asias He	bb5665331c	repair: Sync schema from follower nodes before repair Since commit "repair: Use the same schema version for repair master and followers", repair master and followers uses the same schema version that master decides to use during the whole repair operation. If master has older version of schema, repair could ignore the data which makes use of the new schema, e.g., writes to new columns. To fix, always sync the schema agreement before repair. The master node pulls schema from followers and applies locally. The master then uses the "merged" schema. The followers use get_schema_for_write() to pull the "merged" schema. Fixes #4575 Backports: 3.1	2019-06-25 17:13:47 +08:00
Asias He	14c1a71860	migration_manager: Add sync_schema Makes sure this node knows about all schema changes known by "nodes" that were made prior to this call. Refs: #4575 Backports: 3.1	2019-06-25 17:13:47 +08:00
Botond Dénes	d00cb4916c	tests: introduce random_schema random_schema is a utility class that provides methods for generating random schemas as well as generating data (mutations) for them. The aim is to make using random schemas in tests as simple and convenient as is using `simple_schema`. For this reason the interface of `random_schema` follows closely that of `simple_schema` to the extent that it makes sense. An important difference is that `random_schema` relies on `data_model` to actually build mutations. So all its mutation-related operations work with `data_model::mutation_descrition` instead of actual `mutation` objects. Once the user arrived at the desired mutation description they can generate an actual mutation via `data_model::mutation_description::build()`. In addition to the `random_schema` class, the `random_schema.hh` header exposes the generic utility classes for generating types and values that it internally uses. random_schema is fully deterministic. Using the same seed and the same set of operations is guaranteed to result in generating the same schema and data.	2019-06-25 12:01:33 +03:00
Botond Dénes	070d72ee23	tests/random-utils.hh: add get_real()	2019-06-25 12:01:33 +03:00
Botond Dénes	2d9f6c3b63	tests/random-utils.hh: get_int() add overloads that accept external rand engine	2019-06-25 12:01:33 +03:00
Botond Dénes	2a7710129e	tests/random-utils.hh: add stepped_int_distribution	2019-06-25 12:01:33 +03:00
Botond Dénes	a3f9932a2f	data_value: add ascii constructor To allow a `data_value` with `ascii_type` to be constructed.	2019-06-25 12:01:33 +03:00
Botond Dénes	1bd8b77770	tests/data_model: approximate to the modeled data structures Make the the data modelling structures model their "real" counterparts more closely, allowing the user greater control on the produced data. The changes: * Add timestamp to atomic_value (which is now a struct, not just an alias to bytes). * Add tombstone to collection. * Add row_tombstone to row. * Add bound kinds and tombstone to range_tombstone. Great care was taken to preserve backward compatibility, to avoid unnecessary changes in existing code.	2019-06-25 12:01:33 +03:00
Piotr Sarna	add40d4e59	cql3: lift infinite bound check if it's supported If the database supports infinite bound range deletions, CQL layer will no longer throw an error indicating that both ranges need to be specified. [bhalevy] Update test_range_deletion_scenarios unit test accordingly. Fixes #432 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-24 15:58:34 +03:00
Piotr Sarna	c19fdc4c90	service: enable infinite bound range deletions with mc As soon as it's agreed that the cluster supports sstables in mc format, infinite bound range deletions in statements can be safely enabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-24 15:58:28 +03:00
Piotr Sarna	e77ef849af	database: add flag for infinite bound range deletions Database can only support infinite bound range deletions if sstable mc format is supported. As a first step to implement these checks, an appropriate flag is added to database.	2019-06-24 15:57:47 +03:00
Piotr Sarna	b668ee2b2d	tests: add indexing+paging test case for clustering keys Indexing a non-prefix part of the clustering key has a separate code path (see issue #3405), so it deserves a separate test case.	2019-06-24 14:51:17 +02:00
Piotr Sarna	3d9a37f28f	tests: add indexing + paging + aggregation test case Indexed queries used to erroneously return partial per-page results for aggregation queries. This test case used to reproduce the problem and now ensures that there would be no regressions. Refs #4540	2019-06-24 14:06:42 +02:00
Piotr Sarna	60cafcc39c	tests: add query_options to cquery_nofail The cquery_nofail utility is extended, so it can accept custom query options, just like execute_cql does.	2019-06-24 14:06:41 +02:00
Piotr Sarna	fe18638de3	cql3: make DEFAULT_COUNT_PAGE_SIZE constant public The constant will be later used in test scenarios.	2019-06-24 13:21:37 +02:00
Piotr Sarna	bb08af7e68	cql3: add proper aggregation to paged indexing Aggregated and paged filtering needs to aggregate the results from all pages in order to avoid returning partial per-page results. It's a little bit more complicated than regular aggregation, because each paging state needs to be translated between the base table and the underlying view. The routine keeps fetching pages from the underlying view, which are then used to fetch base rows, which go straight to the result set builder. Fixes #4540	2019-06-24 13:21:32 +02:00
Piotr Sarna	97d476b90f	cql3: add a query options constructor with explicit page size For internal use, there already exists a query_options constructor that copies data from another query_options with overwritten paging state. This commit adds an option to overwrite page size as well.	2019-06-24 13:21:32 +02:00
Piotr Sarna	fa89e220ef	cql3: enable explicit copying of query_options	2019-06-24 12:57:04 +02:00
Piotr Sarna	7a8b243ce4	cql3: split execute_base_query implementation In order to handle aggregation queries correctly, the function that returns base query results is split into two, so it's possible to access raw query results, before they're converted into end-user CQL message.	2019-06-24 12:57:03 +02:00
Benny Halevy	b1e78313fe	log_histogram: log_heap_options::bucket_of: avoid calling pow2_rank(0) pow2_rank is undefined for 0. bucket_of currently works around that by using a bitmask of 0. To allow asserting that count_{leading,trailing}_zeros are not called with 0, we want to avoid it at all call sites. Fixes #4153 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190623162137.2401-1-bhalevy@scylladb.com>	2019-06-23 19:32:51 +03:00
Avi Kivity	779b378785	Merge "Fix partitioned_sstable_set by making it self sufficient" from Raphael & Benny " partitioned_sstable_set is not self sufficient because it relies on compatible_ring_position_view, which in turn relies on lifetime of sstable object. This leads to use-after-free. Fix this problem by introducing compatible_ring_position and using it in p__s__s. Fixes #4572. Test: unit (dev), compaction dtests (dev) " * 'projects/fix_partitioned_sstable_set/v4' of ssh://github.com/bhalevy/scylla: tests: Test partitioned sstable set's self-sufficiency sstables: Fix partitioned_sstable_set by making it self sufficient Introduce compatible_ring_position and compatible_ring_position_or_view	2019-06-23 17:17:18 +03:00
Raphael S. Carvalho	14fa7f6c02	tests: Test partitioned sstable set's self-sufficiency Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-23 16:29:13 +03:00
Raphael S. Carvalho	293557a34e	sstables: Fix partitioned_sstable_set by making it self sufficient Partitioned sstable set is not self sufficient, because it uses compatible_ring_position_view as key for interval map, which is constructed from a decorated key in sstable object. If sstable object is destroyed, like when compaction releases it early, partitioned set potentially no longer works because c__r__p__v would store information that is already freed, meaning its use implies use-after-free. Therefore, the problem happens when partitioned set tries to access the interval of its interval map and uses freed information from c__r__p__v. Fix is about using the newly introduced compatible_ring_position_or_view which can hold a ring_position, meaning that partitioned set is no longer dependent on lifetime of sstable object. Retire compatible_ring_position_view.hh as it is now unused. Fixes #4572. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-23 16:29:13 +03:00
Raphael S. Carvalho	9a83561700	Introduce compatible_ring_position and compatible_ring_position_or_view The motivation for supporting ring position is that containers using it can be self sufficient. The existing compatible_ring_position_view could lead to use after free when the ring position data, it was built from, is gone. The motivation for compatible_ring_position_or_view is to allow lookup on containers that don't support different key types using c__r__p, and also to avoid unnecessary copies. If the user is provided only with a ring_position_view, c__r__p__or_v could be built from it and used for lookups. Converting ring_position_view to ring_position is very bug prone because there could be information lost in the process. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-23 16:29:12 +03:00
Rafael Ávila de Espíndola	65ac0a831c	Add to_string_impl that takes a data_value Currently to_string takes raw bytes. This means that to print a data_value it has to first be serialized to be passed to to_string, which will then deserializes it. This patch adds a virtual to_string_impl that takes a data_value and implements a now non virtual to_sting on top of it. I don't expect this to have a performance impact. It mostly documents how to access a data_value without converting it to bytes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190620183449.64779-3-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola	3bd5dd7570	Add a few more tests of data_value::to_string I found that no tests covered this code while refactoring it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190620183449.64779-2-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Nadav Har'El	6e87bca65d	storage_proxy: fix race and crash in case of MV and other node shutdown Recently, in merge commit `2718c90448`, we added the ability to cancel pending view-update requests when we detect that the target node went down. This is important for view updates because these have a very long timeout (5 minutes), and we wanted to make this timeout even longer. However, the implementation caused a race: Between creating the update's request handler (create_write_response_handler()) and actually starting the request with this handler (mutate_begin()), there is a preemption point and we may end up deleting the request handler before starting the request. So mutate_begin() must gracefully handle the case of a missing request handler, and not crash with a segmentation fault as it did before this patch. Eventually the lifetime management of request handlers could be refactored to avoid this delicate fix (which requires more comments to explain than code), or even better, it would be more correct to cancel individual writes when a node goes down, not drop the entire handler (see issue #4523). However, for now, let's not do such invasive changes and just fix bug that we set out to fix. Fixes #4386. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190620123949.22123-1-nyh@scylladb.com>	2019-06-23 16:03:06 +03:00
Asias He	b99c75429a	repair: Avoid searching all the rows in to_repair_rows_on_wire The repair_rows in row_list are sorted. It is only possible for the current repair_row to share the same partition key with the last repair_row inserted into repair_row_on_wire. So, no need to search from the beginning of the repair_rows_on_wire to avoid quadratic complexity. To fix, look at the last item in repair_rows_on_wire. Fixes #4580 Message-Id: <08a8bfe90d1a6cf16b67c210151245879418c042.1561001271.git.asias@scylladb.com>	2019-06-23 16:03:06 +03:00
Benny Halevy	883cb4318f	Merge pull request #4583 from bhalevy/init-and-shutdown-logging Init and shutdown logging	2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola	3660caff77	Reduce memory used by all tests Tests without custom flags were already being run with -m2G. Tests with custom flags have to manually specify it, but some were missing it. This could cause tests to fail with std::bad_alloc when two concurrent tests tried to allocate all the memory. This patch adds -m2G to all tests that were missing it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190620002921.101481-1-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Avi Kivity	9229afe64f	Merge "Fix infinite paging for indexed queries" from Piotr " Fixes #4569 This series fixes the infinite paging for indexed queries issue. Before this fix, paging indexes tended to end up in an infinite loop of returning pages with 0 results, but has_more_pages flag set to true, which confused the drivers. Tests: unit(dev) Branches: 3.0, 3.1 " * 'fix_infinite_paging_for_indexed_queries' of https://github.com/psarna/scylla: tests: add test case for finishing index paging cql3: fix infinite paging for indexed queries	2019-06-23 16:03:06 +03:00
Takuya ASADA	2135d2ae7f	dist/debian: install capabilities.conf on postinst script We still has "{{^jessie}}" tag on scylla-server systemd unit file to skip using AmbientCapabilities on Debian 8, but it does not able to work anymore since we moved to single binary .deb package for all debian variants, we must share same systemd unit file across all Debian variants. To do so we need to have separated file on /etc/systemd to define AmbientCapabilities, create the file while running postinst script only if distribution is not Debian 8, just like we do in .rpm. See #3344 See #3486 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190619064224.23035-1-syuu@scylladb.com>	2019-06-23 16:03:06 +03:00
Tomasz Grabiec	46341bd63f	gdb: Print coordinator stats related to memory usage from 'scylla memory' Example: Coordinator: fg writes: 150 bg writes: 39980, 21429280 B fg reads: 0 bg reads: 0 hints: 0 B view hints: 0 B Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1559906745-17150-1-git-send-email-tgrabiec@scylladb.com>	2019-06-23 16:03:06 +03:00
Tomasz Grabiec	f7e79b07d1	lsa: Respect the reclamation step hint from seastar allocator This will allow us to reduce the amount of segment compaction when reclaiming on behlaf of a large allocation because we'll evict much more up front. Tests: - unit (dev) Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1559906584-16770-1-git-send-email-tgrabiec@scylladb.com>	2019-06-23 16:03:06 +03:00
Tomasz Grabiec	c5184b3dd0	gdb: Print region_impl pointer from scylla lsa Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1559906684-17019-1-git-send-email-tgrabiec@scylladb.com>	2019-06-23 16:03:06 +03:00
Alexys Jacob	98bc9edf6f	thrift/: support version 0.11+ after THRIFT-2221 Thrift 0.11 changed to generate c++ code with std::shared_ptr instead of boost::shared_ptr. - https://issues.apache.org/jira/browse/THRIFT-2221 This was forcing scylla to stick with older versions of thrift. Fixes issue #3097. thrift: add type aliases to build with old and new versions. update to using namespace =	2019-06-23 16:03:06 +03:00
Takuya ASADA	e4320d6537	dist/debian: run 'systemctl daemon-reload' automatically on package install/uninstall Since we cannot use dh --with=systemd because we don't want to automatically enabling systemd units, manage them by our setup scripts, we have to do 'systemctl daemon-reload' manually. (On dh --with=systemd, systemd helper automatically provides such scirpts) Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190618000210.28972-1-syuu@scylladb.com>	2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola	8c067c26d9	Add support for the sanitize build mode in scylla Running tests in debug mode takes 25:22.08 in my machine. Using sanitize instead takes that down to 10:46.39. The mode is opt in, in that it must be explicitly selected with "configure.py --mode=sanitize" or "ninja sanitize". It must also be explicitly passed to test.py. Unfortunately building with asan, optimizations and debug info is very slow and there is nothing like -gline-tables-only in gcc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190617170007.44117-1-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Benny Halevy	1fd91eb616	main: add logging for deferred stopping Increase visibility of init messages to help diagnose init and shutdown issues. Ref #4384 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-20 13:04:36 +03:00
Benny Halevy	cbbe5a519a	main: improve init logging Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-20 13:04:36 +03:00
Benny Halevy	e96b1afdbd	supervisor::notify log at info level rather than trace Increase visibility of init messages to help diagnose init and shutdown issues. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-06-20 13:04:36 +03:00
Tomasz Grabiec	fa2ed3ecce	sstable: mc: reader: Do not stop parsing across partitions Currently, parser and the consumer save its state and return the control to the caller, which then figures out that it needs to enter a new partition, and that it doesn't need to skip. We do it twice, after row end, and after row start. All this work could be avoided if the consumer installed by the reader adjusted its state and pushed the fragments on the spot. This patch achieves just that. This results in less CPU overhead. The ka/la reader is left still stopping after row end. Brings a 20% improvement in frag/s for a full scan in perf_fast_forward (Haswell, NVMe): perf_fast_forward -c1 -m1G --run-tests=small-partition-skips: Before: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 0.952372 4 1000000 1050009 755 1050765 1046585 976.0 971 124256 1 0 0 0 0 0 0 0 99.7% After: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 0.790178 4 1000000 1265538 1150 1266687 1263684 975.0 971 124256 2 0 0 0 0 0 0 0 99.6%	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	386079472a	sstables: reader: Move some parser state from sstable_mutation_reader to mp_row_consumer_reader This state will be needed by the consumer to handle crossing partition boundaries on its own. While at it, document it.	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	92cb07debd	sstables: reader: Simplify _single_partition_read checking The old code was making advance_to_next_partition() behave incorrectly when _single_partition_read, which was compensated by a check in read_partition(). Cleaner to exit early.	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	7f4c041ba0	sstables: reader: Update stats from on_next_partition() After partition_start is emitted directly from the parser's consumer, read_partition() will not always be called for each produced partition.	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	0964a8fb38	sstables: mutation_fragment_filter: Drop unnecessary calls to _walker.out_of_range() out_of_range() cannot change to true when the position falls into the ranges, we only need to check it when it falls outside them.	2019-06-19 14:29:02 +02:00
Tomasz Grabiec	556ccf4373	sstables: ka/la: reader make push_ready_fragments() safe to call many times Not a bug fix, just makes the implementation more robust against changes. Before this patch this might have resulted in partition_end being pushed many times.	2019-06-19 14:29:01 +02:00
Tomasz Grabiec	ef6edff673	sstables: mc: reader: Move out-of-range check out of push_ready_fragments() Currently, calling push_ready_fragments() with _mf_filter disengaged or with _mf_filter->out_of_range() causes it to call _reader->on_out_of_clustering_range(), which emits the partition_end fragment. It's incorrect to emit this fragment twice, or zero times, so correctness depends on the fact that push_ready_fragments() is called exactly once when transitioning between partitions. This is proved to be tricky to ensure, especially after partition_end starts to be emitted in a different path as well. Ensuring that push_ready_fragments() is NOT called after partition_end is emitted from consume_partition_end() becomes tricky. After having to fix this problem many times after unrelated changes to the flow, I decide that it's better to refactor. This change moves the call of on_out_of_clustering_range() out of push_ready_fragments(), making the latter safe to call any number of times. The _mf_filter->out_of_range() check is moved to sites which update the filter. It's also good because it gets rid of conditionals.	2019-06-19 14:29:01 +02:00
Tomasz Grabiec	552fe21812	sstables: reader: Return void from push_ready_fragments() The result is ignored, which is fine, so make it official to avoid confusion.	2019-06-19 14:29:01 +02:00
Tomasz Grabiec	1488b57933	sstables: reader: Rename on_end_of_stream() to on_out_of_clustering_range() The old name is confusing, because we're not always ending the stream when we call it.	2019-06-19 14:29:01 +02:00
Tomasz Grabiec	9b8ac5ecbc	sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end Currently, if there is a fragment in _ready and _out_of_range was set after row end was consumer, push_ready_fragments() would return without emitting partition_end. This is problematic once we make consume_row_start() emit partiton_start directly, because we will want to assume that all fragments for the previous partition are emitted by then. If they're not, then we'd emit partition_start before partition_end for the previous partition. The fix is to make sure that push_ready_fragments() emits everything.	2019-06-19 14:14:38 +02:00
Piotr Sarna	b8cadc928c	tests: add test case for finishing index paging The test case makes sure that paging indexes does not result in an infinite loop. Refs #4569	2019-06-19 14:10:13 +02:00
Piotr Sarna	88f3ade16f	cql3: fix infinite paging for indexed queries Indexed queries need to translate between view table paging state and base table paging state, in order to be able to page the results correctly. One of the stages of this translation is overwriting the paging state obtained from the base query, in order to return view paging state to the user, so it can be used for fetching next pages. Unfortunately, in the original implementation the paging state was overwritten only if more pages were available, while if 'remaining' pages were equal to 0, nothing was done. This is not enough, because the paging state of the base query needs to be overwritten unconditionally - otherwise a guard paging state value of 'remaining == 0' is returned back to the client along with 'has_more_pages = true', which will result in an infinite loop. This patch correctly overwrites the base paging state unconditionally. Fixes #4569	2019-06-19 14:10:13 +02:00
Tomasz Grabiec	cd1ff1fe02	Merge "Use same schema version for repair nodes" from Asias This patch set fixes repair nodes using different schema version and optimizes the hashing thanks to the fact now all nodes uses same schema version. Fixes: #4549 * seastar-dev.git asias/repair_use_same_schema.v3: repair: Use the same schema version for repair master and followers repair: Hash column kind and id instead of column name and type name	2019-06-18 12:42:53 +02:00
Asias He	4285801af9	repair: Hash column kind and id instead of column name and type name It is guaranteed repair nodes use the same schema. It is faster to hash column kind and id. Changing the hashing of mutation fragment causes incompatibility with mixed clusters. Let's backport to the 3.1 release, which includes row level repair for the first time and is not released yet. Refs: #4549 Backports: 3.1	2019-06-18 18:27:21 +08:00
Asias He	3db136f81e	repair: Use the same schema version for repair master and followers Before this patch, repair master and followers use their own schema version at the point repair starts independently. The schemas can be different due to schema change. Repair uses the schema to serialize mutation_fragment and deserialize the mutation_fragment received from peer nodes. Using different schema version to serialize and deserialize cause undefined behaviour. To fix, we use the schema the repair master decides for all the repair nodes involved. On top of this patch, we could do another step to make sure all nodes has the latest schema. But let's do it in a separate patch. Fixes: #4549 Backports: 3.1	2019-06-18 18:27:21 +08:00
Rafael Ávila de Espíndola	8672eddff2	Document the best practices for when to use asserts/exceptions/logs The intention is just to document what is currently done. If someone wants to propose changes, that can be done after the current practices have been documented. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190524135109.29436-1-espindola@scylladb.com>	2019-06-18 12:13:01 +03:00
Rafael Ávila de Espíndola	26c0814a88	Add test large collection warning This was already working, but we were not testing for it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190617181706.66490-1-espindola@scylladb.com>	2019-06-18 10:27:55 +02:00
Nadav Har'El	6aab1a61be	Fix deciding whether a query uses indexing The code that decides whether a query should used indexing was buggy - a partition key index might have influenced the decision even if the whole partition key was passed in the query (which effectively means that indexing it is not necessary). Fixes #4539 Closes https://github.com/scylladb/scylla/pull/4544 Merged from branch 'fix_deciding_whether_a_query_uses_indexing' of git://github.com/psarna/scylla tests: add case for partition key index and filtering cql3: fix deciding if a query uses indexing	2019-06-18 01:01:14 +03:00
Takuya ASADA	7320c966bc	dist/common/scripts/scylla_setup: don't proceed with empty NIC name Currently NIC selection prompt on scylla_setup just proceed setup when user just pressed Enter key on the prompt. The prompt should ask NIC name again until user input correct NIC name. Fixes #4517 Message-Id: <20190617124925.11559-1-syuu@scylladb.com>	2019-06-17 15:52:29 +03:00
Avi Kivity	938b74f47a	Merge "Fix gcc9 build" from Paweł " These patches fix remaining issues with gcc9 build, that involve a gcc9 bug, a gcc9 bug, and a stricter warning. Tests: unit(debug, dev, release). " * 'fix-gcc9-build' of https://github.com/pdziepak/scylla: dht/ring_position: silence complaints about uninitialised _token_bound xx_hasher: disable -Warray-bounds api/column_family: work around gcc9 bug in seastar::future<std::any>	2019-06-17 15:23:24 +03:00
Tomasz Grabiec	f798f724c8	frozen_mutation: Guard against unfreezing using wrong schema Currently, calling unfreeze() using the wrong version of the schema results in undefined behavior. That can cause hard-to-debug problems. Better to throw in such cases. Refs #4549. Tests: - unit (dev) Message-Id: <1560459022-23786-1-git-send-email-tgrabiec@scylladb.com>	2019-06-17 15:23:24 +03:00
Asias He	f32371727b	repair: Avoid copying position in to_repair_rows_list No need to make a copy because it is not used to construct repair_row any more since commit `9079790f85` (repair: Avoid writing row with same partition key and clustering key more than once). Use mf->position() instead. Refs: #4510 Backports: 3.1 Message-Id: <7b21edcc3368036b6357b5136314c0edc22ad4d2.1560753672.git.asias@scylladb.com>	2019-06-17 15:23:24 +03:00
Paweł Dziepak	483f66332b	dht/ring_position: silence complaints about uninitialised _token_bound	2019-06-17 13:11:20 +01:00
Paweł Dziepak	82b8450922	xx_hasher: disable -Warray-bounds In release mode gcc9 has a false positive warning about out of bound access in xxhash implementation: ./xxHash/xxhash.c:799:27: error: array subscript -3 is outside array bounds of ‘long unsigned int [1]’ [-Werror=array-bounds] This is solved by disabling -Warray-bounds in the xxhash code.	2019-06-17 13:09:54 +01:00
Paweł Dziepak	8a13d96203	api/column_family: work around gcc9 bug in seastar::future<std::any> There is a gcc9 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415 that makes it impossible to pass std::any through a seastar::future<T>. Fortunately, there is only one user of seastar::future<std::any> in Scylla and it is not performance-critical. This patch avoids the gcc9 bug by using seastar::future<std::unique_ptr<std::any>>.	2019-06-17 13:06:28 +01:00
Glauber Costa	91b71a0b1a	do not allow multiple snapshot operations at the same time We saw a node crashing today with nodetool clearsnapshot being called. After investigation, the reason is that nodetool clearsnapshot ws called at the same time a new snapshot was created with the same tag. nodetool clearsnapshot can't delete all files in the directory, because new files had by then been created in that directory, and crashes on I/O error. There are, many problems with allowing those operations to proceed in parallel. Even if we fix the code not to crash and return an error on directory non-empty, the moment they do any amount of work in parallel the result of the operation becomes undefined. Some files in the snapshot may have been deleted by clear, for example, and a user may then not be able to properly restore from the backup if this snapshot was used to generate a backup. Moreover, although we could lock at the granularity of a keyspace or column family, I think we should use a big hammer here and lock the entire snapshot creation/deletion to avoid surprises (for example, if a user requests creation of a snapshot for all keyspaces, and another process requests clear of a single keyspace) Fixes #4554 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190614174438.9002-1-glauber@scylladb.com>	2019-06-16 10:30:13 +03:00
Rafael Ávila de Espíndola	44eb939aa6	Use the sanitizer flags from seastar In practice, we always want to use the same sanitizer flags with seastar and scylla. Seastar was already marking its sanitizer flags public, so what was missing was exporting the link flags via pkgconfig and dropping the duplicates from scylla. I am doing this after wasting some time editing the wrong file. This depends on the seastar patch to export the sanitizer flags in pkgconfig. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-06-16 09:21:10 +03:00
Takuya ASADA	f582a759ee	dist: merge /usr/lib/scylla to /opt/scylladb We used to use /opt/scylladb just for Scylla build toolchain and dependency libraries, not for Scylla main package. But since we merged relocatable package, Scylla main binary and dependency libraries are all located under /opt/scylladb, only setup scripts remained on /usr/lib/scylla. It strange to keep using both /usr/lib/<app name> and /opt/<app name>, we should merge them into single place. Message-Id: <20190614011038.17827-1-syuu@scylladb.com>	2019-06-14 21:03:36 +03:00
Piotr Jastrzebski	a41c9763a9	sstables: distinguish empty and missing cellpath Before this patch mc sstables writer was ignoring empty cellpaths. This is a wrong behaviour because it is possible to have empty key in a map. In such case, our writer creats a wrong sstable that we can't read back. This is becaus a complex cell expects cellpath for each simple cell it has. When writer ignores empty cellpath it writes nothing and instead it should write a length of zero to the file so that we know there's an empty cellpath. Fixes #4533 Tests: unit(release) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <46242906c691a56a915ca5994b36baf87ee633b7.1560532790.git.piotr@scylladb.com>	2019-06-14 20:36:41 +03:00
Asias He	9079790f85	repair: Avoid writing row with same partition key and clustering key more than once Consider master: row(pk=1, ck=1, col=10) follower1: row(pk=1, ck=1, col=20) follower2: row(pk=1, ck=1, col=30) When repair runs, master fetches row(pk=1, ck=1, col=20) and row(pk=1, ck=1, col=30) from follower1 and follower2. Then repair master sends row(pk=1, ck=1, col=10) and row(pk=1, ck=1, col=30) to follower1, follower1 will write the row with the same pk=1, ck=1 twice, which violates uniqueness constraints. To fix, we apply the row with same pk and ck into the previous row. We only needs this on repair follower because the rows can come from multiple nodes. While on repair master, we have a sstable writer per follower, so the rows feed into sstable writer can come from only a single node. Tests: repair_additional_test.py:RepairAdditionalTest.repair_same_row_diff_value_3nodes_test Fixes: #4510 Message-Id: <cb4fbba1e10fb0018116ffe5649c0870cda34575.1560405722.git.asias@scylladb.com>	2019-06-13 17:19:19 +02:00
Asias He	912ce53fc5	repair: Allow repair_row to initialize partially On repair follower node, only decorated_key_with_hash and the mutation_fragment inside repair_row are used in apply_rows() to apply the rows to disk. Allow repair_row to initialize partially and throw if the uninitialized member is accessed to be safe. Message-Id: <b4e5cc050c11b1bafcf997076a3e32f20d059045.1560405722.git.asias@scylladb.com>	2019-06-13 17:18:53 +02:00
Benny Halevy	2fd2713fda	conf: update conf/scylla.yaml default large data warning thresholds They are currently inconsistent with db/config.cc and missing compaction_large_cell_warning_threshold_mb Fixes #4551 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190613133657.15370-1-bhalevy@scylladb.com>	2019-06-13 16:45:27 +03:00
Benny Halevy	4ad06c7eeb	tests/perf: provide random-seed option Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190613114307.31038-2-bhalevy@scylladb.com>	2019-06-13 14:45:49 +03:00
Benny Halevy	43e4631e6a	tests: random-utils: use seastar::testing::local_random_engine To provide test reproducibility use the seastar local_random_engine. To reproduce a run, use the --random-seed command line option with the seed printed accordingly. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190613114307.31038-1-bhalevy@scylladb.com>	2019-06-13 14:45:48 +03:00
Benny Halevy	fe2d629e20	mutation_reader_test: test_multishard_combining_reader_reading_empty_table: fix non-atomic sharing of shards_touched It needs to be a std::vector<std::atomic<bool>> otherwise threads step on wach other in shared memory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190613112359.21884-1-bhalevy@scylladb.com>	2019-06-13 14:44:43 +03:00
Piotr Sarna	2c2122e057	tests: add a test case for filtering clustering key The test cases makes sure that clustering key restriction columns are fetched for filtering if they form a clustering key prefix, but not a primary key prefix (partition key columns are missing). Ref #4541 Message-Id: <3612dc1c6c22c59ac9184220a2e7f24e8d18407c.1560410018.git.sarna@scylladb.com>	2019-06-13 10:38:56 +03:00
Piotr Sarna	c4b935780b	cql3: fix qualifying clustering key restrictions for filtering Clustering key restrictions can sometimes avoid filtering if they form a prefix, but that can happen only if the whole partition key is restricted as well. Ref #4541 Message-Id: <9656396ee831e29c2b8d3ad4ef90c4a16ab71f4b.1560410018.git.sarna@scylladb.com>	2019-06-13 10:38:47 +03:00
Piotr Sarna	adeea0a022	cql3: fix fetching clustering key columns for filtering When a column is not present in the select clause, but used for filtering, it usually needs to be fetched from replicas. Sometimes it can be avoided, e.g. if primary key columns form a valid prefix - then, they will be optimized out before filtering itself. However, clustering key prefix can only be qualified for this optimization if the whole partition key is restricted - otherwise the clustering columns still need to be present for filtering. This commit also fixes tests in cql_query_test suite, because they now expect more values - columns fetched for filtering will be present as well (only internally, the clients receive only data they asked for). Fixes #4541 Message-Id: <f08ebae5562d570ece2bb7ee6c84e647345dfe48.1560410018.git.sarna@scylladb.com>	2019-06-13 10:38:37 +03:00
Glauber Costa	8a3fe3ac9b	debian: correctly relocate python scripts Relocation of python scripts mentions scylla-server in paths explicitly. It should use {{product}} instead. The current build is failing when {{product}} is different than scylla-server Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190613012518.28784-1-glauber@scylladb.com>	2019-06-13 09:39:36 +03:00
Takuya ASADA	b1226fb15a	dist/docker/redhat: change user of scylla services to 'scylla' On branch-3.1 / master, we are getting following error: ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/data: File not owned by current euid: 0. Owner is: 999 ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999) ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/hints: File not owned by current euid: 0. Owner is: 999 ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999) ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/commitlog: File not owned by current euid: 0. Owner is: 999 ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999) ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/view_hints: File not owned by current euid: 0. Owner is: 999 ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999) It seems like owner verification of data directory fails because scylla-server process is running in root but data directory owned by scylla, so we should run services as scylla user. Fixes #4536 Message-Id: <20190611113142.23599-1-syuu@scylladb.com>	2019-06-12 20:29:06 +03:00
Takuya ASADA	60d8a99f05	dist/common/scripts/scylla_setup: verify system umask is acceptable for scylla-server To avoid 'Bad permmisons' error when user changed default umask, we need to verify system umask is acceptable for scylla-server. Fixes #4157 Message-Id: <20190612130343.6043-1-syuu@scylladb.com>	2019-06-12 20:29:06 +03:00
Avi Kivity	cac812661c	Update seastar submodule * seastar 253d6cb...ded50bd (14): > Only export sanitizer flags if used > perftune.py: use pyudev.Devices methods instead of deprecated pyudev.Device ones > Add a Sanitize build mode > Merge "perftune.py : new tuning modes" from Vlad > reactor: clarify how submit_to() destroys the function object > Export the sanitizer flags via pkgconfig > smp: Delete unprocessed work items > iotune: fixed finding mountpoint infinite loop > net: Fix dereferencing moved object > Always enable the exception scalability hack > Merge "Simple cleanups in future.hh" from Rafael > tests: introduce testing::local_random_engine > core/deleter: Fix abort when append() is called twice with a shared deleter > rpc stream: do not crash if a stream is used after eos	2019-06-12 20:28:48 +03:00
Asias He	b463d7039c	repair: Introduce get_combined_row_hash_response Currently, REPAIR_GET_COMBINED_ROW_HASH RPC verb returns only the repair_hash object. In the future, we will use set reconciliation algorithm to decode the full row hashes in working row buf. It is useful to return the number of rows inside working row buf in addition to the combined row hashes to make sure the decode is successful. It is also better to use a wrapper class for the verb response so we can extend the return values later more easily with IDL. Fixes #4526 Message-Id: <93be47920b523f07179ee17e418760015a142990.1559771344.git.asias@scylladb.com>	2019-06-12 13:51:29 +03:00
Takuya ASADA	30414d9c23	dist/ami: install scylla debug symbols by default On AMI creation, install scylla-debuginfo by default. closes #4542 Message-Id: <20190612102355.21386-1-syuu@scylladb.com>	2019-06-12 13:49:46 +03:00
Eliran Sinvani	2b44d8ed42	cql: Allow user manipulation queries to use cql keywords for a name This commit allows the CREATE/DROP/ALTER USER cql queris to use cql keywords for the user name (for example "empty"). Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190612104301.8322-1-eliransin@scylladb.com>	2019-06-12 13:48:10 +03:00
Dejan Mircevski	a52a56bfc0	utils: Add like_matcher A utility for matching text with LIKE patterns, and a battery of tests. Tests: unit(dev,debug) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-06-12 13:14:53 +03:00
Piotr Sarna	7b2de7ac5b	tests: add case for partition key index and filtering The test ensures that partition key index does not influence filtering decisions for regular columns. Ref #4539	2019-06-12 11:53:02 +02:00
Rafael Ávila de Espíndola	bf87b7e1df	logalloc: Use asan to poison free areas With this patch, when using asan, we poison segment memory that has been allocated from the system but should not be accessible to user code. Should help with debugging user after free bugs. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190607140313.5988-1-espindola@scylladb.com>	2019-06-12 11:46:45 +02:00
Piotr Sarna	adc51e57c1	cql3: fix deciding if a query uses indexing The code that decides whether a query should used indexing was buggy - a partition key index might have influenced the decision even if the whole partition key was passed in the query (which effectively means that indexing it is not necessary). Fixes #4539	2019-06-12 11:44:16 +02:00
Raphael S. Carvalho	62aa0ea3fa	sstables: fix log of failure on large data entry deletion by fixing use-after-move Fixes #4532. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190527200828.25339-1-raphaelsc@scylladb.com>	2019-06-12 10:55:46 +03:00
Juliana Oliveira	43f92ae6d5	cql: functions: add min/max/count for boolean type Explicitly add min/max/count functions and tests for boolean type. Tests: unit (release) Signed-off-by: Juliana Oliveira <juliana@scylladb.com> Message-Id: <20190612015215.GA2618@shenzou.localdomain>	2019-06-12 10:11:08 +03:00
Benny Halevy	3ad005ba17	build-ami: fix branch detection failure when not in git tree Introduced in `513d01d53e` The script is trying to determine the branch to shallow clone when an rpm is missing and has to be built. This functionality in the current implementation assumes it is being run inside a git repository, but that must not be the case if the script is triggered after local rpms were placed on the local directory. This happens when putting all necessary rpm files in: dist/ami/files And then running: dist/ami/build_ami.sh --localrpm The dist/ami/ and dist/ami/files are the only ones required for this action so querying the git repository in that situation makes no sense. Fixes #4535 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190611112455.13862-1-bhalevy@scylladb.com>	2019-06-11 19:08:02 +03:00
Piotr Sarna	1a5e5433bf	cql3: make add_restriction helper functions public In order to allow building statement restrictions manually instead of providing WHERE clause from CQL layer, helper functions that add single restrictions are made public. Message-Id: <31fa23a5e5ef927128f23b9fcb3362a2582d86bb.1560237237.git.sarna@scylladb.com>	2019-06-11 16:01:35 +03:00
Tomasz Grabiec	8c4baab81e	Merge "view: ignore duplicated key entries in progress virtual reader" from Piotr S. Build progress virtual reader uses Scylla-specific scylla_views_builds_in_progress table in order to represent legacy views_builds_in_progress rows. The Scylla-specific table contains additional cpu_id clustering key part, which is trimmed before returning it to the user. That may cause duplicated clustering row fragments to be emitted by the reader, which may cause undefined behaviour in consumers. The solution is to keep track of previous clustering keys for each partition and drop fragments that would cause duplication. That way if any shard is still building a view, its progress will be returned, and if many shards are still building, the returned value will indicate the progress of a single arbitrary shard. Fixes #4524 Tests: unit(dev) + custom monotonicity checks from tgrabiec@scylladb.com	2019-06-11 13:55:25 +02:00
Piotr Sarna	85a3a4b458	view: ignore duplicated key entries in progress virtual reader Build progress virtual reader uses Scylla-specific scylla_views_builds_in_progress table in order to represent legacy views_builds_in_progress rows. The Scylla-specific table contains additional cpu_id clustering key part, which is trimmed before returning it to the user. That may cause duplicated clustering row fragments to be emitted by the reader, which may cause undefined behaviour in consumers. The solution is to keep track of previous clustering keys for each partition and drop fragments that would cause duplication. That way if any shard is still building a view, its progress will be returned, and if many shards are still building, the returned value will indicate the progress of a single arbitrary shard. Fixes #4524 Tests: unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>	2019-06-11 13:01:31 +02:00
Nadav Har'El	5ef928a63d	coding-style.md: mention "using namespace seastar" All Scylla code is written with "using namespace seastar", i.e., no "seastar::" prefix for Seastar symbols. Document this in the coding style. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190610203948.18075-1-nyh@scylladb.com>	2019-06-11 10:39:03 +03:00
Calle Wilund	26702612f3	api.hh: Fix bool parsing in req_param Fixes #4525 req_param uses boost::lexical cast to convert text->var. However, lexical_cast does not handle textual booleans, thus param=true causes not only wrong values, but exceptions. Message-Id: <20190610140511.15478-1-calle@scylladb.com>	2019-06-10 17:11:47 +03:00
Gleb Natapov	9213d56a06	storage_proxy: align background and foreground repair metric names One is plural another is not, make them all plural. Message-Id: <20190605135940.GI25001@scylladb.com>	2019-06-10 11:34:36 +03:00
Benny Halevy	2017de9387	build-ami: delete extra parenthesis in branch_arg calculation Fixing a typo Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190610062113.5604-1-bhalevy@scylladb.com>	2019-06-10 11:29:44 +03:00
Avi Kivity	591d2968cc	storage_proxy: limit resources consumed in cross-shard operations Currently, each shard protects itself by not reading from rpc and the native transport if in-flight requests consume too much memory for that shard. However, if all shards then forward their requests to some other shard, then that shard can easily run out of memory since its load can be multiplied by the number of shards that send it requests. To protect against this, use the new Seastar smp_service_group infrastructure. We create three groups: read, write, and write ack (the latter is needed to avoid ABBA deadlocks is shard A exhausts all its resources sending writes to shard B, and shard B simulateously does the same; neither will be able to send acknowledgements, so if the writes are throttled, they will never be unthrottled until a timeout occurs). Range scans are not addressed by this patch since they are handled by multishard_mutation_query, which has its own complex cross-shard communication scheme, but it be a similar solution. Ref #1105 (missing range scan protection) Tests: unit (dev) Message-Id: <20190512142243.17795-1-avi@scylladb.com>	2019-06-07 10:53:23 +02:00
Vlad Zolotarov	20a610f6bc	fix_system_distributed_tables.py: declare the 'port' argument as 'int' If a port value passed as a string this makes the cluster.connect() to fail with Python3.4. Let's fix this by explicitly declaring a 'port' argument as 'int'. Fixes #4527 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190606133321.28225-1-vladz@scylladb.com>	2019-06-06 20:19:57 +03:00
Benny Halevy	c188f838bc	build-ami: use ssh git URLs Rather than https, for cert-based passwordless access. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190606133648.15877-2-bhalevy@scylladb.com>	2019-06-06 20:02:13 +03:00
Benny Halevy	513d01d53e	build-ami: use current git branch for shallow-clone of other repos We want to use the same branch on the other repos build-ami needs as the one we're building for. Automatically find the current branch using the `git branch` command. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190606133648.15877-1-bhalevy@scylladb.com>	2019-06-06 20:02:13 +03:00
Juliana Oliveira	fd83f61556	Add a warning for partitions with too many rows This patch adds a warning option to the user for situations where rows count may get bigger than initially designed. Through the warning, users can be aware of possible data modeling problems. The threshold is initially set to '100,000'. Tests: unit (dev) Message-Id: <20190528075612.GA24671@shenzou.localdomain>	2019-06-06 19:48:57 +03:00
Piotr Sarna	74f6ab7599	db: drop unnecessary double computation when feeding hash When feeding hash for schema digest, compact_for_schema_digest is mistakenly called twice, which may result in needless recomputation. Message-Id: <8f52201cf428a55e7057d8438025275023eb9288.1559826555.git.sarna@scylladb.com>	2019-06-06 16:16:47 +03:00
Rafael Ávila de Espíndola	b3adabda2d	Reduce logalloc differences between debug and release A lot of code in scylla is only reachable if SEASTAR_DEFAULT_ALLOCATOR is not defined. In particular, refill_emergency_reserve in the default allocator case is empty, but in the seastar allocator case it compacts segments. I am trying to debug a crash that seems to involve memory corruption around the lsa allocator, and being able to use a debug build for that would be awesome. This patch reduces the differences between the two cases by having a common segment_pool that defers only a few operations to different segment_store implementations. Tests: unit (debug, dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190606020937.118205-1-espindola@scylladb.com>	2019-06-06 12:55:56 +03:00
Nadav Har'El	95bab04cf9	docs/metrics.md: "instance" label no longer comes from Scylla Prometheus needs to remember which "instance" (node) each measurement came from. But it doesn't actually need Scylla to tell it the instance name - it knows which node it got each measurement from. After Seastar commit `79281ef287` which fixed Seastar issue https://github.com/scylladb/seastar/issues/477, the "instance" label on measurements no longer comes from Scylla but rather is added by Prometheus. This patch corrects the documentation to explain the current situation, instead of incorrectly saying that Scylla adds the "instance" label itself. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190602074629.14336-1-nyh@scylladb.com>	2019-06-06 12:42:30 +03:00
Piotr Sarna	f50f418066	types: isolate deserializing iterator to separate file In order to be used outside types.cc, listlike deserializing iterator is moved to a separate header. Message-Id: <d9416e6a8d170aa4936826b54ca7be4acb4ec8e6.1559745816.git.sarna@scylladb.com>	2019-06-05 17:46:51 +03:00
Pekka Enberg	eb00095bca	relocate_python_scripts.py: Fix node-exporter install on Debian variants The relocatable Python is built from Fedora packages. Unfortunately TLS certificates are in a different location on Debian variants, which causes "node_exporter_install" to fail as follows: Traceback (most recent call last): File "/usr/lib/scylla/libexec/node_exporter_install", line 58, in <module> data = curl('https://github.com/prometheus/node_exporter/releases/download/v{version}/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), byte=True) File "/usr/lib/scylla/scylla_util.py", line 40, in curl with urllib.request.urlopen(req) as res: File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 525, in open response = self._open(req, data) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1360, in https_open context=self._context, check_hostname=self._check_hostname) File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1319, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)> Unable to retrieve version information node exporter setup failed. Fix the problem by overriding the SSL_CERT_FILE environment variable to point to the correct location of the TLS bundle. Message-Id: <20190604175434.24534-1-penberg@scylladb.com>	2019-06-04 21:12:21 +03:00
Piotr Sarna	b3396dbb57	types: migrate to_json_string to use bytes view The to_json_string utility implementation was based on const references instead of views, which can be a source of unnecessary memory copying. This patch migrates all to_json_string to use bytes_view and leaves the const reference version as a thin wrapper. Message-Id: <2bf9f1951b862f8e8a2211cb4e83852e7ac70c67.1559654014.git.sarna@scylladb.com>	2019-06-04 19:17:46 +03:00
Avi Kivity	06d77aa548	Merge "Introduce queue reader" from Botond " Technically queue_reader already exists, however so far it was a private utility in `multishard_writer.cc`. This mini-series makes it public and generally useful. The interface is made safer and simpler and the implementation is improved so it doesn't have two separate buffers. Also, unit tests are added. Tests: mutation_reader_test:debug/test_queue_reader, multishard_writer_test:debug " * 'queue_reader/v2' of https://github.com/denesb/scylla: queue_reader: use the reader's buffer as the queue Make queue_reader public	2019-06-04 13:46:15 +03:00
Botond Dénes	2ccd8ee47c	queue_reader: use the reader's buffer as the queue The queue reader currently uses two buffers, a `_queue` that the producer pushes fragments into and its internal `_buffer` where these fragments eventually end up being served to the consumer from. This double buffering is not necessary. Change the reader to allow the producer to push fragments directly into the internal `_buffer`. This complicates the code a little bit, as the producer logic of `seastar::queue` has to be folded into the queue reader. On the other hand this introduces proper memory consumption management, as well as reduces the amount of consumed memory and eliminates the possibility of outside code mangling with the queue. Another big advantage of the change is that there is now an explicit way to communicate the EOS condition, no need to push a disengaged `mutation_fragment_opt`. The producer of the queue reader now pushes the fragments into the reader via an opaque `queue_reader_handle` object, which has the producer methods of `seastar::queue`. Existing users of queue readers are refactored to use the new interface. Since the code is more complex now, unit tests are added as well.	2019-06-04 13:39:26 +03:00
Glauber Costa	cbaea172cd	python3: add the cassandra driver to the relocatable package We have a script in tree that fixes the schema for distributed system tables, like tracing, should they change their schema. We use it all the time but unfortunately it is not distributed with the scylla package, which makes it using it harder (we want to do this in the server, but consistent updates will take a while). One of the problems with the script today that makes distributing it harder is that it uses the python3 cassandra driver, that we don't want to have as a server dependency. But now with the relocatable packages in place there is no reaso not to just add it. [avi: adjust tools/toolchain/image to point to a new image with python3-cassandra-driver] Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190603162447.24215-1-glauber@scylladb.com>	2019-06-03 19:34:55 +03:00
Konstantin Osipov	29c27bfc28	storage_proxy: remove unnecessary lambdas in metrics binding Remove unnecessasry lambdas when binding metrics of the storage proxy. Message-Id: <20190603133753.1724-1-kostja@scylladb.com>	2019-06-03 16:55:16 +03:00
Botond Dénes	a597e46792	Make queue_reader public Extract it from `mutlishard_writer.cc` and move it to `mutation_reader.{hh,cc}` so other code can start using it too.	2019-06-03 12:08:37 +03:00
Takuya ASADA	25112408a7	dist/debian: support relocatable python3 on Debian variants Unlike CentOS, Debian variants has python3 package on official repository, so we don't have to use relocatable python3 on these distributions. However, official python3 version is different on each distribution, we may have issue because of that. Also, our scripts and packaging implementation are becoming presuppose existence of relocatable python3, it is causing issue on Debian variants. Switching to relocatable python3 on Debian variants avoid these issues, it will easier to manage Scylla python3 environments accross multiple distributions. Fixes #4495 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190531112707.20082-1-syuu@scylladb.com>	2019-06-02 14:59:43 +03:00
Raphael S. Carvalho	f360d5a936	sstables: export output operator for sstable run It wasn't being exported in any header. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190527182246.19007-1-raphaelsc@scylladb.com>	2019-06-02 10:25:51 +03:00
Avi Kivity	7a0c6cd583	Revert "dist/debian: support relocatable python3 on Debian variants" This reverts commit `4d119cbd6d`. It breaks build_deb.sh: 18:39:56 + seastar/scripts/perftune.py seastar/scripts/seastar-addr2line seastar/scripts/perftune.py 18:39:56 Traceback (most recent call last): 18:39:56 File "./relocate_python_scripts.py", line 116, in <module> 18:39:56 fixup_scripts(archive, args.scripts) 18:39:56 File "./relocate_python_scripts.py", line 104, in fixup_scripts 18:39:56 fixup_script(output, script) 18:39:56 File "./relocate_python_scripts.py", line 79, in fixup_script 18:39:56 orig_stat = os.stat(script) 18:39:56 FileNotFoundError: [Errno 2] No such file or directory: '/data/jenkins/workspace/scylla-master/unified-deb/scylla/build/debian/scylla-package/+' 18:39:56 make[1]: *** [debian/rules:19: override_dh_auto_install] Error 1	2019-05-29 13:58:41 +03:00
Konstantin Osipov	fcd52d6187	Update README.md with more recent build instructions on Ubuntu Building on Ubuntu 18 or 19 following the current build instructions doesn't work. Add information about a few pitfalls. Switch README.md to recommending dbuild and move the details to HACKING.md. Message-Id: <20190520152738.GA15198@atlas>	2019-05-29 12:26:12 +03:00
Takuya ASADA	4d119cbd6d	dist/debian: support relocatable python3 on Debian variants Unlike CentOS, Debian variants has python3 package on official repository, so we don't have to use relocatable python3 on these distributions. However, official python3 version is different on each distribution, we may have issue because of that. Also, our scripts and packaging implementation are becoming presuppose existence of relocatable python3, it is causing issue on Debian variants. Switching to relocatable python3 on Debian variants avoid these issues, it will easier to manage Scylla python3 environments accross multiple distributions. Fixes #4495 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190526105138.677-1-syuu@scylladb.com>	2019-05-26 13:56:30 +03:00
Glauber Costa	71c4375a66	scylla_io_setup: adjust values for i3en instances Apparently we are having some issues running iotune in the i3en instances, as the values not always make sense. We believe it is something that XFS is doing, and running fio directly on the device (no filesystem) provides more meaningful results (more according to AWS published expected values). For now, let's use fio instead. In this patch I have ran fio for our 4 dimensions in each of the three types of disks (large, xlarge, 3xlarge). Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190524111454.27956-1-glauber@scylladb.com>	2019-05-24 19:37:58 +03:00
Avi Kivity	53dfaf9121	Update seastar submodule * seastar 5cb1234b0...253d6cb69 (3): > reactor: disable nowait aio again > Merge "Restructure `timer` implementations to avoid circular dependencies" from Jesse > Fix build command in building-docker.md	2019-05-24 14:33:05 +03:00
Raphael S. Carvalho	cabeb12b4e	sstables: add output operator for sstable run the output will look like as follow: Run = { Identifier: 647044fd-d3d4-43c4-b014-b546943ead0d Fragments = { 1471=-9223317893235177836:-7063220874380325121 1478=5924386327138804918:8070482595977135657 1472=-7063202587832032132:-4903425074566642766 1473=-4903298949436784325:-2739716797579745183 1474=-2739703419744073436:-589328117804966275 1477=3734534455848060136:5924372906965333873 1476=1579822226461317527:3734518878340722529 1475=-589322393539097068:1579813857236466583 1479=8070499046054048682:9223317594733741806 } } Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190524043331.5093-1-raphaelsc@scylladb.com>	2019-05-24 08:36:08 +03:00
Paweł Dziepak	899ebe483a	Merge "Fix empty counters handling in MC" from Piotr " Before this patchset empty counters were incorrectly persisted for MC format. No value was written to disk for them. The correct way is to still write a header that informs the counter is empty. We also need to make sure that reading wrongly persisted empty counters works because customers may have sstables with wrongly persisted empty counters. Fixes #4363 " * 'haaawk/4363/v3' of github.com:scylladb/seastar-dev: sstables: add test for empty counters docs: add CorrectEmptyCounters to sstable-scylla-format sstables: Add a feature for empty counters in Scylla.db. sstables: Write header for empty counters sstables: Remove unused variables in make_counter_cell sstables: Handle empty counter value in read path	2019-05-23 13:05:53 +01:00
Piotr Jastrzebski	fdbf4f6f53	sstables: add test for empty counters Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-23 10:10:24 +02:00
Piotr Jastrzebski	e91e1a1dde	docs: add CorrectEmptyCounters to sstable-scylla-format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-23 10:10:24 +02:00
Piotr Jastrzebski	a962696e44	sstables: Add a feature for empty counters in Scylla.db. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-23 10:10:24 +02:00
Piotr Jastrzebski	b35030ae7e	sstables: Write header for empty counters When storing an empty counter we should still write its header that indicates the emptiness. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-23 10:10:08 +02:00
Amnon Heiman	f3b6c5fe2f	API: storage_proxy add CAS and View endpoints Some nodetool command in 3.0 uses the CAS and View metrics. CAS is not implemented and we don't have all the metrics for View but we still don't want those nodetool commands to fail. After this patch the following would work and will return empty: curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/cas_read/moving_average_histogram' curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/view_write/moving_average_histogram' curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/cas_write/moving_average_histogram' This patch is needed for #4416 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190521141235.20856-1-amnon@scylladb.com>	2019-05-22 14:25:17 +03:00
Avi Kivity	698f52d257	Merge "tests: Replace ad-hoc cql utilities with general ones" from Dejan " One local utility function in cql_query_test.cc duplicates an existing exception_predicate member. Another can be generalized for wider use in the future. This patch accomplishes both, retiring a to-do item. Tests: unit (dev) " * 'use-utils-predicate-in-cql_test' of https://github.com/dekimir/scylla: tests/cql: Replace equery() with cquery_nofail() tests: Add cquery_nofail() utility tests: Drop redundant function	2019-05-22 10:09:12 +03:00
Dejan Mircevski	09acb32d35	tests/cql: Replace equery() with cquery_nofail() Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-21 23:38:09 -04:00
Dejan Mircevski	a9849ecba7	tests: Add cquery_nofail() utility Most tests await the result of cql_test_env::execute_cql(). Most would also benefit from reporting errors with top-level location included. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-21 23:28:14 -04:00
Dejan Mircevski	1d8bfc4173	tests: Drop redundant function make_predicate_for_exception_message_fragment() is redundant now that exception_utils has landed. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-21 23:28:14 -04:00
Avi Kivity	d481521a2e	Update seastar submodule * seastar 3f7a5e1...5cb1234 (5): > build: Help Seastar to find Boost on Fedora 30 > Merge 'Reinstate nowait aio support' from Avi > Fix documentation link in README.md > sharded: add variants to invoke_on() that accept an smp_service_group > improve error message on AIO setup failure	2019-05-21 20:15:09 +03:00
Benny Halevy	fae4ca756c	cql3: select_statement: provide default initializer for parameters::_bypass_cache Fixes #4503 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190521143300.22753-1-bhalevy@scylladb.com>	2019-05-21 20:06:40 +03:00
Piotr Jastrzebski	a6484b28a1	sstables: Remove unused variables in make_counter_cell Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-21 12:07:31 +02:00
Piotr Jastrzebski	f711cce024	sstables: Handle empty counter value in read path Due to a bug in an sstable writer, empty counters were stored without a header. Correct way of storing empty counter is to still write a header that indicates the emptiness. Next patch in this series fixes the write path but we have to make sure that we handle incorrectly serialized counters in the read path becuase there may exist sstables with counters stored without header. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-05-21 12:07:12 +02:00
Takuya ASADA	a55330a10b	dist/ami: output scylla version information to AMI tags and description Users may want to know which version of packages are used for the AMI, it's good to have it on AMI tags and description. To do this, we need to download .rpm from specified .repo, extract version information from .rpm. Fixes #4499 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190520123924.14060-2-syuu@scylladb.com>	2019-05-20 15:46:06 +03:00
Takuya ASADA	abe44c28c5	dist/ami: build scylla-python3 when specified --localrpm Since we switched to relocatable python3, we need to build it for AMI too. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190520123924.14060-1-syuu@scylladb.com>	2019-05-20 15:46:05 +03:00
Konstantin Osipov	25087536bc	main: developer-mode configuraiton option uses dash, not underscore Message-Id: <20190520115524.101871-1-kostja@scylladb.com>	2019-05-20 15:14:11 +03:00
Calle Wilund	1e37e1d40c	commitlog: Add optional use of O_DSYNC mode Refs #3929 Optionally enables O_DSYNC mode for segment files, and when enabled ignores actual flushing and just barriers any ongoing writes. Iff using O_DSYNC mode, we will not only truncate the file to max size, but also do an actual initial write of zero:s to it, since XFS (intended target) has observably less good behaviour on non-physical file blocks. Once written (and maybe recycled) we should have rather satisfying throughput on writes. Note that the O_DSYNC behaviour is hidden behind a default disabled option. While user should probably seldom worry about this, we should add some sort of logic i main/init that unless specified by user, evaluates the commitlog disk and sets this to true if it is using XFS and looks ok. This is because using O_DSYNC on things like EXT4 etc has quite horrible performance. All above statements about performance and O_DSYNC behaviour are based on a sampling of benchmark results (modified fsqual) on a statistically non-ssignificant selection of disks. However, at least there the observed behaviour is a rather large difference between ::fallocate:ed disk area vs. actually written using O_DSYNC on XFS, and O_DSYNC on EXT4. Note also that measurements on O_DSYNC vs. no O_DSYNC does not take into account the wall-clock time of doing manual disk flush. This is intentionally ignored, since in the commitlog case, at least using periodic mode, flushes are relatively rare. Message-Id: <20190520120331.10229-1-calle@scylladb.com>	2019-05-20 15:10:48 +03:00
Avi Kivity	d92973ba86	Merge "scylla-gdb.py: scylla_fiber: add fallback mode" from Botond " Add a fallback-mode that can be used when the `scylla ptr` cannot be used, either because the application is not built with the seastar allocator, or due to bugs. The fallback mode relies on a more primitive method for determining how much memory to scan looking for task pointers inside the task object. This mode, being more primitive, is less prone to errors, but is more wasteful and less precise. " * 'scylla-fiber-fallback-mode/v2' of https://github.com/denesb/scylla: scylla-gdb.py: scylla_fiber: add fallback mode scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used() scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers scylla-gdb.py: scylla_fiber: fix misaligned text in docstring	2019-05-19 18:34:55 +03:00
Takuya ASADA	4b08a3f906	reloc/python3: add license files on relocatable python3 package It's better to have license files on our python3 distribution. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190516094329.13273-1-syuu@scylladb.com>	2019-05-19 18:30:19 +03:00
Jesse Haber-Kucharsky	68353a8265	build: Don't build `iotune` unconditionally We compile Seastar unconditionally so that changes to Seastar files are reflected in Scylla when it's built. We don't need to unconditionally build `iotune` in the same way. `iotune` is still listed as a build artifact, so it will be built if `ninja` is invoked without a particular target. However, building a specific target (like `ninja build/dev/scylla`) will not build `iotune`. Fixes #4165 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <9fb96a281580a8743e04d5dd11398be53960cb58.1558100815.git.jhaberku@scylladb.com>	2019-05-19 18:24:05 +03:00
Avi Kivity	5a276d44af	Merge "row_cache: Make invalidate() preemptible" from Tomasz " This patchset fixes reactor stalls caused by cache invalidation not being preemptible. This becomes a problem when there is a lot of partitions in cache inside the invalidated range. This affects high-level operations like nodetool refresh, table truncation, repair and streaming. Fixes #2683 The improvement on stalls was measured using tests/perf_row_cache_update: Before: Small partitions, no overwrites: invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]} Small partition with a few rows: invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]} Large partition, lots of small rows: invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]} After: Small partitions, no overwrites: invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]} Small partition with a few rows: invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]} Large partition, lots of small rows: invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]} The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota). Tests: - unit (dev) " * tag 'cache-preemptible-invalidation-v2' of github.com:tgrabiec/scylla: row_cache: Make invalidate() preemptible row_cache: Switch _prev_snapshot_pos to be a ring_position_ext dht: Introduce ring_position_ext dht: ring_position_view: Take key by const pointer tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion tests: perf_row_cache_update: Report stalls around invalidation	2019-05-19 10:47:46 +03:00
Takuya ASADA	f625284113	dist/debian: apply product name variable on override_dh_auto_install To make product name templatization works correctly, we cannot use "debian/scylla-server" as package contents directory path, need to use template like "debian/{{product}}-server" instead. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190517121946.18248-1-syuu@scylladb.com>	2019-05-19 10:46:08 +03:00
Gleb Natapov	31bf4cfb5e	cache_hitrate_calculator: make cache hitrate calculation preemptable The calculation is done in a non preemptable loop over all tables, so if numbers of tables is very large it may take a while since we also build a string for gossiper state. Make the loop preemtable and also make the string calculation more efficient by preallocating memory for it. Message-Id: <20190516132748.6469-3-gleb@scylladb.com>	2019-05-16 15:32:36 +02:00
Gleb Natapov	4517c56a57	cache_hitrate_calculator: do not copy stats map for each cpu invoke_on_all() copies provided function for each shard it is executed on, so by moving stats map into the capture we copy it for each shard too. Avoid it by putting it into the top level object which is already captured by reference. Message-Id: <20190516132748.6469-2-gleb@scylladb.com>	2019-05-16 15:32:24 +02:00
Dejan Mircevski	8dcb35913a	table: Avoid needless allocation of cell lockers All `table` instances currently unconditionally allocate a cell locker for counter cells, though not all need one. Since the lockers occupy quite a bit of memory (as reported in #4441), it's wasteful to allocate them when unneeded. Fixes #4441. Tests: unit (dev, debug) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190515190910.87931-1-dejan@scylladb.com>	2019-05-16 11:10:38 +03:00
Avi Kivity	5b2c8847c7	Merge "Pre timestamp based data segregation cleanup" from Botond " This series contains loosely related generic cleanup patches that the timestamp based data segregation series depends on. Most of the patches have to do with making headers self-sustainable, that is compilable on their own. This was needed to be able to ensure that the new headers introduced or touched by that series are self-sustainable too. This series also introduces `schema_fwd.hh` which contains a forward declaration of `schema` and `schema_ptr` classes. No effort was made to find and replace all existing ad-hoc schema forward declarations in the source tree. " * 'pre-timestamp-based-data-segregation-cleanup/v1' of https://github.com/denesb/scylla: encoding_stats.hh: add missing include sstables/time_window_compaction_strategy.hh: make self-sufficient sstables/size_tiered_compaction_strategy.hh: make self-sufficient sstables/compaction_strategy_impl.hh: make header self-sufficient compaction_strategy.hh: use schema_fwd.hh db/extensions.hh: use schema_fwd.hh Add schema_fwd.hh	2019-05-15 17:37:06 +03:00
Asias He	51c4f8cc47	repair: Fix use after free in remove_repair_meta for repair_metas We should capture repair_metas so that it will not be freed until the parallel_for_each is finished. Fixes: #4333 Tests: repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test Message-Id: <237b20a359122a639330f9f78c67568410aef014.1557922403.git.asias@scylladb.com>	2019-05-15 17:22:51 +03:00
Calle Wilund	e7003f1051	sstable: Make all sstable components subject to file extensions Makes opening all sstable components go through same file open routine, optionally applying extensions to each (except TOC which is special). Also ensures we read Scylla metadata before other non-TOC components, as we might need this for extensions (hint hint). Message-Id: <20190513201821.14417-1-calle@scylladb.com>	2019-05-15 17:14:58 +03:00
Botond Dénes	a0010f52c5	scylla-gdb.py: scylla_fiber: add fallback mode The current implementation of the `scylla fiber` command relies on the `scylla ptr` command to provide metadata on pointers, more specifically the boundaries of the region the object they point to occupies. However, in debug mode, seastar is using the standard allocator and thus the `scylla ptr` command doesn't work. To work around this, provide a fallback mode for debug builds. This mode assumes pointers point to the start of objetcts and scans a configurable region of memory. While less exact than the variant relying on `scylla ptr` it still works reasonably well. The size of the to-be-scanned memory region can be set using the `--scanned-region-size` command line argument. This defaults to 512. Additionally, add a flag (`--force-fallback-mode`) to force using the fallback mode. This is useful if `scylla ptr` is not working for any reason.	2019-05-15 15:46:42 +03:00
Botond Dénes	c78d667153	scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used() Determines whether the application is using the seastar allocator or not. This is done by attempting to resolve the `seastar::memory::cpu_mem` symbol. To avoid the expensive symbol lookup the result is cached. This means that loading a new inferior will possibly return the wrong value. The cache can be flushed by re-sourcing the `scylla-gdb.py` script.	2019-05-15 15:44:38 +03:00
Botond Dénes	c3a06da8fb	scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers	2019-05-15 15:43:34 +03:00
Botond Dénes	4964671e83	scylla-gdb.py: scylla_fiber: fix misaligned text in docstring	2019-05-15 15:43:29 +03:00
Avi Kivity	8e19121e98	Merge "Implement simple selection alongside aggregation" from Dejan " Although CQL allows SELECT statements with both simple and aggregate selectors, Scylla disallows them. This patch removes that restriction and ensures that mixed simple/aggregate selection works as specified both with and without GROUP BY. Tests: unit (dev) " * 'aggregate-and-simple-select-together' of https://github.com/dekimir/scylla: cql: Fix mixed selection with GROUP BY cql: Allow mixing of aggregate and simple selectors	2019-05-14 20:03:58 +03:00
Dejan Mircevski	f9b00a4318	cql: Fix mixed selection with GROUP BY GROUP BY is currently supported by simple_selection, the class used when all selectors are simple. But when selectors are mixed, we use selection_with_processing, which does not yet support GROUP BY. This patch fixes that. It also adapts one testcase in filtering_test to the new behavior of simple_selector. The test currently expects the last value seen, but simple_selector now outputs the first value seen. (More details: the WHERE clause implicitly selects the columns it references, and unit tests are forced to provide expected values for these columns. The user-visible result is unchanged in the test; users never see the WHERE column values due to filtering in cql::transport, outside unit tests.) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-14 12:50:39 -04:00
Dejan Mircevski	06e3b36164	cql: Allow mixing of aggregate and simple selectors Scylla currently rejects SELECT statements with both simple and aggregate selectors, but Cassandra allows them. This patch brings parity to Scylla. Fixes #4447. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-14 10:34:02 -04:00
Botond Dénes	fe3b798b51	scylla-gdb.py: scylla fiber: add seastar::smp_message_queue::async_work_item to the whitelist Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4c49fcf5391e027eae68707c9e6ab2f9188c2ea4.1557838171.git.bdenes@scylladb.com>	2019-05-14 17:09:32 +03:00
Avi Kivity	82b91c1511	Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz " Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460. Refs #4485. " * tag 'fix-gc_clock-digest-v2.1' of github.com:tgrabiec/scylla: tests: Add test which verifies that schema digest stays the same tests: Add sstables for the schema digest test schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition db/schema_tables: Move feed_hash_for_schema_digest() to .cc file hashing: Introduce type-erased interface for the hasher hashing: Introduce C++ concept for the hasher hashers: Rename hasher to cryptopp_hasher gc_clock: Fix hashing to be backwards-compatible	2019-05-14 16:59:50 +03:00
Tomasz Grabiec	285ada5035	Merge "config: remove _make_config_values macro" from Avi The _make_config_values macro reduces duplication (both the item name and the types need to be available as C++ identifiers and as runtime strings), but is hard to work with. The macro is huge and editors don't handle it well, errors aren't identified at the correct location, and since the macro doesn't have types, it's hard to refactor. This series replaces the macro with ordinary C++ code. Some repetition is introduced, but IMO the result is easier to maintain than the macro. As a bonus the bulk of the code is moved away from the header file. Tests: unit (dev), manual testing of the config REST API * https://github.com/avikivity/scylla config-no-macro/v2 config: make the named_value type name available without requiring _make_config_values config: remove value_status from named_value template parameter list config: add named_value::value_as_json() api: config: stop using _make_config_values config: auto-add named_values into config_file config: add allowed_values parameter to named_value constructor config: convert _make_config_values to individual named_value member declarations and initializers	2019-05-14 16:00:23 +03:00
Avi Kivity	987739898f	docs: document SSTable Scylla.db component Document the format and meaning of the various bits of the Scylla.db component. Message-Id: <20190513081605.7394-1-avi@scylladb.com>	2019-05-14 16:00:23 +03:00
Avi Kivity	786ce70dfc	doc: mention the Slack workspace as a place to get help Message-Id: <20190514090420.5598-1-avi@scylladb.com>	2019-05-14 16:00:23 +03:00
Botond Dénes	c2ec78358b	encoding_stats.hh: add missing include	2019-05-14 13:27:30 +03:00
Botond Dénes	eeacf45b4a	sstables/time_window_compaction_strategy.hh: make self-sufficient	2019-05-14 13:27:30 +03:00
Botond Dénes	9953cecc83	sstables/size_tiered_compaction_strategy.hh: make self-sufficient	2019-05-14 13:27:30 +03:00
Botond Dénes	d02c2253a5	sstables/compaction_strategy_impl.hh: make header self-sufficient Add missing includes and forward declarations. De-inline some methods.	2019-05-14 13:27:30 +03:00
Botond Dénes	20d9d18ab3	compaction_strategy.hh: use schema_fwd.hh	2019-05-14 13:27:30 +03:00
Botond Dénes	690ef09b8f	db/extensions.hh: use schema_fwd.hh	2019-05-14 13:27:30 +03:00
Botond Dénes	48bf1d5629	Add schema_fwd.hh	2019-05-14 13:27:30 +03:00
Tomasz Grabiec	6159d5522d	tests: Add test which verifies that schema digest stays the same (cherry picked from commit `8019634dba`)	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	815295547d	tests: Add sstables for the schema digest test Generated by running test_schema_digest_does_not_change with regenerate set to true. (cherry picked from commit `1f2995c8c5`)	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	9de071d214	schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition Schema digest is calculated by querying for mutations of all schema tables, then compacting them so that all tombstones in them are dropped. However, even if the mutation becomes empty after compaction, we still feed its partition key. If the same mutations were compacted prior to the query, because the tombstones expire, we won't get any mutation at all and won't feed the partition key. So schema digest will change once an empty partition of some schema table is compacted away. That's not a problem during normal cluster operation because the tombstones will expire at all nodes at the same time, and schema digest, although changes, will change to the same value on all nodes at about the same time. This fix changes digest calculation to not feed any digest for partitions which are empty after compaction. The digest returned by schema_mutations::digest() is left unchanged by this patch. It affects the table schema version calculation. It's not changed because the version is calculated on boot, where we don't yet know all the cluster features. It's possible to fix this but it's more complicated, so this patch defers that. Refs #4485. Asd	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	3a4a903674	db/schema_tables: Move feed_hash_for_schema_digest() to .cc file	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	b0eecdcb8f	hashing: Introduce type-erased interface for the hasher The motivation is to allow hiding the definition of functions accepting a hasher. For one, this reduces (re)complication times, because we can put the definition in .cc	2019-05-14 10:43:06 +02:00
Avi Kivity	1cf72b39a5	Merge "Unbreak the Unbreakable Linux" from Glauber " scylla_setup is currently broken for OEL. This happens because the OS detection code checks for RHEL and Fedora. CentOS returns itself as RHEL, but OEL does not. " * 'unbreakable' of github.com:glommer/scylla: scylla_setup: be nicer about unrecognized OS scylla_util: recognize OEL as part of the RHEL family	2019-05-13 21:38:21 +03:00
Glauber Costa	3b64727244	scylla_setup: be nicer about unrecognized OS Right now if the user tries to execute this in an unrecognized OS, the following will be thrown: Traceback (most recent call last): File "/usr/lib/scylla/libexec/scylla_setup", line 214, in <module> do_verify_package('scylla-enterprise-jmx') File "/usr/lib/scylla/libexec/scylla_setup", line 73, in do_verify_package if res != 0: UnboundLocalError: local variable 'res' referenced before assignment It would be a lot nicer to exit gracefully and print a messge saying what is going on. This was caught when running on OEL, which the previous patch fixed. Still, there are other unknown OS out there the users may try to run on. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-05-13 14:31:49 -04:00
Glauber Costa	6c15ae5b36	scylla_util: recognize OEL as part of the RHEL family Oracle Linux is a RHEL-like distribution and we support it just fine, but our new incarnation of scylla_setup is failing to recognize it. os-release for OEL is a bit different. It doesn't have an ID_LIKE string, and only shows an ID string, which is set to 'ol'. So let's recognize this. Fixes: #4493 Branches: 3.1 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-05-13 14:31:38 -04:00
Tomasz Grabiec	77fb34821b	row_cache: Make invalidate() preemptible This change inserts preemption points between removal of partitions. The main complication is in maintaining consitency in the face of concurrent population or eviction. We use the same mechanism which is used by memtable updates. _prev_snapshot_pos is the ring position which partitions the ring into the part which is already updated in cache and the one which is yet to be updated. That position should be set accordingly on preemption. In case of invalidation, updating means removing all entries in the range and marking the range as discontinuous. When resuming invalidation of a range we continue from _prev_snapshot_pos as the lower bound. This affects high-level operations like nodetool refresh, table truncation, repair and streaming. Fixes #2683 The improvement on stalls was measured using tests/perf_row_cache_update: Before Small partitions, no overwrites: invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]} Small partition with a few rows: invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]} Large partition, lots of small rows: invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]} After: Small partitions, no overwrites: invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]} Small partition with a few rows: invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]} Large partition, lots of small rows: invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]} The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).	2019-05-13 19:32:00 +02:00
Tomasz Grabiec	595e1a540e	row_cache: Switch _prev_snapshot_pos to be a ring_position_ext dht::ring_position cannot represent all ring_position_view instances, in particular those obtained from dht::ring_position_view::for_range_start(). To allow using the latter, switch to views.	2019-05-13 19:30:50 +02:00
Tomasz Grabiec	1530224377	dht: Introduce ring_position_ext It's an owning version of ring_position_view. Note that ring_position has a narrower domain than the ring_position_view for historical reasons, so we cannot use that.	2019-05-13 19:30:50 +02:00
Tomasz Grabiec	b08180c7fa	dht: ring_position_view: Take key by const pointer	2019-05-13 19:30:39 +02:00
Tomasz Grabiec	ed697306be	tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion	2019-05-13 19:18:20 +02:00
Tomasz Grabiec	b516e5fdbf	tests: perf_row_cache_update: Report stalls around invalidation	2019-05-13 10:47:03 +02:00
Avi Kivity	a8b3cb8a28	Update seastar submodule * seastar f73690e...3f7a5e1 (7): > Revert "Make sure all allocations/deallocations are properly byte aligned" > http: fix request content for POST requests > doc: discourage generic lambdas and unconstrained templates > smp: add smp_service_group for smp::submit_to() resource control > Revert "smp: add smp_service_group for smp::submit_to() resource control" > smp: add smp_service_group for smp::submit_to() resource control > Make sure all allocations/deallocations are properly byte aligned	2019-05-12 13:32:41 +03:00
Tomasz Grabiec	fd349a3c65	hashing: Introduce C++ concept for the hasher	2019-05-10 12:54:30 +02:00
Tomasz Grabiec	5c2f5b522d	hashers: Rename hasher to cryptopp_hasher So that we can introduce a truly generic interface named "hasher".	2019-05-10 12:54:08 +02:00
Tomasz Grabiec	b7ece4b884	gc_clock: Fix hashing to be backwards-compatible Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460. (cherry picked from commit `549d0eb2f3`)	2019-05-10 12:48:46 +02:00
Avi Kivity	fdace36fa5	Merge "Fixes for GCC9 build" from Paweł " This series contains fixes for GCC9 build, mostly corrections needed after changes in libstdc++. With this series and a workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415 (not included) Scylla builds and passes unit tests with GCC9 (tested on Fedora 30, development mode only). Tests: unit(dev with gcc8 and gcc9). " * tag 'gcc9-fixes/v1' of https://github.com/pdziepak/scylla: tests/imr: add missing noexcept counters: bytes_view::pointer is not const pointer imr/fundamental: use bytes_view::const_pointer for const pointer	2019-05-09 21:51:24 +03:00
Paweł Dziepak	96eec203bd	tests/imr: add missing noexcept The concepts require that serialisers passed to the IMR are noexcept. GCC9 started verifying that.	2019-05-09 17:38:24 +01:00
Paweł Dziepak	ae9e083b02	counters: bytes_view::pointer is not const pointer In libstdc++ for gcc9 std::basic_string_view::pointer isn't const any more. As a result the compiler is complaining about reinterpret_cast casting away const. The solution is to use std::conditional<> to choose between const pointer for counter view and non-const pointer for mutable counter view.	2019-05-09 17:31:35 +01:00
Paweł Dziepak	c19576319f	imr/fundamental: use bytes_view::const_pointer for const pointer In libstdc++ shipped with gcc9 std::basic_string_view::pointer is no longer constant, which is causing the compiler to complain about dropping const in reinterpret_cast. The solution is to use std::basic_string_view::const_pointer.	2019-05-09 17:30:15 +01:00
Paweł Dziepak	49b4aeca4d	Merge "hinted handoff: prevent sending attempts" from Vlad " Fix the broken logic that is meant to prevent sending hints when node is in a DOWN NORMAL state. " * 'hinted_handoff_stop_sending_to_down_node-v2' of https://github.com/vladzcloudius/scylla: hints_manager: rename the state::ep_state_is_not_normal enum value hinted handoff: fix the logic that detects that the destination node is in DN state hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper() types.cc: fix the compilation with fmt v5.3.0	2019-05-09 15:18:57 +01:00
Avi Kivity	db536776d9	tools: toolchain: fix dbuild in interactive mode regression Before `ede1d248af`, running "tools/toolchain/dbuild -it -- bash" was a nice way to play in the toolchain environment, for example to start a debugger. But that commit caused containers to run in detached mode, which is incompatible with interactive mode. To restore the old behavior, detect that the user wants interactive mode, and run the container in non-detached mode instead. Add the --rm flag so the container is removed after execution (as it was before `ede1d248af`). Message-Id: <20190506175942.27361-1-avi@scylladb.com>	2019-05-09 15:01:21 +02:00
Dejan Mircevski	d5f587b83d	Narrow down build dependences of duration_test In 0ea6df, duration_test was made to link against all tests/*.o files. This isn't necessary, as it only needs tests/exception_utils.o. This patch narrows down duration_test's dependences to only exception_utils. Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190508211630.108228-1-dejan@scylladb.com>	2019-05-09 15:01:21 +02:00
Dejan Mircevski	e4ec89473e	tests: Cover indexing errors in frozen collections Add new test cases: - disallow creating a non-FULL index on frozen collections - disallow repeated creation of a FULL index on frozen collections - disallow FULL indexes on non-frozen collections - disallow referencing frozen-map entries in the WHERE clause Also add error-message expectations to existing test cases. Fixes #3654. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190509025806.124499-1-dejan@scylladb.com>	2019-05-09 15:25:11 +03:00
Dejan Mircevski	4eeec4a452	tests: drop util.hh The file tests/util.hh was somehow committed despite `git mv`g it to tests/exception_utils.hh. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190508210203.106295-1-dejan@scylladb.com>	2019-05-09 14:45:33 +03:00
Takuya ASADA	19a973cd05	dist/ami: fix wrong path of SCYLLA-PRODUCT-FILE Since other build_*.sh are for running inside extracted relocatable package, they have SCYLLA-PRODUCT-FILE on top of the directory, but build_ami.sh is not running in such condition, we need to run SCYLLA-VERSION-GEN first, then refer to build/SCYLLA-PRODUCT-FILE. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190509110621.27468-1-syuu@scylladb.com>	2019-05-09 14:45:31 +03:00
Vlad Zolotarov	f07c341efc	hints_manager: rename the state::ep_state_is_not_normal enum value Rename this state value to better reflect the reality: state::ep_state_is_not_normal -> state::ep_state_left_the_ring The manager gets to this state when the destination Node has left the ring. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-05-08 15:46:47 -04:00
Vlad Zolotarov	93ba700458	hinted handoff: fix the logic that detects that the destination node is in DN state When node is in a DN state its gossiper state may be NORMAL, SHUTDOWN or "" depending on the use case. In addition to that if node has been removed from the ring its state is also going to be removed from the gossiper_state map. Let's consider the above when deciding if node is in the DN state. Fixes #4461 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-05-08 14:53:01 -04:00
Glauber Costa	a23531ebd5	Support AWS i3en instances AWS just released their new instances, the i3en instances. The instance is verified already to work well with scylla, the only adjustments that we need is advertise that we support it, and pre-fill the disk information according to the performance numbers obtained by running the instance. Fixes #4486 Branches: 3.1 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190508170831.6003-1-glauber@scylladb.com>	2019-05-08 20:09:44 +03:00
Avi Kivity	a86fdeb02b	Merge "Implement GROUP BY" from Dejan " Cassandra has supported GROUP BY in SELECT statements since 2016 (v3.10), while ScyllaDB currently treats it as a syntax error. To achieve parity with Cassandra in this important bit of functionality, this patch adds full support for GROUP BY, from parsing to validation to implementation to testing. " * 'groupby-implPP' of https://github.com/dekimir/scylla: Implement grouping in selection processing Propagate GROUP BY indices to result_set_builder Process GROUP BY columns into select_statement Parse GROUP BY clause, store column identifiers	2019-05-08 18:35:12 +03:00
Dejan Mircevski	d51e4a589d	Implement grouping in selection processing Make result_set_builder obey its _group_by_cell_indices by recognizing group boundaries and resetting the selectors. Also make simple_selectors work correctly when grouping. Fixes #2206. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 11:05:36 -04:00
Dejan Mircevski	c3929aee3a	Propagate GROUP BY indices to result_set_builder Ensure that the indices recorded in select_statement are passed to result_set_builder when one is created for processing the cell values. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 10:10:10 -04:00
Dejan Mircevski	274a77f45e	Process GROUP BY columns into select_statement Validate raw GROUP BY identifiers and translate them into a select_statement member. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 10:10:10 -04:00
Dejan Mircevski	e1fb414805	Parse GROUP BY clause, store column identifiers Extend the grammar file with GROUP BY, collect the column identifiers, and store them in raw::select_statement. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 10:09:22 -04:00
Avi Kivity	ab3f044daa	Revert "Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz" This reverts commit `dcb263b36b`, reversing changes made to `a6759dc6aa`. schema_change_test fails consistently on master with it.	2019-05-08 16:19:38 +03:00
JP-Reddy	56420dc650	scylla_io_setup: TypeError in iotune_args array from scylla_io_setup script Whenever the iotune_args array uses "--smp", it needs cpudata.smp() which returns an integer instead of a string. So when iotune_args is passed to subprocess.check_call(), it actually throws "TypeError: expected str, bytes or os.PathLike object, not int" but "%s did not pass validation tests, it may not be on XFS..." is shown as the exception. Even though the user inputs correct arguments, it might still throw an error and confuse the user that he/she has not passed the right arguments. One simple fix is to use str(cpudata.smp()) instead of cpudata.smp(). Signed-off-by: JP-Reddy <guthijp.reddy@gmail.com> Message-Id: <20190406070118.48477-1-guthijp.reddy@gmail.com>	2019-05-07 20:13:54 +03:00
Paweł Dziepak	8a16cbc50d	Merge "treewide: adjust for gcc 9" from Avi " gcc 9 complains a lot about pessimizing moves, narrowing conversions, and has tighter deduction rules, plus other nice warnings. Fix problems found by it, and make some non-problems compile without warnings. " * tag 'gcc9/v1' of https://github.com/avikivity/scylla: types: fix pessimizing moves thrift: fix pessimizing moves tests: fix pessimizing moves tests: cql_query_test: silence narrowing conversion warning test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T> table: fix potentially wrong schema when reading from zero sstables storage_proxy: fix pessimizing moves memtable: fix pessimizing moves IDL: silence narrowing conversion in bool serializer compaction: fix pessimizing moves cache: fix pessimizing moves locator: fix pessimizing moves database: fix pessimizing moves cql: fix pessimizing moves cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9	2019-05-07 12:19:29 +01:00
Avi Kivity	43867fe618	types: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 10:01:36 +03:00
Avi Kivity	1b760297f5	thrift: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 10:01:15 +03:00
Avi Kivity	0ff6e48e77	tests: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 10:00:58 +03:00
Avi Kivity	b60d58d6bd	tests: cql_query_test: silence narrowing conversion warning Make it explicit to gcc 9 that the conversion to bool is intended.	2019-05-07 09:59:44 +03:00
Avi Kivity	5636b621a7	test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T> gcc 9 is unable to decide whether to call role_name's copy or move constructor. Help it by casting.	2019-05-07 09:58:21 +03:00
Avi Kivity	add20eb9a6	table: fix potentially wrong schema when reading from zero sstables We use the schema during creation of the mutation_source rather than during the query itself. Likely they're the same, and since no rows are returned from a zero-sstable query, harmless. But gcc 9 complains. Fix by using the query's schema.	2019-05-07 09:56:30 +03:00
Avi Kivity	985a30a01c	storage_proxy: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:56:09 +03:00
Avi Kivity	fd3c493961	memtable: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:55:53 +03:00
Avi Kivity	17c268cd55	IDL: silence narrowing conversion in bool serializer bool serializers are now aliases to int8_t serializers, but gcc 9 complains about narrowing conversions, due to the path int8_t -> int -> bool. A bad narrowing conversion here cannot happen in practice, but massage the code a little to silence it.	2019-05-07 09:28:24 +03:00
Avi Kivity	d7cbd3dc61	compaction: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:28:12 +03:00
Avi Kivity	9c7eb95f78	cache: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:27:50 +03:00
Avi Kivity	c42d59d805	locator: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:27:27 +03:00
Avi Kivity	96a0073929	database: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:26:58 +03:00
Avi Kivity	03e9cdbfb0	cql: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:26:20 +03:00
Avi Kivity	c26ec176dd	cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9 We use uninitialized<T> (wrapping an optional<T>) to adjust to the parser's way of laying out the code, but this fails with gcc 9 (presumably for the correct reasons) when converting from uninitialized<T> back to optional<T>. Add a conversion operator to make it build.	2019-05-07 09:21:22 +03:00
Dejan Mircevski	0ea6df2cd1	tests: Add predicates for checking exception messages Many tests verify exception messages. Currently, they do so via verbose lambdas or inner functions that hide test-failure locations. This patch adds utilities for quick creation of message-checking tests and replaces existing ad-hoc methods with these new utilities. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190506210006.124645-1-dejan@scylladb.com>	2019-05-07 07:11:07 +03:00
Avi Kivity	dcb263b36b	Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz " Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460. Branches: 3.1 " * tag 'fix-gc_clock-digest-v1' of github.com:tgrabiec/scylla: tests: Add test which verifies that schema digest stays the same tests: Add sstables for the schema digest test gc_clock: Fix hashing to be backwards-compatible	2019-05-07 07:04:40 +03:00
Tomasz Grabiec	8019634dba	tests: Add test which verifies that schema digest stays the same	2019-05-06 18:43:43 +02:00
Tomasz Grabiec	1f2995c8c5	tests: Add sstables for the schema digest test Generated by running test_schema_digest_does_not_change with regenerate set to true.	2019-05-06 18:43:43 +02:00
Tomasz Grabiec	549d0eb2f3	gc_clock: Fix hashing to be backwards-compatible Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460.	2019-05-06 18:43:43 +02:00
Avi Kivity	a6759dc6aa	Update seastar submodule * seastar 4cdccae...f73690e (16): > sstring: silence technically correct but unhelpful warning in sstring move ctor > cmake: add a seastar_supports_flag function > future: Fix build with libc++'s non-trivially-constructible std::tuple<> > Revert "Make sure all allocations are properly bytes aligned" > Merge "future: simplify future_state management" from Rafael > Make sure all allocations are properly bytes aligned > util/log: use correct clock type > core/reactor: don't assume system_clock::duration is in nanoseconds > Merge "Optimize the future_state move constructor" from Rafael > rpc: don't use boost/variant.hpp directly > core/memory: Omit [[gnu::leaf]] attribute on clang > Fix build with std::filesystem > Merge "Fix clang build and tests" from Rafael > cmake: Move ) out of quotes > Merge "Fix some bugs found by (or perhaps in) gcc 9" by Avi > Deduplicate Seastar dependencies management in CMake scripts	2019-05-06 19:17:37 +03:00
Gleb Natapov	1d851a3892	messaging: catch an error that sending of CLIENT_ID may return Avoid a warning about unhandled exception. Message-Id: <20190506122718.GL21208@scylladb.com>	2019-05-06 18:13:51 +03:00
Glauber Costa	79a5351651	scylla-housekeeping: timeout eventually scylla-housekeeping always wants to run in the installation to check if we are running the latest version. This happens regardless of whether or not we said yes or no to the housekeeping scylla_setup question - as that question only deals with whether or not we want to do this through a timer. It is fine to try to run scylla-housekeeping, as long as we time it out. The current code doesn't. The naive solution is to add a timeout parameter to urllib.request.open. However, that timeout is not respected and in my tests I saw real timeouts up to four times higher the timeout we set. For a reasonable 5s timeout, this mean a 20s real timeout which can lead to a very bad user experience. This seems to be a known problem with this module according to a quick Google search. This patch then takes a slightly more complex solution and uses multiprocess to enforce a well-defined user-visible timeout. Fixes #3980 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190506122335.5707-1-glauber@scylladb.com>	2019-05-06 17:37:59 +03:00
Gleb Natapov	b8188e1e2f	storage_proxy: avoid copying of a topology and endpoint array in batchlog code batchlog make copies of topology and endpoint array in batchlog endpoint choosing code. There is a remark that at least endpoint copy is deliberate because Cassandra code has it. We do not have to follow. Our endpoint calculation code is atomic, so we can use a reference. Message-Id: <20190506115815.GK21208@scylladb.com>	2019-05-06 17:36:50 +03:00
Raphael S. Carvalho	ef5681486f	compaction: do not unconditionally delete a new sstable in interrupted compaction After incremental compaction, new sstables may have already replaced old sstables at any point. Meaning that a new sstable is in-use by table and a old sstable is already deleted when compaction itself is UNFINISHED. Therefore, we should NEVER delete a new sstable unconditionally for an interrupted compaction, or data loss could happen. To fix it, we'll only delete new sstables that didn't replace anything in the table, meaning they are unused. Found the problem while auditting the code. Fixes #4479. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190506134723.16639-1-raphaelsc@scylladb.com>	2019-05-06 16:55:36 +03:00
Avi Kivity	1c65ba6e66	Use correct scylla_tables schema for removing version column Mutations carry their schema, so use that instead of bring in a global schema, which may change as features are added. Message-Id: <20190505132542.6472-1-avi@scylladb.com>	2019-05-06 13:51:08 +02:00
Paweł Dziepak	51e98e0e11	tests/perf_fast_forward: report average number of aio operations perf_fast_forward is used to detect performance regressions. The two main metrics used for this are fargments per second and the number of the IO operations. The former is a median of a several runs, but the latter is just the actual number of asynchronous IO operations performed in the run that happened to be picked as a median frag/s-wise. There's no always a direct correlation between frag/s and aio and the latter can vary which makes the latter hard to compare. In order to make this easier a new metric was introduced: "average aio" which reports the average number of asynchronous IO operations performed in a run. This should produce much more stable results and therefore make the comparison more meaningful. Message-Id: <20190430134401.19238-1-pdziepak@scylladb.com>	2019-05-06 11:47:31 +02:00
Piotr Sarna	cf8d2a5141	Revert "view: cache is_index for view pointer" This reverts commit `dbe8491655`. Caching the value was not done in a correct manner, which resulted in longevity tests failures. Fixes #4478 Branches: 3.1 Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>	2019-05-06 11:45:46 +03:00
Benny Halevy	d9136f96f3	commitlog: descriptor: skip leading path from filename std::regex_match of the leading path may run out of stack with long paths in debug build. Using rfind instead to lookup the last '/' in in pathname and skip it if found. Fixes #4464 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>	2019-05-05 17:51:56 +03:00
Benny Halevy	3a2fa82d6e	time_window_backlog_tracker: fix use after free Fixes #4465 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190430094209.13958-1-bhalevy@scylladb.com>	2019-05-05 12:47:51 +03:00
Glauber Costa	47d04e49e8	scylla_setup: respect user's decision not to call housekeeping The setup script asks the user whether or not housekeeping should be called, and in the first time the script is executed this decision is respected. However if the script is invoked again, that decision is not respected. This is because the check has the form: if (housekeeping_cfg_file_exists) { version_check = ask_user(); } if (version_check) { do_version_check() } else { dont_do_it() } When it should have the form: if (housekeeping_cfg_file_exists) { version_check = ask_user(); if (version_check) { do_version_check() } else { dont_do_it() } } (Thanks python) This is problematic in systems that are not connected to the internet, since housekeeping will fail to run and crash the setup script. Fixes #4462 Branches: master, branch-3.1 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190502034211.18435-1-glauber@scylladb.com>	2019-05-02 18:46:41 +03:00
Glauber Costa	99c00547ad	make scylla_util OS detection robust against empty lines Newer versions of RHEL ship the os-release file with newlines in the end, which our script was not prepared to handle. As such, scylla_setup would fail. This patch makes our OS detection robust against that. Fixes #4473 Branches: master, branch-3.1 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190502152224.31307-1-glauber@scylladb.com>	2019-05-02 18:33:35 +03:00
Paweł Dziepak	cf451f0e62	Merge "gdb: Fixes and improvements to memory analysis" from Tomasz " One of the fixes is for incorrect recognition of memory pages as belonging or not belonging to small allocation pools in some cases. Also, compensates for https://github.com/scylladb/seastar/issues/608 in "scylla memory", which improves accurracy of the small allocation pool report. Fixes "scylla task_histogram" to not look into pages which do not belong to live small allocation pool spans. Fixes #4367 Fixes #4368 " * tag 'gdb-fix-span-qualification-v2' of github.com:tgrabiec/scylla: gdb: Print size of large allocations in 'scylla ptr' gdb: Fix 'scylla ptr' for free pages gdb: Set is_live and offset for large allocations properly in 'scylla ptr' gdb: Fix 'scylla ptr' misqualifying pointers gdb: Make 'scylla memory' show unused memory in small pools gdb: Fix small pool memory usage reporting in 'scylla memory' gdb: Switch 'scylla memory' to use the span_checker to find large spans gdb: Switch task_histogram to use the span_checker gdb: Introduce span_checker	2019-05-02 14:25:30 +01:00
Gleb Natapov	95c6d19f6c	batchlog_manager: fix array out of bound access endpoint_filter() function assumes that each bucket of std::unordered_multimap contains elements with the same key only, so its size can be used to know how many elements with a particular key are there. But this is not the case, elements with multiple keys may share a bucket. Fix it by counting keys in other way. Fixes #3229 Message-Id: <20190501133127.GE21208@scylladb.com>	2019-05-01 17:30:11 +03:00
Nadav Har'El	2710f382de	secondary index: expand test of secondary-index and UPDATE requests The existing unit test test_secondary_index_contains_virtual_columns reproduced a bug (issue #4144) with indexing of primary-key columns, but we only actually tested clustering columns. In issue #4471 there was a question whether we may still have a bug when indexing of partition-key columns. This patch adds a test that verifies that we don't, and this case works well too. Refs #4144 Refs #4471 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190501113500.25900-1-nyh@scylladb.com>	2019-05-01 12:53:23 +01:00
Nadav Har'El	a45b6e41a0	materialized views and secondary index: sometimes allow dropping base columns Until this patch, dropping columns from a table was completely forbidden if this table has any materialized views or secondary indexes. However, this is excessively harsh, and not compatible with Cassandra which does allow dropping columns from a base table which has a secondary index on other columns. This incompatibility was raised in the following Stackoverflow question: https://stackoverflow.com/questions/55757273/error-while-dropping-column-from-a-table-with-secondary-index-scylladb/55776490 In this patch, we allow dropping a base table column if none of its materialized views needs this column. Columns selected by a view (as regular or key columns) are needed by it, of course, but when virtual columns are used (namely, there is a view with same key columns as the base), all columns are needed by the view, so unfortunately none of the columns may be dropped. After this patch, when a base-table column cannot be dropped because one of the materialized views needs it, the error message will look like: exceptions::invalid_request_exception: Cannot drop column a from base table ks.cf: a materialized view cf_a_idx_index needs this column. This patch also includes extensive testing for the cases where dropping columns are now allowed, and not allowed. The secondary-index tests are especially interesting, because they demonstrate that now usually (when a non-key column is being indexed) dropping columns will be allowed, which is what originally bothered the Stackoverflow user. Fixes #4448. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190429214805.2972-1-nyh@scylladb.com>	2019-04-30 12:13:10 +01:00
Nadav Har'El	92d5f61ba5	cql: support single-value IN restriction wherever EQ restriction is supported There are several places were IN restrictions are not currently supported, especially in queries involving a secondary index. However, when the IN restriction has just a single value, it is nothing more than an equality restriction and can be converted into one and be supported. So this patch does exactly this. Note that Cassandra does this conversion since August 2016, and therefore supports the special case of single-value IN even where general IN is not supported. So it's important for Cassandra compatibility that we do this conversion too. This patch also includes a test with two queries involving a secondary index that were previously disallowed because of the "IN" on the primary key or the indexed column - and are now allowed when the IN restriction has just a single value. A third query tested is not related to secondary indexes, but confirms we don't break multi-column single-value IN queries. Fixes #4455. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190428160317.23328-1-nyh@scylladb.com>	2019-04-30 12:13:06 +01:00
Tomasz Grabiec	1adcb3637e	Merge "multishard reader: fix handling of non strictly monotonous positions" from Botond The shard readers of the multishard reader assumed that the positions in the data stream are strictly monotonous. This assumption is invalid. Range tombstones can have positions that they can share with other range tombstones and/or a clustering row. The effect of this false assumption was that when the shard reader was evicted such that the last seen fragment was a range tombstone, when recreated it would skip any unseen fragments that have the same position as that of the last seen range tombstone. Fixes: #4418 Branches: master, 3.0, 2019.1 Tests: unit(dev) * https://github.com/denesb/scylla.git multishard_reader_handle_non_strictly_monotonous_positions/v4: multishard_combining_reader: shard_reader::remote_reader extract fill-buffer logic into do_fill_buffer() mutlishard_combining_reader: reorder shard_reader::remote_reader::do_fill_buffer() code position_in_partition_view: add region() accessor multishard_combining_reader: fix handling of non-strictly monotonous positions flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice tests: add unit test for multishard reader correctly handling non-strictly monotonous positions	2019-04-30 12:35:28 +02:00
Tomasz Grabiec	077c639e42	Merge "Simplify the result_set_row API" from Rafael Currently null and missing values are treated differently. Missing values throw no_such_column. Null values return nullptr, std::nullopt or throw null_column_value. The api is a bit confusing since a function returning a std::optional either returns std::nullopt or throws depending on why there is no value. With this patch series only get_nonnull throws and there is only one exception type. * https://github.com/espindola/scylla.git espindola/merge-null-and-missing-v2: query-result-set: merge handling of null and missing values Remove result_set_row::has Return a reference from get_nonnull	2019-04-30 11:06:29 +02:00
Rafael Ávila de Espíndola	63c47117b5	Return a reference from get_nonnull No reason to copy if we don't have to. Now that get_nonnull doesn't copy, replace a raw used of get_data_value with it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 21:14:11 -07:00
Rafael Ávila de Espíndola	0474458872	Remove result_set_row::has Now that the various get methods return nullptr or std::nullopt on missing values, we don't need to do double lookups. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 19:56:26 -07:00
Rafael Ávila de Espíndola	2770b29036	query-result-set: merge handling of null and missing values Nothing seems to differentiate a missing and a null value. This patch then merges the two exception types and now the only method that throws is get_nonnull. The other methods return nullptr or std::nullopt as appropriate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 19:56:20 -07:00
Avi Kivity	3726a4fbd9	Merge "Fix schema disagreement during rolling upgrade" from Tomasz " After `7c87405`, schema sync includes system_schema.view_virtual_columns in the schema digest. Old nodes don't know about this table and will not include it in the digest calculation. As a result, there will be schema disagreement until the whole cluster is upgraded. Also, the order in which tables were hashed changed in `7c87405`, which causes digests to differ in some schemas. Fixes #4457. " * tag 'fix-disagreement-during-upgrade-v2' of github.com:tgrabiec/scylla: db/schema_tables: Include view_virtual_columns in the digest only when all nodes do storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature db/schema_tables: Hash schema tables in the same order as on 3.0 db/schema_tables: Remove table name caching from all_tables() treewide: Propagate schema_features to db::schema::all_tables() enum_set: Introduce full() service/storage_service: Introduce cluster_schema_features() schema: Introduce schema_features schema_tables: Propagate storage_service& to merge_schema() gms/feature: Introduce a more convenient when_enabled() gms/feature: Mark all when_enabled() overloads as const	2019-04-29 14:23:53 +03:00
Avi Kivity	ede1d248af	tools: toolchain: improve dbuild signal handing Currently, we use --sig-proxy to forward signals to the container. However, this requires the container's co-operation, which usually doesn't exist. For example, docker run --sig-proxy fedora:29 bash -c "sleep 5" Does not respond to ctrl-C. This is a problem for continuous integration. If a build is aborted, Jenkins will first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give up and use SIGKILL. If the graceful termination doesn't work, we end up with an orphan container running on the node, which can then consume enough memory and CPU to harm the following jobs. To fix this, trap signals and handle them by killing the container. Also trap shell exit, and even kill the container unconditionally, since if Jenkins happens to kill the "docker wait" process the regular paths will not be taken. We lose a lot by running the container asynchronously with the dbuild shell script, so we need to add it back: - log display: via the "docker logs" command - auto-removal of the container: add a "docker rm -f" command on signal or normal exit Message-Id: <20190424130112.794-1-avi@scylladb.com>	2019-04-29 10:05:21 +02:00
Botond Dénes	aa18bb33b9	tests: add unit test for multishard reader correctly handling non-strictly monotonous positions	2019-04-29 10:24:14 +03:00
Botond Dénes	51e81cf027	flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice To be able to support this new overload, the reader is made partition-range aware. It will now correctly only return fragments that fall into the partition-range it was created with. For completeness' sake and to be able to test it, also implement `fast_forward_to(const dht::partition_range)`. Slicing is done by filtering out non-overlapping fragments from the initial list of fragments. Also add a unit test that runs it through the mutation_source test suite.	2019-04-29 10:24:14 +03:00
Tomasz Grabiec	c96ee9882b	db/schema_tables: Include view_virtual_columns in the digest only when all nodes do After `7c87405`, schema sync includes system_schema.view_virtual_columns in the schema digest. Old nodes don't know about this table and will not include it in the digest calculation. As a result, there will be schema disagreement until the whole cluster is upgraded. Fix this by taking the new table into account only when the whole cluster is upgraded. The table should not be used for anything before this happens. This is not currently enforced, but should be. Fixes #4457.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	a108df09f9	storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature Needed for determining if all nodes in the cluster are aware of the new schema table. Only when all nodes are aware of it we can take it into account when calculating schema digest, otherwise there would be permanent schema disagreement in during rolling upgrade.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	73b859005c	db/schema_tables: Hash schema tables in the same order as on 3.0 The commit `7c87405` also indirectly changed the order of schema tables during hash calculation (index table should be taken after all other tables). This shows up when there is an index created and any of {user defined type, function, or aggregate}. Refs #4457.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	394a684a99	db/schema_tables: Remove table name caching from all_tables() The set of table names will depend on the features and thus will be dynamic.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	3cb7b2d72e	treewide: Propagate schema_features to db::schema::all_tables()	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	f33f0d759d	enum_set: Introduce full()	2019-04-28 15:50:12 +02:00
Tomasz Grabiec	1d9b88dceb	service/storage_service: Introduce cluster_schema_features()	2019-04-28 15:50:12 +02:00
Tomasz Grabiec	0633fcde10	schema: Introduce schema_features	2019-04-28 15:50:12 +02:00
Tomasz Grabiec	6e2c190b5f	schema_tables: Propagate storage_service& to merge_schema() We will need to calculate cluster schema features at the time we calculate the schema digest.	2019-04-28 12:33:10 +02:00
Tomasz Grabiec	6db002163f	gms/feature: Introduce a more convenient when_enabled() It can be invoked with a lambda without the ceremony of creating a class deriving from gms::feature::listener. The reutrned registration object controls listener's scope.	2019-04-28 12:33:10 +02:00
Tomasz Grabiec	22c07b9183	gms/feature: Mark all when_enabled() overloads as const	2019-04-28 12:33:10 +02:00
Rafael Ávila de Espíndola	ee9f3388f6	cql_query_test: Fix a use after return There was nothing keeping the verify lambda alive after the return. It worked most of the time since the only state kept by the lambda was a pointer to cql_test_env. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190426203823.15562-1-espindola@scylladb.com>	2019-04-27 08:06:35 +03:00
Avi Kivity	07d06aee43	Update seastar submodule * seastar e84d2647c...4cdccae53 (4): > Merge "future: Move some code out of line" from Rafael > tests: socket_test: Add missing virtual and override > build: Don't pass -Wno-maybe-uninitialized to clang > Merge "expose file_permssions for creating files and dirs in API" from Benny	2019-04-26 22:58:48 +03:00
Tomasz Grabiec	c6274fdef3	keys: Avoid implicit conversion to partition_key in the hasher of partition_key_view Message-Id: <1556230107-13557-1-git-send-email-tgrabiec@scylladb.com>	2019-04-26 20:02:35 +03:00
Botond Dénes	bc08f8fd07	flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice To be able to run the mutation-source test suite with this reader. In the next patch, this reader will be used in testing another reader, so it is important to make sure it works correctly first.	2019-04-26 12:43:45 +03:00
Botond Dénes	eba310163d	multishard_combining_reader: fix handling of non-strictly monotonous positions The shard readers under a multishard reader are paused after every operation executed on them. When paused they can be evicted at any time. When this happens, they will be re-created lazily on the next operation, with a start position such that they continue reading from where the evicted reader left off. This start position is determined from the last fragment seen by the previous reader. When this position is clustering position, the reader will be recreated such that it reads the clustering range (from the half-read partition): (last-ckey, +inf). This can cause problems if the last fragment seen by the evicted reader was a range-tombstone. Range tombstones can share the same clustering position with other range tombstones and potentially one clustering row. This means that when the reader is recreated, it will start from the next clustering position, ignoring any unread fragments that share the same position as the last seen range tombstone. To fix, ensure that on each fill-buffer call, the buffer contains all fragments for the last position. To this end, when the last fragment in the buffer is a range tombstone (with pos x), we continue reading until we see a fragment with a position y that is greater. This way it is ensured that we have seen all fragments for pos x and it is safe to resume the read, starting from after position x.	2019-04-26 11:38:12 +03:00
Botond Dénes	b30af48c83	position_in_partition_view: add region() accessor	2019-04-26 11:38:12 +03:00
Vlad Zolotarov	274b9d8069	hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check gossiper::is_alive() has a lot of not needed checks (e.g. is_me(ep)) that are irrelevant for HH use case and we may safely skip them. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:16:07 -04:00
Vlad Zolotarov	74b4076ceb	hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper() sender has its own reference to the local gossiper - use it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:04:02 -04:00
Vlad Zolotarov	fe82437dea	types.cc: fix the compilation with fmt v5.3.0 Compilation fails with fmt release 5.3.0 when we print a bytes_view using "{}" formatter. Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument" Fix this by explicitly using to_hex() converter. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:04:02 -04:00
Piotr Sarna	037b517c85	service: initialize system distributed keyspace after schema agreement In order to avoid schema disagreements during upgrades (which may lead to deadlocks), system distributed keyspace initialization is moved right before starting the bootstrapping process, after the schema agreement checks already succeeded. Fixes #3976 Message-Id: <932e642659df1d00a2953df988f939a81275774a.1556204185.git.sarna@scylladb.com>	2019-04-25 18:44:08 +02:00
Raphael S. Carvalho	ccb29c6c20	sstables: make partitioned sstable set available to custom compaction strategies To make it available, we'll need to make it optional the usage of level metadata, used to deal with interval map's fragmentation issue when level 0 falls behind, and also introduce a interface for compaction strategies to implement make_sstable_set() that instantiate partitioned sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190424232948.668-1-raphaelsc@scylladb.com>	2019-04-25 12:59:04 +03:00
Botond Dénes	a3f79bfe5e	mutlishard_combining_reader: reorder shard_reader::remote_reader::do_fill_buffer() code Reduce the number of indentations - use early return for the short path.	2019-04-24 10:55:16 +03:00
Botond Dénes	bbd3f0acc3	multishard_combining_reader: shard_reader::remote_reader extract fill-buffer logic into do_fill_buffer()	2019-04-24 10:55:16 +03:00
Avi Kivity	b19792405f	main: RAII-ify shutdown Instead of app-template::run_deprecated() and at_exit() hooks, use app_template::run() and RAII (via defer()) to stop services. This makes it easier to add services that do support shutdown correctly. Ref #2737 Message-Id: <20190420175733.29454-1-avi@scylladb.com>	2019-04-23 16:13:39 +02:00
Avi Kivity	9a6c86e2a7	config: convert _make_config_values to individual named_value member declarations and initializers While causing some duplication (names are explicitly instead of implicitly stringified, and names are repeated in the member declaration and initializer), it is overall more maintainable than the huge macro. It is easier to overload named_value constructors when you can get error reporting on the line where the error occurs, for example.	2019-04-23 16:29:03 +03:00
Avi Kivity	4b3c2f6514	config: add allowed_values parameter to named_value constructor The _make_config_values() macro supples an optional list of allowed values for a config item, so support that, even though no one uses it yet.	2019-04-23 16:29:03 +03:00
Avi Kivity	d959fbfc16	config: auto-add named_values into config_file By passing a config_file into named_value, we remove another call to the _make_config_values() macro.	2019-04-23 16:29:03 +03:00
Avi Kivity	b663cd1765	api: config: stop using _make_config_values Now that named_value::value_as_json() exists, make use of it to report the current value of a configuration variable via the REST API, instead of _make_config_values().	2019-04-23 16:29:03 +03:00
Avi Kivity	6033b6a079	config: add named_value::value_as_json() Currently, the REST API does its own conversion of named_value into json. This requires it to use the _make_config_values macro to perform iteration of all config items, since it needs to preserve the concrete type of the item while iterating, so it can select the correct json conversion. Since we want to remove that macro, we need to provide a different way to convert a config item to json. So this patch adds a value_as_json(). To hide json_return_value from the rest of the system, we extend config_type with a conversion function to handle the details. This usually calls the json_return_type constructor directly, but when it doesn't have default translation, it interposes a conversion into a type that json recognizes. I didn't bother maintaining the existing type names, since they're C++ names which don't make sense for the UI.	2019-04-23 16:28:19 +03:00
Avi Kivity	db3f61776f	config: remove value_status from named_value template parameter list The value_status is only needed at run-time, and removing it from the template parameter list reduces type proliferation (which leads to code bloat) and simplifies the code.	2019-04-23 16:15:28 +03:00
Avi Kivity	daf5744daa	config: make the named_value type name available without requiring _make_config_values I want to remove the _make_config_values macro, but it is needed now in api/config.cc to make the type names available. So as a first step, copy the type names to config_src. Further changes can extract it from there. Because we want to add more type infomation in following patches, place the type name in a new config_type object, instead of allocating a string_view in config_src.	2019-04-23 16:13:54 +03:00
Tomasz Grabiec	21fbf59fa8	lsa: Fix compact_and_evict() being called with a too low step compact_and_evict gets memory_to_release in bytes while reclamation step is in segments. Broken in `f092decd90`. It doesn't make much difference with the current default step of 1 segment since we cannot reclaim less than that, so shouldn't cause problems in practice. Message-Id: <1556013920-29676-1-git-send-email-tgrabiec@scylladb.com>	2019-04-23 13:14:43 +03:00
Gleb Natapov	c6b3b9ff13	cache_hitrate_calculator: wait for ongoing calculation to complete during stop Currently stop returns ready future immediately. This is not a problem since calculation loop holds a shared pointer to the local service, so it will not be destroyed until calculation completes and global database object db, that also used by the calculation, is never destroyed. But the later is just a workaround for a shutdown sequence that cannot handle it and will be changed one day. Make cache hitrate calculation service ready for it. Message-Id: <20190422113538.GR21208@scylladb.com>	2019-04-22 14:44:42 +03:00
Takuya ASADA	64c2aa8f9b	reloc/python3: add missing SCYLLA-PRODUCT-FILE to python3 relocatable package Since `214c74a`, we need SCYLLA-PRODUCT-FILE on relocatable package so add it on python3 package as well. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190422085620.22486-1-syuu@scylladb.com>	2019-04-22 13:56:38 +03:00
Gleb Natapov	306f5b99b5	cache_hitrate_calculator: fix use after free in non_system_filter lambda non_system_filter lambda is defined static which means it is initialized only once, so the 'this' that is will capture will belong to a shard where the function runs first. During service destruction the function may run on different shard and access already other's shard service that may be already freed. Fixed #4425 Message-Id: <20190421152139.GN21208@scylladb.com>	2019-04-21 18:22:31 +03:00
Amnon Heiman	9ad63efcfe	Adding node_exporter to docker This patch add the node_exporter to the docker image. It install it create and run a service with it. After this patch node_exporter will run and will be part of scylla Docker image. Fixes #4300 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190421130643.6837-1-amnon@scylladb.com>	2019-04-21 18:12:58 +03:00
Benny Halevy	0c9aaef673	sstables: make lamdas that std:move mutable As noticed by Rafael Ávila de Espíndola <espindola@scylladb.com> regarding commit `5a99023d4a`: Without the lambda being mutable, the second std::move actually doesn't move anything. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421150422.19304-1-bhalevy@scylladb.com>	2019-04-21 18:11:42 +03:00
Benny Halevy	5a99023d4a	treewide: use lambda for io_check of *touch_directory To prepare for a seastar change that adds an optional file_permissions parameter to touch_directory and recursive_touch_directory. This change messes up the call to io_check since the compiler can't derive the Func&& argument. Therefore, use a lambda function instead to wrap the call to {recursive_,}touch_directory. Ref #4395 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>	2019-04-21 12:04:39 +03:00
Tomasz Grabiec	f092decd90	lsa: Fix potential bad_alloc even though evictable memory exists When we start the LSA reclamation it can be that segment_pool::_free_segments is 0 under some conditions and segment_pool::_current_emergency_reserve_goal is set to 1. The reclamation step is 1 segment, and compact_and_evict_locked() frees 1 segment back into the segment_pool. However, segment_pool::reclaim_segments() doesn't free anything to the standard allocator because the condition _free_segments > _current_emergency_reserve_goal is false. As a result, tracker::impl::reclaim() returns 0 as the amount of released memory, tracker::reclaim() returns memory::reclaiming_result::reclaimed_nothing and the seastar allocator thinks it's a real OOM and throws std::bad_alloc. The fix is to change compact_and_evict() to make sure that reserves are met, by releasing more if they're not met at entry. This change also allows us to drop the variant of allocate_segment() which accepts the reclamation step as a means to refill reserves faster. This is now not needed, because compact_and_evict() will look at the reserve deficit to increase the amount of memory to reclaim. Fixes #4445 Message-Id: <1555671713-16530-1-git-send-email-tgrabiec@scylladb.com>	2019-04-20 09:17:49 +03:00
Avi Kivity	704600f829	Update seastar submodule * seastar eb03ba5cd...e84d2647c (14): > Fix hardcoded python paths in shebang line > Disable -Wmaybe-uninitialized everywhere > app_template: allow opting out of automatic SIGINT/SIGTERM handling > build: Restore DPDK machine inference from cflags > http: capture request content for POST requests > Merge "Simplify future_state and promise" from Rafael > temporary_buffer: fix memleak on fast path > perftune.py: allow explicitly giving a CPU mask to be used for binding IRQs > perftune.py: fix the sanity check for args.tune > perftune.py: identify fast-path hardware queues IRQs of Mellanox NICs > memory: malloc_allocator should be always available > Merge "Using custom allocator in the posix network stack" from Elazar > memory: Tell reclaimers how much should be reclaimed > net/ipv4_addr: add std::hash & operator== overloads	2019-04-20 09:16:53 +03:00
Avi Kivity	d485facea2	Revert "tools: toolchain: improve dbuild signal handing" This reverts commit `6c672e674b`. It loses build logs, and the patch that restores logs causes build failures, so the whole thing needs to be revisited.	2019-04-19 15:16:42 +03:00
Takuya ASADA	0a874f1897	dist/docker/redhat: prioritize /opt/scylladb/python3/bin on $PATH To prevent running entrypoint script in another python3 package like python36 in EPEL, move /opt/scylladb/python3/bin to top of $PATH. It won't happen on this container image, but may occurs when user tries to extend the image. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190417165806.12212-1-syuu@scylladb.com>	2019-04-19 11:47:40 +03:00
Takuya ASADA	c3dae6673f	dist/common/scripts: use out() to run perftune.py perftune.py executes hwloc-calc, the command is now provided as relocatable binary, placed under /opt/scylladb/bin. So we need to add the directory to PATH when calling subprocess.check_output(), but our utility function already do that, switch to it. Fixes #4443 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190418124345.24973-1-syuu@scylladb.com>	2019-04-19 11:47:40 +03:00
Benny Halevy	9785754e0d	distributed_loader: do not follow symlinks when verifying mode and owner We allow only regular files and directotries so to detect symlinks we must not follow them. Fixes #4375 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190418051627.9298-1-bhalevy@scylladb.com>	2019-04-19 11:47:40 +03:00
Takuya ASADA	214c74a71d	dist: merge product name parameter on single place When we add product name customization, we mistakenly defined the parameter on each package build script. Number of script is increasing since we recently added relocatable python3 package, we should merge it in single place. Also we should save the parameter on relocatable package, just like version-release parameters. So move the definition to SCYLLA-VERSION-GEN, save it to build/SCYLLA-PRODUCT-FILE then archive it to relocatable package. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190417163335.10191-1-syuu@scylladb.com>	2019-04-19 11:47:40 +03:00
Paweł Dziepak	d47ea66ec6	messaging_service: add lz4_fragmented RPC compressor Seastar now supports two RPC compression algorithm: the original LZ4 one and LZ4_FRAGMENTED. The latter uses lz4 stream interface which allows it to process large messages without fully linearising them. Since, RPC requests used by Scylla often contain user-provided data that potentially could be very large, LZ4_FRAGMENTED is a better choice for the default compression algorithm. Message-Id: <20190417144318.27701-1-pdziepak@scylladb.com>	2019-04-18 19:07:14 +03:00
Takuya ASADA	592fec32a0	dist/common/scripts: use /etc/os-release to detect distributions Since we moved relocatable .rpm now Scylla able to run on Amazon Linux 2. However, is_redhat_variant() on scylla_util.py does not works on Amazon Linux 2, since it does not have /etc/redhat-release. So we need to switch to /etc/os-release, use ID_LIKE to detect Redhat variants/Debian variants. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190417115634.9635-1-syuu@scylladb.com>	2019-04-18 19:07:14 +03:00
Takuya ASADA	3cf7cf015a	dist/docker/redhat: use relocatable python3 on docker-entrypoint.py Switch to relocatable python3 instead of EPEL's python3 on docker-entrypoint.py. Also drop uneeded dependencies, since we switched to relocatable scylla image. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190417111024.6604-1-syuu@scylladb.com>	2019-04-18 19:07:14 +03:00
Paweł Dziepak	85409c1a16	Merge "Validate elements of collections" from Piotr " Previously we weren't validating elements of collections so it was possible to add non-UTF-8 string to a column with type list<text>. Tests: unit(release) Fixes #4009 " * 'haaawk/4009/v5' of github.com:scylladb/seastar-dev: types: Test correct map validation types: Test correct in clause validation types: Test correct tuple validation types: Test correct set validation types: Test correct list validation types: Add test_tuple_elements_validation types: Add test_in_clause_validation types: Add test_map_elements_validation types: Add test_set_elements_validation types: Add test_list_elements_validation types: Validate input when tuples types: Validate input when parsing a set types: Validate input when parsing a map types: Validate input when parsing a list types: Implement validation for tuple types: Implement validation for set types: Implement validation for map types: Implement validation for list types: Add cql_serialization_format parameter to validate	2019-04-18 19:07:14 +03:00
Botond Dénes	6e85d1e8c1	date_type_impl: add notice explaining why its not used And why is it still in the code. The note has been copied from Origin. Refs: #4419 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c7790a898c331a7f58014d82a10cbc9ee7ad3265.1555483620.git.bdenes@scylladb.com>	2019-04-18 19:07:14 +03:00
Piotr Jastrzebski	134b59a425	table_helper: take insert function arguments by value Previous version wasn't working correctly with r-values. Fixes #4438 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <5017b04901c47bd826b2e411e603ce01e42a83a5.1555424512.git.piotr@scylladb.com>	2019-04-16 17:34:35 +03:00
Tomasz Grabiec	5dc3f5ea33	Merge "Properly enable MC format on the cluster" from Piotr 1. All nodes in the cluster have to support MC_SSTABLE_FEATURE 2. When a node observes that whole cluster supports MC_SSTABLE_FEATURE then it should start using MC format. 3. Once all shards start to use MC then a node should broadcast that unbounded range tombstones are now supported by the cluster. 4. Once whole cluster supports unbounded range tombstones we can start accepting them on CQL level. tests: unit(release) Fixes #4205 Fixes #4113 * seastar-dev.git dev/haaawk/enable_mc/v11: system_keyspace: Add scylla_local system_keyspace: add accessors for SCYLLA_LOCAL storage_service: add _sstables_format field feature: add when_enabled callbacks system_keyspace: add storage_service param to setup Add sstable format helper methods Register feature listeners in storage_service Add service::read_sstables_format Use read_sstables_format in main.cc Use _sstables_format to determine current format Add _unbounded_range_tombstones_feature Update supported features on format change	2019-04-16 14:07:05 +02:00
Avi Kivity	6c672e674b	tools: toolchain: improve dbuild signal handing Currently, we use --sig-proxy to forward signals to the container. However, this requires the container's co-operation, which usually doesn't exist. For example, docker run --sig-proxy fedora:29 bash -c "sleep 5" Does not respond to ctrl-C. This is a problem for continuous integration. If a build is aborted, Jenkins will first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give up and use SIGKILL. If the graceful termination doesn't work, we end up with an orphan container running on the node, which can then consume enough memory and CPU to harm the following jobs. To fix this, trap signals and handle them by killing the container. Also trap shell exit, and even kill the container unconditionally, since if Jenkins happens to kill the "docker wait" process the regular paths will not be taken. Message-Id: <20190415084040.12352-1-avi@scylladb.com>	2019-04-16 14:07:05 +02:00
Tomasz Grabiec	ac0d435c3e	Merge "hinted handoff: don't reuse_segments and discard corrupted segments" from Vlad This series addresses two issues in the hinted handoff that should complete fixing the infamous #4231. In particular the second patch removes the requirement to manually delete hints files after upgrading to 3.0.4. Tested with manual unit testing. * https://github.com/vladzcloudius/scylla.git hinted_handoff_drop_broken_segments-v3: hinted handoff: disable "reuse_segments" commitlog: introduce a segment_error hinted handoff: discard corrupted segments	2019-04-16 14:07:05 +02:00
Avi Kivity	643bddbecc	Update seastar submodule * seastar 6f73675...eb03ba5 (11): > tests: tests C++14 dialect in continuous integration > rpc/compressor/lz4: fix std:variant related compiler errors > tests: futures_test: allow project to compile with C++14 > Merge "io_queue: make io_priority_class namespace global" from Benny > future::then_wrapped: use std::terminate instead of abort > reactor: make metric about task quota violations less sensitive > Merge "Add LZ4_FRAGMENTED compressor for RPC" from Paweł > Fix build issues with Clang 7 > Merge "file_stat follow_symlink option and related fixes" from Benny > doc/tutorial.md: reword mention of seastar::thread premption on get() > tests: semaphore_test: relax timeouts Fixes #4272.	2019-04-16 14:34:32 +03:00
Raphael S. Carvalho	52e1125b52	sstables: do not destroy sstable runs after resharding Resharding wasn't preserving the sstable run structure, which depends on all fragments sharing the same run identifier. So let's make resharding run aware, meaning that a run will be created for each shard involved. tests: release mode. Fixes #4428. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190415193556.16435-1-raphaelsc@scylladb.com>	2019-04-16 10:34:49 +03:00
Tomasz Grabiec	ff66b27754	gdb: heapprof: Coalesce parents in the flamegraph mode This change drops the hit count from the name of the node, because it prevents coalescing of nodes which are shared parents for paths with different counts. This lack of coalescing makes the flamegraph a lot less useful. Message-Id: <1555348576-26382-1-git-send-email-tgrabiec@scylladb.com>	2019-04-15 21:05:08 +03:00
Tomasz Grabiec	3fd82021b1	schema_tables: Serialize schema merges fairly All schema changes made to the node locally are serialized on a semaphore which lives on shard 0. For historical reasons, they don't queue but rather try to take the lock without blocking and retry on failure with a random delay from the range [0, 100 us]. Contenders which do not originate on shard 0 will have an extra disadvantage as each lock attempt will be longer by the across-shard round trip latency. If there is constant contention on shard 0, contenders originating from other shards may keep loosing to take the lock. Schema merge executed on behalf of a DDL statement may originate on any shard. Same for the schema merge which is coming from a push notification. Schema merge executed as part of the background schema pull will originate on shard 0 only, where the application state change listeners run. So if there are constant schema pulls, DDL statements may take a long time to get through. The fix is to serialize merge requests fairly, by using the blocking semaphore::wait(), which is fair. We don't have to back-off any more, since submit_to() no longer has a global concurrency limit. Fixes #4436. Message-Id: <1555349915-27703-1-git-send-email-tgrabiec@scylladb.com>	2019-04-15 20:40:38 +03:00
Botond Dénes	c6314e422f	tests/mutation_source_test: use a single random seed Currently, each instanciation of `random_mutation_generator::impl` will generate a new random seed for itself. Altough these are printed, mapping back all the printed seeds to the exact source location where it has to be substituted in is non-trivial. This makes reproducing random test failures very hard. To solve this problem, use `tests::random::get_int()` to produce the random seed of the `random_mutation_generator::impl` instances. This way the seed of all the mutation generator will be derived from a single "master" seed that is easily replaced after a test failure, hopefully also leading to easily reproducible random test failures. I checked that after substituting in a previously generated master random seed, all derived seeds were exactly the same. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <0471415938fc27485975ef9213d37d94bff20fd5.1555329062.git.bdenes@scylladb.com>	2019-04-15 17:37:31 +03:00
Avi Kivity	3afbe219cd	Merge "UDF/UDA related cleanups and refactoring" from Rafael " These are patches I wrote while working on UDF/UDA, but IMHO they are independent improvements and are ready for review. Tests: unit (debug) dtest (release) I checked that all tests in nosetests -v user_types_test.py sstabledump_test.py cqlsh_tests/cqlsh_tests.py now pass. " * 'espindola/udf-uda-refactoring-v3' of https://github.com/espindola/scylla: Refactor user type merging cql_type_parser::raw_builder: Allow building types incrementally cql3: delete dead code Include missing header return a const reference from return_type delete unused var Add a test on nested user types.	2019-04-15 16:52:13 +03:00
Glauber Costa	c01ed239a3	fix typo in create table statement error message specifed -> specified Fixes #4434 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190415125206.2993-1-glauber@scylladb.com>	2019-04-15 16:51:13 +03:00
Benny Halevy	b543ab4c76	sstables: remove_temp_dir: do not return then_wrapped future f.get_exception makes the future invalid so it must not be returned. Instead, make_exception_future<> with the exception ptr. Fixes #4435. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190415111909.30499-1-bhalevy@scylladb.com>	2019-04-15 16:42:49 +03:00
Glauber Costa	b9327f81cf	conf: stop telling people to run auto_bootstrap: false auto_bootstrap: false provide negligible gains for new clusters and it is extremely dangerous everywhere else. We have seen a couple of times in which users, confused by this, added this flag by mistake and added nodes with it. While they were pleased by the extremely fast times to add nodes, they were later displeased to find their data missing. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190414012028.20767-1-glauber@scylladb.com>	2019-04-14 10:42:25 +03:00
Piotr Jastrzebski	2c599122e1	Update supported features on format change Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:38:31 +02:00
Piotr Jastrzebski	9c7e3dd470	Add _unbounded_range_tombstones_feature This requires introduction of storage_service::get_known_features and using it with check_knows_remote_features. Otherwise a node joining the existing cluster won't be able to join because it does not support unbounded range tombstones yet. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	96ad8f7df9	Use _sstables_format to determine current format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	da1eba5bdb	Use read_sstables_format in main.cc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	7339e9de30	Add service::read_sstables_format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	9934740c39	Register feature listeners in storage_service Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:36:58 +02:00
Piotr Jastrzebski	7a62235259	Add sstable format helper methods Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	caa6798f2c	system_keyspace: add storage_service param to setup Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	460fb260cb	feature: add when_enabled callbacks Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	081542cf00	storage_service: add _sstables_format field Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	0211541d84	system_keyspace: add accessors for SCYLLA_LOCAL Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	4c205b733a	system_keyspace: Add scylla_local Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Benny Halevy	adf539fb2c	tests: sstable_test_env::do_with_async: wait_for_background_jobs To solve memory leak seen in sstable_datafile_test -t test_old_format_non_compound_range_tombstone_is_read Refs #4376 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190411154621.9716-1-bhalevy@scylladb.com>	2019-04-11 18:50:42 +03:00
Takuya ASADA	4636284856	dist/ami: drop EPEL, convert scylla_install_ami script to python2 We have to run this script in python2, since we dropped EPEL from dependencies, and the script is installer for rpms so we cannot use relocatable python3 for it. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190411151858.2292-1-syuu@scylladb.com>	2019-04-11 18:21:48 +03:00
Glauber Costa	f3a24b6c22	dist: remove curl dependency to simplify dependency list further Although curl is widely available, there is no reason to depend on it. There are mainly two users, as indicated by grep: 1) scylla-housekeeping 2) scripts within the AMI 3) docker image The AMI has its own RPM and it already depends on curl. While we could get rid of the curl dependency there too, we can do that later. Docker is its own thing and it only needs it at build time anyway. For the main scylla repo, this patch changes scylla-housekeeping so as not to depend on the curl binary and use urllib directly instead. We can then remove curl from our dependency list. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190411125642.9754-1-glauber@scylladb.com>	2019-04-11 16:12:36 +03:00
Benny Halevy	8181acd83b	test.py: fail if given test name not found Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190411092041.24712-1-bhalevy@scylladb.com>	2019-04-11 12:31:23 +03:00
Tzach Livyatan	f444c949bd	Fix the Dockerhub documentation for listen-address Fix listen-address documention: it is used for internal communication, not for external clients Signed-off-by: Tzach Livyatan <tzach@scylladb.com> Message-Id: <20190410181409.16078-1-tzach@scylladb.com>	2019-04-11 11:53:40 +03:00
Botond Dénes	f201f8abab	types: fix date_type_impl::less() (timestamp cql type) date_type_impl::less() invokes `compare_unsigned()` to compare the underlying raw byte values. `compared_unsigned()` is a tri comparator, however `date_type_impl::less()` implicitely converted the returned value to bool. In effect, `date_type_impl::less()` would always return `true` when the two compared values were not equal. Found while working on a unit test which empoly a randomly generated schema to test a component. Fixes #4419. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <8a17c81bad586b3772bf3d1d1dae0e3dc3524e2d.1554907100.git.bdenes@scylladb.com>	2019-04-10 21:01:25 +03:00
Botond Dénes	90721468f0	tests/mutation_diff: remove false-positive diff of the partition header Currently the partition header will always be reported as different when comparing two mutations. This is because they are prepended with the "expected: " and "... but got: " texts. This generates unnecessary noise. Inject a new line between the prefix and the partition-header proper. This way the partition header will only show up in the diff when there is an actual difference. The "expected: " and "... but got: " phrases are still shown as different on the top of the diff but this is fine as one can immediately see that they are not part of the data and additionaly they help the reader in determining which part of the diff is the expected one and which is the actual one. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <29e0f413d248048d7db032224a3fd4180bf1b319.1554909144.git.bdenes@scylladb.com>	2019-04-10 18:05:36 +02:00
Raphael S. Carvalho	8a117c338a	compaction: fix use-after-free when calculating backlog after schema change The problem happens after a schema change because we fail to properly remove ongoing compaction, which stopped being tracked, from list that is used to calculate backlog, so it may happen that a compaction read monitor (ceases to exist after compaction ends) is used after freed. Fixes #4410. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190409024936.23775-1-raphaelsc@scylladb.com>	2019-04-10 15:54:39 +03:00
Vlad Zolotarov	db2ba0df61	hinted handoff: discard corrupted segments If we discover that a current segment is corrupted there is nothing we can do about it. This patch does the following: 1) Drops the corrupted segment and moves to the next one. 2) Logs such events as ERRORs. 3) Introduces a new metrics that accounts such event. Fixes #4364 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 15:54:20 -04:00
Vlad Zolotarov	1cba4a54bb	commitlog: introduce a segment_error Introduce a common base class for all errors that indicate that the current segment has "issues". This allows a laconic "catch" clause for all such errors. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 15:31:13 -04:00
Vlad Zolotarov	00fe2acb35	hinted handoff: disable "reuse_segments" Hinted handoff doesn't utilize this feature (which was developed with a commitlog in mind). Since it's enabled by default we need to explicitly disable it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 11:13:41 -04:00
Piotr Jastrzebski	dee64c30b3	types: Test correct map validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:23 +02:00
Piotr Jastrzebski	3d94f0aaf0	types: Test correct in clause validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:23 +02:00
Piotr Jastrzebski	36853a7a5c	types: Test correct tuple validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	94bdc1c868	types: Test correct set validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	429a8e082a	types: Test correct list validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	910d81e03e	types: Add test_tuple_elements_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	e2fe9ca5d0	types: Add test_in_clause_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	cd11959a8e	types: Add test_map_elements_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	22f541af1d	types: Add test_set_elements_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	be405e24e9	types: Add test_list_elements_validation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	47e242efc5	types: Validate input when tuples Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	c4df3014ac	types: Validate input when parsing a set Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	8a7b05ae26	types: Validate input when parsing a map Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	16596ec045	types: Validate input when parsing a list Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	8482764003	types: Implement validation for tuple Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	bd2823b623	types: Implement validation for set Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	086d8abf89	types: Implement validation for map Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	4a51ee6e34	types: Implement validation for list Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	f5f6367674	types: Add cql_serialization_format parameter to validate Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Takuya ASADA	e3a5ac2945	reloc: run fix_sharedlib() only on application/x-sharedlib and application/x-pie-executable We need to prevent to run fix_sharedlib() on non-ELF files. Fixes #4415 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190409114941.28276-1-syuu@scylladb.com>	2019-04-09 14:54:54 +03:00
Tomasz Grabiec	1b1f241c94	gdb: Print size of large allocations in 'scylla ptr'	2019-04-09 13:44:15 +02:00
Tomasz Grabiec	cda1781a77	gdb: Fix 'scylla ptr' for free pages Fixes runtime error which happens because the setter is expected to take an argument, but our definition doesn't take one. We're not really expecting the setter to be called with False, so don't use setter semantics.	2019-04-09 13:44:15 +02:00
Tomasz Grabiec	13efabe74c	gdb: Set is_live and offset for large allocations properly in 'scylla ptr' Before: (gdb) scylla ptr 0x601000860003 thread 1, large, free After: (gdb) scylla ptr 0x601000860003 thread 1, large, live (0x601000860000 +3) Omission from `e1ea4db7ca`.	2019-04-09 13:22:06 +02:00
Tomasz Grabiec	4002d8db7c	gdb: Fix 'scylla ptr' misqualifying pointers It can be that page::pool is != nullptr and page::offset_in_span is 0 for a page which is inside a large allocation span (live or dead). This may lead to misqualification of a pointer as belonging to a small allocation pool. Only the first page of a span contains reliable information. This patch changes the code to use the span_checker, which knows the real boundaries of spans and exposes reliable information via the span object. Fixes #4368	2019-04-09 13:22:06 +02:00
Tomasz Grabiec	4d3399ee1f	gdb: Make 'scylla memory' show unused memory in small pools Example output: Small pools: objsz spansz usedobj memory unused wst% 1 4096 0 0 0 0.0 1 4096 0 0 0 0.0 1 4096 0 0 0 0.0 1 4096 0 0 0 0.0 2 4096 0 0 0 0.0 2 4096 0 0 0 0.0 3 4096 0 0 0 0.0 3 4096 0 0 0 0.0 4 4096 0 0 0 0.0 5 4096 0 0 0 0.0 6 4096 0 0 0 0.0 7 4096 0 0 0 0.0 8 4096 241 8192 6264 76.5 10 4096 0 8192 8192 99.9 12 4096 35943 454656 23340 1.4 14 4096 0 8192 8192 99.8 16 4096 1171 24576 5840 23.8 20 4096 1007 24576 4436 17.7 24 4096 59380 1437696 12576 0.5 28 4096 548 16384 1040 6.2 32 4096 69433 2314240 92384 0.3 40 4096 36447 1564672 106792 0.4 48 4096 34099 1748992 112240 0.4	2019-04-09 13:22:05 +02:00
Tomasz Grabiec	ac7a393be5	gdb: Fix small pool memory usage reporting in 'scylla memory' Uses span_checker to work around for corrupted _pages_in_use. Refs https://github.com/scylladb/seastar/issues/608 As a bonus, calculates use_count correctly for fallback spans.	2019-04-09 13:22:05 +02:00
Tomasz Grabiec	d0567476e5	gdb: Switch 'scylla memory' to use the span_checker to find large spans Simplifies code.	2019-04-09 13:22:05 +02:00
Tomasz Grabiec	4b748e601c	gdb: Switch task_histogram to use the span_checker It can be that page::pool is != nullptr and page::offset_in_span is 0 for a page which is inside a large allocation span (live or dead). This may lead to misqualification of that span as belonging to a small allocation pool and interpreting its contents as if it contained small objects. Only the first page of a span contains reliable information. This patch changes the code to use the span_checker, which knows the real boundaries of spans and exposes reliable information via the span object. Another problem was that the command scanned dead spans as well. This is no longer the case after this patch. I've seen this command report thousands of no longer live sstable writers and various continuations because of those problems. Fixes #4367	2019-04-09 13:22:05 +02:00
Tomasz Grabiec	c7215a2f67	gdb: Introduce span_checker The purpose is to encapsulate iteration and lookup of seastar allocator memory spans.	2019-04-09 13:22:05 +02:00
Rafael Ávila de Espíndola	89b2c4ddc5	Refactor user type merging The comparison of tables before and after mutation is now done by a generic diff_rows function. The same function will be used for user defined functions and user defined aggregates. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 14:16:40 -07:00
Rafael Ávila de Espíndola	4f1260f3e3	cql_type_parser::raw_builder: Allow building types incrementally Before this patch raw_builder would always start with an empty list of user types. This means that every time a type is added to a keyspace, every type in that keyspace needs to be recreated. With this patch we pass a keyspace_metadata instead of just the keyspace name and can construct new user types on top of previous ones. This will be used in the followup patch, where only new types are created. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 14:06:51 -07:00
Rafael Ávila de Espíndola	c037b266b4	cql3: delete dead code In c++ TOKEN_FUNCTION_NAME is only needed in the .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola	1db0b83711	Include missing header abstract_function.hh uses function, which is defined in function.hh, so it should include it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola	4551691b5d	return a const reference from return_type We define data_type as using data_type = shared_ptr<const abstract_type>; Since it is a shared_ptr, it cannot be copied into another thread since that would create a race condition incrementing the reference counter. In particular, before this patch it is not legal to call return_type from another thread. With this patch read only access from another thread is possible. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola	35f1b1055d	delete unused var Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola	b577082c64	Add a test on nested user types. This would have found a bug in a previous version of this series. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 10:54:33 -07:00
Takuya ASADA	1f009b5e9b	dist/redhat/python3: drop SCYLLA-*-FILE files in rpm Related with #4409, These are more files does not needed for runtime, so drop them too. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190405074030.3990-1-syuu@scylladb.com>	2019-04-08 11:52:48 +03:00
Rafael Ávila de Espíndola	6191fd7701	Avoid duplicated read_keyspace_mutation calls There were many calls to read_keyspace_mutation. One in each function that prepares a mutation for some other schema change. With this patch they are all moved to a single location. Tests: unit (dev, debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190328024440.26201-1-espindola@scylladb.com>	2019-04-07 09:26:56 +03:00
Takuya ASADA	d180caea89	dist/redhat/python3: drop dist/ files in rpm These files does not needed for runtime, drop them. Fixes #4409 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190405071445.18678-1-syuu@scylladb.com>	2019-04-07 09:26:56 +03:00
Amos Kong	db9a721d02	scylla_kernel_check: update kb_fs_not_qualified_aio doc link The doc has been moved to https://docs.scylladb.com/troubleshooting/error_messages/kb_fs_not_qualified_aio/ Fixes #4398 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <75fdc97d222667f4402cadc7a46e52d6f38a32a8.1554375560.git.amos@scylladb.com>	2019-04-07 09:26:56 +03:00
Glauber Costa	2305cc88f3	relocatable python: Be more permissive with mime type checking Fedora28 python magic used to return a x-sharedlib mime type for .so files. Fedora29 changed that to x-pie-executable, so the libraries are no longer relocated. Let's be more permissive and relocate everything that starts with application/. Fixes #4396 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190404140929.7119-1-glauber@scylladb.com>	2019-04-07 09:26:56 +03:00
Piotr Jastrzebski	882ea9caf0	tests: Fix use after free in check_multi_schema Refs #4376 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <7d7b4cf69cea1e4d31058d8f1fd2c01f1dd11c58.1554387442.git.piotr@scylladb.com>	2019-04-07 09:26:56 +03:00
Piotr Jastrzebski	4485868d27	tests: Fix use after free in check_read_indexes Refs #4376 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <0dc76b2a55bebc49558f30e8d2894973ce817577.1554386770.git.piotr@scylladb.com>	2019-04-07 09:26:56 +03:00
Tomasz Grabiec	a717e11026	Merge "row level repair shutdown fixes" from Asias This series fixes row level repair shutdown related issues we saw with dtests, e.g., use after free of the repair meta object, fail to stop a table during shutdown. Fixes: #4044 Fixes: #4314 Fixes: #4333 Fixes: #4380 Tests: repair_additional_test.py:RepairAdditionalTest.repair_abort_test repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test * sestar-dev.git asias/repair.fix.shutdown.v1: repair: Wait for pending repair_meta operation before removing it repair: Check shutdown in row level repair repair: Remove repair meta when node is dead repair: Remove all row level repair during shtudown	2019-04-05 15:47:25 +03:00
Avi Kivity	e63bc6b1e3	Update seastar submodule * seastar 63d8607...6f73675 (5): > Merge "seastar-addr2line: improve the context of backtraces" from Botond > log: fix std::system_error ostream operator to print full error message > Revert "threads: yield on get if we had run for too long." > core/queue: Document concurrency constraints > core/memory: Make small pools use the full span size Fixes #4407. Fixes #4316.	2019-04-05 15:47:25 +03:00
Avi Kivity	b1c4c371fa	Merge "fix I/O calculation for i3.metal instances" from Glauber " Calculation of IO properties is slightly wrong for i3.metal, because we get the number of disks wrong. The reason for that is our check for ephemeral nvme disks, that pre-date the time in which root devices were exposed as nvme devices (nitro and metal instances). " toolchain updated with python3-psutil * 'ec2fixes' of github.com:glommer/scylla: scylla_util.py: do not include root disks in ephemeral list scylla-python3: include the psutil module fix typo in scylla_ec2_check	2019-04-05 15:46:59 +03:00
Asias He	f212dfb887	streaming: Reject stream if the _sys_dist_ks or _view_update_generator are not ready They are of type db::system_distributed_keyspace and db::view::view_update_generator. n1 is in normal status n2 boots up and _sys_dist_ks or _view_update_generator are not initialized n1 runs stream, n2 is the follower. n2 uses the _sys_dist_ks or _view_update_generator "Assertion `local_is_initialized()' failed" is observed Fixes #4360 Message-Id: <4ae13e1640ac8707a9ba0503a2744f6faf89ecf4.1554330030.git.asias@scylladb.com>	2019-04-04 10:48:00 +03:00
Avi Kivity	8abba6f6a6	Merge "Avoid copying data_type" from Rafael " With these changes we avoid a std::vector<data_value> copy, which is nice in itself, but also makes it possible to call get_list from other shards. " * 'espindola/result-set-v3' of https://github.com/espindola/scylla: Avoid copying a std::vector in get_list query-result-set: add and use a get_ptr method	2019-04-03 21:29:22 +03:00
Asias He	99da196e6f	repair: Reject repair if the _sys_dist_ks or _view_update_generator are not ready They are of type db::system_distributed_keyspace and db::view::view_update_generator. n1 is in normal status n2 boots up and _sys_dist_ks or _view_update_generator are not initialized n1 runs repair, n2 is the follower. n2 uses the _sys_dist_ks or _view_update_generator "Assertion `local_is_initialized()' failed" is observed Fixes #4360 Message-Id: <6616c21078c47137a99ba71baf82594ba709597c.1553742487.git.asias@scylladb.com>	2019-04-03 21:29:22 +03:00
Rafael Ávila de Espíndola	74f956e5a8	Avoid copying a std::vector in get_list For now this is just an optimization. But it also avoids copying data_type, which will allow this be used across shards. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-03 09:20:12 -07:00
Rafael Ávila de Espíndola	c2a8807c35	query-result-set: add and use a get_ptr method This moves a copy up the call stack and makes it possible to avoid it completely by passing a reference type to get_nonnull. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-03 09:19:52 -07:00
Tomasz Grabiec	3356a085d2	lsa: Cover more bad_alloc cases with abort When --abort-on-lsa-bad-alloc is enabled we want to abort whenever we think we can be out of memory. We covered failures due to bad_alloc thrown from inside of the allocation section, but did not cover failures from reservations done at the beginning of with_reserve(). Fix by moving the trap into reserve(). Message-Id: <1553258915-27929-1-git-send-email-tgrabiec@scylladb.com>	2019-04-03 16:39:40 +03:00
Glauber Costa	0e9a50ab57	scylla_util.py: do not include root disks in ephemeral list Nitro instances (and metal ones) put their root device in nvme (as a protocol. it is still EBS). Our algorithm so far has relied on parsing the nvme devices to figure out which ones are ephemeral but it will break for those instances. Out of our supported instances so far, the i3.metal is the only one in which this breaks. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-04-03 07:57:00 -04:00
Glauber Costa	6d7ac87136	scylla-python3: include the psutil module Using a new python3 module has never been that easy! So we'll unapologetically use psutil and don't even worry about whether or not CentOS supports it (it doesn't) Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-04-02 17:24:25 -04:00
Glauber Costa	027eee5f13	fix typo in scylla_ec2_check enahanced -> enhanced Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-04-02 17:24:00 -04:00
Dejan Mircevski	a66a5d423a	query_processor: Add query-count metrics ... with labels for each consistency level. Fixes https://github.com/scylladb/scylla/issues/4309 ("add counters breaking up cql requests based on consistency_level"). Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <1554127055-17705-1-git-send-email-dejan@scylladb.com>	2019-04-02 19:08:25 +03:00
Avi Kivity	be6905da84	Update seastar submodule * seastar 5572de7...63d8607 (6): > test: verify that negative sleep time doesn't cause infinite sleep > httpd: Change address handling to use socket_address > dns: Change "unspecififed" address search type to retrive first avail > Allow when_all and when_all_succeed to take function arguments > when_all: abort if memory allocation fails > inet_address: Add missing constructor impl.	2019-04-02 16:56:56 +03:00
Asias He	b98d95ebf0	repair: Remove all row level repair during shtudown We saw dtest failed to stop a node like: ``` ERROR: repair_one_missing_row_test (repair_additional_test.RepairAdditionalTest) ---------------------------------------------------------------------- Traceback (most recent [2019.1.3.node1.repair.zip](https://github.com/scylladb/scylla/files/2723244/2019.1.3.node1.repair.zip) call last): File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 2521, in repair_one_missing_row_test return RepairAdditionalBase._repair_one_missing_row_test(self) File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 1842, in _repair_one_missing_row_test self.check_rows_on_node(node2, nr_rows) File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 34, in check_rows_on_node node.stop(wait_other_notice=True) File "/home/asias/src/cloudius-systems/scylla-ccm/ccmlib/scylla_node.py", line 496, in stop raise NodeError("Problem stopping node %s" % self.name) NodeError: Problem stopping node node1 ``` The problem is: 1) repair_meat is created repair_meta -> repair_writer::create_writer() -> t.stream_in_progress() repari_meta -> repair_reader::repair_reader -> cf.read_in_progress() 2) repair_meta is stored in _repair_metas map. 3) Shtudown repair, repair_meta is not removed from the _repair_metas map 4) Shutdown database which wait for the utils::phased_barrier. To fix, we should stop and remove all the repair_meata from the _repair_metas map. Tests: 30 successful runs of the repair_kill_2_test Fixes: #4044	2019-04-02 19:28:53 +08:00
Asias He	344d0ee37d	repair: Remove repair meta when node is dead Repair follower nodes will create repair meta object when repair master node starts a repair. Normally, the repair meta object is removed when repair master finishes the repair and sends the verb REPAIR_ROW_LEVEL_STOP to all the followers to remove the repair meta object. In case of repair master was killed suddenly, no one will remove the repair meta object. To prevent keeping this repair meta object forever, we should remove such objects when gossip detects a node is dead with the gossip listener. Fixes: #4380 Reviewed-by: Botond Dénes <bdenes@scylladb.com>	2019-04-02 19:28:53 +08:00
Asias He	b061157b21	repair: Check shutdown in row level repair During node shutdown, we should abort the repair as soon as possible. Check if we are in shutdown in row level repair steps. Refs: #4044	2019-04-02 19:28:53 +08:00
Asias He	e3e489328e	repair: Wait for pending repair_meta operation before removing it We remove repair_meta object in remove_repair_meta up receiving of stop row level repair rpc verb. It is possible there is an pending operation of repair_meta. To avoid use after free, we should not remove the repair_meta object until all the pending operations are done. Use a gate to protect it. Fixes: #4333 Fixes: #4314 Tests: 50 succesful run of repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test	2019-04-02 19:28:53 +08:00
Vlad Zolotarov	0dc0a6025d	query_pager::fetch_page: cosmetics: fix code alignment Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190401214030.5570-2-vladz@scylladb.com>	2019-04-02 11:53:10 +03:00
Asias He	70fbe85b3e	main: Add shutdown database log It is useful to know which step we are during shutdown process. Refs: #4044 Message-Id: <f7c94c60d039560bfacd6d473f7d828940cc55b7.1554172140.git.asias@scylladb.com>	2019-04-02 11:49:00 +03:00
Benny Halevy	3749148339	storage_service: fix handling of load_new_sstables exception ignore_ready_future in load_new_ss_tables broke migration_test:TestMigration_with_*.migrate_sstable_with_counter_test_expect_fail dtests. The java.io.NotSerializableException in nodetool was caused by exceptions that were too long. This fix prints the problematic file names onto the node system log and includes the casue in the resulting exception so to provide the user with information about the nature of the error. Fixes #4375 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190331154006.12808-1-bhalevy@scylladb.com>	2019-04-02 11:46:19 +03:00
Avi Kivity	988dfd7209	Merge "add relocatable CLI tools required for scylla setup scripts" from Takuya " To make offline installer easier we need to minimize dependencies as possible. Python dependencies are already dropped by adding relocatable python3 by Glauber, now it's time to drop rest of command line tools which used by scylla setup tools. (even scripts are converted to python3, it still executes some external commands, so these commands should be distributed with offline installer) Note that some of CLI tools haven't added such as NTP and RAID stuff, since these tools have daemons, not just CLI. To use such stuff in offline mode, users have to install them manually. But both NTP setup and RAID setup are optional, users still can run Scylla w/o them. " Toolchain updated to docker.io/scylladb/scylla-toolchain:fedora-29-20190401 for changes in install-dependencies.sh; also updates to gnutls 3.6.7 security release. * 'reloc_clitools_v5' of https://github.com/syuu1228/scylla: reloc: add relocatable CLI tools for scylla setup scripts dist/redhat: drop systemd-libs from dependency dist/redhat: drop file from dependency since it seems unused dist/redhat: drop pciutils from dependency since it only used in DPDK mode	2019-04-01 14:23:04 +03:00
Raphael S. Carvalho	d59f716e1c	table: fix wild disk usage stat after sstables are discarded by truncate Truncate would make disk usage stat go wild because it isn't updated when sstables are removed in table::discard_sstables(). Let's update the stat after sstables are removed from the sstable set. Fixes #3624. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190328154918.25404-1-raphaelsc@scylladb.com>	2019-04-01 13:55:11 +03:00
Duarte Nunes	b2dd8ce065	database: Make exception message more accurate It's the sstable read queue that's overloaded, not the inactive one (which can be considered empty when we can't admit newer reads). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190328003533.6162-1-duarte@scylladb.com>	2019-04-01 13:53:50 +03:00
Takuya ASADA	75a7859019	reloc: add relocatable CLI tools for scylla setup scripts To minimize dependencies of Scylla, add relocatable image of CLI tools required for scylla setup scripts.	2019-04-01 02:59:01 +09:00
Takuya ASADA	a3c1b9fcf3	dist/redhat: drop systemd-libs from dependency Since we switched to relocatable package, we don't need distribution native libraries, so the package is not needed anymore.	2019-04-01 02:58:22 +09:00
Takuya ASADA	a3741b4052	dist/redhat: drop file from dependency since it seems unused The pacakge is not used in our script anymore, drop it.	2019-04-01 02:57:43 +09:00
Takuya ASADA	7d78515d5b	dist/redhat: drop pciutils from dependency since it only used in DPDK mode Since we don't use DPDK mode by default, and the mode is not officially supported, drop pciutils from package dependency. Users who want to use DPDK mode they neeed to install the package manually.	2019-04-01 02:56:31 +09:00
Avi Kivity	77a0d5c5da	Update seastar submodule * seastar 05efbce...5572de7 (5): > posix_file_impl::list_directory: do not ignore symbolic link file type > prometheus: yield explicitly after each metric is processed > thread: add maybe_yield function > metrics: add vector overload of add_group() > memory: tone down message for memory allocator	2019-03-31 15:26:21 +03:00
Tomasz Grabiec	4c0584289b	tests: cql_test_env: Fix _feature_service not being initialized We moved from uninitialized field instead of the constructor parameter. No known issues. Message-Id: <1553854544-26719-1-git-send-email-tgrabiec@scylladb.com>	2019-03-31 13:05:35 +03:00
Takuya ASADA	b1bba0c1b0	dist/redhat/python3: product name customization support Currently scylla-python3 package name is hardcorded, need to support package name renaming just like on other scylla packages. This is required to release enterprise version. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190329003941.12289-1-syuu@scylladb.com>	2019-03-29 19:22:24 +02:00
Amos Kong	98cb7d145b	scylla_setup: don't repeatedly select disks if it's assigned Currently scylla_setup would be stuck to select disks in non-interaction mode. Fixes #4370 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <8fb445708a6ac0d2130f8a8d041b1d8d71f1cf14.1553745961.git.amos@scylladb.com>	2019-03-28 15:21:36 +02:00
Avi Kivity	65dd45d9cf	Merge "sstable: validate file ownership and mode." from Benny " File must be either owned by the process uid or have both read and write access to it, so it could be (hard) linked when sysctl fs.protected_hardlinks is enabled. Fixes #3117 " * 'projects/valid_owner_and_mode/v3-rebased' of https://github.com/bhalevy/scylla: storage_service: handle load_new_sstables exception init: validate file ownership and mode. treewide: use std::filesystem	2019-03-28 14:58:14 +02:00
Benny Halevy	956cb2e61c	storage_service: handle load_new_sstables exception Refs #3117 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:54:56 +02:00
Benny Halevy	e3f7fe44c0	init: validate file ownership and mode. Files and directories must be owned by the process uid. Files must have read access and directories must have read, write, and execute access. Refs #3117 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:40:12 +02:00
Benny Halevy	ff4d8b6e85	treewide: use std::filesystem Rather than {std::experimental,boost,seastar::compat}::filesystem On Sat, 2019-03-23 at 01:44 +0200, Avi Kivity wrote: > The intent for seastar::compat was to allow the application to choose > the C++ dialect and have seastar follow, rather than have seastar choose > the types and have the application follow (as in your patch). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:21:10 +02:00
Dejan Mircevski	aa11f5f35e	Drop unused #include v2: fix "From" field in email Tests: unit/cql_query_test (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <1553099087-11621-1-git-send-email-dejan@scylladb.com>	2019-03-28 01:48:19 +00:00
Duarte Nunes	d8fcdefe4a	tests/view_schema_test: Remove debug output A stray std::cout remained. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-03-27 21:58:10 +00:00
Tomasz Grabiec	2b8bf0dbf8	Merge "db/view: Apply tracked tombstones for new updates" from Duarte When generating view updates for base mutations when no pre-existing data exists, we were forgetting to apply the tracked tombstones. Fixes #4321 Tests: unit(dev) * https://github.com/duarten/scylla materialized-views/4321/v1.1: db/view: Apply tracked tombstones for new updates tests/view_schema_test: Add reproducer for #4321	2019-03-27 13:24:28 +01:00
Duarte Nunes	f609848b69	tests/view_schema_test: Add reproducer for #4321 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-03-27 12:01:39 +00:00
Duarte Nunes	ded9221187	db/view: Apply tracked tombstones for new updates When generating view updates for base mutations when no pre-existing data exists, we were forgetting to apply the tracked tombstones. Fixes #4321 Tests: unit(dev) Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-03-27 12:01:39 +00:00
Glauber Costa	043d102ab6	commitlog: fix typo in error message maxiumum -> maximum Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190326191108.7573-1-glauber@scylladb.com>	2019-03-26 21:32:56 +02:00
Avi Kivity	a77762b02a	Merge "Optimise vint deserialisation" from Paweł " Variable length integers are used are used extensively by SSTables mc format. The current deserialisation routine is quite naive in a way that it reads each byte separately. Since, those vints usually appear inside much larger buffers, we optimise for such cases, read 8-bytes at once and then mask out the unneeded parts (as well as fix their order because big-endian). Tests: unit(dev). perf_vint (average time per element when deserializing 1000 vints): before: vint.deserialize 69442000 14.400ns 0.000ns 14.399ns 14.400ns after: vint.deserialize 241502000 4.140ns 0.000ns 4.140ns 4.140ns perf_fast_forward (data on /tmp): large-partition-single-key-slice on dataset large-part-ds1: before: range time (s) iterations frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> [0, 1] 0.000278 8792 2 7190 119 7367 1960 3 104 2 0 0 1 1 0 0 1 100.0% -> [1, 100) 0.000344 96 99 288100 4335 307689 193809 2 108 2 0 0 1 1 0 0 1 100.0% -> (100, 200] 0.000339 13254 100 295263 2824 301734 222725 2 108 2 0 0 1 1 0 0 1 100.0% after: range time (s) iterations frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> [0, 1] 0.000236 10001 2 8461 59 8718 2261 3 104 2 0 0 1 1 0 0 1 100.0% -> [1, 100) 0.000285 89 99 347500 2441 355826 215745 2 108 2 0 0 1 1 0 0 1 100.0% -> (100, 200] 0.000293 14369 100 341302 1512 350123 222049 2 108 2 0 0 1 1 0 0 1 100.0% " * tag 'optimise-vint/v2' of https://github.com/pdziepak/scylla: sstable: pass full length of buffer to vint deserialiser vint: optimise deserialisation routine vint: drop deserialize_type structure tests/vint: reduce test dependencies tests/perf: add performance test for vint serialisation	2019-03-26 16:41:44 +02:00
Avi Kivity	4b330b3911	Merge "introduce sstables manager" from Benny " This series introduce a rudimentary sstables manager that will be used for making and deleting sstables, and tracking of thereof. The motivation for having a sstables manager is detailed in https://github.com/scylladb/scylla/issues/4149. The gist of it is that we need a proper way to manage the life cycle of sstables to solve potential races between compaction and various consumers of sstables, so they don't get deleted by compaction while being used. In addition, we plan to add global statistics methods like returning the total capacity used by all sstables. This patchset changes the way class sstable gets the large_data_handler. Rather than passing it separately for writing the sstable and when deleting sstables, we provide the large_data_handler when the sstable object is constructed and then use it when needed. Refs #4149 " * 'projects/sstables_manager/v3' of https://github.com/bhalevy/scylla: sstables: provide large_data_handler to constructor sstables_manager: default_sstable_buffer_size need not be a function sstables: introduce sstables_manager sstables: move shareable_components def to its own header tests: use global nop_lp_handler in test_services sstables: compress.hh: add missing include sstables: reorder entry_descriptor constructor params sstables: entry_descriptor: get rid of unused ctor sstables: make load_shared_components a method of sstable sstables: remove default params from sstable constructor database: add table::make_sstable helper distributed_loader: pass column_family to load_sstables_with_open_info distributed_loader: no need for forward declaration of load_sstables_with_open_info distributed_loader: reshard: use default params for make_sstable	2019-03-26 16:31:40 +02:00
Benny Halevy	223e1af521	sstables: provide large_data_handler to constructor And use it for writing the sstable and/or when deleting it. Refs #4198 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:24:19 +02:00
Benny Halevy	c23f658d0e	sstables_manager: default_sstable_buffer_size need not be a function Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	eebc3701a5	sstables: introduce sstables_manager The goal of the sstables manager is to track and manage sstables life-cycle. There is a sstable manager instance per database and it is passed to each column-family (and test environment) on construction. All sstables created, loaded, and deleted pass through the sstables manager. The manager will make sure consumers of sstables are in sync so that sstables will not be deleted while in use. Refs #4149 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	b50c041aa2	sstables: move shareable_components def to its own header To be used by sstables_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	2cd11208a1	tests: use global nop_lp_handler in test_services Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	0e3f9c25e4	sstables: compress.hh: add missing include Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	33cbfe81f2	sstables: reorder entry_descriptor constructor params To match make_sstable's in preparation of moving to sstables_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	ac5f9c1eae	sstables: entry_descriptor: get rid of unused ctor Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	adf8428321	sstables: make load_shared_components a method of sstable and open code its static part in the caller (distributed_loader) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	ff7b7910f1	sstables: remove default params from sstable constructor The goal is to construct sstables only via make_sstables that will be moved to class sstables_manager in a later patch. Defining the default values in both interfaces is unneeded and may to lead to them going out of sync. Therefore, have only make_sstables provide the default parameter values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	3a17053cb8	database: add table::make_sstable helper In most cases we make a sstable based on the table schema and soon - large_data_handler. Encapsulate that in a make_sstable method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	67f705ae04	distributed_loader: pass column_family to load_sstables_with_open_info Rather than just its schema. In preparation for adding table::make_sstable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	99875ba966	distributed_loader: no need for forward declaration of load_sstables_with_open_info Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	7a8ab1d6f1	distributed_loader: reshard: use default params for make_sstable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Avi Kivity	5e39b62fcc	Merge "configure: Optionally don't compress debug in executables" from Rafael " Most of the binaries we link in a debug build are linked with -s, so the only impact is build/debug/scylla, which grows by 583 MiB when using --compress-exec-debuginfo=0. On the other hand, not having to recompress all the debug info from all the used object files is a pretty big win when debugging an issue. For example, linking build/debug/scylla goes from 56.01s user 15.86s system 220% cpu 32.592 total to 27.39s user 19.51s system 991% cpu 4.731 total Note how the cpu time is "only" 2x better, but given that compressing debug info is a long serial task, the wall time is 6.8x better. Tests: unit (debug) " * 'espindola/dont-compress-debug-v5' of https://github.com/espindola/scylla: configure: Add a --compress-exec-debuginfo option configure: Move some flags from cxx_ld_flags to cxxflags configure: rename per mode opt to cxx_ld_flags configure: remove per mode libs configure: remove sanitize_libs and merge sanitize into opt configure: split a ld_flags_{mode} out of cxxflags_{mode}	2019-03-26 15:25:07 +02:00
Avi Kivity	fad1be0ddc	Update seastar submodule * seastar caa98f8...05efbce (2): > fix use after free in rpc server handler > rpc: wait for send_negotiation_frame Fixes #4336.	2019-03-26 14:33:37 +02:00
Gleb Natapov	1abc50ad8a	messaging_service: make sure a client is unique for a destination Function messaging_service::get_rpc_client() suppose to either return existing client or create one and return it. The function is suppose to be atomic, so after checking that requested client does not exist it is safe to assume emplace() will succeed. But we saw bugs that made the function to not be atomic. Lets add an assert that will help to catch such bugs easier if they will happen in the future. Message-Id: <20190326115741.GX26144@scylladb.com>	2019-03-26 14:19:08 +02:00
Avi Kivity	a696a3daf2	Merge "Fix decimal and varint serialization" from Piotr " Fixes #4348 v2 changes: * added a unit test This miniseries fixes decimal/varint serialization - it did not update output iterator in all cases, which may lead to overwriting decimal data if any other value follows them directly in the same buffer (e.g. in a tuple). It also comes with a reproducing unit test covering both decimals and varints. Tests: unit (dev) dtest: json_test.FromJsonUpdateTests.complex_data_types_test json_test.FromJsonInsertTests.complex_data_types_test json_test.ToJsonSelectTests.complex_data_types_test " * 'fix_varint_serialization_2' of https://github.com/psarna/scylla: tests: add test for unpacking decimals types: fix varint and decimal serialization	2019-03-26 13:00:19 +02:00
Piotr Sarna	e538163a29	tests: add test for unpacking decimals Refs #4348	2019-03-26 11:52:44 +01:00
Piotr Sarna	287a02dc05	types: fix varint and decimal serialization Varint and decimal types serialization did not update the output iterator after generating a value, which may lead to corrupted sstables - variable-length integers were properly serialized, but if anything followed them directly in the buffer (e.g. in a tuple), their value will be overwritten. Fixes #4348 Tests: unit (dev) dtest: json_test.FromJsonUpdateTests.complex_data_types_test json_test.FromJsonInsertTests.complex_data_types_test json_test.ToJsonSelectTests.complex_data_types_test Note that dtests still do not succeed 100% due to formatting differences in compared results (e.g. 1.0e+07 vs 1.0E7, but it's no longer a query correctness issue.	2019-03-26 11:02:43 +01:00
Rafael Ávila de Espíndola	ddac002fd4	Make atomic_cell comparison symmetrical I noticed a test failure with Mutation inequality is not symmetric for ... And the difference between the two mutations was that one atomic_cell was live and the other wasn't. Looking at the code I found a few cases where the comparison was not symmetrical. This patch fixes them. This patch will not fix the test, as it will now fail with a "Mutations differ" error, but that is probably an independent issue. Ref #3975. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190325194647.54950-1-espindola@scylladb.com>	2019-03-26 11:14:22 +02:00
Vlad Zolotarov	c798563cb0	scylla_util.py: ignore perftune.py's error messages when calling it in order to get mode's CPU mask When we call perftune.py in order to get a particular mode's cpu set (e.g. mode=sq_split) it may fail and print an error message to stderr because there are too few CPUs for a particular configuration mode (e.g. when there are only 2 CPUs and the mode is sq_split). We already treat these situations correctly however we let the corresponding perftune.py error message get out into the syslog. This is definitely confusing, stressful and annoying. Let's not let these messages out. Fixes #4211 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190325220018.22824-1-vladz@scylladb.com>	2019-03-26 11:08:31 +02:00
Vlad Zolotarov	afa176851b	transport: result_message: fix the compilation with fmt v5.3.0 Compilation fails with fmt release 5.3.0 when we print a bytes_view using "{}" formatter. Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument" Resolve this by explicitly using the operator<<() across the whole operator<<(std::ostream& os, const result_message::rows& msg) function. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20190325203628.5902-1-vladz@scylladb.com>	2019-03-26 11:06:18 +02:00
Benny Halevy	af7f2a07f4	table::open_sstable: test has_scylla_component after load has_scylla_component is always false before loading the sstable. Also, return exception future rather than throwing. Hit with the following dtests: counter_tests.TestCounters.upgrade_test counter_tests.TestCountersOnMultipleNodes.counter_consistency_node__test resharding_test.ReshardingTest_nodes?_with_CompactionStrategy.resharding_counter_test update_cluster_layout_tests.TestUpdateClusterLayout.increment_decrement_counters_in_threads_nodes_restarted_test Fixes #4306 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190326084151.18848-1-bhalevy@scylladb.com>	2019-03-26 10:58:52 +02:00
Avi Kivity	f259a4c3b4	Merge "Remove usage of static gossiper object in init.cc and storage_service" from Asias " This series removes the usage of the static gossiper object in init.cc and storage_service. Follow up series will remove more in other components. This is the effort to clean up the component dependencies and have better shutdown procedure. Tests: tests/gossip_test, tests/cql_query_test, tests/sstable_mutation_test, dtests. " * tag 'asias/storage_service_gossiper_dep_v5' of github.com:cloudius-systems/seastar-dev: storage_service: Do not use the global gms::get_local_gossiper() storage_service: Pass gossiper object to storage_service gms: Remove i_failure_detector.hh gossip: Get rid of the gms::get_local_failure_detector static object dht: Do not use failure_detector::is_alive in failure_detector_source_filter tests: Fix stop snitch in gossip_test.cc gossiper: Do not use value_factory from storage_service object gossiper: Use cfg options from _cfg instead of get_local_storage_service gossiper: Pass db::config object to gossiper class init: Pass gossiper object to init_ms_fd_gossiper	2019-03-26 08:54:46 +02:00
Avi Kivity	1d9699d833	Update seastar submodule * seastar 33baf62...caa98f8 (8): > Merge "Add file_accessible and file_stat methods" from Benny > future::then: use std::terminate instead of abort > build: Allow cooked dependencies with configure.py > tests: Show a test's output when it fails > posix_file_impl: Bypass flush() call iff opened with O_DSYNC > posix_file_impl: Propagate and keep open_flags > open_flags: Add O_DSYNC value > build: Forward variables to CMake correctly	2019-03-25 15:45:52 +02:00
Avi Kivity	a7520c0ba9	Merge "Turn cql3_type into a trivial wrapper over data_type" from Rafael " Both cql3_type and abstract_type are normally used inside shared_ptr. This creates a problem when an abstract_type needs to refer to a cql3_type as that creates a cycle. To avoid warnings from asan, we were using a std::unordered_map to store one of the edges of the cycle. This avoids the warning, but wastes even more memory. Even before this series cql3_type was a fairly light weight structure. This patch pushes in that direction and now cql3_type is a struct with a single member variable, a data_type. This avoids the reference cycle and is easier to understand IMHO. The one corner case is varchar. In the old system cql3_type::varchar and cql3_type::text don't compare equal, but they both map to the same data_type. In the new system they would compare equal, so we avoid the confusion by just removing the cql3_type::varchar variable. Tests: unit (dev) " * 'espindola/merge-cq3-type-and-type-v3' of https://github.com/espindola/scylla: Turn cql3_type into a trivial wrapper over data_type Delete cql3_type::varchar Simplify db::cql_type_parser::parse Add a test for the varchar column representation	2019-03-25 15:03:16 +02:00
Tomasz Grabiec	80020118d0	Merge "Fix a couple of bugs related to large entry deletion" from Rafael The crash observed in issue #4335 happens because delete_large_data_entries is passed a deleted name. Normally we don't get a crash, but a garbage name and we fail to delete entries from system.large_. Adding a test for the fix found another issue that the second patch is this series fixes. Tests: unit (dev) Fixes #4335. https://github.com/espindola/scylla guthub/fix-use-after-free-v4: large_data_handler: Fix a use after destruction large_data_handler: Make a variable non static Allow large_data_handler to be stopped twice Allow table to be stopped twice Test that large data entries are deleted	2019-03-25 10:37:36 +01:00
Avi Kivity	8c6306897d	Merge "load_new_sstables: validate new_tables before calling row_cache::invalidate" from Benny " Validate the to-be-loaded sstables in the open_sstable phase and handle any exceptions before calling cf.get_row_cache().invalidate. Currently if exception is thrown from distributed_loader::open_sstable cf._sstables_opened_but_not_loaded may be left partially populated. Fixes #4306 Tests: unit (dev) - next-gating dtests (dev) - migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test_expect_fail - with bypassing exception in distributed_loader::flush_upload_dir to trigger the exception in table::open_sstable " * 'issues/4306/v3' of https://github.com/bhalevy/scylla: table: move sstable counters validation from load_sstable to open_sstable distributed_loader::load_new_sstables: handle exceptions in open_sstable	2019-03-24 20:30:44 +02:00
Avi Kivity	bd3a836e6c	Merge "fixes for relocatable python3 packaging" from Takuya " Aligned way to build relocatable rpm with existing relocatable packages. " * 'relocatable-python3-fix-v3' of https://github.com/syuu1228/scylla: reloc: allow specify rpmbuild dir reloc/python3: archive package version number on build_reloc.sh reloc/python3: archive rpm build script in the relocatable package, build rpm using the script relloc/python3: fix PyYAML package name reloc: rename python3 relocatable package filename to align same style with other packages reloc: move relocatable python build scripts to reloc/python3 and dist/redhat/python3	2019-03-24 20:29:56 +02:00
Duarte Nunes	93a1c27b31	service/storage_proxy: Don't consider view hints for MV backpressure When a view replica becomes unavailable, updates to it are stored as hints at the paired based replica. This on-disk queue of pending view updates grows as long as there are view updated and the view replica remains unavailable. Currently, we take that relative queue size into account when calculating the delay for new base writes, in the context of the backpressure algorithm for materialized views. However, the way we're calculating that on-disk backlog is wrong, since we calculate it per-device and then feed it to all the hints managers for that device. This means that normal hints will show up as backlog for the view hints manager, which in turn introduces delays. This can make the view backpressure mechanism kick-in even if the cluster uses no materialized views. There's yet another way in which considering the view hints backlog is wrong: a view replica that is unavailable for some period of time can cause the backlog to grow to a point where all base writes are applied the maximum delay of 1 second. This turns a single-node failure into cluster unavailability. The fix to both issues is to simply not take this on-disk backlog into account for the backpressure algorithm. Fixes #4351 Fixes #4352 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190321170418.25953-1-duarte@scylladb.com>	2019-03-24 20:29:56 +02:00
Benny Halevy	32bf0f36ef	table: move sstable counters validation from load_sstable to open_sstable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-24 18:25:09 +02:00
Benny Halevy	564be8b720	distributed_loader::load_new_sstables: handle exceptions in open_sstable Propagate exception to caller. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-24 18:25:09 +02:00
Takuya ASADA	efb3865840	reloc: allow specify rpmbuild dir Aded same option on python3/build_rpm.sh, --builddir to specify rpmbuild dir.	2019-03-24 00:34:09 +09:00
Takuya ASADA	dc5cec4194	reloc/python3: archive package version number on build_reloc.sh Instead of getting python3 version number on build_rpm.sh, archive version number when generating python3 relocatable package.	2019-03-24 00:27:24 +09:00
Takuya ASADA	4fed4fecf6	reloc/python3: archive rpm build script in the relocatable package, build rpm using the script Since we archive rpm/deb build script on relocatable package and build rpm/deb using the script, so align python relocatable package too. Also added SCYLLA-RELOCATABLE-FILE, SCYLLA-RELEASE-FILE and SCYLLA-VERSION-FILE since these files are required for relocatable package.	2019-03-24 00:27:16 +09:00
Takuya ASADA	b1283b23bb	relloc/python3: fix PyYAML package name On Fedora 29 (Scylla official toolchain uses it), PyYAML package name is "python3-pyyaml", no uppercase character.	2019-03-24 00:27:02 +09:00
Takuya ASADA	3762c4447a	reloc: rename python3 relocatable package filename to align same style with other packages	2019-03-24 00:26:48 +09:00
Takuya ASADA	a515324732	reloc: move relocatable python build scripts to reloc/python3 and dist/redhat/python3 To make easier to find build scripts and keep script filename simpler, move them to python3 directory.	2019-03-24 00:25:50 +09:00
Tomasz Grabiec	bc4a614e17	Merge "Add scylla fiber gdb command" from Botond Debugging continuations is challenging. There is no support from gdb for finding out which continuation was this continuation called from, nor what other continuations are attached to it. GDB's `bt` command is of limited use, at best a handful of continuations will appear in the backtrace, those that were ready. This series attempts to fill part of this void and provides a command that answers the latter question: what continuations are attached to this one? `scylla fiber` allows for walking a continuation chain, printing each continuation. It is supposed to be the seastar equivalent of `bt`. The continuation chain is walked starting from an arbitrary task, specified by the user. The command will print all continuations attached to the specified task. This series also contains some loosely related cleanup of existing commands and code in `scylla-gdb.py`. * https://github.com/denesb/scylla.git scylla-fiber-gdb-command/v4: scylla-gdb.py: fix static_vector scylla-gdb.py: std_unique_ptr: add get() method scylla-gdb.py: fix existing documentation scylla-gdb.py: fix tasks and task-stats commands scylla-gdb.py: resolve(): add cache parameter scylla-gdb.py: scylla_ptr: move actual logic into analyze() scylla-gdb.py: scylla_ptr: make analyze() usable for outside code scylla-gdb.py: scylla_ptr: accept any valid gdb expression as input scylla-gdb.py: add scylla fiber command	2019-03-23 10:20:20 +02:00
Asias He	7447c92d63	storage_service: Do not use the global gms::get_local_gossiper() Use the gossiper object stored in _gossiper member from storage_service.	2019-03-22 09:11:26 +08:00
Asias He	b91452ed4c	storage_service: Pass gossiper object to storage_service Pass the gossiper object to storage_service class in order to avoid the usage of the static object returned from get_local_gossiper().	2019-03-22 09:11:26 +08:00
Asias He	b2c110699e	gms: Remove i_failure_detector.hh It is not used any more.	2019-03-22 09:08:51 +08:00
Asias He	af579a055b	gossip: Get rid of the gms::get_local_failure_detector static object Store the failure_detector object inside gossiper object. - No more the global object sharded<failure_detector> - No need to initialize sharded<failure_detector> manually which simplifies the code in tests/cql_test_env.cc and init.cc.	2019-03-22 09:08:51 +08:00
Asias He	2b6a4050c2	dht: Do not use failure_detector::is_alive in failure_detector_source_filter Switch failure_detector_source_filter to use get_local_gossiper::is_alive directly since we are going to remove the static gms::get_local_failure_detector object soon. Pass the nodes that are down to the filter direclty, to avoid the range_streamer to depends on gossiper at all.	2019-03-22 08:26:47 +08:00
Asias He	9dbc4af1dd	tests: Fix stop snitch in gossip_test.cc It should stop snitch not failure detector. Fix it up. We are going to remove the static failure_detector object soon.	2019-03-22 08:26:47 +08:00
Asias He	967794798a	gossiper: Do not use value_factory from storage_service object Avoid using value_factory from storage_service inside gossiper.	2019-03-22 08:26:47 +08:00
Asias He	4a55617c6c	gossiper: Use cfg options from _cfg instead of get_local_storage_service Gossiper has db::config _cfg now, avoid using the get_local_storage_service() to get config options.	2019-03-22 08:26:44 +08:00
Asias He	ee1227b3ae	gossiper: Pass db::config object to gossiper class Gossiper calls service::get_local_storage_service() to get cfg options. To avoid cyclic dependency, pass the cfg object to gossiper directly.	2019-03-22 08:25:16 +08:00
Asias He	1652ee512a	init: Pass gossiper object to init_ms_fd_gossiper In order to avoid the usage of the static gossiper object returned from get_local_gossiper().	2019-03-22 08:25:16 +08:00
Rafael Ávila de Espíndola	51754ab068	Test that large data entries are deleted This area is hard to test since we only issue deletes during compaction and we wait for deletes only during shutdown. That is probably worth it, seeing that two independent bugs would have been found by this test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 10:48:20 -07:00
Rafael Ávila de Espíndola	bd1593c12a	Allow table to be stopped twice This will be used in a testcase. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 10:47:59 -07:00
Rafael Ávila de Espíndola	c8da28a3eb	Allow large_data_handler to be stopped twice This will be used in a testcase. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 10:47:23 -07:00
Rafael Ávila de Espíndola	c0b0a6baeb	configure: Add a --compress-exec-debuginfo option The default is the old behavior, but it is now possible to configure with --compress-exec-debuginfo=0 to get faster links but larger binaries. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:55:54 -07:00
Rafael Ávila de Espíndola	ab53055640	configure: Move some flags from cxx_ld_flags to cxxflags They are moved because they are not relevant for linking. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:55:39 -07:00
Rafael Ávila de Espíndola	e11cefab9c	configure: rename per mode opt to cxx_ld_flags It is the same name used in the build.ninja file. A followup patch will add cxxflags and move compiler only flags there. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:46:58 -07:00
Rafael Ávila de Espíndola	443a85a68c	configure: remove per mode libs It was always empty. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:46:32 -07:00
Rafael Ávila de Espíndola	35c7ec6777	configure: remove sanitize_libs and merge sanitize into opt These are flags we want to pass to both compilation and linking. There is nothing special about the fact that they are sanitizer related. With {sanitize} being passed to the link, we don't need {sanitize_libs}. We do need to make sure -fno-sanitize=vptr is the last one in the command line. Before we were implicitly getting it from seastar, but it is bad practice to get some sanitizer flags from seastar but not others. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-21 09:43:02 -07:00
Duarte Nunes	5752174762	Merge 'Use staging directory for uploaded sstables awaiting view updates' from Piotr " This series adds moving sstables uploaded via `nodetool refresh` to staging/ directory if they require generating view updates from them. Previous behavior (leaving these sstables in upload/ directory until view updates are generated) might have caused sstables with conflicting names to be mistakenly overwritten by the user. Fixes #4047 Tests: unit (dev) dtest: backup_restore_tests.py + backup_restore_tests.py modified with having materialized view definitions " * 'use_staging_directory_for_uploaded_sstables_awaiting_view_updates' of https://github.com/psarna/scylla: sstables: simplify requires_view_building loader: move uploaded view pending sstables to staging	2019-03-21 12:46:02 -03:00
Gleb Natapov	bb93d990ad	messaging_service: keep shared pointer to an rpc connection while opening mutation fragment stream Current code captures a reference to rpc::client in a continuation, but there is no guaranty that the reference will be valid when continuation runs. Capture shared pointer to rpc::client instead. Fixes #4350. Message-Id: <20190314135538.GC21521@scylladb.com>	2019-03-21 12:46:01 -03:00
Tomasz Grabiec	69775c5721	row_cache: Fix abort in cache populating read concurrent with memtable flush When we're populating a partition range and the population range ends with a partition key (not a token) which is present in sstables and there was a concurrent memtable flush, we would abort on the following assert in cache::autoupdating_underlying_reader: utils::phased_barrier::phase_type creation_phase() const { assert(_reader); return _reader_creation_phase; } That's because autoupdating_underlying_reader::move_to_next_partition() clears the _reader field when it tries to recreate a reader but it finds the new range to be empty: if (!_reader \|\| _reader_creation_phase != phase) { if (_last_key) { auto cmp = dht::ring_position_comparator(_cache._schema); auto&& new_range = _range.split_after(_last_key, cmp); if (!new_range) { _reader = {}; return make_ready_future<mutation_fragment_opt>(); } Fix by not asserting on _reader. creation_phase() will now be meaningful even after we clear the _reader. The meaning of creation_phase() is now "the phase in which the reader was last created or 0", which makes it valid in more cases than before. If the reader was never created we will return 0, which is smaller than any phase returned by cache::phase_of(), since cache starts from phase 1. This shouldn't affect current behavior, since we'd abort() if called for this case, it just makes the value more appropriate for the new semantics. Tests: - unit.row_cache_test (debug) Fixes #4236 Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>	2019-03-21 12:46:00 -03:00
Asias He	c0f744b407	storage_service: Wait for gossip to settle only if do_bind is set In commit `71bf757b2c`, we call wait_for_gossip_to_settle() which takes some time to complete in storage_service::prepare_to_join(). In tests/cql_query_test calls init_server with do_bind == false which in turn calls storage_service::prepare_to_join(). Since in the test, there is only one node, there is no point to wait for gossip to settle. To make the cql_query_test fast again, do not call wait_for_gossip_to_settle if do_bind is false. Before this patch, cql_query_test takes forever to complete. After it takes 10s. Tests: tests/cql_query_test Message-Id: <3ae509e0a011ae30eef3f383c6a107e194e0e243.1553147332.git.asias@scylladb.com>	2019-03-21 12:46:00 -03:00
Avi Kivity	a9cf07369f	Merge "Add local indexes" from Piotr " This series adds support for local indexing, i.e. when the index table resides on the same partition as base data. It addresses the performance issue of having an indexed query that also specifies a partition key - index will be queried locally. " * 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits) tests: add cases for local index prefix optimization tests: add create/drop local index test case tests: add non-standard names cases to local index tests tests: add multi pk case for local index tests tests: add test for malformed local index definitions tests: add local index paging test tests: add local indexing test cql3: add CREATE INDEX syntax for local indexes cql3: use serialization function to create index target string index: add serialization function for index targets index: use proper local index target when adding index index: add parsing target column name from local index targets db: add checking for local index in schema tables index: add checking if serialized target implies local index index: enable parsing multi-key targets index: move target parser code to .cc file json: add non-throwing overload for to_json_value cql3: add checking for local indexes in has_supporting_index() cql3: move finding index restrictions to prepare stage cql3: add picking an index by score ...	2019-03-21 12:46:00 -03:00
Nadav Har'El	561c640ed1	materialized views: allow view without clustering columns When a materialized view was created, the verification code artificially forbade creating a view without a clustering key column. However, there is no real reason to forbid this. In the trivial case, the original base table might not have had a clustering key, and the view might want to use the exact same key. In a more complex case, a view may want to have all the primary key columns as partition key columns, and that should be fine. The patch also includes a regression test, which failed before this patch, and succeeds with it (we test that we can create materialized views in both aforementioned scenarios, and these materialized views work as expected). Duarte raised the opinion that the "trivial" case of a view table with a key identical to that of the base should be disallowed. However, this should be done, if at all (I think it shouldn't), in a follow-up patch, which will implement the non-triviality requirement consistently (e.g., require view primary key to be different from base's, regardless of the existance or non-existance of clustering columns). Fixes #4340. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190320122925.10108-1-nyh@scylladb.com>	2019-03-21 12:45:52 -03:00
Glauber Costa	34b640993f	storage proxy: add tracepoints about delays When we are tracing requests, we would like to know everything that happened to a query that can contribute to it having increased latencies. We insert some of those latencies explicitly due to throttling, but we do not log that into tracing. In the case of storage proxy, we do have a log message at trace level but that is rarely used: trace messages are too heavy of a hammer, there is no way to specify specific queries, etc. The correct place for that is CQL tracing. This patch moves that message to CQL tracing. We also add a matching tracepoint assuring us that no delay happened if that's the case. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190320163350.15075-1-glauber@scylladb.com>	2019-03-21 12:45:52 -03:00
Avi Kivity	eddb98e8c6	Merge "sstables: mc: Write and read static compact tables the same way as Cassandra" from Tomasz " Static compact tables are tables with compact storage and no clustering columns. Before this patch, Scylla was writing rows of static compact tables as clustered rows instead of as static rows. That's because in our in-memory model such tables have regular rows and no static row. In Cassandra's schema (since 3.x), those tables have columns which are marked as static and there are no regular columns. This worked fine as long as Scylla was writing and reading those sstables. But when importing sstables from Cassandra, our reader was skipping the static row, since it's not present in our schema, and returning no rows as a result. Also, Cassandra, and Scylla tools, would have problems reading those sstables. Fix this by writing rows for such tables the same way as Cassandra does. In order to support rolling downgrade, we do that only when all nodes are upgraded. Fixes #4139. Tests: - unit (dev) " * tag 'static-compact-mc-fix-v3.1' of github.com:tgrabiec/scylla: tests: sstables: Test reading of static compact sstable generated by Cassandra tests: sstables: Add test for writing and reading of static compact tables sstables: mc: Write static compact tables the same way as Cassandra sstable: mc: writer: Set _static_row_written inside write_static_row() sstables: Add sstable::features() sstables: mc: writer: Prepare write_static_row() for working with any column_kind storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag sstables: mc: writer: Build indexed_columns together with serialization_header sstables: mc: writer: De-optimize make_serialization_header() sstable: mc: writer: Move attaching of mc-specific components out of generic code	2019-03-21 12:45:51 -03:00
Rafael Ávila de Espíndola	53ab298957	Turn cql3_type into a trivial wrapper over data_type Both cql3_type and abstract_type are normally used inside shared_ptr. This creates a problem when an abstract_type needs to refer to a cql3_type as that creates a cycle. To avoid warnings from asan, we were using a std::unordered_map to store one of the edges of the cycle. This avoids the warning, but wastes even more memory. Even before this patch cql3_type was a fairly light weight structure. This patch pushes in that direction and now cql3_type is a struct with a single member variable, a data_type. This avoids the reference cycle and is easier to understand IMHO. Tests: unit (dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 14:10:28 -07:00
Rafael Ávila de Espíndola	c76148b6ce	Delete cql3_type::varchar varchar is just an alias for text. Handle that conversion directly in the parser and delete the cql3_type::varchar variable. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 14:07:46 -07:00
Rafael Ávila de Espíndola	7f64a6ec4b	Simplify db::cql_type_parser::parse Since its first version, db::cql_type_parser::parse had special cases for native and user defined types. Those are not necessary, as the general parser has no problem handling them. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola	088d59aced	Add a test for the varchar column representation We map varchar to text, and so does cassandra. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola	8d9baf9843	large_data_handler: Make a variable non static The value computed is not static since `f254664fe6`, but unfortunately that was missed in that commit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 09:31:21 -07:00
Rafael Ávila de Espíndola	e7749e7aee	large_data_handler: Fix a use after destruction The path leading to the issue was: The sstable name is allocated and passed to maybe_delete_large_data_entries by reference auto name = sst->get_filename(); return large_data_handler.maybe_delete_large_data_entries(*sst->get_schema(), name, sst->data_size()); A future is created with a reference to it large_partitions = with_sem([&s, &filename, this] { return delete_large_data_entries(s, filename, db::system_keyspace::LARGE_PARTITIONS); }); The semaphore blocks. The filename is destroyed. delete_large_data_entries is called with a destroyed filename. The reason this did not reproduce trivially in a debug build was that the sstable itself was in the stack and the destructed value was read as an internal value, and so asan had nothing to complain about. Unfortunately we also had no tests that the entry in system.large_rows was actually deleted. This patch passes the name by value. It might create up to 3 copies of it. If that is too inefficient it can probably be avoided with a do_with in maybe_delete_large_data_entries. Fixes #4335 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 09:30:42 -07:00
Rafael Ávila de Espíndola	c250a26e68	configure: split a ld_flags_{mode} out of cxxflags_{mode} Flags that we want to pass to gcc during compilation and linking are in cxx_ld_flags_{mode}. With this patch, we no longer pass -I. -I build/{mode}/gen to the link, which should have no impact. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 08:33:23 -07:00
Piotr Sarna	9695a47e96	sstables: simplify requires_view_building Since sstables uploaded via upload/ directory are no longer left there awaiting view updates, the only remaining valid directory is staging/.	2019-03-20 13:47:21 +01:00
Botond Dénes	0c381572fd	repair::row_level: pin table for local reads The repair reader depends on the table object being alive, while it is reading. However, for local reads, there was no synchronization between the lifecycle of the repair reader and that of the table. In some cases this can result in use-after-free. Solve by using the table's existing mechanism for lifecycle extension: `read_in_progress()`. For the non-local reader, when the local node's shard configuration is different from the remote one's, this problem is already solved, as the multishard streaming reader already pins table objects on the used shards. This creates an inconsistency that might be suprising (in a bad way). One reader takes care of pinning needed resources while the other one doesn't. I was thorn on how to reconcile this, and decided to go with the simplest solution, explicitely pinning the table for local reads, that is conserve the inconsistency. It was suggested that this inconsitency is remedied by building resource pinning into the local reader as well [1] but there is opposition to this [2]. Adding a wrapper reader which does just the resource pinning seems excessive, both in code and runtime overhead. Spotted while investigating repair-related crashes which occured during interrupted repairs. Fixes: #4342 [1] https://github.com/scylladb/scylla/issues/4342#issuecomment-474271050 [2] https://github.com/scylladb/scylla/issues/4342#issuecomment-474331657 Tests: none, this is a trivial fix for a not-yet-seen-in-the-wild bug. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <8e84ece8343468960d4e161467ecd9bb10870c27.1553072505.git.bdenes@scylladb.com>	2019-03-20 14:45:22 +02:00
Piotr Sarna	986004a959	loader: move uploaded view pending sstables to staging When loading tables uploaded via `nodetool refresh`, they used to be left in upload/ directory if view updates would need to be generated from them. Since view update generation is asynchronous, sstables left in the directory could erroneously get overwritten by the user, who decides to upload another batch of sstables and some of the names collided. To remedy this, uploaded sstables that need view updates are moved to staging/ directory with a unique generation number, where they await view update generation. Fixes #4047	2019-03-20 13:44:29 +01:00
Juliana Oliveira	8cd6028d0d	Dockerfile: remove cgroup volume mount Mounting /sys/fs/cgroup inside the image causes docker cgroup to not be mounted internally. Therefore, hosts cannot limit resources on Scylla. This patch removes the cgroup volume mount, allowing folders under /sys/fs/cgroup to be created inside docker. Message-Id: <20190320122053.GA20256@shenzou.localdomain>	2019-03-20 14:30:27 +02:00
Nadav Har'El	7c874057f5	materialized_views: propagate "view virtual columns" between nodes db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed to list the same schema tables - the former is the list of their names, and the latter is the list of their schemas. This code duplication makes it easy to forget to update one of them, and indeed recently the new "view_virtual_columns" was added to all_tables() but not to ALL. What this patch does is to make ALL a function instead of constant vector. The newly named all_table_names() function uses all_tables() so the list of schema tables only appears once. So that nobody worries about the performance impact, all_table_names() caches the list in a per-thread vector that is only prepared once per thread. Because after this patch all_table_names() has the "view_virtual_columns" that was previously missing, this patch also fixes #4339, which was about virtual columns in materialized views not being propagated to other nodes. Unfortunately, to test the fix for #4339 we need a test with multiple nodes, so we cannot test it here in a unit test, and will instead use the dtest framework, in a separate patch. Fixes #4339 Branches: 3.0 Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190320063437.32731-1-nyh@scylladb.com>	2019-03-20 09:14:59 -03:00
Nadav Har'El	ccf731a820	Materialized views: add metric for current flow-control delay The materialized views flow control mechanism works by adding a certain delay to each client request, designed to slow down the client to the rate at we can complete the background view work. Until now we could observe this mechanism only indirectly, in whether or not it succeeded to keep the view backlog bounded; But we had no way to directly observe the delay that we decided to add. In fact, we had a bug where this delay was constantly zero, and we didn't even notice :-) So in this patch we add a new metric, scylla_storage_proxy_coordinator_last_mv_flow_control_delay The metric is a floating point number, in units of seconds. This metric is somewhat peculiar that it always contains the last delay used for some request - unlike other metrics it doesn't measure the "current" value of something. Moreover, it can jump wildly because there is no guarantee that each request's delay will be identical (in particular, different requests may involve different base replicas which have different view backlogs, so decide on different delays). In the future we may want to supplement this metric with some sort of delay histogram. But even this simple metric is already useful to debug certain scenarios and understand if the materialized-views flow control is working or not. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190227133630.26328-1-nyh@scylladb.com>	2019-03-20 09:14:59 -03:00
Tomasz Grabiec	fbeae4ffeb	toolchain: Install gdb in the image Scylla built using the frozen toolchain needs to be debugged on a system with matching libraries. It's easiest if it's also done on the same image. Install gdb in the image so that it's always out there when we need it. Fixes #4329 Message-Id: <1553072393-9145-1-git-send-email-tgrabiec@scylladb.com>	2019-03-20 13:35:26 +02:00
Piotr Sarna	41679de13e	tests: add cases for local index prefix optimization The cases check if incorporating clustering key prefix into the indexed query works fine (i.e. does not require filtering and returns proper rows).	2019-03-20 10:51:27 +01:00
Piotr Sarna	56a0e6d992	tests: add create/drop local index test case	2019-03-20 10:51:27 +01:00
Piotr Sarna	3c61c8e18a	tests: add non-standard names cases to local index tests New test cases cover case-sensitive column/table names and names with non-alphanumeric characters like commas and parentheses.	2019-03-20 10:51:27 +01:00
Piotr Sarna	d664e0e522	tests: add multi pk case for local index tests	2019-03-20 10:51:27 +01:00
Piotr Sarna	3b39029924	tests: add test for malformed local index definitions	2019-03-20 10:51:27 +01:00
Piotr Sarna	4b82011cd3	tests: add local index paging test	2019-03-20 10:51:27 +01:00
Piotr Sarna	8836500fcd	tests: add local indexing test A test case for local indexing is added to the SI suite.	2019-03-20 10:51:27 +01:00
Piotr Sarna	cedec95f8d	cql3: add CREATE INDEX syntax for local indexes In order to create a local index, the syntax used is: CREATE INDEX t ON ((p1, p2, p3), v); where (p1, p2, p3) are partition key columns (all of them), and v is the indexed column.	2019-03-20 10:51:27 +01:00
Piotr Sarna	1fd61c5ac4	cql3: use serialization function to create index target string Instead of building the string manually, a serialization function is called to create a string out of index target list.	2019-03-20 10:51:27 +01:00
Piotr Sarna	757419b524	index: add serialization function for index targets Since target_parser is responsible for deserializing target strings, the function that serializes them belongs in the same class.	2019-03-20 10:51:26 +01:00
Piotr Sarna	074ed2c8a5	index: use proper local index target when adding index With global indexes, target column name is always the same as the string kept in 'options[target]' field. It's not the case for local indexes, and so a proper extracting function is used to get the value.	2019-03-20 10:20:24 +01:00
Piotr Sarna	2fcae3d0ec	index: add parsing target column name from local index targets When (re)creating a local index, the target string needs to be used to parse out the actual indexed column: "(base_pk_part1,base_pk_part2,base_pk_part3),actual_indexed_column". This column is later used to deterine if an index should be applied to a SELECT statement.	2019-03-20 10:20:24 +01:00
Piotr Sarna	e0d7807eed	db: add checking for local index in schema tables Based on which targets the index has, it will be either local or global - local indexes have their full base partition key embedded in their targets.	2019-03-20 10:20:24 +01:00
Piotr Sarna	de5e5ee1a5	index: add checking if serialized target implies local index This utility enables checking if the specified target indicated having a local index, even before base table schema is known.	2019-03-20 10:20:24 +01:00
Piotr Sarna	5672edc149	index: enable parsing multi-key targets Parsing index targets that consist of partition key columns followed by clustering key columns is enabled.	2019-03-20 10:20:24 +01:00
Piotr Sarna	9782381dd4	index: move target parser code to .cc file It will be useful later when expanding the implementation.	2019-03-20 10:20:24 +01:00
Piotr Sarna	25264d61ee	json: add non-throwing overload for to_json_value It will be needed later to avoid unnecessary try-catch blocks.	2019-03-20 10:20:24 +01:00
Piotr Sarna	b46ab76d4b	cql3: add checking for local indexes in has_supporting_index() With local indexes it's not sufficient to check if a single restriction is supported by an index in order to decide that in can be used, because local indexes can be leveraged only when full partition key is properly restricted. (It also serves as a great example why restrictions code would greatly benefit from a facelift! :) )	2019-03-20 10:20:24 +01:00
Piotr Sarna	87f6e37caa	cql3: move finding index restrictions to prepare stage Index restrictions that match a given index were recomputed during execution stage, which is redundant and prone to errors. Now, used index restrictions are cached in a prepare statement.	2019-03-20 10:20:22 +01:00
Piotr Sarna	9823898b27	cql3: add picking an index by score Instead of choosing the first index that we find (in column def order), the index with highest score is picked. Currently local indexes score higher than global ones if restrictions allow local indexing to be applied.	2019-03-20 10:20:02 +01:00
Piotr Sarna	2f173f7ed8	cql3: add handling paging state for local indexes When computing paging state for local indexes, the partition and clustering keys are different than with global ones: - partition key is the same as base's - clustering key starts with the indexed column	2019-03-20 10:20:02 +01:00
Piotr Sarna	75dd964751	cql3: add handling partition slices for local indexes For local indexes, a slice will consist of the indexed column followed by base clustering columns.	2019-03-20 10:20:01 +01:00
Piotr Sarna	b12162c8f5	cql3: add returning correct partition ranges for local indexes Local indexes always share the partition range with their base.	2019-03-20 09:51:46 +01:00
Piotr Sarna	da8e8f18b3	cql3: make read_posting_list a member function It already accepts several arguments that can be extracted from 'this', and more will be added in the future. New parameters include lambdas prepared during prepare stage that define how to extract partition/clustering key ranges depending on which index is used, so keeping it a static function will result in unbounded number of parameters with complex types, which will in turn make the function header almost illegible for a reader. Hence, read_posting_list becomes a member function with easy access to any data prepared during prepare stage.	2019-03-20 09:51:46 +01:00
Piotr Sarna	85017c5ad4	cql3: look for indexed column definition only once There's no need to look for the column definition inside a loop.	2019-03-20 09:51:46 +01:00
Piotr Sarna	8002471c81	cql3: allow index target to keep multiple columns Instead of having just one column definition, index target is now a variant of either single column definition or a vector of them. The vector is expected to be used when part of a target definition is enclosed in parentheses: $ CREATE INDEX ON t((p),v); or $ CREATE INDEX ON t((p1,p2), v); etc. This feature will allow providing (possibly composite) base partition key to CREATE INDEX statement, which will result in creating a local index.	2019-03-20 09:51:46 +01:00
Piotr Sarna	a45022dbc7	docs: document index target serialization Index target serialization format is extended for the purpose of local indexing. Both new and old formats are described in docs.	2019-03-20 09:51:46 +01:00
Piotr Sarna	9c984f9da9	index: fix indentation	2019-03-20 09:51:46 +01:00
Piotr Sarna	3b908b7b5d	index: add base partition keys to local index schema When the index is local, its partition key in underlying materialized view is the the same as base's, and the indexed column is a first clustering key. This implementation ensures that view and base rows will reside on the same partition, while querying the indexed column will be possible by putting it as a first clustering key part.	2019-03-20 09:51:46 +01:00
Piotr Sarna	90d47ca183	schema: add is_local_index cached value to index metadata In order to quickly distinguish global indexes from local ones, a cached boolean value is introduced.	2019-03-20 09:51:46 +01:00
Botond Dénes	ddf795d2f9	configure.py: add check header targets Our guidelines dictate that each header is self-sufficient, i.e. after including it into an empty .cc file, the .cc file can be compiled without having to include any other header file. Currently we don't have any tool to check that a header is self sufficient. This patch aims to remedy that by adding a target to check each header, as well as a target to check all the headers. For each header a target is generated that does the equivalent of including the header into an empty .cc file, then compiling the resulting .cc file.This targetis called {header_name}.o, so for given the header `myheader.hh` this will be `build/dev/myheader.hh.o` (if the dev build-mode is used). Also a target, `checkheaders` is added which validates all headers in the project. This currently fails as we have many headers that are not self-sufficient. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <fdf550dc71203417252f1d8144e7a540eec074a1.1552636812.git.bdenes@scylladb.com>	2019-03-19 17:35:18 +02:00
Botond Dénes	721dd70d93	scylla-gdb.py: add scylla fiber command The scylla fiber command traverses a continuation chain, given an arbitrary task pointer. Example (cropped for brevity): (gdb) scylla fiber this #0 (task) 0x0000600000550360 0x000000000468ac40 vtable for seastar... #1 (task) 0x0000600000550300 0x00000000046c3778 vtable for seastar... #2 (task) 0x00006000018af600 0x00000000046c37a0 vtable for seastar... #3 (task) 0x00006000005502a0 0x00000000046c37f0 vtable for seastar... #4 (task*) 0x0000600001a65e10 0x00000000046c6b10 vtable for seastar... scylla fiber can be passed any expression that evaluates to a task pointer. C++ variables, raw adresses and GDB variables (e.g. $1) all work. The command works by scanning the task object for pointers. If a pointer is found it is dereferenced. If successful it checks that the pointer dereferences to a vtable, the class for which is a known task. If this succeeds the found task is saved, the scan then recursively proceeds to scan the newly found task until a task with no further attached continuations is found.	2019-03-19 17:06:41 +02:00
Botond Dénes	697fc5cefe	scylla-gdb.py: scylla_ptr: accept any valid gdb expression as input	2019-03-19 17:06:41 +02:00
Botond Dénes	e1ea4db7ca	scylla-gdb.py: scylla_ptr: make analyze() usable for outside code Instead of a formatted message, intended for humans, return a `pointer_metadata` object, suitable for being using by code. The formatting of the pointer metadata into the human readable message is now done by the `pointer_metadata.__str__()` method, on the call site. Also make `analyze()` a class method, making it possible for being called without having to create a `scylla_ptr` command instance, possibly confusing GDB.	2019-03-19 17:06:41 +02:00
Botond Dénes	e77b6d12d1	scylla-gdb.py: scylla_ptr: move actual logic into analyze() In preparation to this method being made usable for outside code.	2019-03-19 17:06:41 +02:00
Botond Dénes	7d5c0ff666	scylla-gdb.py: resolve(): add cache parameter Allow callers to prevent the resolved name from being saved. Useful when one is just probing addresses but doesn't want to flood the cache with useless symbols.	2019-03-19 17:06:41 +02:00
Botond Dénes	48b96d25b3	scylla-gdb.py: fix tasks and task-stats commands These two commands are broken for some time, roughly since the CPU scheduler was merged. Fix them and move the task queue parsing code into a common method, which now is used by both commands.	2019-03-19 17:06:41 +02:00
Botond Dénes	87c28df429	scylla-gdb.py: fix existing documentation Some commands are documented, but not in the python way. Refactor these commands so they use the standard python way for self documenting. In addition to being more "python", this makes these documentation strings discoverable by GDB so they appear in the `help scylla` output.	2019-03-19 17:06:41 +02:00
Botond Dénes	e1dffc3850	scylla-gdb.py: std_unique_ptr: add get() method Add a `get()` method that retrieves the wrapped pointer without dereferencing it. All existing methods are refactored to use this new method to obtain the pointer instead of directly accessing the members. This way only a single method has to be fixed if the object implementation changes.	2019-03-19 17:06:41 +02:00
Botond Dénes	c51b11c0ed	scylla-gdb.py: fix static_vector Appearantly a new 'dummy' level was added.	2019-03-19 17:06:41 +02:00
Glauber Costa	7119440cbc	tests: make sure that commitlog replay works after truncate. Tomek and I recently had a discussion about whether or not a commitlog replay would be safe after we dropped or truncated a table that is not flushed (durable, but auto_snapshots being false). While we agreed that would be the safe, we both agreed we would feel better with a unit test covering that. This patch adds such a test (btw, it passes) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190318223811.6862-1-glauber@scylladb.com>	2019-03-19 11:30:51 +01:00
Avi Kivity	0441b59a70	Update seastar submodule * seastar 463d24e...33baf62 (3): > reactor: improve detection of io_pgetevents() > rpc: fix stack use after free in frame reading functions > core/thread: enable move-only functions	2019-03-19 11:44:35 +02:00
Takuya ASADA	32cee92d56	dist/debian: don't strip ld.so On some environment dh_strip fails at libreloc/ld.so, so it's better to skip too just like libprotobuf.so.15. error message is: dh_strip -Xlibprotobuf.so.15 --dbg-package=scylla-server-dbg strip:debian/scylla-server/opt/scylladb/libreloc/ld.so[.gnu.build.attributes]: corrupt GNU build attribute note: bad description size: Bad value dh_strip: strip --remove-section=.comment --remove-section=.note --strip-unneeded debian/scylla-server/opt/scylladb/libreloc/ld.so returned exit code 1 0 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190319005153.26506-1-syuu@scylladb.com>	2019-03-19 11:06:44 +02:00
Asias He	71bf757b2c	gossiper: Enable features only after gossip is settled n1, n2, n3 in the cluster, shutdown n1, n2, n3 start n1, n2 start n3, we saw features are enabled using the system table while n1 and n2 are already up and running in the cluster. INFO 2019-02-27 09:24:41,023 [shard 0] gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} INFO 2019-02-27 09:24:41,025 [shard 0] storage_service - Starting up server gossip INFO 2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.1 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} INFO 2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.2 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} The problem is we enable the features too early in the start up process. We should enable features after gossip is settled. Fixes #4289 Message-Id: <04f2edb25457806bd9e8450dfdcccc9f466ae832.1551406991.git.asias@scylladb.com>	2019-03-18 18:25:29 +01:00
Dejan Mircevski	c7d05b88a6	Update GCC version check in configure.py This brings the version check up-to-date with README.md and HACKING.md, which were updated by commit fa2b03 ("Replace std::experimental types with C++17 std version.") to say that minimum GCC 8.1.1 is required. Tests: manually run configure.py with various `--compiler` values. Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190318130543.24982-1-dejan@scylladb.com>	2019-03-18 15:24:25 +02:00
Tomasz Grabiec	33f15aa1b5	tests: sstables: Test reading of static compact sstable generated by Cassandra	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	c78568daef	tests: sstables: Add test for writing and reading of static compact tables	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	47ca280e57	sstables: mc: Write static compact tables the same way as Cassandra Static compact tables are tables with compact storage and no clustering columns. Before this patch, Scylla was writing rows of static compact tables as clustered rows instead of static rows. That's because in our in-memory model such tables have regular rows and no static row. In Cassandra's schema (since 3.x), those tables have columns which are marked as static and there are no regular columns. This worked fine as long as Scylla was writing and reading those sstables. But when importing sstables from Cassandra, our reader was skipping the static row, since it's not present in the schema, and returning no rows as a result. Also, Cassandra, and Scylla tools, would have problems reading those sstables. Fix this by writing rows for such tables the same way as Cassandra does. In order to support rolling downgrade, we do that only when all nodes are upgraded. Fixes #4139.	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	b0ff68d8d9	sstable: mc: writer: Set _static_row_written inside write_static_row()	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	b68df143a1	sstables: Add sstable::features()	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	cf9721e855	sstables: mc: writer: Prepare write_static_row() for working with any column_kind	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	fefef7b9eb	storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag When enabled on all nodes, sstable writers will start to produce correct MC-format sstables for compact storage tables by writing rows into the static row (like C*) rather than into the regular row. We only do that when all nodes are upgraded to support rolling downgrade. After all nodes are upgraded, regular rolling downgrade will not be possible. Refs #4139	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	52d634025d	sstables: mc: writer: Build indexed_columns together with serialization_header The set of columns in both must match, so it's better to build them together. Later the for choosing columns will become more complicated, and this patch will allow for avoiding duplication.	2019-03-18 11:18:33 +01:00
Tomasz Grabiec	701ac53b80	sstables: mc: writer: De-optimize make_serialization_header() So that it's easier to make it use schema_v3 conditionally in later patches. It's not on the hot path, so it shouldn't matter that we don't reserve the vectors.	2019-03-18 11:15:18 +01:00
Tomasz Grabiec	8bb8d67a93	sstable: mc: writer: Move attaching of mc-specific components out of generic code	2019-03-18 11:15:18 +01:00
Tomasz Grabiec	b0e6f17a22	Merge "Fix empty remote common_features in check_knows_remote_features" from Asias Three nodes in the cluster node1, node2, node3 Shutdown the whole cluster Start node1 Start node2, node2 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.2 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't started yet, node1 sees node3 has empty features. In get_supported_features(), an empty common features will be returned if an empty features of a node is seen. To fix, we should fallback to use the features saved in system table. Start node3, node3 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't inserted its own features into gossip endpoint_state_map. get_supported_features() returns the common features of all nodes in endpoint_state_map. To fix, we should fallback to use the features stored in the system table for such node in this case. Fixes #4225 Fixes #4341 * dev asias/fix_check_knows_remote_features.upstream.v4.1: gossiper: Remove unused register_feature and unregister_feature gossiper: Remove unused wait_for_feature_on_all_node and wait_for_feature_on_node gossiper: Log feature is enabled only if the feature is not enabled previously gossiper: Fix empty remote common_features in check_knows_remote_features	2019-03-18 10:56:10 +01:00
Asias He	1d59f26c11	gossiper: Fix empty remote common_features in check_knows_remote_features Three nodes in the cluster node1, node2, node3 Shutdown the whole cluster Start node1 Start node2, node2 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.2 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't started yet, node1 sees node3 has empty features. In get_supported_features(), an empty common features will be returned if an empty features of a node is seen. To fix, we should fallback to use the features saved in system table. Start node3, node3 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't inserted its own features into gossip endpoint_state_map. get_supported_features() returns the common features of all nodes in endpoint_state_map. To fix, we should fallback to use the features stored in the system table for such node in this case. Fixes #4225	2019-03-18 10:56:10 +01:00
Asias He	acb4badbc3	gossiper: Log feature is enabled only if the feature is not enabled previously We saw the log "Feature FOO is enabled" more than once like below. It is better to log it only when the feature is not enabled previously. gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL gossip - Feature CORRECT_COUNTER_ORDER is enabled gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled gossip - Feature COUNTERS is enabled gossip - Feature DIGEST_MULTIPARTITION_READ is enabled gossip - Feature INDEXES is enabled gossip - Feature LARGE_PARTITIONS is enabled gossip - Feature LA_SSTABLE_FORMAT is enabled gossip - Feature MATERIALIZED_VIEWS is enabled gossip - Feature MC_SSTABLE_FORMAT is enabled gossip - Feature RANGE_TOMBSTONES is enabled gossip - Feature ROLES is enabled gossip - Feature ROW_LEVEL_REPAIR is enabled gossip - Feature SCHEMA_TABLES_V3 is enabled gossip - Feature STREAM_WITH_RPC_STREAM is enabled gossip - Feature TRUNCATION_TABLE is enabled gossip - Feature WRITE_FAILURE_REPLY is enabled gossip - Feature XXHASH is enabled gossip - Feature CORRECT_COUNTER_ORDER is enabled gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled gossip - Feature COUNTERS is enabled gossip - Feature DIGEST_MULTIPARTITION_READ is enabled gossip - Feature INDEXES is enabled gossip - Feature LARGE_PARTITIONS is enabled gossip - Feature LA_SSTABLE_FORMAT is enabled gossip - Feature MATERIALIZED_VIEWS is enabled gossip - Feature MC_SSTABLE_FORMAT is enabled gossip - Feature RANGE_TOMBSTONES is enabled gossip - Feature ROLES is enabled gossip - Feature ROW_LEVEL_REPAIR is enabled gossip - Feature SCHEMA_TABLES_V3 is enabled gossip - Feature STREAM_WITH_RPC_STREAM is enabled gossip - Feature TRUNCATION_TABLE is enabled gossip - Feature WRITE_FAILURE_REPLY is enabled gossip - Feature XXHASH is enabled gossip - InetAddress 127.0.0.2 is now UP, status = NORMAL	2019-03-18 10:56:10 +01:00
Asias He	f32f08c91e	gossiper: Remove unused wait_for_feature_on_all_node and wait_for_feature_on_node Remove unused check_features helper as well.	2019-03-18 10:56:09 +01:00
Asias He	6dbcb2e0c9	gossiper: Remove unused register_feature and unregister_feature They are not used any more.	2019-03-18 10:56:09 +01:00
Benny Halevy	ecf88d8e2e	compaction: fix sstable_window_size calculation is only unit/size is set If a user that changes the default UNIT from DAYS to HOURS and does not set the compaction_window_size will endup with a window of 24H instead of 1H. According to the docs https://docs.scylladb.com/getting-started/compaction/#twcs-options compaction_window_size should default to a value of 1. Fixes #4310 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190307131318.13998-1-bhalevy@scylladb.com>	2019-03-18 11:19:18 +02:00
Takuya ASADA	02be95365f	reloc/build_rpm.sh: don't use '*' for tar xf argument It works accidentally but it just expanded by bash to use mached files in current directory, not correctly recognized by tar. Need to use full file name instead. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190312172243.5482-2-syuu@scylladb.com>	2019-03-18 11:09:55 +02:00
Takuya ASADA	5b10b6a0ce	reloc/build_reloc.sh: enable DPDK We get following link error when running reloc/build_reloc.sh in dbuild, need to enable DPDK on Seastar: g++: error: /usr/lib64/librte_cfgfile.so: No such file or directory g++: error: /usr/lib64/librte_cmdline.so: No such file or directory g++: error: /usr/lib64/librte_ethdev.so: No such file or directory g++: error: /usr/lib64/librte_hash.so: No such file or directory g++: error: /usr/lib64/librte_kvargs.so: No such file or directory g++: error: /usr/lib64/librte_mbuf.so: No such file or directory g++: error: /usr/lib64/librte_eal.so: No such file or directory g++: error: /usr/lib64/librte_mempool.so: No such file or directory g++: error: /usr/lib64/librte_mempool_ring.so: No such file or directory g++: error: /usr/lib64/librte_pmd_bnxt.so: No such file or directory g++: error: /usr/lib64/librte_pmd_e1000.so: No such file or directory g++: error: /usr/lib64/librte_pmd_ena.so: No such file or directory g++: error: /usr/lib64/librte_pmd_enic.so: No such file or directory g++: error: /usr/lib64/librte_pmd_fm10k.so: No such file or directory g++: error: /usr/lib64/librte_pmd_qede.so: No such file or directory g++: error: /usr/lib64/librte_pmd_i40e.so: No such file or directory g++: error: /usr/lib64/librte_pmd_ixgbe.so: No such file or directory g++: error: /usr/lib64/librte_pmd_nfp.so: No such file or directory g++: error: /usr/lib64/librte_pmd_ring.so: No such file or directory g++: error: /usr/lib64/librte_pmd_sfc_efx.so: No such file or directory g++: error: /usr/lib64/librte_pmd_vmxnet3_uio.so: No such file or directory g++: error: /usr/lib64/librte_ring.so: No such file or directory Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190312172243.5482-1-syuu@scylladb.com>	2019-03-18 11:09:55 +02:00
Piotr Sarna	2e05d86cf3	service: reduce number of spawned threads when notifying Commit `9c544df217` introduced running up/down/join/leave notifications in threaded context, but spawned a thread for every notification, while it could be done once for all notifiees. Reported-by: Avi Kivity <avi@scylladb.com> Message-Id: <34815d5aa11902c4a052cff38f4c45c45ff919d8.1552897848.git.sarna@scylladb.com>	2019-03-18 10:45:47 +02:00
Avi Kivity	64fa2dd1d2	Merge "gdb: Introduce 'scylla sstables'" from Tomasz " Finds all sstables on current shard and prints useful information, like on-disk and in-memory usage. Example: (gdb) scylla sstables (sstables::sstable) 0x60100034d200: local=1 data_file=9551, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x601000348600: local=1 data_file=1229, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x601000348000: local=1 data_file=4785, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x60100034c600: local=1 data_file=298, in_memory=266192 (bf=400, summary=3072, sm=262096) ... total (shard-local): count=144, data_file=782839677, in_memory=59774408 Because of the way it finds sstables (bag_sstable_set), doesn't yet support tables using LeveledCompactionStrategy. " * 'gdb-scylla-sstables' of github.com:tgrabiec/scylla: gdb: Introduce 'scylla sstables' gdb: Introduce find_instances() gdb: Extract std_unqiue_ptr.get() gdb: Add chunked_vector wrapper gdb: Add small_vector wrapper gdb: Add circular_buffer.size() and circular_buffer.external_memory_footprint() gdb: Add wrapper for seastar::lw_shared_ptr gdb: Add std_vector.external_memory_footprint() gdb: Add wrapper for boost::variant gdb: Add wrapper for std::optional	2019-03-17 19:37:44 +02:00
Takuya ASADA	270f9cf9e6	dist/debian: fix installing scyllatop Since we removed dist/common/bin/scyllatop we are getting a build error on .deb package build (`1bb65a0888`). To fix the error we need to create a symlink for /usr/bin/scyllatop. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190316162105.28855-1-syuu@scylladb.com>	2019-03-17 19:37:44 +02:00
Tomasz Grabiec	05e2c87936	gdb: Introduce 'scylla sstables' Finds all sstables on current shard and prints useful information, like on-disk and in-memory usage. Example: (gdb) scylla sstables (sstables::sstable) 0x60100034d200: local=1 data_file=9551, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x601000348600: local=1 data_file=1229, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x601000348000: local=1 data_file=4785, in_memory=266192 (bf=400, summary=3072, sm=262096) (sstables::sstable) 0x60100034c600: local=1 data_file=298, in_memory=266192 (bf=400, summary=3072, sm=262096)	2019-03-15 15:12:48 +01:00
Tomasz Grabiec	929653f51d	gdb: Introduce find_instances()	2019-03-15 15:12:48 +01:00
Tomasz Grabiec	fc4952c579	gdb: Extract std_unqiue_ptr.get()	2019-03-15 15:12:48 +01:00
Tomasz Grabiec	e47a5019f2	gdb: Add chunked_vector wrapper	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	a6da71e4da	gdb: Add small_vector wrapper	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	0e8589cfdf	gdb: Add circular_buffer.size() and circular_buffer.external_memory_footprint()	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	380c6fbdfe	gdb: Add wrapper for seastar::lw_shared_ptr	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	93e5e0d644	gdb: Add std_vector.external_memory_footprint()	2019-03-15 15:12:47 +01:00
Tomasz Grabiec	8866b1320a	gdb: Add wrapper for boost::variant	2019-03-15 15:12:46 +01:00
Tomasz Grabiec	dd237c32af	gdb: Add wrapper for std::optional	2019-03-15 15:12:46 +01:00
Paweł Dziepak	f4f56027bf	Merge "Detect partitioner mismatch" from Piotr " Refuse to accept SSTables that were created with partitioner different than the one used by the Scylla server. Fixes #4331 " * 'haaawk/4331/v4' of github.com:scylladb/seastar-dev: sstables: Add test for sstable::validate_partitioner sstables: Add sstable::validate_partitioner and use it	2019-03-15 11:45:10 +00:00
Piotr Jastrzebski	2b0437a147	sstables: Add test for sstable::validate_partitioner Make sure the exception is thrown when Scylla tries to load an SSTable created with non-compatible partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-03-15 10:47:47 +01:00
Piotr Jastrzebski	4aea97f120	sstables: Add sstable::validate_partitioner and use it Scylla server can't read sstables that were created with different partitioner than the one being used by Scylla. We should make sure that Scylla identifies such mismatch and refuses to use such SSTables. We can use partitioner information stored in validation metadata (Statistics.db file) for each SSTable and compare it against partitioner used by Scylla. Fixes #4331 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-03-15 10:14:37 +01:00
Rafael Ávila de Espíndola	94c28cfb16	sstable: Wait for future returned by maybe_record_large_cells. A previous version of the patch that introduced these calls had no limit on how far behind the large data recording could get, and maybe_record_large_cells returned null. The final version switched to a semaphore, but unfortunately these calls were not updated. Tests: unit (dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190314195856.66387-1-espindola@scylladb.com>	2019-03-14 21:01:37 +01:00
Paweł Dziepak	349601ac32	sstable: pass full length of buffer to vint deserialiser vint deserialiser can be more performant if it is allowed to do an overread (i.e. read more memory than the value it is deserialising). In case of sstable reads those vints are going to be usually in a middle of a much larger buffer so lets pass the whole length of the buffer and enable this optimisation.	2019-03-14 13:37:06 +00:00
Paweł Dziepak	552fc0c6b9	vint: optimise deserialisation routine At the moment, vint deserialisation is using a naive approach, reading each byte separately. In practice, vints are going to most often appears inside larger buffers. That means we can read 8-bytes at a time end then figure out unneded parts and mask them out. This way we avoid a loop and do less memory loads which are much more expensive than arithmetic operations (even if they hit the cache).	2019-03-14 13:37:06 +00:00
Paweł Dziepak	57de2c26b3	vint: drop deserialize_type structure Deserialisation function returns a structure containing both the value and its length in the input buffer. In the vast majority of the cases the caller will already know the length and having this structure will make it harder for the compiler to emit good code, especially if the function is not inlined. In practice I've seen the structure causing register pressure problems that lead to spilling variables to memory.	2019-03-14 13:37:06 +00:00
Paweł Dziepak	6110278439	tests/vint: reduce test dependencies vint serialisation test doesn't need whole Scylla so lets reduce its dependencies to improve build times.	2019-03-14 13:37:06 +00:00
Paweł Dziepak	54a079cdb5	tests/perf: add performance test for vint serialisation	2019-03-14 13:37:06 +00:00
Piotr Sarna	9c544df217	service: run notifying code in threaded context In order to allow yielding when handling endpoint lifecycle changes, notifiers now run in threaded context. Implementations which used this assumption before are supplemented with assertions that they indeed run in seastar::async mode. Fixes #4317 Message-Id: <45bbaf2d25dac314e4f322a91350705fad8b81ed.1552567666.git.sarna@scylladb.com>	2019-03-14 12:56:53 +00:00
Piotr Sarna	a7602bd2f1	database: add global view update stats Currently view update metrics are only per-table, but per-table metrics are not always enabled. In order to be able to see the number of generated view updates in all cases, global stats are added. Fixes #4221 Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>	2019-03-14 12:04:18 +00:00
Paweł Dziepak	d4d2eb2ed5	Update seastar submodule * seastar e640314...463d24e (3): > Merge 'Handle IOV_MAX limit in posix_file_impl' from Paweł > core: remove unneeded 'exceptional future ignored' report > tests/perf: support multiple iterations in a single test run	2019-03-13 14:24:58 +00:00
Tomasz Grabiec	2ef9d9c12e	Merge "Record large cells to system.large_cells" from Rafael Issue #4234 asks for a large collection detector. Discussing the issue Benny pointed out that it is probably better to have a generic large cell detector as it makes a natural progression on what we already warn on (large partitions and large rows). This patch series implements that. It is on top of shutdown-order-patches-v7 which is currently on next. With the charges to use a semaphore this patch series might be getting a bit big. Let me know if I should split it. * https://github.com/espindola/scylla espindola/large-cells-on-top-of-shutdown-v5: db: refactor large data deletion code db: Rename (maybe_)?update_large_partitions db: refactor a try_record helper large_data_handler: assert it is not used after stop() db: don't use _stopped directly sstables: delete dead error handling code. large_data_handler: Remove const from a few functions large_data_handler: propagate a future out of stop() large_data_handler: Run large data recording in parallel Create a system.large_cells table db: Record large cells Add a test for large cells	2019-03-13 09:44:57 +01:00
Rafael Ávila de Espíndola	f983570ac8	Add a test for large cells Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	63251b66c1	db: Record large cells Fixes #4234. Large cells are now recorded in system.large_cells. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	d17083b483	Create a system.large_cells table This is analogous to the system.large_rows table, but holds individual cells, so it also needs the column name. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	8b4ae95168	large_data_handler: Run large data recording in parallel With this changes the futures returned by large_data_handler will not normally wait for entries to be written to system.large_rows or system.large_partitions. We use a semaphore to bound how behind system.large_* table updates can get. This should avoid delaying sstables writes in the common case, which is more relevant once we warn of large cells since the the default threshold will be just 1MB. Note that there is no ordering between the various maybe_record_* and maybe_delete_large_data_entries requests. This means that we can end up with a stale entry that is only removed once the TTL expires. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	54b856e5e4	large_data_handler: propagate a future out of stop() stop() will close a semaphore in a followup patch, so it needs to return a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	989ab33507	large_data_handler: Remove const from a few functions These will use a member semaphore variable in a followup patch, so they cannot be const. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	0b763ec19b	sstables: delete dead error handling code. maybe_delete_large_data_entries handles exceptions internally, so the code this patch deletes would never run. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	5fcb3ff2d7	db: don't use _stopped directly This gives flexibility in how it is implemented. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	a17a936882	large_data_handler: assert it is not used after stop() This should have been changed in the patch db: stop the commit log after the tables during shutdown But unfortunately I missed it then. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	f3089bf3d1	db: refactor a try_record helper We had almost identical error handling for large_partitions and large_rows. Refactor in preparation for large_cells. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:02 -07:00
Rafael Ávila de Espíndola	d7f263d334	db: Rename (maybe_)?update_large_partitions This renames it to record_large_partitions, which matches record_large_rows. It also changes the signature to be closer to record_large_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola	f254664fe6	db: refactor large data deletion code The code for deleting entries from system.large_partitions was almost a duplicate from the code for deleting entries from system.large_rows. This patch unifies the two, which also improves the error message when we fail to delete entries from system.large_partitions. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:16:04 -07:00
Asias He	b8158dd65d	streaming: Get rid of the keep alive timer in streaming There is no guarantee that rpc streaming makes progress in some time period. Remove the keep alive timer in streaming to avoid killing the session when the rpc streaming is just slow. The keep alive timer is used to close the session in the following case: n2 (the rpc streaming sender) streams to n1 (the rpc streaming receiver) kill -9 n2 We need this because we do not kill the session when gossip think a node is down, because we think the node down might only be temporary and it is a waste to drop the previous work that has done especially when the stream session takes long time. Since in range_streamer, we do not stream all data in a single stream session, we stream 10% of the data per time, and we have retry logic. I think it is fine to kill a stream session when gossip thinks a node is down. This patch changes to close all stream session with the node that gossip think it is down. Message-Id: <bdbb9486a533eee25fcaf4a23a946629ba946537.1551773823.git.asias@scylladb.com>	2019-03-12 12:20:28 +01:00
Duarte Nunes	2718c90448	Merge 'Add canceling long-standing view update requests' from Piotr " This series allows canceling view update requests when a node is discovered DOWN. View updates are sent in the background with long timeout (5 minutes), and in case we discover that the node is unavailable, there's no point in waiting that long for the request to finish. What's more, waiting for these requests occurs on shutdown, which may result in waiting 5 minutes until Scylla properly shuts down, which is bad for both users and dtests. This series implements storage_proxy as a lifecycle subscriber, so it can react to membership changes. It also keeps track of all "interruptible" writes per endpoint, so once a node is detected as DOWN, an artificial timeout can be triggered for all aforementioned write requests. Fixes #3826 Fixes #3966 Fixes #4028 " * 'write_hints_for_view_updates_on_shutdown_4' of https://github.com/psarna/scylla: service: remove unused stop_hints_manager storage_proxy: add drain_on_shutdown implementation main: register storage proxy as lifecycle subscriber storage_proxy: add endpoint_lifecycle_subscriber interface storage_proxy: register view update handlers for view write type storage_proxy: add intrusive list of view write handlers storage_proxy: add view_update_write_response_handler	2019-03-08 13:34:46 -03:00
Piotr Sarna	ae52b3baa7	tests: fix complex timestamp test flakiness Complex timestamp tests were ported from dtest and contained a potential race - rows were updated with TTL 1 and then checked if the row exists in both base and view replicas in an eventually() loop. During this loop however, TTL of 1 second might have already passed and the row could have been deleted from base. This patch changes the mentioned TTL to 30 seconds, making the tests extremely unlikely to be flaky. Message-Id: <6b43fe31850babeaa43465eb771c0af45ee6e80d.1552041571.git.sarna@scylladb.com>	2019-03-08 13:34:27 -03:00
Tomasz Grabiec	eb5506275b	Merge "Further enhancements to perf_fast_forward" from Paweł This series contains several improvements to perf_fast_forward that either address some of the problems seen in the automated runs or help understanding the results. The main problem was that test small-partition-slicing had a preparation stage disproportionally long compared to the actual testing phase. While the fragments per second results wasn't affected by that, it restricted the number of iterations of the test that we were able to run, and the test which single iterations is short (and more prone to noise) was executed only four times. This was solved by sharing the preparation stage with all iterations, thus enabling the test to be run many times and improving the stability of the results. Another, improvement is the ability to dump all test results and process them producing histograms. This allows us to see how the distribution of particular statistics looks like and if there are some complications. Refs #4278. * https://github.com/pdziepak/scylla.git more-perf_fast_forward/v1: tests/perf_fast_forward: print number of iterations of each test tests/perf_fast_forward: reuse keys in small partition slicing test tests/perf_fast_forward: extract json result file writing logic tests/perf_fast_forward: add an option to dump all results tests/perf_fast_forward: add script for analysing full results	2019-03-07 12:22:13 -03:00
Piotr Sarna	aea4b7ea78	service: remove unused stop_hints_manager Stopping hints manager now occurs when draining storage proxy and it shouldn't be executed independently, so it's removed from external API.	2019-03-07 13:44:06 +01:00
Piotr Sarna	cc806909d7	storage_proxy: add drain_on_shutdown implementation When storage proxy is shutting down, all interruptible writes can be timed out in order not to wait for them. Instead, the mechanism will fall back to storing hints and/or not progressing with view building.	2019-03-07 13:44:05 +01:00
Piotr Sarna	c61d0ee8aa	main: register storage proxy as lifecycle subscriber In order to be able to act when node joins/leaves, storage proxy is registered as an endpoint lifecycle subscriber. Fixes #3826 Fixes #4028	2019-03-07 12:10:40 +01:00
Piotr Sarna	92df1d5a6b	storage_proxy: add endpoint_lifecycle_subscriber interface Storage proxy is able to react to membership changes in order to cancel long-standing operations for an endpoint.	2019-03-07 12:10:40 +01:00
Piotr Sarna	f9ff97511f	storage_proxy: register view update handlers for view write type View update handlers have a specialized class, so all writes of type write_type::VIEW are now registered as such.	2019-03-07 12:10:40 +01:00
Piotr Sarna	75ec5fa876	storage_proxy: add intrusive list of view write handlers In order to be able to iterate over view update write response handlers, an intrusive list of them is added to storage proxy. This way iteration can be easily yielded without invalidating operators and all logic is moved to slow path.	2019-03-07 12:10:40 +01:00
Piotr Sarna	c2048a0758	storage_proxy: add view_update_write_response_handler View update write response handler inherits from a regular write response handler, but it's also possible to link it intrusively in order to be able to induce timeouts on them later.	2019-03-07 12:10:40 +01:00
Paweł Dziepak	0ba7a3c55a	tests/perf_fast_forward: add script for analysing full results perf_fast_forward with flag --dump-all-results reports the results of every test iteration that was executed. This patch introduces a python script that can analyse those results (in json format) and present them in a more human-friendly way. For now, the only option is to plot histograms of selected statistics.	2019-03-06 15:48:49 +00:00
Paweł Dziepak	4220b90b22	tests/perf_fast_forward: add an option to dump all results perf_fast_forward runs each test case multiple times and reports a summary of those results (median, min, max, and median absolute deviation). While very convenient the summary may hide some important information (e.g. the distribution of the results). This patch adds an option to report results of every single executed iteration.	2019-03-06 15:48:48 +00:00
Paweł Dziepak	55ed8b2472	tests/perf_fast_forward: extract json result file writing logic We are about to report, depending on flags, both full results as well as the results summary written now. Most of the logic is going to be identical.	2019-03-06 15:48:45 +00:00
Paweł Dziepak	daafde21c5	tests/perf_fast_forward: reuse keys in small partition slicing test	2019-03-06 15:48:42 +00:00
Paweł Dziepak	0eb1e570aa	tests/perf_fast_forward: print number of iterations of each test	2019-03-06 15:48:38 +00:00
Avi Kivity	0beeb2f721	Merge "implement upgradesstables + scub" from Calle " Fixes #4245 Breaks up "perform_cleanup" in parameterized "rewrite_sstables" and implements upgrade + scrub in terms of this. Both run as a "regular" compaction, but ignore the normal criteria for compaction and select obsolete/all tables. We also ensure all previous compactions are done so we can guarantee all tables are rewritten post invocation of command. " * 'calle/upgrade_sstables' of github.com:scylladb/seastar-dev: api::storage_service: Implement "scrub" api/storage_service: Implement "upgradesstables" api::storage_service: Add keyspace + tables helper compaction_manager: Add perform_sstable_scrub compaction_manager: Add perform_sstable_upgrade compaction_manager: break out rewrite_sstables from cleanup table: parameterize cleanup_sstables	2019-03-06 15:47:26 +02:00
Duarte Nunes	a29ec4be76	Merge 'Update system.large_partitions during shutdown' from Rafael " Currently any large partitions found during shutdown are not recorded. The reason is that the database commit log is already off, so there is nowhere to record it to. One possible solution is to have an independent system database. With that the regular db is shutdown first and writes can continue to the system db. That is a pretty big change. It would also not allow us to record large partitions in any system tables. This patch series instead tries to stop the commit log later. With that any large partitions are recorded to the log and moved to a sstable on the next startup. " * 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla: db: stop the commit log after the tables during shutdown db: stop the compaction manager earlier db: Add a stop_database helper db: Don't record large partitions in system tables	2019-03-06 10:36:38 -03:00
Calle Wilund	ef1bdebd0a	api::storage_service: Implement "scrub"	2019-03-06 13:13:21 +00:00
Calle Wilund	23f4c982ea	api/storage_service: Implement "upgradesstables" Fixes #4245 Implemented as a compation barrier (forcing previous compactions to finish) + parameterized "cleanup", with sstable list based on parameters.	2019-03-06 13:13:21 +00:00
Calle Wilund	3b5588dddd	api::storage_service: Add keyspace + tables helper To avoid repeating code to get keyspace + tables	2019-03-06 13:13:21 +00:00
Calle Wilund	c0bb6a4bef	compaction_manager: Add perform_sstable_scrub Suspiciously similar to an unconditional upgrade	2019-03-06 13:13:21 +00:00
Calle Wilund	7585b8c310	compaction_manager: Add perform_sstable_upgrade Rewrites obsolete/all sstables via compaction	2019-03-06 13:13:21 +00:00
Tomasz Grabiec	889f31fabe	Merge "fix slow truncation under flush pressure" from Glauber Truncating a table is very slow if the system is under pressure. Because in that case we mostly just want to get rid of the existing data, it shouldn't take this long. The problem happens because truncate has to wait for memtable flushes to end, twice. This is regardless of whether or not the table being truncated has any data. 1. The first time is when we call truncate itself: if auto_snapshot is enabled, we will flush the contents of this table first and we are expected to be slow. However, even if auto_snapshot is disabled we will still do it -- which is a bug -- if the table is marked as durable. We should just not flush in this case and it is a silly bug. 1. The second time is when we call cf->stop(). Stopping a table will wait for a flush to finish. At this point, regardless of which path (Durable or non-durable) we took in the previous step we will have no more data in the table. However, calling `flush()` still need to acquire a flush_permit, which means we will wait for whichever memtable is flushing at that very moment to end. If the system is under pressure and a memtable flush will take many seconds, so will truncate. Even if auto_snapshots are enabled, we shouldn't have to flush twice. The first flush should already put is in a state in which the next one is immediate (maybe holding on to the permit, maybe destroying the memtable_list already at that point -> since no other memtables should be created). If auto_snapshots are not enabled, the whole thing should just be instantaneous. This patchset fixes that by removing the flush need when !auto_snapshot, and special casing the flush of an empty table. Fixes #4294 * git@github.com:glommer/scylla.git slowtruncate-v2: database: immediately flush tables with no memtables. truncate: do not flush memtables if auto_snapshot is false.	2019-03-06 13:54:58 +01:00
Eliran Sinvani	479131259e	auth: prevent failure due to race in tables creation This commit rewrites the logic of table creation at startup of the auth mechanism to be race proof. This is done by simply ignoring the already_exists exception as done in system_distributed_keyspace. The old creation logic, tested for existance of the column family and right after called announce_new_column_family with the newly created table schema. The problem was that it does not prevent a race since the announcement itself is a fiber and the created table can still be gossiped from another node, causing the announce function to throw an already_exists exception that in turn crashes scylla. Message-Id: <20190306075016.28131-1-eliransin@scylladb.com>	2019-03-06 13:09:09 +01:00
Rafael Ávila de Espíndola	16ed9a2574	db: stop the commit log after the tables during shutdown This allows for system.large_partitions to be updated if a large partition is found while writing the last sstables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola	a3e1f14134	db: stop the compaction manager earlier We want to finish all large data logging in stop_system, so stopping the compaction manager should be the first thing stop_system does. The make_ready_future<>() will be removed in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola	765d8535f1	db: Add a stop_database helper This reduces code duplication. A followup patch will add more code to stop_database. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:45 -08:00
Rafael Ávila de Espíndola	0b86a99592	db: Don't record large partitions in system tables This will allow us to delay shutdown of all system tables in a uniform way. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 17:52:00 -08:00
Tomasz Grabiec	c584f48c32	Merge "transport: sort bound ranges in read reques in order to conform to cql definitions" from Eliran According to the cql definitions, if no ORDER BY clause is present, records should be returned ordered by the clustering keys. Since the backend returns the ranges according to their order of appearance in the request, the bounds should be sorted before sending it to the backend. This kind of sorting is needed in queries that generates more than one bound to be read, examples to such queris are: 1. a SELECT query with an IN clause. 2. a SELECT query on a mixed order tupple of columns (see #2050). The assumption this commit makes is the correctness of the bounds list, that is, the bounds are non overlapping. If this wasn't true, multiple occurences of the same reccord could have returned for certain queries. Tests: 1. Unit tests release 2. All dtest that requires #2050 and #2029 Fixes #2029	2019-03-05 21:07:15 +01:00
Avi Kivity	3cfbd682ec	Merge "Add JSON support to tuples and UDT" from Piotr " Fixes #3708 This series adds JSON serialization and deserialization procedures to tuples and user defined types. Tests: unit (dev) " * 'add_tuple_and_udt_json_support_2' of https://github.com/psarna/scylla: tests: add test cases for JSON and UDT types: add JSON support to UDT tests: add JSON tuple tests types: add JSON support for tuples	2019-03-05 20:06:15 +02:00
Glauber Costa	c2c6c71398	truncate: do not flush memtables if auto_snapshot is false. Right now we flush memtables if the table is durable (which in practice it almost always is). We are truncating, so we don't want the data. We should only flush if auto_snapshot is true. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-03-05 11:22:48 -05:00
Glauber Costa	ed8261a0fe	database: immediately flush tables with no memtables. If a table has no data, it may still take a long time to flush. This is because before we even try to flush, we need go acquire a permit and that can take a while if there is a long running flush already queued. We can special case the situation in which there is no data in any of the memtables owned by table and return immediately. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-03-05 11:22:48 -05:00
Piotr Sarna	a5c66d5ce1	tests: add test cases for JSON and UDT	2019-03-05 16:25:18 +01:00
Piotr Sarna	ebf0eb92bb	types: add JSON support to UDT User defined types can now be serialized to and deserialized from JSON. Fixes #3708	2019-03-05 16:08:05 +01:00
Piotr Sarna	c2064d152d	tests: add JSON tuple tests	2019-03-05 16:08:05 +01:00
Piotr Sarna	aa0cc8a8a2	types: add JSON support for tuples Tuples can now be serialized to and deserialized from JSON. Refs #3708	2019-03-05 16:08:04 +01:00
Piotr Sarna	e9bc2a7912	cql3: fix error message for lack of primary keys in JSON When any primary key part is not present in INSERT JSON statement, proper error message will be presented to the client. Tests: unit (dev) Message-Id: <3aa99703523c45056396a0b6d97091da30206dab.1551797502.git.sarna@scylladb.com>	2019-03-05 16:54:46 +02:00
Avi Kivity	256b7d34e2	Update seastar submodule * seastar ab54765...e640314 (10): > net: enable IP_BIND_ADDRESS_NO_PORT before binding a socket during connection > core: show address in error message for posix_listen failures > fmt: remove submodule > tests: fix loopback socket close() to not fail when the peer's side is already closed > Merge "Add suffixes to target names" from Jesse > temporary_buffer: improve documentation for alignment param requirements > docs: Fix dependencies for split tutorial target > deleter: prevent early memory free caused by deleter append. > doc/tutorial.md: introduce memory allocation foreign_ptr > Fix CLI help message (network & DPDK options) Toolchain and configure.py updated for fmt submodule removal.	2019-03-05 15:51:38 +02:00
Botond Dénes	817490cda1	tests/multishard_mutation_query_test: fuzzy_test: replace BOOST_WARN_* with logger::debug() fuzzy_test performs some checks that are expected to fail and whoose failure does not influence the outcome of the test. For this it uses the `BOOT_WARN_*` family of macros. These will just log a warning when their predicate fails. This can however confuse someone looking at the logs trying to determine the cause of a failure. Since these checks are performed primarly to provide an aid in debugging failures, replace them with a conditional debug-level log message. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <f550a9d9ab1b5b4aeb4f81860cbd3d924fc86898.1551792035.git.bdenes@scylladb.com>	2019-03-05 15:24:53 +02:00
Botond Dénes	0ed0d3297a	tests/multishard_mutation_query_test: test_abandoned_read: reduce querier TTL The `test_abandoned_read` verifies that an abandoned read does a proper cleanup. One of the things checked is that after the querier TTL expires, the saved queriers are cleaned-up. This check however had a very tight timing. The TTL was 2s and the test waited 2s before it did the check, which is wrapped in an `eventually_true()` (max +1s). The TTL timer scans the queriers with a period of TTL/2 so a querier can live 1.5*TTL time. This means that the 2s + 1s wait time is just on the limit and with some bad luck (and a slow machine) it can fail. Reduce the TTL in this test to 1s to relax the dependence on timing. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <ed0d45b5a07960b83b391d289cade9b9f60c7785.1551787638.git.bdenes@scylladb.com>	2019-03-05 14:10:04 +02:00
Eliran Sinvani	eeb0845be0	unit test: validate order instead of just content in the mixed order token test This change ammends on the functionality of the result generation, it changes the behaviour to return the expected results vector sorted in the expected order of appearance in the result set. Then the result set is validated for both, content and also order. Tests: unit tests (Release) Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2019-03-05 13:51:17 +02:00
Eliran Sinvani	13284d9272	unit test: change IN clause tests to validate with ordering_spec Whenever a query with an IN clause on clustering keys is executed, assuming only one partition, the rows are ordered according to the clustering keys. This commit adds the order validation to the content validation whenever possible (which means removing the ignore order part). Tests: unit tests (Release) Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2019-03-05 13:51:17 +02:00
Eliran Sinvani	7df0c873aa	transport: sort bound ranges in read reques in order to conform to cql definitions According to the cql definitions, if no ORDER BY clause is present, records should be returned ordered by the clustering keys. Since the backend returns the ranges according to their order of appearance in the request, the bounds should be sorted before sending it to the backend. This kind of sorting is needed in queries that generates more than one bound to be read, examples to such queris are: 1. a SELECT query with an IN clause. 2. a SELECT query on a mixed order tupple of columns (see #2050). The assumption this commit makes is the correctness of the bounds list, that is, the bounds are non overlapping. If this wasn't true, multiple occurences of the same reccord could have returned for certain queries. Tests: 1. Unit tests release 2. All dtest that requires #2050 and #2029 Fixes #2029 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2019-03-05 13:51:17 +02:00
Avi Kivity	5993c05a1b	Merge "partitioner: Futurize split_range_to_single_shard" from Asias " Futurize split_range_to_single_shard to fix reactor stall. Fixes: #3846 " * tag 'asias/split_range_to_single_shard/v4' of github.com:scylladb/seastar-dev: partitioner: Futurize split_range_to_single_shard tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc	2019-03-05 11:25:36 +02:00
Asias He	58fae5f4c1	partitioner: Futurize split_range_to_single_shard We saw reactor stalls when closing SSTables. The backtrace looks like: Oct 12 19:00:51 dell-1 scylla[435045]: Backtrace:[Backtrace #0] void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/sylla/scylla/seastar/util/backtrace.hh:56 seastar::backtrace_buffer::append_backtrace() at /home/sylla/scylla/seastar/core/reactor.cc:410 (inlined by) print_with_backtrace at /home/sylla/scylla/seastar/core/reactor.cc:431 seastar::reactor::block_notifier(int) at /home/sylla/scylla/seastar/core/reactor.cc:749 _L_unlock_13 at funlockfile.c:? std::experimental::fundamentals_v1::_Optional_base<range_bound<dht::ring_position>, true>::_Optional_base(std::experimental::fundamentals_v1::_Optional_base<range_bound<dht::ring_position>, true>&&) at /opt/scylladb/include/c++/7/experimental/optional:247 (inlined by) std::experimental::fundamentals_v1::optional<range_bound<dht::ring_position> >::optional(std::experimental::fundamentals_v1::optional<range_bound<dht::ring_position> >&&) at /opt/scylladb/include/c++/7/experimental/optional:493 (inlined by) wrapping_range<dht::ring_position>::wrapping_range(wrapping_range<dht::ring_position>&&) at /home/sylla/scylla/./range.hh:61 (inlined by) nonwrapping_range<dht::ring_position>::nonwrapping_range(nonwrapping_range<dht::ring_position>&&) at /home/sylla/scylla/./range.hh:430 (inlined by) void __gnu_cxx::new_allocator<nonwrapping_range<dht::ring_position> >::construct<nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position> >(nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/ext/new_allocator.h:136 (inlined by) void std::allocator_traits<std::allocator<nonwrapping_range<dht::ring_position> > >::construct<nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position> >(std::allocator<nonwrapping_range<dht::ring_position> >&, nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/alloc_traits.h:475 (inlined by) nonwrapping_range<dht::ring_position>& std::deque<nonwrapping_range<dht::ring_position>, std::allocator<nonwrapping_range<dht::ring_position> > >::emplace_back<nonwrapping_range<dht::ring_position> >(nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/deque.tcc:167 (inlined by) std::deque<nonwrapping_range<dht::ring_position>, std::allocator<nonwrapping_range<dht::ring_position> > >::push_back(nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/stl_deque.h:1558 (inlined by) dht::split_range_to_single_shard(dht::i_partitioner const&, schema const&, nonwrapping_range<dht::ring_position> const&, unsigned int) at /home/sylla/scylla/dht/i_partitioner.cc:454 dht::split_range_to_single_shard(schema const&, nonwrapping_range<dht::ring_position> const&, unsigned int) at /home/sylla/scylla/dht/i_partitioner.cc:464 create_sharding_metadata at /home/sylla/scylla/sstables/sstables.cc:2075 (inlined by) sstables::sstable::write_scylla_metadata(seastar::io_priority_class const&, unsigned int, sstables::sstable_enabled_features) at /home/sylla/scylla/sstables/sstables.cc:2435 sstables::sstable_writer_m::consume_end_of_stream() at /home/sylla/scylla/sstables/sstables.cc:3483 sstables::compaction::finish_new_sstable(std::experimental::fundamentals_v1::optional<sstables::sstable_writer>&, seastar::lw_shared_ptr<sstables::sstable>&) at /home/sylla/scylla/sstables/compaction.cc:338 (inlined by) sstables::regular_compaction::stop_sstable_writer() at /home/sylla/scylla/sstables/compaction.cc:579 (inlined by) sstables::regular_compaction::finish_sstable_writer() at /home/sylla/scylla/sstables/compaction.cc:585 sstables::compacting_sstable_writer::consume_end_of_stream() at /home/sylla/scylla/sstables/compaction.cc:494 (inlined by) auto compact_mutation_state<(emit_only_live_rows)0, (compact_for_sstables)1>::consume_end_of_stream<sstables::compacting_sstable_writer>(sstables::compacting_sstable_writer&) at /home/sylla/scylla/./mutation_compactor.hh:292 (inlined by) compact_mutation<(emit_only_live_rows)0, (compact_for_sstables)1, sstables::compacting_sstable_writer>::consume_end_of_stream() at /home/sylla/scylla/./mutation_compactor.hh:397 (inlined by) stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >::consume_end_of_stream() at /home/sylla/scylla/./mutation_reader.hh:366 (inlined by) auto flat_mutation_reader::impl::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)> >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)>, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /home/sylla/scylla/./flat_mutation_reader.hh:288 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)> >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)>, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /home/sylla/scylla/./flat_mutation_reader.hh:370 (inlined by) operator() at /home/sylla/scylla/sstables/compaction.cc:757 (inlined by) apply at /home/sylla/scylla/seastar/core/apply.hh:35 (inlined by) apply<sstables::compaction::run(std::unique_ptr<sstables::compaction>)::<lambda()> > at /home/sylla/scylla/seastar/core/apply.hh:43 (inlined by) apply<sstables::compaction::run(std::unique_ptr<sstables::compaction>)::<lambda()> > at /home/sylla/scylla/seastar/core/future.hh:1309 (inlined by) operator() at /home/sylla/scylla/./seastar/core/thread.hh:315 (inlined by) _M_invoke at /opt/scylladb/include/c++/7/bits/std_function.h:316 std::function<void ()>::operator()() const at /opt/scylladb/include/c++/7/bits/std_function.h:706 (inlined by) seastar::thread_context::main() at /home/sylla/scylla/seastar/core/thread.cc:313 The call chain is: sstable_writer_k_l::consume_end_of_stream and mc::writer::consume_end_of_stream -> sstable::write_scylla_metadata -> create_sharding_metadata -> dht::split_range_to_single_shard Since sstable writer assumes a thread context. We can futurize dht::split_range_to_single_shard. Fixes: #3846 Tests: dtest + build/dev/tests/partitioner_test	2019-03-05 17:21:27 +08:00
Benny Halevy	1021eb29c9	distributed_loader: fix old format counters exception table::load_sstable: fix missing arg in old format counters exception Properly catch and log the exception in load_new_sstables. Abort when the exception is caught to keep current behavior. Seen with migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test without enable_dangerous_direct_import_of_cassandra_counters. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190301091235.2914-1-bhalevy@scylladb.com>	2019-03-04 17:36:09 +01:00
Avi Kivity	026821fb59	Merge "Record large rows in the system.large_rows table" from Rafael " This fixes #3988. We already have a system.large_partitions, but only a warning for large rows. These patches close the gap by also recording large rows into a new system.large_rows. " * 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla: Add a testcase for large rows Populate system.large_rows. Create a system.large_rows table Extract a key_to_str helper Don't call record_large_rows if stopped Add a delete_large_rows_entries method to large_data_handler db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void Rename maybe_delete_large_partitions_entry Rename log_large_row to record_large_rows Rename maybe_log_large_row to maybe_record_large_rows	2019-03-04 18:31:10 +02:00
Avi Kivity	da0a25859b	Merge "Improvements to commitlog logs" from Paweł " This series contains minor improvements to commitlog log messages that have helped investigating #4231, but are not specific to that bug. " * tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla: commitlog: use consistent chunk offsets in logs commitlog: provide more information in logs commitlog: remove unnecessary comment	2019-03-04 14:52:46 +02:00
Paweł Dziepak	00b33de25c	commitlog: use consistent chunk offsets in logs Logs in commitlog writer use offset in the file of the chunk header to identify chunks. However, the replayer is using offset after the header for the same purpose. This causes unnecessary confusion suggesting that the replayer is reading at the wrong position. This patch changes the replayer so that it reports chunk header offsets.	2019-03-04 12:15:50 +00:00
Paweł Dziepak	813b00a1a6	commitlog: provide more information in logs This commits adds some more information to the logs. Motivated, by experiences with investigating #4231. * size of each write * position of each write * log message for final write	2019-03-04 12:15:50 +00:00
Paweł Dziepak	1a657e9c5f	commitlog: remove unnecessary comment	2019-03-04 12:15:50 +00:00
Avi Kivity	d95dec22d9	Merge "Fix commitlog chunks overwriting each other" from Paweł " This series fixes a problem in the commitlog cycle() function that confused in-memory and on-disk size of chunks it wrote to disk. The former was used to decide how much data needs to be actually written, and the latter was used to compute the offset of the next chunk. If two chunk writes happened concurrently one the one positioned earlier in the file could corrupt the header of the next one. Fixes #4231. Tests: unit(dev), dtest(commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup,test_commitlog_replay_with_alter_table) " * tag 'fix-commitlog-cycle/v1' of https://github.com/pdziepak/scylla: commitlog: write the correct buffer size utils/fragmented_temporary_buffer_view: add remove suffix	2019-03-04 14:14:32 +02:00
Tomasz Grabiec	58e7ad20eb	sstable/compaction: Use correct schema in the writing consumer Introduced in `2a437ab427`. regular_compaction::select_sstable_writer() creates the sstable writer when the first partition is consumed from the combined mutation fragment stream. It gets the schema directly from the table object. That may be a different schema than the one used by the readers if there was a concurrent schema alter duringthat small time window. As a result, the writing consumer attached to readers will interpret fragments using the wrong version of the schema. One effect of this is storing values of some columns under a different column. This patch replaces all column_family::schema() accesses with accesses to the _schema memeber which is obtained once per compaction and is the same schema which readers use. Fixes #4304. Tests: - manual tests with hard-coded schema change injection to reproduce the bug - build/dev/scylla boot - tests/sstable_mutation_test Message-Id: <1551698056-23386-1-git-send-email-tgrabiec@scylladb.com>	2019-03-04 13:27:19 +02:00
Paweł Dziepak	434023425d	commitlog: write the correct buffer size Commitlog files contain multiple chunks. Each chunk starts as a single (possibly, fragmented buffer). The size of that buffer in memory may be larger than the size in the file. cycle() was incorrectly using the in-memory size to write the whole buffer to the file. That sometimes caused data corruption, since a smaller on-file size was used to compute the offset of the next chunk and there could be multiple chunk writes happening at the same time. This patch solves the issue by ensuring that only the actual on-file size of the chunk is written.	2019-03-04 10:25:48 +00:00
Paweł Dziepak	ca8d1025c0	utils/fragmented_temporary_buffer_view: add remove suffix This patch adds fragmented_temporary_buffer_view::remove_suffix(). It is also necessary to adjust remove_prefix() since now the total size of all fragments may be larger than the size of the view if both those operations are performed.	2019-03-04 10:23:45 +00:00
Asias He	3861f538dc	tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc We are going to convert split_range_to_single_shard to return a future.	2019-03-04 09:41:09 +08:00
Avi Kivity	8f71e7ffd4	Merge "auth: Prevent disallowed roles from logging in" from Jesse " This series heavily refactors `auth_test` in anticipation of the last patch, which fixes a bug and which should be backported. Branches: branch-3.0, branch-2.3 " Fixes #4284 * 'jhk/check_can_login/v2' of https://github.com/hakuch/scylla: auth: Reject logins from disallowed roles tests: Restrict the scope of a variable tests: Simplify boolean assertions in `auth_test` tests: Abstract out repeated assertion checking tests: Do not use the `auth` namespace tests: Validate authentication correctly tests: Ensure test roles are created and dropped tests: Use `static` variables in `auth_test` tests: Remove non-useful test	2019-03-02 17:13:06 +02:00
Asias He	a949ccee82	repair: Reject combination of -dc and -hosts options 4 nodes in the cluster n1, n2 in dc1 n3, n4 in dc2 dc1 RF=2, dc2 RF=2. If we run nodetool repair -hosts 127.0.0.1,127.0.03 -dc "dc1,dc2" multi on n1. The -hosts option will be ignored and only the -dc option will be used to choose which hosts to repair. In this case, n1 to n4 will be repaired. If user wants to select specific hosts to repair with, there is no need to specify the -dc option. Use the -hosts option is enough. Reject the combination and not to surprise the user. In https://issues.apache.org/jira/browse/CASSANDRA-9876, the same logic is introduced as well. Refs #3836 Message-Id: <e95ac1099f98dd53bb9d6534316005ea3577e639.1551406529.git.asias@scylladb.com>	2019-03-02 16:42:29 +02:00
Juliana Oliveira	6322293263	dist/docker: add ssh server Scylla Manager communicates through SSH, so this patch adds SSH server to Scylla's docker image in order for it to be configurable by Scylla Manager. Message-Id: <20190301161428.GA12148@shenzou.localdomain>	2019-03-01 19:11:35 +02:00
Avi Kivity	41078de096	tools: toolchain: update image for gcc-8.3.1-2.fc29.x86_64 tests: unit (debug, dev, release)	2019-03-01 16:42:18 +02:00
Duarte Nunes	44966d0a66	Merge 'Fix view update generation optimizations' from Piotr " This series aims to fix inconsistencies in recent view update generation series (`435447998`). First of all, it checks view row marker liveness instead of that of a base row marker when deciding if optimizations can be applied or not. Secondly, tests based on creating mutations directly are removed. Instead: - dtest case which detected inconsistencies in previous series is ported to be a unit test - the above case is also expanded to cover views with regular base column in their key - additional test for TTL and timestamps is added and it's based on CQL Tests: unit (dev) dtest: materialized_views_test.TestMaterializedViews.test_no_base_column_in_view_pk_complex_timestamp_without_flush Fixes: #4271 " * 'fix_virtual_columns_liveness_checks_in_update_optimization_5' of https://github.com/psarna/scylla: tests: add view update optimization case for TTL database: add view_stats getter tests: port complex timestamp view test from dtest db,view: fix virtual columns liveness checks tests: remove update generating test case	2019-03-01 10:58:39 -03:00
Jesse Haber-Kucharsky	a139afc30c	auth: Reject logins from disallowed roles When the `LOGIN` option for a role is set to `false`, Scylla should not permit the role to log in. Fixes #4284 Tests: unit (debug)	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	320b4a7b99	tests: Restrict the scope of a variable	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	f8764a12e6	tests: Simplify boolean assertions in `auth_test`	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	879217ccaf	tests: Abstract out repeated assertion checking	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	3c8eeb0e86	tests: Do not use the `auth` namespace	2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky	afed9c7bee	tests: Validate authentication correctly There are additional validation steps that the server executes in addition to simply invoking the authenticator, so we adapt the tests to also perform that validation. We also eliminate lots of code duplication.	2019-02-28 15:01:14 -05:00
Jesse Haber-Kucharsky	baefde0f6c	tests: Ensure test roles are created and dropped Since the role manager and authenticator work in tandem, the test cases should use the wrapper for `auth::service` to create and drop users instead of just doing it through the authenticator.	2019-02-28 15:00:20 -05:00
Jesse Haber-Kucharsky	fd88d59ad9	tests: Use `static` variables in `auth_test` This way, we avoid copies and alleviate resource-management concerns.	2019-02-28 14:59:38 -05:00
Jesse Haber-Kucharsky	f274982522	tests: Remove non-useful test Password handling is verified in its own test suite, and this test not only makes a number of assumptions about implementation details, but also tries to verify a hashing scheme (bcrypt) which is not supported on most Linux distributions.	2019-02-28 14:58:27 -05:00
Avi Kivity	7c968f4a9e	build: move XXH_PRIVATE_API and SEASTAR_TESTING_MAIN non-mode-specific These defines are global, so they can be in the mode-agnostic cxxflags rather than the mode-specific cxxflags_{mode}. Message-Id: <20190228081247.20116-1-avi@scylladb.com>	2019-02-28 09:51:02 +00:00
Piotr Sarna	032f8e2893	tests: add view update optimization case for TTL This test case checks whether redundant updates are omitted and the essential ones are still generated.	2019-02-28 10:47:20 +01:00
Piotr Sarna	67e63d4dd7	database: add view_stats getter It will be used for testing purposes	2019-02-28 10:47:20 +01:00
Piotr Sarna	09b8d2e9d6	tests: port complex timestamp view test from dtest This test was useful in discovering corner cases for TTLs of virtual columns, so it's ported to unit test suite from dtest. The test is also extended with a mirrored case for base regular column that is included in view pk.	2019-02-28 10:47:20 +01:00
Piotr Sarna	5f85a7a821	db,view: fix virtual columns liveness checks When looking for optimization paths, columns selected in a view are checked against multiple conditions - unfortunately virtual columns were erroneously skipped from that check, which resulted in ignoring their TTLs. That can lead to overoptimizing and not including vital liveness info into view rows, which can then result in row disappearing too early.	2019-02-28 10:47:19 +01:00
Piotr Sarna	b963543762	tests: remove update generating test case This test case should have been based on CQL instead of creating artificial update scenarios. It also contains invalid cases regarding base and view row marker, so it's removed here and replaced with CQL-based test in this same series.	2019-02-28 10:40:47 +01:00
Avi Kivity	20eadb2c39	relocatable-package: package and redirect gnutls configuration gnutls requires a configuration file, and the configuration file must match the one used by the library. Since we ship our own version of the library with the relocatable package, we must also ship the configuration file. Luckily, it is possible to override the location of the configuration file via an environment variable, so all we need to do is to copy the file to the archive and provide the environment variable in the thunk that adjusts the library path. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190227110529.14146-1-avi@scylladb.com>	2019-02-28 10:57:32 +02:00
Avi Kivity	4022a919f6	test: allocate at least one logical core per unit test Currently, we only allocate memory for concurrent unit test runs. This can cause CPU overcommit when running test.py on machines with a log of memory but few cores. This overcommit can cause timeouts in tests that are time-sensitive (bad practice, but can happen) and makes the desktop sluggish. Improve by allocating at least one logical core per running test. Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190227132516.22147-1-avi@scylladb.com>	2019-02-28 10:34:33 +02:00
Dan Yasny	6dbb48a12a	node_health_check: collect scylla.d contents with node_health_check We are missing data for CPU conf files and potentially other information when collecting node data. Fixes #4094 Message-Id: <20190225204727.20805-5-dyasny@scylladb.com>	2019-02-28 10:23:19 +02:00
Dan Yasny	9055e7a49e	node_health_check: Add redhat-release to health check if present Collect /etc/redhat-release as well as os-release from relevant hosts. The problem with os-release is that it doesn't contain the minor version of the EL OS family. Since this is only present in Red Hat distributions and derivatives, it will not be collected in Debian derivatives. Another approach is to use lsb_release -a but it will not provide anything more useful than os-release on Debian and lsb needs to be installed on EL derivatives first. Fixes #4093 Message-Id: <20190225204727.20805-4-dyasny@scylladb.com>	2019-02-28 10:23:12 +02:00
Dan Yasny	2f26390f52	node_health_check: Use clear hostname instead of -i for filenames and report names Hostname -i produces a garbled output on new systems with ipv6 enabled, better to use the clean hostname instead, for the file names. Message-Id: <20190225204727.20805-3-dyasny@scylladb.com>	2019-02-28 10:23:06 +02:00
Dan Yasny	f483c594ee	node_health_check: Detect the address for the CQL (port 9042) listener and use it The script relies on hostname -i for host address, which can be wrong in some systems. This patch checks for where the defined CQL_PORT is listening, and uses the correct IP address instead. Message-Id: <20190225204727.20805-2-dyasny@scylladb.com>	2019-02-28 10:22:58 +02:00
Avi Kivity	632c7c303a	Merge "auth: Restructure SASL code" from Jesse " This series restructures the SASL code that was previously internal to the `password_authenticator` so that it can be used in other contexts. " * 'jhk/restructure_sasl/v1' of https://github.com/hakuch/scylla: auth: Rename SASL challenge class for "PLAIN" auth: Make a ctor `explicit` auth: Move `sasl_challenge` to its own file auth: Decouple SASL code from its parent class	2019-02-28 10:19:41 +02:00
Jesse Haber-Kucharsky	f2d92f81e8	auth: Report a more specific error with bad creds Without this change, the resulting error message for an invalid password is "authentication failed". With this change, we report "Username and/or password are incorrect". Fixes #4285 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <32d00be8af5075ee10d2c14f85b76843a9adac10.1551306914.git.jhaberku@scylladb.com>	2019-02-28 09:53:57 +02:00
Jesse Haber-Kucharsky	3d883e8cf2	auth: Rename SASL challenge class for "PLAIN"	2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky	0c955b7992	auth: Make a ctor `explicit`	2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky	dc41f1098b	auth: Move `sasl_challenge` to its own file This will allow for other authenticators other than `password_authenticator` from making use of the PLAIN SASL authentication code.	2019-02-27 18:36:52 -05:00
Jesse Haber-Kucharsky	2d59fa6be9	auth: Decouple SASL code from its parent class This way, we can (in the future) use this implementation of the SASL "PLAIN" mechanism in other contexts other than `password_authenticator`.	2019-02-27 18:11:31 -05:00
Avi Kivity	88322086cb	Merge "Add fuzzer-type unit test for range scans" from Botond " This series adds a fuzzer-type unit test for range scans, which generates a semi-random dataset and executes semi-random range scans against it, validating the result. This test aims to cover a wide range of corner cases with the help of randomness. Data and queries against it are generated in such a way that various corner cases and their combinations are likely to be covered. The infrastructure under range-scans have gone under massive changes in the last year, growing in complexity and scope. The correctness of range scans is critical for the correct functioning of any Scylla cluster, and while the current unit tests served well in detecting any major problems (mostly while developing), they are too simplistic and can only be relied on to check the correctness of the basic functionality. This test aims to extend coverage drastically, testing cases that the author of the range-scan code or that of the existing unit tests didn't even think exists, by relying on some randomness. Fixes: #3954 (deprecates really) " * 'more-extensive-range-scan-unit-tests/v2' of https://github.com/denesb/scylla: tests/multishard_mutation_query_test: add fuzzy test tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan() tests/test_table: add advanced `create_test_table()` overload tests/test_table: make `create_test_table()` customizable query: add trim_clustering_row_ranges_to() tests/test_table: add keyspace and table name params tests/test_table: s/create_test_cf/create_test_table/ tests: move create_test_cf() to tests/test_table.{hh,cc} tests/multishard_mutation_query_test: drop many partition test tests/multishard_mutation_query_test: drop range tombstone test	2019-02-27 17:26:53 +02:00
Avi Kivity	cc2f9841c4	Merge "Simplify -g and -gz checks in configure.py" from Rafael * 'simplify-g-gz-check-v2' of https://github.com/espindola/scylla: Assume -gz is always available Assume -g is always available	2019-02-27 17:19:37 +02:00
Duarte Nunes	871790a340	Merge 'Hide virtual columns write time and ttl from the user' from Piotr " This miniseries hides virtual columns's writetime and ttl from the user. Tests: unit (dev) Fixes #4288 " * 'hide_virtual_columns_writetime_and_ttl_2' of https://github.com/psarna/scylla: tests: add test for hiding virtual columns from WRITETIME cql3: hide virtual columns from WRITETIME() and TTL() schema: add column_definition::is_hidden_from_cql	2019-02-27 14:36:08 +00:00
Calle Wilund	93602ecee3	compaction_manager: break out rewrite_sstables from cleanup Allowing additional behaviour control. Such as which tables, and whether to actually lock ourselves out as a "cleanup".	2019-02-27 14:25:31 +00:00
Calle Wilund	7fb6bbe68c	table: parameterize cleanup_sstables To allow using the logic for one-sstable-at-a-time compaction (i.e. rewrite) of sstables without the "normal" cleanup logic and partition selection.	2019-02-27 14:25:31 +00:00
Piotr Sarna	09eb0429ce	tests: add test for hiding virtual columns from WRITETIME Visibility checks for virtual columns' WRITETIME and TTL are added.	2019-02-27 15:08:16 +01:00
Piotr Sarna	af39787bf0	cql3: hide virtual columns from WRITETIME() and TTL() Virtual columns should not be visible to the user, so they are now hidden not only from directly selecting them, but also via WRITETIME() and TTL() keywords. Fixes #4288	2019-02-27 15:08:15 +01:00
Piotr Sarna	b0ab4c28cf	schema: add column_definition::is_hidden_from_cql Right now the only columns hidden from CQL are view virtual columns, but in case of expanding this set, a helper function is provided.	2019-02-27 15:07:54 +01:00
Avi Kivity	d189e12438	tests: database_test: fix misaligned dma write test_distributed_loader_with_pending_delete issues a dma write, but violates the unwritten contract to temporary_buffer::aligned(), which requires that size be a multiple of alignment. As a result the test fails spuriously. Instead of playing with the alignment, rewrite that snippet to use the easier-to-use make_file_output_stream(). Introduced in `1ba88b709f`. Branches: master. Message-Id: <20190226181850.3074-1-avi@scylladb.com>	2019-02-27 09:00:31 +01:00
Rafael Ávila de Espíndola	d9e0b47d53	Add a testcase for large rows Tests: unit (release) Fixes #3988. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:56:50 -08:00
Rafael Ávila de Espíndola	25f81cf3e3	Populate system.large_rows. It now records large rows when they are first written to an sstable and removes them when the sstable is deleted. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:56:42 -08:00
Rafael Ávila de Espíndola	66d8a0cf93	Create a system.large_rows table This is analogous to the system.large_partitions table, but holds individual rows, so it also needs the clustering key of the large rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	da4c0da78a	Extract a key_to_str helper It will be used in more places in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	b7fd03d0fd	Don't call record_large_rows if stopped The implementations large_data_handler should only be called if large_data_handler hasn't been stopped yet. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	0c401f56f8	Add a delete_large_rows_entries method to large_data_handler This will be responsible for removing large rows from system.large_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	81a21ea425	db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void These functions will record into tables in a followup patch, so they will need to return a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	d4c001cba8	Rename maybe_delete_large_partitions_entry It will also delete large rows, so rename it to maybe_delete_large_data_entries. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	e9a13aff90	Rename log_large_row to record_large_rows It will also record into a table in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	6fb7066755	Rename maybe_log_large_row to maybe_record_large_rows It will also record into a table in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	a586ac209a	Assume -gz is always available It is available since clang 5 and gcc 5. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola	054078b6af	Assume -g is always available From the log it looks like these checks were added in 2014 because of a broken clang. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola	87106ea5e2	Improve the build mode documentation With this patch HACKING suggest using just ./configure.py and passing the mode to ninja. It also expands on the characteristics of each mode and mentions the dev mode. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190208020444.19145-1-espindola@scylladb.com>	2019-02-26 19:54:50 +02:00
Nadav Har'El	da54d0fc7d	Materialized views: fix accidental zeroing of flow-control delay The materialized-views flow control carefully calculates an amount of microseconds to delay a client to slow it down to the desired rate - but then a typo (std::min instead of std::max) causes this delay to be zeroed, which in effect completely nullifies the flow control algorithm. Before this fix, experiments suggested that view flow control was not having any effect and view backlog not bounded at all. After this fix, we can see the flow control having its desired effect, and the view backlog converging. Fixes #4143. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190226161452.498-1-nyh@scylladb.com>	2019-02-26 18:22:18 +02:00
Tomasz Grabiec	1a63a313c8	Merge "repair: Rename names to be consistent with rpc verb " from Asias Some of the function names are not updated after we change the rpc verb names. Rename them to make them consistent with the rpc verb names. * seastar-dev.git asias/row_level_repair_rename_consistent_with_rpc_verb/v1: repair: Rename request_sync_boundary to get_sync_boundary repair: Rename request_full_row_hashes to get_full_row_hashes repair: Rename request_combined_row_hash to get_combined_row_hash repair: Rename request_row_diff to get_row_diff repair: Rename send_row_diff to put_row_diff repair: Update function name in docs/row_level_repair.md	2019-02-26 13:01:36 +01:00
Tomasz Grabiec	b06aac4fdb	Merge "Fix temporary spurious schema version mismatch when nodes are restarted" from Asias Fixes: #4148 Fixes: #4258 Tests: resharding_test.py:reshardingtest_nodes4_with_sizetieredcompactionstrategy.resharding_by_smp_increase_test * seastar-dev.git asias/fix_schema_mismatch_when_nodes_restarts/v1: database: Add update_schema_version and announce_schema_version storage_service: Add application_state::SCHEMA when gossip starts	2019-02-26 12:55:52 +01:00
Avi Kivity	5f94bc902a	transport: add option to disable shard-aware drivers The shard-aware drivers can cause a huge amount of connections to be created when there are tens of thousands of clients. While normally the shard-aware drivers are beneficial, in those cases they can consume too much memory. Provide an option to disable shard awareness from the server (it is likely to be easier to do this on the server than to reprovision those thousands of clients). Tests: manual test with wireshark. Message-Id: <20190223173331.24424-1-avi@scylladb.com>	2019-02-26 12:44:11 +01:00
Asias He	459836079c	storage_service: Add application_state::SCHEMA when gossip starts In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw: 4 nodes in the tests n1, n2, n3, n4 are started n1 is stopped n1 is changed to use different shard config n1 is restarted ( 2019-01-27 04:56:00,377 ) The backtrace happened on n2 right fater n1 restarts: 0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled 1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled 2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled 3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed) 4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status = 5 Segmentation fault on shard 0. 6 Backtrace: 7 0x00000000041c0782 8 0x00000000040d9a8c 9 0x00000000040d9d35 10 0x00000000040d9d83 11 /lib64/libpthread.so.0+0x00000000000121af 12 0x0000000001a8ac0e 13 0x00000000040ba39e 14 0x00000000040ba561 15 0x000000000418c247 16 0x0000000004265437 17 0x000000000054766e 18 /lib64/libc.so.6+0x0000000000020f29 19 0x00000000005b17d9 The theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time n1 has SCHEMA application_state, when n1 restarts, n2 gets new application state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty application_state for SCHEMA. We dereference the nullptr application_state and abort. In commit `da80f27f44`, we fixed the problem by checking the pointer before dereference. To prevent this to happen in the first place, we'd better to add application_state::SCHEMA when gossip starts. This way, peer nodes always see the application_state::SCHEMA when a node restarts. Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test Fixes #4148 Fixes #4258	2019-02-26 19:30:22 +08:00
Asias He	75edbe939d	database: Add update_schema_version and announce_schema_version Split the update_schema_version_and_announce() into update_schema_version() and announce_schema_version(). This is going to be used in storage_service::prepare_to_join() where we want to first update the schema version, start gossip, announce the schema version.	2019-02-26 19:10:02 +08:00
Amnon Heiman	b8a838c66c	node_exporter_install: Add a force install option It is sometimes usefull for force reinstallation of the node_exporter, for example during upgrade or if something is wrong with the current installation. This patch adds a --force command line option. If the --force is given to the node_expoerter_install, it will reinstall node_exporter to the latest version, regardless if it was already installed. The symbolic link in /usr/bin/node_exporter will be set to the installed version, so if there are other installed version, they will remain. Examples: $ sudo ./dist/common/scripts/node_exporter_install node_exporter already installed, you can use `--force` to force reinstallation $ sudo ./dist/common/scripts/node_exporter_install --force node_exporter already installed, reinstalling Fixes #4201 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190225151120.21919-1-amnon@scylladb.com>	2019-02-25 20:16:58 +02:00
Pekka Enberg	ca288189a9	dist/ami: Support different products for the AMI Let's add a PRODUCT variable, similar to build_rpm.sh, for example, so that we can override package names for enterprise AMIs. Message-Id: <20190225063319.19516-1-penberg@scylladb.com>	2019-02-25 11:17:44 +02:00
Asias He	3e615c3a15	repair: Update function name in docs/row_level_repair.md The repair rpc request_* functions are renamed to get_*. The send_row_diff is renamed to put_row_diff.	2019-02-25 15:13:39 +08:00
Asias He	62104902db	repair: Rename send_row_diff to put_row_diff Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	6e4ea1b3c4	repair: Rename request_row_diff to get_row_diff Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	5b29fb30ac	repair: Rename request_combined_row_hash to get_combined_row_hash Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	6f6c4878d5	repair: Rename request_full_row_hashes to get_full_row_hashes Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	02ddfa393e	repair: Rename request_sync_boundary to get_sync_boundary Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Avi Kivity	a0b0db7915	Merge "Fix regression in perf_fast_forward results" from Paweł " After `adcb3ec20c` ("row_cache: read is not single-partition if inter-partition forwarding is enabled") we have noticed a regression in the results of some perf_fast_forward tests. This was caused by those tests not disabling partition-level fast-forwarding even though it was not needed and the commit in question fixed an incorrect optimisation in such cases. However, after solving that issue it has also become apparent that mutation_reader_merger performs worse when the fast-forwarding is disabled. This was attributed to logic responsible for dropping readers as soon as they have reached the end of stream (which cannot be done if fast-forwarding is enabled). This problem was mitigated with avoiding a scan of the list and removing readers in small batches. Fixes #4246. Fixes #4254. Tests: unit(dev) " * tag 'perf_fast_forward-fix-regression/v1' of https://github.com/pdziepak/scylla: mutation_reader_merger: drop unneded readers in small batches mutation_reader_merger: track readers by iterators and not pointers tests/perf_fast_forward: disable partition-level fast-forwarding if not needed	2019-02-24 19:24:00 +02:00
Avi Kivity	e3c53ff3ff	Update seastar submodule * seastar 2313dec...ab54765 (10): > Fix C++-17-only uses of static_assert() with a single parameter. > README.md: fix out-of-date explanation of C++ dialect > net: fix tcp load balancer accounting leak while moving socket to other shard > Revert "deleter: prevent early memory free caused by deleter append." > deleter: prevent early memory free caused by deleter append. > Solve seastar.unit.thread failure in debug mode > Fix iovec-based read_dma: use make_readv_iocb instead of make_read_iocb > build: Fix the required version of `fmt` > app_template: fix use after move in app constructor > build: Rename CMake variable for private flags Fixes #4269.	2019-02-24 16:06:23 +02:00
Avi Kivity	a3a7bea12f	Merge "Clean up preprocessor definitions" from Jesse * 'jhk/define_debug/v1' of https://github.com/hakuch/scylla: build: Remove the `DEBUG_SHARED_PTR` pp variable build: Prefer the Seastar version of a pp variable	2019-02-23 14:04:08 +02:00
Jesse Haber-Kucharsky	f9297895c1	auth: Change the log level for async. retries The log message is benign, but it has caused some users of Scylla to think that an error has occurred. Fixes #3850 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <ba49c38266c0e77c3ed23cfca3c1a082b3060f17.1550777586.git.jhaberku@scylladb.com>	2019-02-23 14:03:16 +02:00
Tomasz Grabiec	3f698701c2	gdb: Drop incorrect throw of StopIteration It is converted into a RuntimeError by python3: https://docs.python.org/3/library/exceptions.html#StopIteration We should just return. Message-Id: <20190221144321.18093-1-tgrabiec@scylladb.com>	2019-02-23 14:02:47 +02:00
Nadav Har'El	0eddf19432	main: add INFO log messages at start, initialization end, and end. Scylla currently prints a welcome message when it starts, with the Scylla version, but this is not printed to the regular log so in some cases (e.g., Jenkins runs) we do not see it in the log. So let's add a regular INFO-level log message with the same information. Also, Scylla currently doesn't print any specific log message when it normally completes its shutdown. In some cases, users may end up wondering whether Scylla hung in the middle of the shutdown, or in fact exited normally. Refs #4238. So in this patch we add a "shutdown complete" message as the very last message in a successfull shutdown. We print Scylla's version also in the shutdown message, which may be useful to see in the logs when shutting down one version of Scylla and starting a different version. Finally, we also add a log message when initialization is complete, which may also be useful to understand whether Scylla hung during initialization. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217140659.19512-1-nyh@scylladb.com>	2019-02-22 16:52:31 +01:00
Tomasz Grabiec	b90cb91468	gdb: Introduce 'scylla cache' Prints contents of the row cache for each table on current shard. Message-Id: <20190222144420.19677-1-tgrabiec@scylladb.com>	2019-02-22 14:58:58 +00:00
Paweł Dziepak	b524f96a74	mutation_reader_merger: drop unneded readers in small batches It was observed that destroying readers as soon as they are not needed negatively affects performance of relatively small reads. We don't want to keep them alive for too long either, since they may own a lot of memory, but deferring the destruction slightly and removing them in batches of 4 seems to solve the problem for the small reads.	2019-02-22 14:43:38 +00:00
Paweł Dziepak	435e24f509	mutation_reader_merger: track readers by iterators and not pointers mutation_reader_merger uses a std::list of mutation_reader to keep them alive while the rest of the logic operates on non-owning pointers. This means that when it is a time to drop some of the readers that are no longer needed, the merger needs to scan the list looking for them. That's not ideal. The solution is to make the logic use iterators to elements in that list, which allows for O(1) removal of an unneeded reader. Iterators to list are just pointers to the node and are not invalidated by unrelated additions and removals.	2019-02-22 14:33:10 +00:00
Paweł Dziepak	5d5777f85e	tests/perf_fast_forward: disable partition-level fast-forwarding if not needed Several of the test cases in perf_fast_forward do not need partition-level fast-forwarding. However, since the defaults are used to construct most of the readers the fast-forwarding is enabled regardless. This showed an apparent regression in the perf_fast_forward results after `adcb3ec20c` ("row_cache: read is not single-partition if inter-partition forwarding is enabled") which disabled an optimisation that was invalid when partition-level fast-forwarind was requested. This patch ensures that all single-partition reads that do not need partition-level fast-forwarding keep it disabled.	2019-02-22 14:28:02 +00:00
Avi Kivity	fdefee696e	Merge "sstables: mc: writer: Avoid large allocations for keeping promoted index entries" from Tomasz " Currently we keep the entries in a circular_buffer, which uses a contiguous storage. For large partitions with many promoted index entries this can cause OOM and sstable compaction failure. A similar problem exists for the offset vector built in write_promoted_index(). This change solves the problem by serializing promoted index entries and the offset vector on the fly directly into a bytes_ostream, which uses fragmented storage. The serialization of the first entry is deferred, so that serialization is avoided if there will be less than 2 entries. Promoted index is not added for such partitions. There still remains a problem that large-enough promoted index can cause OOM. Refs #4217 Tests: - unit (release) - scylla-bench write Branches: 3.0 " * tag 'fix-large-alloc-for-promoted-index-v3' of github.com:tgrabiec/scylla: sstables: mc: writer: Avoid large allocations for maintaining promoted index sstables: mc: writer: Avoid double-serialization of the promoted index	2019-02-22 15:44:51 +02:00
Avi Kivity	177159da75	Merge "delete_atomically recovery" from Benny " The delete_atomically function is required to delete a set of sstables atomically. I.e. Either delete all or none of them. Deleting only some sstables in the set might result in data resurrection in case sstable A holding tombstone that cover mutation in sstable B, is deleted, while sstable B remains. This patchset introduces a log file holding a list of SSTable TOC files to delete for recovering a partial delete_atomically operation. A new subdirectory is create in the sstables dir called `pending_delete` holding in-flight logs. The logs are created with a temporary name (using a .tmp suffix) and renamed to the final .log name once ready. This indicates the commit point for the operation. When populating the column family, all files in the pending_delete sub-directory are examined. Temporary log files are just removed, and committed log files are read, replayed, and deleted. Fixes #4082 Tests: unit (dev), database_test (debug) " * 'projects/delete_atomically_recovery/v5' of https://github.com/bhalevy/scylla: tests: database_test: add test_distributed_loader_with_pending_delete distributed_loader: replay and cleanup pending_delete log files distributed_loader: populated_column_family: separate temp sst dirs cleanup phase docs: add sstables-directory-structure.md sstables: commit sstables to delete_atomically into a pending_delete log file sstables: delete_atomically: delete sstables in a thread sstables: component_basename: reuse with sstring component sstables: introduce component_basename database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions sstables: add delete_sstable_and_maybe_large_data_entries sstables: call remove_by_toc_name in dtor if marked_for_deletion	2019-02-22 15:37:17 +02:00
Benny Halevy	1ba88b709f	tests: database_test: add test_distributed_loader_with_pending_delete Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:08:22 +02:00
Benny Halevy	043673b236	distributed_loader: replay and cleanup pending_delete log files Scan the table's pending_delete sub-directory if it exists. Remove any temporary pending_delete log files to roll back the respective delete_atomically operation. Replay completed pending_delete log files to roll forward the respective delete_atomically operation, and finally delete the log files. Cleanup of temporary sstable directories and pending_delete sstables are done in a preliminary scan phase when populating the column family so that we won't attempt to load the to-be-deleted sstables. Fixes #4082 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:08:22 +02:00
Benny Halevy	ee3ad75492	distributed_loader: populated_column_family: separate temp sst dirs cleanup phase In preparation for replaying pending_delete log files, we would like to first remove any temporary sst dirs and later handle pending_delete log files, and only then populate the column family. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:08:22 +02:00
Benny Halevy	f35e4cbac7	docs: add sstables-directory-structure.md Refs #4184 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:08:22 +02:00
Benny Halevy	024d0a6d49	sstables: commit sstables to delete_atomically into a pending_delete log file To facilitate recovery of a delete_atomically operation that crashed mid way, add a replayable log file holding the committed sstables to delete. It will be used by populate_column_family to replay the atomic deletion. 1. Write the toc names of sstables to be deleted into a temporary file. 2. Once flushed and closed, rename the temp log file into the final name and flush the pending_delete directory. 3. delete the sstables. 4. Remove the pending_delete log file and flush the pending_delete directory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:05:37 +02:00
Benny Halevy	70fda0eda0	sstables: delete_atomically: delete sstables in a thread In prepaton for implementing a pending_delete log file. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:05:37 +02:00
Benny Halevy	9ac04850a0	sstables: component_basename: reuse with sstring component Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 11:05:10 +02:00
Benny Halevy	a2a9750074	sstables: introduce component_basename component_basename returns just the basename for the component filename without the leading sstdir path. To be used for delete_atomically's pending_delete log file. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Benny Halevy	13ffda5c31	database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions 1. We would like to be able to call maybe_delete_large_partitions_entry from the sstable destructor path in the future so the sstable might go away while the large data entries are being deleted. 2. We would like the caller to handle any exception on this path, especially in the prepatation part, before calling delete_large_partitions_entry(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Benny Halevy	ae29db8db6	sstables: add delete_sstable_and_maybe_large_data_entries To be called by delete_atomically, rather that passing a vector to delete_sstables. This way, no need to build `sstables_to_delete_atomically` vector To be replaced in the future with a sstable method once we provide the large_data_handler upon construction. Handle exceptions from remove_by_toc_name or maybe_delete_large_partitions_entry by merely logging an error. There is nothing else we can do at this point. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Benny Halevy	387f14a874	sstables: call remove_by_toc_name in dtor if marked_for_deletion No need to call delete_sstables which works on a list of sstable (by toc name). Also, add FIXME comment about not calling large_data_handler.maybe_delete_large_partitions_entry on this path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Avi Kivity	34b254381f	sstables: checksummed_file_writer: fix dma alignment checksummed_file_writer does not override allocate_buffer(), so it inherits data_source_impl's default allocate_buffer, which does not care about alignment. The buffer is then passed to the real file_data_sink_impl, and thence to the file itself, which cannot complete the write since it is not properly aligned. This doesn't fail in release mode, since the Seastar allocator will supply a properly aligned buffer even if not asked to do so. The ASAN allocator usually does supply an aligned buffer, but not always, which causes the test to fail. Fix by forwarding the allocate_buffer() function to the underlying data_source. Fixes #4262. Branches: branch-3.0 Message-Id: <20190221184115.6695-1-avi@scylladb.com>	2019-02-21 21:26:56 +01:00
Jesse Haber-Kucharsky	b7b50392ed	build: Remove the `DEBUG_SHARED_PTR` pp variable This definition is exported by Seastar as `SEASTAR_DEBUG_SHARED_PTR` and no code in Scylla uses this definition either way.	2019-02-21 10:45:09 -05:00
Jesse Haber-Kucharsky	f4883a1aea	build: Prefer the Seastar version of a pp variable Seastar defines `SEASTAR_DEFAULT_ALLOCATOR`, and everywhere else in Scylla we use this variable too.	2019-02-21 10:41:42 -05:00
Piotr Sarna	c743617236	cql3: unify max value for row limit and per-partition limit Limits are stored as uint32_t everywhere, but in some places int32_t was used, which created inconsistencies when comparing the value to std::numeric_limits<Type>::max(). In order to solve inconsistencies, the types are unified to uint32_t, and instead of explicitly calling numeric limit max, an already existing constant value query::max_rows is utilized. Fixes #4253 Message-Id: <4234712ff61a0391821acaba63455a34844e489b.1550683120.git.sarna@scylladb.com>	2019-02-21 13:56:02 +02:00
Tomasz Grabiec	ecff716f40	query-result-set: Give more context on failure We've seen schema application failing with marshal_exception here. That's not enough information to figure out what is the problem. Knowing which table and column is affected would make diagnosis much easier in certain cases. This patch wraps errors in query::deserialization_error with more information. Example output: query::deserialization_error (failed on column system_schema.tables#bloom_filter_fp_chance \ (version: c179c1d7-9503-3f66-a5b3-70e72af3392a, id: 0, index: 0, type: org.apache.cassandra.db.marshal.DoubleType):\ seastar::internal::backtraced<marshal_exception> (marshaling error: read_simple - not enough bytes (expected 8, got 3) Message-Id: <20190221113219.13018-1-tgrabiec@scylladb.com>	2019-02-21 11:35:27 +00:00
Nadav Har'El	f55bdea364	compaction manager: avoid spurious "asked to stop" message at the end of the log This patch removes the log message about "compaction_manager - Asked to stop" at the very end of Scylla runs. This log message is confusing because it only has the "asked to stop" part, without finally a "stopped", and may lead a user to incorrectly fear that the shutdown hung - when it in fact finished just fine. The database object holds a compaction_manager and stop()s it when the database is stop()ed - and that is the very last thing our shutdown does. However, much earlier, as the first shutdown operation (i.e., the last at_exit() in main.cc), we already stop() the compaction manager. The second stop() call does nothing, but unfortunately prints the log message just before checking if it has anything to stop. So this patch just moves the log message to after the check. Fixes #4238. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217142657.19963-1-nyh@scylladb.com>	2019-02-21 12:32:47 +01:00
Rafael Ávila de Espíndola	5a7bff36ca	Simplify sstable::filename No functionality change, but avoids a std::unordered_map. Tests: unit (dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190221014630.15476-1-espindola@scylladb.com>	2019-02-21 12:40:01 +02:00
Avi Kivity	5520fc37ba	Merge " Fix INSERT JSON with null values" from Piotr " Fixes #4256 This miniseries fixes a problem with inserting NULL values through INSERT JSON interface. Tests: unit (dev) " * 'fix_insert_json_with_null' of https://github.com/psarna/scylla: tests: add test for INSERT JSON with null values cql3: add missing value erasing to json parser	2019-02-21 12:36:09 +02:00
Piotr Sarna	4d211690f9	tests: add test for INSERT JSON with null values	2019-02-21 11:25:14 +01:00
Piotr Sarna	6618191e49	cql3: add missing value erasing to json parser When inserting a null value through INSERT JSON, the column was erroneously not removed from the 'not used' list of columns. Fixes #4256	2019-02-21 11:23:44 +01:00
Tomasz Grabiec	8687666169	schema_tables: Add trace-level logging of schema mutations Can be useful in diagnosing problems with application of schema mutations. do_merge_schema() is called on every change of schema of the local node. create_table_from_mutations() is called on schema merge when a table was altered or created using mutations read from local schema tables after applying the change, or when loading schema on boot. Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>	2019-02-21 12:16:38 +02:00
Tomasz Grabiec	f65d1e649d	schema_mutations: Make printable Message-Id: <20190221093929.8929-1-tgrabiec@scylladb.com>	2019-02-21 12:16:32 +02:00
Avi Kivity	9adfd11374	Merge "Avoid including cryptopp headers" from Rafael " cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. This patch series introduces a single .cc file that has to include cryptopp headers. " * 'avoid-cryptopp-v3' of https://github.com/espindola/scylla: Avoid including cryptopp headers Delete dead code	2019-02-21 10:31:20 +02:00
Rafael Ávila de Espíndola	fd5ea2df5a	Avoid including cryptopp headers cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. The issue has been reported as https://github.com/weidai11/cryptopp/issues/793 To work around it, this patch uses a pimpl to have a single .cc file that has to include cryptopp headers. While at it, it also reduces the differences and code duplication between the md5 and sha1 hashers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-20 08:03:46 -08:00
Rafael Ávila de Espíndola	a309f952d2	Delete dead code This code would have be to refactored by the next patch. Since it is commented out, just delete it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-20 08:03:46 -08:00
Duarte Nunes	4354479985	Merge 'Minimize generated view updates for unselected column updates' from Piotr " This series addresses the issue of redundant view updates, generated for columns that were not selected for given materialized view. Cases covered (quote:) * If a base row has a live row marker, then we can avoid generating view updates if only unselected columns change; * If a base row has no live row marker, then we can avoid generating view updates if unselected columns are updated, unless they are newly created, deleted, or they have a TTL. Additionally, this series includes caching selected columns and is_index information to avoid unnecessary CPU cycles spent on recomputing these two. Fixes #3819 " * 'send_less_view_updates_if_not_necessary_4' of https://github.com/psarna/scylla: tests: add cases for view update generation optimizations view: minimize generated view updates for unselected columns view: cache is_index for view pointer index: make non-pointer overload of is_index function index: avoid copying when checking for is_index	2019-02-20 13:24:44 +00:00
Piotr Sarna	563456e3ac	tests: add cases for view update generation optimizations Test cases that cover avoiding generating view updates when not necessary (e.g. when a column not selected by the view is modified) are added.	2019-02-20 14:05:29 +01:00
Piotr Sarna	bd52e05ae2	view: minimize generated view updates for unselected columns In some cases generating view updates for columns that were not selected in CREATE VIEW statement is redundant - it is the case when the update will not influence row liveness in anyway. Currently, these cases are optimized out: - row marker is live and only unselected columns were updated; - row marked is not live and only unselected columns were updated, and in the process nothing was created or deleted and there was no TTL involved;	2019-02-20 14:05:27 +01:00
Piotr Sarna	dbe8491655	view: cache is_index for view pointer It's detrimental to keep querying index manager whether a view is backing a secondary index every time, so this value is cached at construct time. At the same time, this value is not simply passed to view_info when being created in secondary index manager, in order to decouple materialized view logic from secondary indexes as much as possible (the sole existence of is_index() is bad enough).	2019-02-20 12:52:32 +01:00
Piotr Sarna	cb20fc2e4f	index: make non-pointer overload of is_index function Previous interface enforced passing a shared pointer, which might result in calling unneeded shared_from_this().	2019-02-20 12:52:32 +01:00
Piotr Sarna	94db098d39	index: avoid copying when checking for is_index Previously is_index implementation used list_indexes() helper function, which copies data.	2019-02-20 12:52:32 +01:00
Tomasz Grabiec	a8c74bc7ab	gdb: Print LSA/Cache/Memtable memory usage from "scylla memory" Example output: LSA: allocated: 181010432 used: 177209344 free: 3801088 Cache: total: 97255424 used: 60700600 free: 36554824 Memtables: total: 83755008 Regular: real dirty: 79429632 virt dirty: 35168426 System: real dirty: 524288 virt dirty: 466764 Streaming: real dirty: 0 virt dirty: 0 Message-Id: <1550598424-23428-1-git-send-email-tgrabiec@scylladb.com>	2019-02-20 12:53:53 +02:00
Tomasz Grabiec	dafe22dd83	lsa: Fix spurios abort with --enable-abort-on-lsa-bad-alloc allocate_segment() can fail even though we're not out of memory, when it's invoked inside an allocating section with the cache region locked. That section may later succeed after retried after memory reclamation. We should ignore bad_alloc thrown inside allocating section body and fail only when the whole section fails. Fixes #2924 Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>	2019-02-20 12:53:49 +02:00
Avi Kivity	84465c23c4	Merge "Add multi-column restrictions filtering" from Piotr " Fixes #3574 This series adds missing multi-column restrictions filtering to CQL. The underlying infrastructure already allows checking multi-column restrictions in a reasonable way, so this series consists of mostly adding simple interfaces and parameters. Also, unit test cases for multi-column restrictions are provided. Tests: unit (dev) " * 'add_multi_column_restrictions_filtering_3' of https://github.com/psarna/scylla: tests: add multi-column filtering tests cql3: add multi-column restrictions filtering cql3: add specified is_satisfied_by to multi-column restriction cql3: rewrite raw loop in is_satisfied_by to boost::any_of cql3: fix is_satisfied_by for multi-column restrictions cql3: add missing include to multi-column restriction	2019-02-19 14:42:14 +02:00
Piotr Sarna	9432937816	tests: add multi-column filtering tests Refs #3574	2019-02-19 13:24:25 +01:00
Piotr Sarna	4dc0b0672c	cql3: add multi-column restrictions filtering It's now possible to pass multi-column restrictions to queries that require filtering. Fixes #3574	2019-02-19 13:24:25 +01:00
Piotr Sarna	3db526ffe2	cql3: add specified is_satisfied_by to multi-column restriction Multi-column restrictions need only schema, clustering key and query options in order to decide if they are satisfied, so an overloaded function that takes reduced number of parameters is added.	2019-02-19 13:24:25 +01:00
Piotr Sarna	16dbc917a4	cql3: rewrite raw loop in is_satisfied_by to boost::any_of	2019-02-19 13:24:12 +01:00
Piotr Sarna	0d675e4419	cql3: fix is_satisfied_by for multi-column restrictions Multi-column restriction should be satisfied by the value if any of the ranges contains it, not all of them. Example: SELECT * FROM t WHERE (a,b) IN ((1,2),(1,3)) will operate on two singular ranges: [(1,2),(1,2)] and [(1,3),(1,3)]. It's sufficient for a value to be inside any of these two in order to satisfy the restriction.	2019-02-19 13:10:58 +01:00
Avi Kivity	934ba7ccb2	Merge "tests: introduce test environment and cleanup sstable tests" from Benny " As part of implementing sstables manager and fixing issue related to updating large_data_handler on all delete paths, we want to funnel all sstable creations, loading, and deletions through a manager. The patchset lays out test infrastructure to funnel these opeations through class sstables::test_env. In the process, it cleans up many numerous call sites in the existing unit tests that evolved over time. Refs #4198 Refs #4149 Tests: unit (dev) " * 'projects/test_env/v3' of https://github.com/bhalevy/scylla: tests: introduce sstables::test_env tests: perf_sstable: rename test_env tests: sstable_datafile_test: use useable_sst tests: sstable_test: add write_and_validate_sst helper tests: sstable_test: add test_using_reusable_sst helper tests: sstable_test: use reusable_sst where possible tests: sstable_test: add test_using_working_sst helper tests: sstable_3_x_test: make_test_sstable tests: run_sstable_resharding_test: use default parameters to make_sstable tests: sstables::test::make_test_sstable: reorder params tests: test_setup: do_with_test_directory is unused tests: move sstable_resharding_strategy_tests to sstable_reharding_test tests: move create_token_from_key helpers to test_services tests: move column_family_for_tests to test_services dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh	2019-02-19 11:26:42 +02:00
Piotr Sarna	4eecb57a0b	cql3: add missing include to multi-column restriction	2019-02-19 10:24:31 +01:00
Tomasz Grabiec	9c6f897731	tools/toolchain/README: Add the "Troubleshooting" section Message-Id: <1550567863-29404-1-git-send-email-tgrabiec@scylladb.com>	2019-02-19 11:21:02 +02:00
Tzach Livyatan	622361bf1a	docs/docker-hub.md: Docker Compose cluster example This adds a simple example of launching a 3-node Scylla cluster with Docker Compose. Signed-off-by: Tzach Livyatan <tzach@scylladb.com> [ penberg: minor edits ] Message-Id: <20190213081003.6401-1-tzach@scylladb.com>	2019-02-19 09:52:20 +02:00
Avi Kivity	e37e095432	build: allow configuring and testing multiple modes Allow the --mode argument to ./configure.py and ./test.py to be repeated. This is to allow contiuous integration to configure only debug and release, leaving dev to developers. Message-Id: <20190214162736.16443-1-avi@scylladb.com>	2019-02-18 15:52:25 +00:00
Tomasz Grabiec	08f4a3664e	sstables: mc: writer: Avoid large allocations for maintaining promoted index Currently, we keep the entries in a circular_buffer, which uses a contiguous storage. For large partitions with many promoted index entries this can cause OOM and sstable compaction failure. A similar problem exists for the offset vector built in write_promoted_index(). This change solves the problem by serializing promoted index entries and the offset vector on the fly directly into a bytes_ostream, which uses fragmented storage. The serialization of the first entry is deferred, so that serialization is avoided if there will be less than 2 entries. Promoted index is not added for such partitions. There still remains a problem that large-enough promoted index can cause OOM. Refs #4217	2019-02-18 16:03:07 +01:00
Tomasz Grabiec	4e093bc3a4	sstables: mc: writer: Avoid double-serialization of the promoted index	2019-02-18 16:03:07 +01:00
Duarte Nunes	6e83457b1b	Merge 'Add PER PARTITION LIMIT' from Piotr " This series introduces PER PARTITION LIMIT to CQL. Protocol and storage is already capable of applying per-partition limits, so for nonpaged queries the changes are superficial - a variable is parsed and passed down. For paged queries and filtering the situation is a little bit more complicated due to corner cases: results for one partition can be split over 2 or more pages, filtering may drop rows, etc. To solve these, another variable is added to paging state - the number of rows already returned from last served partition. Note that "last" partition may be stretched over any number of pages, not just the last one, which is a case especially when considering filtering. As a result, per-partition-limiting queries are not eligible for page generator optimization, because they may need to have their results locally filtered for extraneous rows (e.g. when the next page asks for per-partition limit 5, but we already received 4 rows from the last partition, so need just 1 more from last partition key, but 5 from all next ones). Tests: unit (dev) Fixes #2202 " * 'add_per_partition_limit_3' of https://github.com/psarna/scylla: tests: remove superficial ignore_order from filtering tests tests: add filtering with per partition key limit test tests: publish extract_paging_state and count_rows_fetched tests: fix order of parameters in with_rows_ignore_order cql3,grammar: add PER PARTITION LIMIT idl,service: add persistent last partition row count cql3: prevent page generator usage for per-partition limit cql3: add checking for previous partition count to filtering pager: add adjusting per-partition row limit cql3: obey per partition limit for filtering cql3: clean up unneeded limit variables cql3: obey per partition limit for select statement cql3: add get_per_partition_limit cql3: add per_partition_limit to CQL statement	2019-02-18 14:47:11 +00:00
Amnon Heiman	750b76b1de	scylla-housekeeping: Read JSON as UTF-8 string for older Python 3 compatibility Python 3.6 is the first version to accept bytes to the json.loads(), which causes the following error on older Python 3 versions: Traceback (most recent call last): File "/usr/lib/scylla/scylla-housekeeping", line 175, in <module> args.func(args) File "/usr/lib/scylla/scylla-housekeeping", line 121, in check_version raise e File "/usr/lib/scylla/scylla-housekeeping", line 116, in check_version versions = get_json_from_url(version_url + params) File "/usr/lib/scylla/scylla-housekeeping", line 55, in get_json_from_url return json.loads(data) File "/usr/lib64/python3.4/json/__init__.py", line 312, in loads s.__class__.__name__)) TypeError: the JSON object must be str, not 'bytes' To support those older Python versions, convert the bytes read to utf8 strings before calling the json.loads(). Fixes #4239 Branches: master, 3.0 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190218112312.24455-1-amnon@scylladb.com>	2019-02-18 14:52:32 +02:00
Piotr Sarna	5ad5221ce1	tests: remove superficial ignore_order from filtering tests Testing filtering with LIMIT used with_rows_ignore_order function, while it's better to use simpler with_rows.	2019-02-18 11:06:44 +01:00
Piotr Sarna	5f67a501ec	tests: add filtering with per partition key limit test	2019-02-18 11:06:44 +01:00
Piotr Sarna	a84e237177	tests: publish extract_paging_state and count_rows_fetched These local lambda functions will be reused, so they are promoted to static functions.	2019-02-18 11:06:44 +01:00
Piotr Sarna	824e9dc352	tests: fix order of parameters in with_rows_ignore_order When reporting a failure, expected rows were mixed up with received rows. Also, the message assumed it received more rows, but it can as well be less, so now it reports a "different number" of rows.	2019-02-18 11:06:44 +01:00
Piotr Sarna	3e4f065847	cql3,grammar: add PER PARTITION LIMIT Select statements now allow passing PER PARTITION LIMIT (?) directive which will trim results for each partition accordingly.	2019-02-18 11:06:44 +01:00
Piotr Sarna	acf7bedad4	idl,service: add persistent last partition row count In order to process paged queries with per-partition limits properly, paging state needs to keep additional information: what was the row count of last partition returned in previous run. That's necessary because the end of previous page and the beginning of current one might consist of rows with the same partition key and we need to be able to trim the results to the number indicated by per-partition limit.	2019-02-18 11:06:44 +01:00
Piotr Sarna	3a2b004f02	cql3: prevent page generator usage for per-partition limit Paged queries that induce per-partition limits cannot use page generator optimization, as sometimes the results need to be filtered for extraneous rows on page breaks.	2019-02-18 11:06:44 +01:00
Piotr Sarna	1dadae212a	cql3: add checking for previous partition count to filtering Filtering now needs to take into account per partition limits as well, and for that it's essential to be able to compare partition keys and decide which rows should be dropped - if previous page(s) contained rows with the same partition key, these need to be taken into consideration too.	2019-02-18 11:06:43 +01:00
Piotr Sarna	82a3883575	pager: add adjusting per-partition row limit For filtering pagers, per partition limit should be set to page size every time a query is executed, because some rows may potentially get dropped from results.	2019-02-18 10:55:52 +01:00
Piotr Sarna	b965c3778f	cql3: obey per partition limit for filtering Filtering queries now take into account the limit of rows per single partition provided by the user.	2019-02-18 10:29:34 +01:00
Piotr Sarna	b3aa939cde	cql3: clean up unneeded limit variables Some places extracted a `limit` variable to be captured by lambdas, but they were not used inside them.	2019-02-18 10:29:34 +01:00
Piotr Sarna	cfb6e9c79c	cql3: obey per partition limit for select statement Select statement now takes into account the limit of rows per single partition provided by the user.	2019-02-18 10:29:34 +01:00
Piotr Sarna	41b466246e	cql3: add get_per_partition_limit	2019-02-18 10:29:34 +01:00
Piotr Sarna	93786a9148	cql3: add per_partition_limit to CQL statement Select statements can now accept per_partition_limit variable.	2019-02-18 10:29:34 +01:00
Gleb Natapov	b01a659014	storage_proxy: remove old Cassandra code Part of the code is already implemented (counters and hinted-handoff). Part of the code will probably never be (triggers). And the rest is the code that estimates number of rows per range to determine query parallelism, but we implemented exponential growth algorithms instead. Message-Id: <20190214112226.GE19055@scylladb.com>	2019-02-18 10:34:55 +02:00
Avi Kivity	a1567b0997	Merge "replace get_restricted_ranges() function with generator interface" from Gleb " get_restricted_ranges() is inefficient since it calculates all vnodes that cover a requested key ranges in advance, but callers often use only the first one. Replace the function with generator interface that generates requested number of vnodes on demand. " * 'gleb/query_ranges_to_vnodes_generator' of github.com:scylladb/seastar-dev: storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator storage_proxy: remove old get_restricted_ranges() interface cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface storage_proxy: introduce new query_ranges_to_vnode_generator interface	2019-02-18 10:33:54 +02:00
Avi Kivity	497367f9f7	Revert "build: switch debug mode from -O0 to -Og" This reverts commit `e988521b89`. It triggers a bug int gcc variable tracking, and there are reports it significantly slows down compilation.	2019-02-17 18:32:28 +02:00
Nadav Har'El	05db7d8957	Materialized views: name the "batch_memory_max" constant Give the constant 1024*1024 introduced in an earlier commit a name, "batch_memory_max", and move it from view.cc to view_builder.hh. It now resides next to the pre-existing constant that controlled how many rows were read in each build step, "batch_size". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217100222.15673-1-nyh@scylladb.com>	2019-02-17 13:28:16 +00:00
Avi Kivity	7b411e30a9	Update seastar submodule * seastar 11546d4...2313dec (6): > Deprecate thread_scheduling_group in favor of scheduling_group > Merge "Fixes for Doxygen documentation" from Jesse > future: optionally type-erase future::then() and future::then_wrapped > build: Allow deprecated declarations internally > rpc: fix insertion of server connections into server's container > rpc: split BOOST_REQUIRE with long conditions into multiple	2019-02-16 22:27:34 +02:00
Avi Kivity	03531c2443	fragmented_temporary_buffer: fix read_exactly() during premature end-of-stream read_exactly(), when given a stream that does not contain the amount of data requested, will loop endlessly, allocating more and more memory as it does, until it fails with an exception (at which point it will release the memory). Fix by returning an empty result, like input_stream::read_exactly() (which it replaces). Add a test case that fails without a fix. Affected callers are the native transport, commitlog replay, and internal deserialization. Fixes #4233. Branches: master, branch-3.0 Tests: unit(dev) Message-Id: <20190216150825.14841-1-avi@scylladb.com>	2019-02-16 17:06:19 +00:00
Takuya ASADA	af988a5360	install-dependencies.sh: show description when 'yum-utils' package is installed on Fedora When yum-utils already installed on Fedora, 'yum install dnf-utils' causes conflict, will fail. We should show description message instead of just causing dnf error mesage. Fixes #4215 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190215221103.2379-1-syuu@scylladb.com>	2019-02-16 17:16:18 +02:00
Pekka Enberg	f7cf04ac4b	tools/toolchain: Clean up DNF cache from Docker image Make sure we call "dnf clean all" to remove the DNF cache, which reduces Docker image size as per the following guidelines: https://github.com/fedora-cloud/Fedora-Dockerfiles/wiki/Guidelines-for-Creating-Dockerfiles A freshly built image is 250 MB smaller than the one on Docker Hub: <none> <none> b8cafc8ff557 16 seconds ago 1.2 GB docker.io/scylladb/scylla-toolchain fedora-29-20190212 d253d45a964c 3 days ago 1.45 GB Message-Id: <20190215142322.12466-1-penberg@scylladb.com>	2019-02-16 17:12:10 +02:00
Botond Dénes	2125e99531	service/storage_service: fix pre-bootstrap wait for schema agreement When bootstrapping, a node should to wait to have a schema agreement with its peers, before it can join the ring. This is to ensure it can immediately accept writes. Failing to reach schema agreement before joining is not fatal, as the node can pull unknown schemas on writes on-demand. However, if such a schema contains references to UDFs, the node will reject writes using it, due to #3760. To ensure that schema agreement is reached before joining the ring, `storage_service::join_token_ring()` has to checks. First it checks that at least one peer was connected previously. For this it compares `database::get_version()` with `database::empty_version`. The (implied) assumption is that this will become something other than `database::empty_version` only after having connected (and pulled schemas from) at least one peer. This assumption doesn't hold anymore, as we now set the version earlier in the boot process. The second check verifies that we have the same schema version as all known, live peers. This check assumes (since `3e415e2`) that we have already "met" all (or at least some) of our peers and if there is just one known node (us) it concludes that this is a single-node cluster, which automatically has schema agreement. It's easy to see how these two checks will fail. The first fails to ensure that we have met our peers, and the second wrongfully concludes that we are a one-node cluster, and hence have schema agreement. To fix this, modify the first check. Instead of relying on the presence of a non-empty database version, supposedly implying that we already talked to our peers, explicitely make sure that we have really talked to at least one other node, before proceeding to the second check, which will now do the correct thing, actually checking the schema versions. Fixes: #4196 Branches: 3.0, 2.3 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <40b95b18e09c787e31ba6c5519fb64d68b4ca32e.1550228389.git.bdenes@scylladb.com>	2019-02-15 15:56:46 +01:00
Rafael Ávila de Espíndola	9cd14f2602	Don't write to system.large_partition during shutdown The included testcase used to crash because during database::stop() we would try to update system.large_partition. There doesn't seem to be an order we can stop the existing services in cql_test_env that makes this possible. This patch then adds another step when shutting down a database: first stop updating system.large_partition. This means that during shutdown any memtable flush, compaction or sstable deletion will not be reflected in system.large_partition. This is hopefully not too bad since the data in the table is TTLed. This seems to impact only tests, since main.cc calls _exit directly. Tests: unit (release,debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190213194851.117692-1-espindola@scylladb.com>	2019-02-15 10:49:10 +01:00
Avi Kivity	e988521b89	build: switch debug mode from -O0 to -Og -Og is advertised as debug-friendly optimization, both in compile time and debug experience. It also cuts sstable_mutation_test run time in half: Changing -O0 to -Og Before: real 16m49.441s user 16m34.641s sys 0m10.490s After: real 8m38.696s user 8m26.073s sys 0m10.575s Message-Id: <20190214205521.19341-1-avi@scylladb.com>	2019-02-15 08:19:48 +02:00
Benny Halevy	c8f239ff2b	tests: introduce sstables::test_env In preparation to adding sstables_manager we want to establish an environment for testing sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:37:41 +02:00
Benny Halevy	f9546b23b7	tests: perf_sstable: rename test_env test_env is going to be a class in sstables namespace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:15 +02:00
Benny Halevy	d6cfc1fae5	tests: sstable_datafile_test: use useable_sst Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	2a6b5a7622	tests: sstable_test: add write_and_validate_sst helper In preparation for sstables::test_env Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	255f05e6c8	tests: sstable_test: add test_using_reusable_sst helper In preparation for sstables::test_env Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	e11e29a1fc	tests: sstable_test: use reusable_sst where possible Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	9d4989f2e8	tests: sstable_test: add test_using_working_sst helper In preparation for sstables::test_env Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	55aac22b37	tests: sstable_3_x_test: make_test_sstable Reused for making sstables for test cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	3bc1b8b9ff	tests: run_sstable_resharding_test: use default parameters to make_sstable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:22:14 +02:00
Benny Halevy	b0f3f8d766	tests: sstables::test::make_test_sstable: reorder params In preparation for providing a default large_data_handler in a test-standard way. buffer_size parameter reordered and now has a default value same as make_sstable()'s. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:36 +02:00
Benny Halevy	bcd3f36a8a	tests: test_setup: do_with_test_directory is unused Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:32 +02:00
Benny Halevy	b39c7bc4ae	tests: move sstable_resharding_strategy_tests to sstable_reharding_test Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:32 +02:00
Benny Halevy	8801a6da1f	tests: move create_token_from_key helpers to test_services Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:32 +02:00
Benny Halevy	815fd76c25	tests: move column_family_for_tests to test_services And unify multiple copies of column_family_test_config(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:21:10 +02:00
Benny Halevy	b6ad61d2e5	dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh So it can be used by other tests Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-14 22:16:52 +02:00
Nadav Har'El	43c42d608d	materialized views: forbid using "virtual" columns in restrictions For fixing issue #3362 we added in materialized views, in some cases, "virtual columns" for columns which were not selected into the view. Although these columns nominally exist in the view's schema, they must not be visible to the user, and in commit `3f3a76aa8f` we prevented a user from being able to SELECT these columns. In this patch we also prevent the user from being able to use these column names (which shouldn't exist in the view) in WHERE restrictions. Fixes #4216 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212162014.18778-1-nyh@scylladb.com>	2019-02-14 16:08:41 +02:00
Gleb Natapov	0b84b04f97	consistency_level: make it more const correct Message-Id: <20190214122631.GF19055@scylladb.com>	2019-02-14 14:52:51 +02:00
Nadav Har'El	fec562ec8f	Materialized views: limit size of row batching during bulk view building The bulk materialized-view building processes (when adding a materialized view to a table with existing data) currently reads the base table in batches of 128 (view_builder::batch_size) rows. This is clearly better than reading entire partitions (which may be huge), but still, 128 rows may grow pretty large when we have rows with large strings or blobs, and there is no real reason to buffer 128 rows when they are large. Instead, when the rows we read so far exceed some size threshold (in this patch, 1MB), we can operate on them immediately instead of waiting for 128. As a side-effect, this patch also solves another bug: At worst case, all the base rows of one batch may be written into one output view partition, in one mutation. But there is a hard limit on the size of one mutation (commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the batch size to exceed this limit. By not batching further after 1MB, we avoid reaching this limit when individual rows do not reach it but 128 of them did. Fixes #4213. This patch also includes a unit test reproducing #4213, and demonstrating that it is now solved. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190214093424.7172-1-nyh@scylladb.com>	2019-02-14 12:04:40 +02:00
Calle Wilund	e70286a849	db/extensions: Allow schema extensions to turn themselves off Fixes #4222 Iff an extension creation callback returns null (not exception) we treat this as "I'm not needed" and simply ignore it. Message-Id: <20190213124311.23238-1-calle@scylladb.com>	2019-02-13 14:50:51 +02:00
Jesse Haber-Kucharsky	74ac1deee1	build: Fix the build on Ubuntu The way the `pkg-config` executable works on Fedora and Ubuntu is different, since on Fedora `pkg-config` is provided by the `pkgconf` project. In the build directory of Seastar, `seastar.pc` and `seastar-testing.pc` are generated. `seastar` is a requirement of `seastar-testing`. When pkg-config is invoked like this: pkg-config --libs build/release/seastar-testing.pc the version of `pkg-config` on Fedora resolves the reference to `seastar` in `Requires` to the `seastar.pc` in the same directory. However, the version of `pkg-config` on Ubuntu 18.04 does not: Package seastar was not found in the pkg-config search path. Perhaps you should add the directory containing `seastar.pc' to the PKG_CONFIG_PATH environment variable Package 'seastar', required by '/seastar-testing', not found To address the divergent behavior, we set the `PKG_CONFIG_PATH` variable to point to the directory containing `seastar.pc`. With this change, I was able to configure Scylla on both Fedora 29 and Ubuntu 18.04. Fixes #4218 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <d7164bde2790708425ac6761154d517404818ecd.1550002959.git.jhaberku@scylladb.com>	2019-02-13 13:33:50 +02:00
Avi Kivity	2915baeff4	Merge "Move truncation records to separate table" from Calle " Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data. " * 'calle/truncation' of github.com:scylladb/seastar-dev: truncation_migration_test: Add rudimentary test system_keyspace: Add waitable for trunc. migration cql_test_env: Add separate config w. feature disable cql_test_env: Add truncation migration to init cql_assertions: Add null/non-null tests storage_service: Add features disabling for tests Add system.truncated documentation in docs commitlog_replay: Use dedicated table for truncation storage_service: Add "truncation_table" feature	2019-02-13 11:16:30 +02:00
Calle Wilund	2e320a456c	truncation_migration_test: Add rudimentary test	2019-02-13 09:08:12 +00:00
Calle Wilund	4e657c0633	system_keyspace: Add waitable for trunc. migration For tests. Hooray for separation of concern.	2019-02-13 09:08:12 +00:00
Calle Wilund	b253757b17	cql_test_env: Add separate config w. feature disable	2019-02-13 09:08:12 +00:00
Calle Wilund	859a1d8f36	cql_test_env: Add truncation migration to init	2019-02-13 09:08:12 +00:00
Calle Wilund	fbcbe529ad	cql_assertions: Add null/non-null tests	2019-02-13 09:08:12 +00:00
Calle Wilund	64e8c6f31d	storage_service: Add features disabling for tests	2019-02-13 09:08:12 +00:00
Calle Wilund	7d3867e153	Add system.truncated documentation in docs	2019-02-13 09:08:12 +00:00
Calle Wilund	12ebcf1ec7	commitlog_replay: Use dedicated table for truncation Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data.	2019-02-13 09:08:12 +00:00
Calle Wilund	ff5e541335	storage_service: Add "truncation_table" feature	2019-02-13 09:08:12 +00:00
Avi Kivity	a3de5581ce	Update seastar submodule * seastar 428f4ac...11546d4 (9): > reactor: Fix an infinite loop caused the by high resolution timer not being monitored > build: Add back `SEASTAR_SHUFFLE_TASK_QUEUE` > build: Unify dependency versions > future-util: optimize parallel_for_each() with single element > core/sharded.hh: fix doxygen for "Multicore" group > build: switch from travis-ci to circleci > perftune.py: fix irqbalance tuning on Ubuntu 18 > build: Make the use of sanitizers transitive > net: ipv6: fix ipv6 detection and tests by binding to loopback	2019-02-12 18:42:07 +02:00
Avi Kivity	c7aa73af51	Merge "Automatically pause shard readers when not used" from Botond " Recently, there has been a series of incidents of the multishard combining reader deadlocking, when the concurrency of reads were severely restricted and there was no timeout for the read. Several fixes have been merged (`414b14a6b`, `21b4b2b9a`, `ee193f1ab`, `170fa382f`) but eliminating all occurrences of deadlocks proved to be a whack-a-mole game. After the last bug report I have decided that instead of trying to plug new wholes as we find them, I'll try to make wholes impossible to appear in the first place. To translate this into the multishard reader, instead of sprinkling new `reader.pause()` calls all over the place in the multishard reader to solve the newly found deadlocks, make the pausing of readers fully automatic on the shard reader level. Readers are now always kept in a paused state, except when actually used. This eliminates the entire class of deadlock bugs. This patch-set also aims at simplifying the multishard reader code, as well as the code of the existing `lifecycle_policy` implementations. This effort resulted in: * mutation_reader.cc: no change in SLOC, although it now also contains logic that used to be duplicated in every `lifecycle_policy` implementation; * multishard_mutation_query.cc: 150 SLOC removed; * database.cc: 30 SLOC removed; Also the code is now (hopefully) simpler, safer and has a clearer structure. Fixes #4050 (main issue) Fixes #3970 Fixes #3998 (deprecates really) " * 'simplify-and-fix-multishard-reader/v3.1' of https://github.com/denesb/scylla: query_mutations_on_all_shards(): make states light-weight query_mutations_on_all_shards(): get rid of read_context::paused_reader query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state query_mutations_on_all_shards(): pause looked-up readers query_mutation_on_all_shards(): remove unecessary indirection shard_reader: auto pause readers after being used reader_concurrency_semaphore::inactive_read_handle: fix handle semantics shard_reader: make reader creation sync shard_reader: use semaphore directly to pause-resume shard_reader: recreate_reader(): fix empty range case foreign_reader: rip out the now unused private API shard_reader: move away from foreign_reader multishard_combining_reader: make shard_reader a shared pointer multishard_combining_reader: move the shard reader definition out multishard_combining_reader: disentangle shard_reader	2019-02-12 16:22:52 +02:00
Botond Dénes	db106a32c8	query_mutations_on_all_shards(): make states light-weight Previously the different states a reader can be in were all separate structs, and were joined together by a variant. When this was designed this made sense as states were numerous and quite different. By this point however the number of states has been reduced to 4, with 3 of them being almost the same. Thus it makes sense to merge these states into single struct and keep track of the current state with an enum field. This can theoretically increase the chances of mistakes, but in practice I expect the opposite, due to the simpler (and less) code. Also, all the important checks that verify that a reader is in the state expected by the code are all left in place. A byproduct of this change is that the amount of cross-shard writes is greatly reduced. Whereas previously the whole state object had to be rewritten on state change, now a single enum value has to be updated. Cross shard reads are reduced as well to the read of a few foreign pointers, all state-related data is now kept on the shard where the associated reader lives.	2019-02-12 16:20:51 +02:00
Botond Dénes	65b2eb0939	query_mutations_on_all_shards(): get rid of read_context::paused_reader	2019-02-12 16:20:51 +02:00
Botond Dénes	ec44a4dbb1	query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state These two states are now the same, with the artificial distinction that all readers are promoted to readey_to_save state after the compaction state and the combined buffer is dismantled. From a practical perspective this distinction is meaningless so merge the two states into a single `saving` state.	2019-02-12 16:20:51 +02:00
Botond Dénes	9a1bd24d82	query_mutations_on_all_shards(): pause looked-up readers On the beginning of each page, all saved readers from the previous pages (if any) are looked up, so they can be reused. Some of these saved readers can end up not being used at all for the current page, in which case they will needlessly sit on their permit for the duration of filling the page. Avoid this by immediately pausing all looked-up readers. This also allows a nice unifying of the reader saving logic, as now all readers will be in a paused state when `save_reader()` is called. Previously, looked-up, but not used readers were an exception to this, requiring extra logic to handle both cases. This logic can now be removed.	2019-02-12 16:20:51 +02:00
Botond Dénes	61b9ed7faf	query_mutation_on_all_shards(): remove unecessary indirection	2019-02-12 16:20:51 +02:00
Botond Dénes	9000626647	shard_reader: auto pause readers after being used Previously it was the responsibility of the layer above (multishard combining reader) to pause readers, which happened via an explicit `pause()` call. This proved to be a very bad design as we kept finding spots where the multishard reader should have paused the reader to avoid potential deadlocks (due to starved reader concurrency semaphores), but didn't. This commit moves the responsibility of pausing the reader into the shard reader. The reader is now kept in a paused state, except when it is actually used (a `fill_buffer()` or `fast_forward_to()` call is executing). This is fully transparent to the layer above. As a side note, the shard reader now also hides when the reader is created. This also used to be the responsibility of the multishard reader, and although it caused no problems so far, it can be considered a leak of internal details. The shard reader now automatically creates the remote reader on the first time it is attempted to be used. The code has been reorganized, such that there is now a clear separation of responsibilities. The multishard combining reader handles the combining of the output of the shard readers, as well as issuing read-aheads. The shard reader handles read-ahead and creating the remote reader when needed, as well as transferring the results of remote reads to the "home" shard. The remote reader (`shard_reader::remote_reader`, new in this patch) handles pausing-resuming as well as recreating the reader after it was evicted. Layers don't access each other's internals (like they used to). After this commit, the reader passed to `destroy_reader()` will always be in paused state.	2019-02-12 16:20:51 +02:00
Botond Dénes	ab5d717052	reader_concurrency_semaphore::inactive_read_handle: fix handle semantics That is: * make it move only; * make moved-from handles null handles; * add (public) default constructor, which constructs a null handle;	2019-02-12 16:20:51 +02:00
Botond Dénes	37006135dc	shard_reader: make reader creation sync Reader creation happens through the `reader_lifecycle_policy` interface, which offers a `create_reader()` method. This method accepts a shard parameter (among others) and returns a future. Its implementation is expected to go to the specified shard and then return with the created reader. The method is expected to be called from the shard where the shard reader (and consequently the multishard reader) lives. This API, while reasonable enough, has a serious flaw. It doesn't make batching possible. For example, if the shard reader issues a call to the remote shard to fill the remote reader's buffer, but finds that it was evicted while paused, it has to come back to the local shard just to issue the recreate call. This makes the code both convoluted and slow. Change the reader creation API to be synchronous, that is, callable from the shard where the reader has to be created, allowing for simple call sites and batching. This change requires that implementations of the lifecycle policy update any per-reader data-structure they have from the remote shard. This is not a problem however, as these data-structures are usually partitioned, such that they can be accessed safely from a remote shard. Another, very pleasant, consequence of this change is that now all methods of the lifecycle interface are sync and thus calls to them cannot overlap anymore. This patch also removes the `test_multishard_combining_reader_destroyed_with_pending_create_reader` unit test, which is not useful anymore. For now just emulate the old interface inside shard reader. We will overhaul the shard reader after some further changes to minimize noise.	2019-02-12 16:20:51 +02:00
Botond Dénes	57d1f6589c	shard_reader: use semaphore directly to pause-resume The shard reader relies on the `reader_lifecycle_policy` for pausing and resuming the remote reader. The lifecycle policy's API was designed to be as general as possible, allowing for any implementation of pause/resume. However, in practice, we have a single implementation of pause/resume: registering/unregistering the reader with the relevant `reader_concurrency_semaphore`, and we don't expect any new implementations to appear in the future. Thus, the generic API of the lifecycle policy, is needlessly abstract making its implementations needlessly complex. We can instead make this very concrete and have the lifecycle policy just return the relevant semaphore, removing the need for every implementor of the lifecycle policy interface to have a duplicate implementation of the very same logic. For now just emulate the old interface inside shard reader. We will overhaul the shard reader after some further changes to minimize noise.	2019-02-12 16:20:51 +02:00
Botond Dénes	fae5a2a8c8	shard_reader: recreate_reader(): fix empty range case If the shard reader is created for a singular range (has a single partition), and then it is evicted after reaching EOS, when recreated we would have to create a reader that reads an empty range, since the only partition the range has was already read. Since it is not possible to create a reader with an empty range, we just didn't recreate the reader in this case. This is incorrect however, as the code might still attempt to read from this reader, if only due to a bug, and would trigger a crash. The correct fix is to create an empty reader that will immediately be at EOS.	2019-02-12 16:20:51 +02:00
Botond Dénes	cd807586f6	foreign_reader: rip out the now unused private API Drop all the glue code, needed in the past so the shard reader can be implemented on top of foreign reader. As the shard reader moved away from foreign reader, this glue code is not needed anymore.	2019-02-12 16:20:51 +02:00
Botond Dénes	d80bc3c0a5	shard_reader: move away from foreign_reader In the past, shard reader wrapped a foreign reader instance, adding functionality required by the multishard reader on top. This has worked well to a certain degree, but after the addition of pause-resume of shard reader, the cooperation with foreign reader became more-and-more a struggle. It has now gotten to a point, where it feels like shard reader is fighting foreign reader as much as it reuses it. This manifested itself in the ever growing amount of glue code, and hacks baked into foreign reader (which is supposed to be of general use), specific to the usage in the multishard reader. It is time we don't force this code-reuse anymore and instead implement all the required functionality in shard reader directly.	2019-02-12 16:20:51 +02:00
Botond Dénes	da0c01c68b	multishard_combining_reader: make shard_reader a shared pointer Some members of shard reader have to be accessed even after it is destroyed. This is required by background work that might still be pending when the reader is destroyed. This was solved by creating a special `state` struct, which contained all the members of the shard readers that had to be accessed even after it was destroyed. This state struct was managed through a shared pointer, that each continuation that was expected to outlive the reader, held a copy of. This however created a minefield, where each line of the code had to be carefully audited to access only fields that will be guaranteed to remain valid. Fix this mess by making the whole class a shared pointer, with `enable_shared_from_this`. Now each continuation just has to make sure to keep `this` alive and code can now access all members freely (well, almost).	2019-02-12 16:20:51 +02:00
Botond Dénes	f1c3421eb4	multishard_combining_reader: move the shard reader definition out Shard reader started its life as a very thin layer above foreign reader, with just some convenience methods added. As usual, by now it has grown into a hairy monster, its class definition out-growing even that of the multishard reader itself. It is time shard reader is moved into the top-level scope, improving the readability of both classes.	2019-02-12 16:20:51 +02:00
Botond Dénes	7114b59309	multishard_combining_reader: disentangle shard_reader Currently shard reader has a reference to the owning multishard reader and it freely accesses its members. This resulted in a mess, where it's not clear what exactly shard reader depends on. Disentangle this mess, by making the shard reader self-sufficient, passing all it depends on into its constructor.	2019-02-12 16:20:51 +02:00
Nadav Har'El	85e5791710	tests/view_schema_test: fix flakiness caused by missing eventually() All tests that involve writing to a base table and then reading from the view table must use the eventually() function to account for the fact that the view update is asynchronous, and may be visible only some time after writing the base table. Forgetting an eventually() can cause the test to become flaky and sometimes fail because the expected data is not yet in the view. Botond noticed these failures in practice in two subtests (test_partition_key_filtering_with_slice and test_clustering_key_in_restrictions). This patch fixes both tests, and I also reviewed the entire source file view_schem_test.cc and found additional places missing an eventually() (and also places that unnecessarily used eventually() to read from the base table), and fixed those as well. Fixes #4212 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212121140.14679-1-nyh@scylladb.com>	2019-02-12 16:10:30 +02:00
Paweł Dziepak	eb03cf00f5	sstable: write_components: drop default for encoding stats There is no value if having a default value for encoding_stats parameter of write_components(). If anything it weakens the tests by encouraging not using the real encoding stats which is not what the actual sstable write path in Scylla does. This patch removes the default value and makes most of the tests provide real encoding statistics. The ones that do not are those that have no easy way of obtaining those (and those stats are not that important for the test itself) or there is a reason for not using those (sstable_3_x_test::test_sstable_write_large_row uses row size thresholds based on size with default-constructed encoding_stats). Message-Id: <20190212124356.14878-1-pdziepak@scylladb.com>	2019-02-12 16:08:24 +02:00
Calle Wilund	4a52ed7884	commitlog: Accept recycled (not yet re-used) segments in replay Refs #4085 Changes commitlog descriptor to both accept "Recycled-Commitlog..." file names, and preserve said name in the descriptor. This ensures we pick up the not-yet-used recycled segments left from a crash for replay. The replay in turn will simply ignore the recycled files, and post actual replay they will be deleted as needed. Message-Id: <20190129123311.16050-1-calle@scylladb.com>	2019-02-12 12:23:55 +02:00
Nadav Har'El	93baa334ea	create-relocatable-package.py: speed up slow compression create-relocatable-package.py currently (refs #4194) builds a compressed tar file, but does so using a painfully slow Python implementation of gzip, which is a problem considering the huge size (around 2 gigabytes) of Scylla's executable. On my machine, running it for a release build of Scylla takes a whopping 6 minutes. Just replacing the Python compression with a pipe to an external "gzip" process speeds up the run to just 2 minutes. But gzip is still not optimal, using only one thread even when on a many-core machine. If we switch to "pigz", a parallel implementation of "gzip", all cores are used and on my machine the compression speeds up to just 23 seconds - that's 15 times faster than before this patch. So this patch has create-relocatable-package.py use an external pigz process. "pigz" is now required on the build system (if you want to create packages), so is added to install-dependencies.sh. [avi: update toolchain] Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212090333.3970-1-nyh@scylladb.com>	2019-02-12 11:19:04 +02:00
Nadav Har'El	1cf1af1502	scylla_setup: fix non-interactive behavior In commit `ec66dd6562`, in non-interactive runs of scylla_setup all options were unintentionally set to "false", regardless of the options passed on the scylla_setup command line. This can lead to all sorts of wrong behaviors, and in particular one test setup assumed it was enabling the Scylla service (which was previously the default) but after this commit, it no longer did. This patch restores the previous behavior: Non-interactive invocations of scylla_setup adhere to the defaults and the command-line options, rather than blindly choosing "false". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190211214105.32613-1-nyh@scylladb.com>	2019-02-12 10:50:00 +02:00
Gleb Natapov	26e5700819	storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator Do not recalculate too much ranges in advance, it requires large allocation and usually means that a consumer of the interface is going to do to much work in parallel. Fixes: #3767	2019-02-12 10:45:25 +02:00
Avi Kivity	da9628c6dc	auth: password_authenticator: protect against NULL salted_hash In case salted_hash was NULL, we'd access uninitialized memory when dereferencing the optional in get_as<>(). Protect against that by using get_opt() and failing authentication if we see a NULL. Fixes #4168. Tests: unit (release) Branches: 3.0, 2.3 Message-Id: <20190211173820.8053-1-avi@scylladb.com>	2019-02-11 18:54:03 +01:00
Botond Dénes	c9e00172e9	tests/multishard_mutation_query_test: add fuzzy test "Fuzzy test" executes semi-random range-scans against semi-random data. By doing so we hope to achieve a coverage of edge cases that would be very hard to achieve by "conventional" unit tests. Fuzzy test generates a table with a population of partitions that are a combinations of all of: * Size of static row: none, tiny, small and large; * Number of clustering rows: none, few, several, and lots; * Size of clustering rows: tiny, small and large; * Number of range deletions: few, several and lots; * Number of rows covered by a range deletion: few, several; As well as a partition with extreme large static row, extreme number of rows and rows of extreme size. To avoid writing an excess amount of data, the size limit of pages is reduced to 1KB (from the default 1MB) and the row count limit of pages is reduced to 1000 (from the default of 10000). The test then executes range-scans against this population. For each range scan, a random partition range is generated, that is guaranteed to contain at least one partition (to avoid executing mostly empty scans), as well as a random partition-slice (row ranges). The data returned by the query is then thoroughly validated against the population description returned by the `create_test_table()` function. As this test has a large degree of randomness to it, covering a quasi-infinite input-space, it can (theoretically) fail at any time. As such I took great care in making such failures deterministically reproducible, based on a single random seed, which is logged to the output in case of a failure, together with instructions on how to repeat the particular run. The test also uses extensive logging to aid investigations. For logging, seastar's logging mechanism is used, as `BOOST_TEST_MESSAGE` produces unintelligible output when running with -c > 1. Log messages are carefully tagged, so that the test produces the least amount of noise by default, while being very explicit about what's happening when ran with `debug` or especially `trace` log levels.	2019-02-11 17:14:47 +02:00
Botond Dénes	4b2cac6f40	tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan() The existing `read_all_partitions_with_paged_scan()` implementation was tailored to the existing, simplistic test cases. Refactor it so that it can be used in much more complex test cases: * Allow specifying the page's `max_size`. * Allow specifying the query range. * Allow specifying the partition slice's ck ranges. * Fix minor bugs in the paging logic. To avoid churn, a backward-compatible overload is added, that retains the old parameter set.	2019-02-11 17:14:47 +02:00
Botond Dénes	542301fdc9	tests/test_table: add advanced `create_test_table()` overload This overload provides a middle ground between the very generic, but hard-to-use "expert version" and to very restrictive and simplistic "beginner version". It allows the user to declaratively describe the to-be-generated population in terms of bunch `std::uniform_int_distribution` objects (e.g. number of rows, size of rows, etc.). This allows for generating a random population in a controlled way, with a minimum amount of boiler-plate code on the user side.	2019-02-11 17:14:47 +02:00
Botond Dénes	7e1c1c2e8c	tests/test_table: make `create_test_table()` customizable Allow the user to specify the population of the table in a generic and flexible way. This patch essentially rewrites the `create_test_table()` implementation from scratch, so that it populates the table using the partition generator passed in by the user. Backward compatibility is kept, by providing a `create_test_table()` overload that is identical to the previous API. This overload is now implemented on top of the generic overload.	2019-02-11 17:14:47 +02:00
Gleb Natapov	ecc5230de5	storage_proxy: remove old get_restricted_ranges() interface It is not used any more.	2019-02-11 14:45:43 +02:00
Gleb Natapov	0cd9bbb71d	cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface	2019-02-11 14:45:43 +02:00
Gleb Natapov	e6208b1cde	tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface	2019-02-11 14:45:43 +02:00
Gleb Natapov	2735a85c8e	storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface	2019-02-11 14:45:43 +02:00
Gleb Natapov	692a0bd000	storage_proxy: introduce new query_ranges_to_vnode_generator interface get_restricted_ranges() function gets query provided key ranges and divides them on vnode boundaries. It iterates over all ranges and calculates all vnodes, but all its users are usually interested in only one vnode since most likely it will be enough to populate a page. If it will be not enough they will ask for more. This patch introduces new interface instead of the function that allows to generate vnode ranges on demand instead of precalculating all of them.	2019-02-11 14:45:43 +02:00
Avi Kivity	cb51fcab9d	README: improbe dbuild instructions Add a quick start, document more options, and link from the main README. Message-Id: <20190210154606.21739-1-avi@scylladb.com>	2019-02-11 09:25:25 +01:00
Avi Kivity	2724a66a12	docker: don't send .git during "docker build" It's huge and useless during "docker build" operations. Message-Id: <20190208161848.21125-1-avi@scylladb.com>	2019-02-11 09:17:14 +01:00
Glauber Costa	e0bfd1c40a	allow Cassandra SSTables with counters to be imported if they are new enough Right now Cassandra SSTables with counters cannot be imported into Scylla. The reason for that is that Cassandra changed their counter representation in their 2.1 version and kept transparently supporting both representations. We do not support their old representation, nor there is a sane way to figure out by looking at the data which one is in use. For safety, we had made the decision long ago to not import any tables with counters: if a counter was generated in older Cassandra, we would misrepresent them. In this patch, I propose we offer a non-default way to import SSTables with counters: we can gate it with a flag, and trust that the user knows what they are doing when flipping it (at their own peril). Cassandra 2.1 is by now pretty old. many users can safely say they've never used anything older. While there are tools like sstableloader that can be used to import those counters, there are often situations in which directly importing SSTables is either better, faster, or worse: the only option left. I argue that having a flag that allow us to import them when we are sure it is safe is better than having no option at all. With this patch I was able to successfully import Cassandra tables with counters that were generated in Cassandra 2.1, reshard and compact their SSTables, and read the data back to get the same values in Scylla as in Cassandra. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190210154028.12472-1-glauber@scylladb.com>	2019-02-10 17:50:48 +02:00
Glauber Costa	61ea54eff6	tools: toolchain: dbuild: use host networking This is convenient to test scylla directly by invoking build/dev/scylla. This needs to be done under docker because the shared objects scylla looks for may not exist in the host system. During quick development we may not want to go through the trouble of packaging relocatable scylla every time to test changes. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190209021033.8400-1-glauber@scylladb.com>	2019-02-10 12:16:47 +02:00
Duarte Nunes	d2d885fb93	Merge 'Fix misdetection of remote counter shards' from Paweł " The code reading counter cells form sstables verifies that there are no unsupported local or remote shards. The latter are detected by checking if all shards are present in the counter cell header (only remote shards do not have entries there). However, the logic responsible for doing that was incorrectly computing the total number of counter shards in a cell if the header was larger than a single counter shard. This resulted in incorrect complaints that remote shards are present. Fixes #4206 Tests: unit(release) " * tag 'counter-header-fix/v1' of https://github.com/pdziepak/scylla: tests/sstables: test counter cell header with large number of shards sstables/counters: fix remote counter shard detection	2019-02-10 12:16:31 +02:00
Paweł Dziepak	4eeb8eeed5	tests/sstables: test counter cell header with large number of shards The logic responsible for reading counters from sstables was getting confused by large headers. The size of the header depends directly on the number of shards. This tests checks that we can handle cells with large number of counter shards properly.	2019-02-08 17:06:31 +00:00
Paweł Dziepak	df1ac03154	sstables/counters: fix remote counter shard detection Each counter cell has a header with an entry for each local and global shards. The detection of remote shards is done by checking if there are any counter shards that do not have an entry in the header. This is done by computing the number of counter shards in a cell and comparing it to the number of header entries. However, the computation was wrong and included the size taken by the header itself. As a result, if the header was as big or larger than a single counter shard Scylla incorrectly complained about remote shards.	2019-02-08 17:04:22 +00:00
Glauber Costa	8ba6b569b1	relocatable python: make sure all shared objects are relocated The interpreter as it is right now has a bug: I incorrectly assumed that all the shared libraries that python dynamically links would be in lib-dynload. That is not true, and at least some of them are in site-packages. With that, we were loading system libraries for some shared objects. The approach taken to fix this is to just check if we're seeing a shared library and relocate everything we see: we will end up relocating the ones in lib64 too, but that not only should be okay, it is probably even more fool-proof. While doing that I noticed that I had forgotten to incorporate one of previous feedback from Avi (that we're leaving temporary files behind). So I'm fixing that as well. [avi: update toolchain] Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190208115501.7234-1-glauber@scylladb.com>	2019-02-08 18:42:24 +02:00
Glauber Costa	fb742473e2	replace /usr/local as a source of packages in the python relocatable interpreter I was playing with the python3 interpreter trying to get pip to work, just to see how far we can go. We don't really need pip, but I figured it would be a good stress test to make sure that the process is working and robust. And it didn't really work, because although pip will correctly install things into $relocatable_root/local/lib, sys.path will still refer to a hardcoded /usr/local. While this should not affect Scylla, since we expect to have all our modules in out path anyway -- and that path is searched before /usr/local, it is still dangerous to make an absolute reference like this. Unfortunately, /usr/local/ it is included unconditionally by site.py, which is executed when the interpreter is started and there is no environment variable I found to change that (the help string refers to PYTHONNOUSERSITE, but I found no mention of that in site.py whatsoever) There is a way to tell site.py not to bother to add user sites, by passing the -s flag, which this patch does. Aside from doing that, we also enhance PYTHONPATH to include a reference to ./local/{lib,lib64}/python<version>/site-packages. After applying this patch, I was able to build an interpreter containing only python3-pip and python3-setuptools, and build the relocatable environment from there. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190206052104.25927-1-glauber@scylladb.com>	2019-02-08 18:41:52 +02:00
Botond Dénes	181bf64858	query: add trim_clustering_row_ranges_to() This algorithm was already duplicated in two places (service/pager/query_pagers.cc and mutation_reader.cc). Soon it will be used in a third place. Instead of triplicating, move it into a function that everybody can use.	2019-02-08 16:30:17 +02:00
Botond Dénes	bc31d8cbcc	tests/test_table: add keyspace and table name params Allow the keyspace and table names to be customizable by the caller.	2019-02-08 16:30:17 +02:00
Botond Dénes	2d885c6453	tests/test_table: s/create_test_cf/create_test_table/ Also move it to the `test` namespace.	2019-02-08 16:30:17 +02:00
Botond Dénes	c2a6ac307f	tests: move create_test_cf() to tests/test_table.{hh,cc} In the next patches `create_test_cf()` will be made much more powerful and as such generally useful. Move it into its own files so other tests can start using it as well.	2019-02-08 16:30:17 +02:00
Botond Dénes	2d3c4f9009	tests/multishard_mutation_query_test: drop many partition test Soon a much better test will be added that will cover many partitions as well and much more.	2019-02-08 16:30:17 +02:00
Botond Dénes	ced0e7ecb3	tests/multishard_mutation_query_test: drop range tombstone test Soon a much better test will be added that will also cover range tombstones and much more.	2019-02-08 16:30:17 +02:00
Paweł Dziepak	64b1a2caf9	tests: modernise tmpdir tmpdir is a helper class representing a temporary directory. Unfortunately, it suffers for some problems such as lack of proper encapsulation and weak typing. This has caused bugs in the past when the user code accidentally modified the member variable with the path to the directory. This patch modernises tmpdir and updates its users. The path is stored in a std::filesystem::path and available read-only to the class users. mkdtemp and boost are replaced by standard solution. The users are update to use path more (when it didn't involve too many changes to their code) and stop using lw_shared_ptr to store the tmpdir when it wasn't necessary. tmpdir intentionally doesn't provide any helpers for getting the path as a string in order to discourage weak types. Message-Id: <20190207145727.491-1-pdziepak@scylladb.com>	2019-02-07 20:18:14 +02:00
Avi Kivity	e2e25720c1	Update seastar submodule * seastar c3be06d...428f4ac (13): > build: make the "dist" test respect the build type > Merge 'Add support for docker --cpuset-cpus' from Juliana > Merge "Add support for Coroutines TS" from Paweł > Merge "Modernize dependency management" from Avi > future: propagate broken_promise exception to abandoned continuations > net/inet_address: avoid clang Wmissing-braces > build: Default to the "Release" type if unspecified > rpc: log an exception that may happen while processing an RPC message > Add a --split-dwarf option to configure.py > build: Fix the `StdFilesystem` module > Compress debug info by default > Add an option for building with split dwarf > Dockerfile: install stow	2019-02-07 20:08:15 +02:00
Paweł Dziepak	de2a447576	utils/extremum_tracking: drop default constructor Default constructed extremum_tracker has uninitialised _default_value which basically makes it never correct to do that. Since this class is a mechanism and not a value it doesn't really need to be a regular type, so let's drop the default constructor. Message-Id: <20190207162430.7460-1-pdziepak@scylladb.com>	2019-02-07 18:31:25 +02:00
Tomasz Grabiec	7184289015	Merge "Various fixes and improvements for sstables statistics" from Paweł This series contains several fixes and improvements as well as new tests for sstable code dealing with statistics. * https://github.com/pdziepak/scylla.git sstable-stats-fixes/v1-rebased: sstables: compaction: don't access moved-from vector of sstables memtable: move encoding_stats_collector implementation out of header sstables: seal_statistics(): pass encoding_stats by constant reference sstables/mc/writer: don't assume all schema columns are present tests/sstable3: improvements to file compare tests: extract mutation data model tests/data_model: add support for expiring atomic cells tests/data_model: allow specifying timestamp for row markers tests/memtable: test column tracking for encoding stats sstables: use correct source of statistics in get_encoding_stats_for_compaction() utils/extremum_tracking: preserve "not-set" status on merge sstables/metadata_collector: move the default values to the global tracker tests/sstables: test for reading serialisation header tests/sstables: pass encoding stats to write_components() tests/sstable: test merging encoding_stats Fixes #4202.	2019-02-07 12:35:29 +01:00
Paweł Dziepak	67252de195	tests/sstable: test merging encoding_stats	2019-02-07 10:17:06 +00:00
Paweł Dziepak	e25603fbf7	tests/sstables: pass encoding stats to write_components() By default write_components() uses a safe default for encoding_stats which indicates that all columns are present. This may hide so bugs, so let's pass the real thing in the tests that this may matter.	2019-02-07 10:17:06 +00:00
Paweł Dziepak	d44d5ebf86	tests/sstables: test for reading serialisation header	2019-02-07 10:17:06 +00:00
Paweł Dziepak	ebf667fb9c	sstables/metadata_collector: move the default values to the global tracker column_stats is a per-partition tracker, while metadata_collector is the global one. The statistics gathered by column_stats are merged into the metadata_collector. In order to ensure that we get proper default values in case no value of particular kind (e.g. no TTLs) was seen they need to be set on the global tracker, not the per-partition one.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	2680022df0	utils/extremum_tracking: preserve "not-set" status on merge extremum_tracker allows choosing a default value that's going to be used only if no "real" values were provided. Since it is never compared with the actual input values it can be anything. For instance, if the minimum tracker default value is 0 and there was one update with the value 1 the detected minimum is going to be 1 (the default is ignored). However, this doesn't work when the trackers are merged since that process always leaves the destination tracker in the "set" state regardless whether any of the merged trakcers has ever seen any value. This is fixed by this patch, by properly preserving _is_set state on merge.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	84d8ee35d4	sstables: use correct source of statistics in get_encoding_stats_for_compaction() sstable class is responsible for much more things that it should. In particular, it takes care of both writing and reading sstables. The problem that it causes is that it is very easy to confuse those two. This is what has happened in get_encoding_stats_for_compaction(). Originally, it was using _c_stats as a source of the statistics, which is used only during the write and per-partition. Needless to say, the returned encoding_stats were bogus. The correct source of those statistics is get_stats_metadata().	2019-02-07 10:16:50 +00:00
Paweł Dziepak	e315448d0a	tests/memtable: test column tracking for encoding stats	2019-02-07 10:16:50 +00:00
Paweł Dziepak	591d5195a9	tests/data_model: allow specifying timestamp for row markers	2019-02-07 10:16:50 +00:00
Paweł Dziepak	b07cba6a89	tests/data_model: add support for expiring atomic cells	2019-02-07 10:16:50 +00:00
Paweł Dziepak	aab0b7360f	tests: extract mutation data model	2019-02-07 10:16:50 +00:00
Paweł Dziepak	fa216be260	tests/sstable3: improvements to file compare This patch introduces some improvement to file comparison: - exception flags are set so that any error triggers an exceptions and guarantees that they are not silently ignored - std::ios_base::binary flag is passed to open() - istreambuf_iterator is used instead of istream_iterator. It is better suited for comparing binary data.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	bc61471132	sstables/mc/writer: don't assume all schema columns are present The writer constructor prepares lists of present static and regular columns, those should be used for any further checks.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	0132bcc035	sstables: seal_statistics(): pass encoding_stats by constant reference	2019-02-07 10:16:50 +00:00
Paweł Dziepak	341f186933	memtable: move encoding_stats_collector implementation out of header	2019-02-07 10:16:50 +00:00
Paweł Dziepak	6d5c1a9813	sstables: compaction: don't access moved-from vector of sstables	2019-02-07 10:16:50 +00:00
Paweł Dziepak	a8a45a243b	tests/cql_test_env: don't override tmpdir::path The interface tmpdir::path isn't properly encapsulated and its users can modify the path even though they really shouldn't. This can happen accidentally, in cql_test_env a reference to tmpdir::path was created and later assigned to in one of the code paths. This caused tmpdir destructor to remove wrong directory at program exit. This patch solves the problem by avoiding referencing tmpdir::path, a copy is perfectly acceptable considering that this is tests-only code. Message-Id: <20190206173046.26801-1-pdziepak@scylladb.com>	2019-02-06 20:55:40 +02:00
Takuya ASADA	96b1cb97ba	dist/ami: don't cleanup build dir rm -rf build/* was to start rpm building on clean state, but it also delete scylla built binaries so it was not good idea. Instead of rm -rf build/*, we can check file existance on cloned directory, if it seems good we can reuse it. Also we need to run git pull on each package repo since it may not included latest commit. Fixes #4189 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190206101755.2056-1-syuu@scylladb.com>	2019-02-06 15:33:09 +02:00
Nadav Har'El	3e7dc7230d	build_deb.sh: fix error message The error message was apparently copied from the RPM script. Fix it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190205162148.20698-1-nyh@scylladb.com>	2019-02-05 18:22:36 +02:00
Avi Kivity	54748ad15b	Merge "Allow non-key IN restrictions" from Piotr " Fixes #4193 Fixes #3795 This series enables handling IN restrictions for regular columns, which is needed by both filtering and indexing mechanisms. Tests: unit (release) " * 'allow_non_key_in_restrictions' of https://github.com/psarna/scylla: tests: add filtering with IN restriction test cql3: remove unused can_have_only_one_value function cql3: allow non-key IN restrictions	2019-02-05 17:30:35 +02:00
Piotr Sarna	45db5da51b	tests: add filtering with IN restriction test Test case for filtering regular columns with IN restriction is added.	2019-02-05 16:04:17 +01:00
Piotr Sarna	36609d1376	cql3: remove unused can_have_only_one_value function	2019-02-05 16:04:17 +01:00
Piotr Sarna	c178ed8b16	cql3: allow non-key IN restrictions Restricting a regular column with IN restriction is a perfectly valid case for filtering and indexing, so it should be allowed. Fixes #4193 Fixes #3795	2019-02-05 15:50:17 +01:00
Rafael Ávila de Espíndola	84542dadfa	sstables: delete_atomically: don't drop futures We still allow the delete of rows from system.large_partition to run in parallel with the sstable deletion, but now we return a future that waits for both. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190205001526.68774-1-espindola@scylladb.com>	2019-02-05 16:47:58 +02:00
Calle Wilund	ba6a8ef35b	tls: Use a default prio string disabling TLS1.0 forcing min 128bits Fixes #4010 Unless user sets this explicitly, we should try explicitly avoid deprecated protocol versions. While gnutls should do this for connections initiated thusly, clients such as drivers etc might use obsolete versions. Message-Id: <20190107131513.30197-1-calle@scylladb.com>	2019-02-05 15:34:18 +02:00
Avi Kivity	6c71eae63f	Merge "API: Stream compaction history records" from Amnon " get_compaction_history can return a lot of records which will add up to a big http reply. This series makes sure it will not create large allocations when returning the results. It adds an api to the query_processor to use paged queries with a consumer function that returns a future, this way we can use the http stream after each record. This implementation will prevent large allocations and stalls. Fixes #4152 " * 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev: tests/query_processor_test: add query_with_consumer_test system_keyspace, api: stream get_compaction_history query_processor: query and for_each_cql_result with future	2019-02-05 14:16:36 +02:00
Avi Kivity	ebf179318c	Merge "SI: Add virtual columns to underlying MV" from Duarte " Virtual columns are MV-specific columns that contribute to the liveness of view rows. However, we were not adding those columns when creating an index's underlying MV, causing indexes to miss base rows. Fixes #4144 Branches: master, branch-3.0 " Reviewed-by: Nadav Har'El <nyh@scylladb.com> * 'sec-index/virtual-columns/v1' of https://github.com/duarten/scylla: tests/secondary_index_test: Add reproducer for #4144 index/secondary_index_manager: Add virtual columns to MV	2019-02-05 13:26:45 +02:00
Avi Kivity	367ef8d318	Merge "provide our own, relocatable, python3 interpreter" from Glauber " We would like to deploy Scylla in constrained environments where internet access is not permitted. In those environments it is not possible to acquire the dependencies of Scylla from external repos and the packages have to be sent alongside with its dependencies. In older distributions, like CentOS7 there isn't a python3 interpreter available. And while we can package one from EPEL this tends to break in practice when installing the software in older patchlevels (for instance, installing into RHEL7.3 when the latest is RHEL7.5). The reason for that, as we saw in practice, is that EPEL may not respect RHEL patchlevels and have the python interpreter depending on newer versions of some system libraries. virtualenv can be used to create isolated python enviornments, but it is not designed for full isolation and I hit at least two roadblocks in practice: 1) It doesn't copy the files, linking some instead. There is an --always-copy option but it is broken (for years) in some distributions. 2) Even when the above works, it still doesn't copy some files, relying on the system files instead (one sad example was the subprocess module that was just kept in the system and not moved to the virtualenv) This patch solves that problem by creating a python3 environment in a directory with the modules that Scylla uses, and no other else. It is essentially doing what vitualenv should do but doesn't. Once this environment is assembled the binaries are then made relocatable the same way the Scylla binary is. One difference (for now) between the Scylla binary relocation process and ours is that we steer away from LD_LIBRARY_PATH: the environment variable is inherited by any child process steming from the caller, which means that we are unable to use the subprocess module to call system binaries like mkfs (which our scripts do a lot). Instead, we rely on RUNPATH to tell the binary where to search for its libraries. Once we generate an archive with the python3 interpreter, we then package it as an rpm with bare any dependencies. The dependencies listed are: $ rpm -qpR scylla-relocatable-python3-3.6.7-1.el7.x86_64.rpm rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(PayloadIsXz) <= 5.2-1 And the total size of that rpm, with all modules scylla needs is 20MB. The Scylla rpm now have a way more modest dependency list: $ rpm -qpR scylla-server-666.development-0.20190121.80b7c7953.el7.x86_64.rpm \| sort \| uniq /bin/sh curl file hwloc kernel >= 3.10.0-514 mdadm pciutils rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(PayloadIsXz) <= 5.2-1 scylla-conf scylla-relocatable-python3 <== our python3 package. systemd-libs util-linux xfsprogs I have tested this end to end by generating RPMs from our master branch, then installing them in a clean CentOS7.3 installation without even using yum, just rpm -Uhv <package_list> Then I called scylla_setup to make sure all python scripts were working and started Scylla successfully. " * 'scylla-python3-v5' of github.com:glommer/scylla: Create a relocatable python3 interpreter spec file: fix python3 dependency list. fixup scripts before installing them to their final location automatically relocate python scripts make scyllatop relocatable use relative paths for installing scylla and iotune binaries	2019-02-05 12:53:34 +02:00
Amnon Heiman	c96c3ce9e8	tests/query_processor_test: add query_with_consumer_test This patch adds a unit test for querying with a consumer function. query with consumer uses paging, the tests covers the scenarios where the number of rows bellow and above the page size, it also test the option to stop in the middle of reading. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-02-05 12:35:53 +02:00
Amnon Heiman	6c7742d616	system_keyspace, api: stream get_compaction_history get_compaciton_history can return big chunk of data. To prevent large memory allocation, the get_compaction_history now read each compaction_history record and use the http stream to send it. Fixes #4152 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-02-05 11:14:53 +02:00
Amnon Heiman	c0e3b7673d	query_processor: query and for_each_cql_result with future query and for_each_cql_result accept a function that reads a row and return a stop_iterator. This implementation of those functions gets a function that returns a future stop_iterator allowing preemption between calls. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-02-05 11:14:53 +02:00
Glauber Costa	afed2cddae	Create a relocatable python3 interpreter We would like to deploy Scylla in constrained environments where internet access is not permitted. In those environments it is not possible to acquire the dependencies of Scylla from external repos and the packages have to be sent alongside with its dependencies. In older distributions, like CentOS7 there isn't a python3 interpreter available. And while we can package one from EPEL this tends to break in practice when installing the software in older patchlevels (for instance, installing into RHEL7.3 when the latest is RHEL7.5). The reason for that, as we saw in practice, is that EPEL may not respect RHEL patchlevels and have the python interpreter depending on newer versions of some system libraries. virtualenv can be used to create isolated python enviornments, but it is not designed for full isolation and I hit at least two roadblocks in practice: 1) It doesn't copy the files, linking some instead. There is an --always-copy option but it is broken (for years) in some distributions. 2) Even when the above works, it still doesn't copy some files, relying on the system files instead (one sad example was the subprocess module that was just kept in the system and not moved to the virtualenv) This patch solves that problem by creating a python3 environment in a directory with the modules that Scylla uses, and no other else. It is essentially doing what vitualenv should do but doesn't. Once this environment is assembled the binaries are then made relocatable the same way the Scylla binary is. One difference (for now) between the Scylla binary relocation process and ours is that we steer away from LD_LIBRARY_PATH: the environment variable is inherited by any child process steming from the caller, which means that we are unable to use the subprocess module to call system binaries like mkfs (which our scripts do a lot). Instead, we rely on RUNPATH to tell the binary where to search for its libraries. In terms of the python interpreter, PYTHONPATH does not need to be set for this to work as the python interpreter will include the lib directory in its PYTHONPATH. To confirm this, we executed the following code: bin/python3 -c "import sys; print('\n'.join(sys.path))" with the interpreter unpacked to both /home/centos/glaubertmp/test/ and /tmp. It yields respectively: /home/centos/glaubertmp/test/lib64/python36.zip /home/centos/glaubertmp/test/lib64/python3.6 /home/centos/glaubertmp/test/lib64/python3.6/lib-dynload /home/centos/glaubertmp/test/lib64/python3.6/site-packages and /tmp/python/lib64/python36.zip /tmp/python/lib64/python3.6 /tmp/python/lib64/python3.6/lib-dynload /tmp/python/lib64/python3.6/site-packages This was tested by moving the .tar.gz generated on my Fedora28 laptop to a CentOS machine without python3 installed. I could then invoke ./scylla_python_env/python3 and use the interpreter to call 'ls' through the subprocess module. I have also tested that we can successfully import all the modules we listed for installation and that we can read a sample yaml file (since PyYAML depends on the system's libyaml, we know that this works) Time to build: real 0m15.935s user 0m15.198s sys 0m0.382s Final archive size (uncompressed): 81MB Final archive sie (compressed) : 25MB Signed-off-by: Glauber Costa <glauber@scylladb.com> -- v3: - rewrite in python3 - do not use temporary directories, add directly to the archive. Only the python binary have to be materialized - Use --cacheonly for repoquery, and also repoquery --list in a second step to grab the file list v2: - do not use yum, resolve dependencies from installed packages instead - move to scripts as Avi wants this not only for old offline CentOS	2019-02-04 18:02:40 -05:00
Glauber Costa	f757b42ba7	spec file: fix python3 dependency list. The dependency list as it was did not reflect the fact that scyllatop is now written in python3. Some packages, like urwid, should use the python3 version. CentOS doesn't really have an urwid package for python3, not even in EPEL. So this officially marks the point in which we can't build packages that will install in CentOS7 anyway. Luckily, we will soon be providing our own python3 interpreter. But for now, as a first step, simplify the dependency list by removing the CentOS/Fedora conditional and listing the full python3 list Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 18:02:40 -05:00
Glauber Costa	7052028752	fixup scripts before installing them to their final location Before installing python files to their final location in install.sh, replace them with a thunk so that they can work with our python3 interpreter. The way the thunk works, they will also work without our python3 interpreter so unconditionally fixing them up is always safe. I opt in this patch for fixing up just at install time to simplify developer's life, who won't have to worry about this at all. Note about the rpm .spec file: since we are relying on specific format for the shebangs, we shouldn't let rpmbuild mess with them. Therefore, we need to disable a global variable that controls that behavior (by definition, Fedora rpmbuild will rewrite all shebangs to /usr/bin/python3) Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 18:02:40 -05:00
Glauber Costa	3869628429	automatically relocate python scripts Given a python script at $DIR/script.py, this copies the script to $DIR/libexec/script.py.bin, fixes its shebang to use /usr/bin/env instead of an absolute path for the interpreter and replaces the original script with a thunk that calls into that script. PYTHONPATH is adjusted so that the original directory containing the script can also serve as a source of modules, as would be originally intended. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 18:02:39 -05:00
Glauber Costa	1bb65a0888	make scyllatop relocatable Right now the binary we distribute with scyllatop calls into /usr/lib/scylla/scyllatop/scyllatop.py unconditionally. Calling that is all that this binary does. This poses a problem to our relocatable process, since we don't want to be referring to absolute paths (And moreover, that is calling python whereas it should be calling python3) The scyllatop.py files includes a python3 shebang and is executable. Therefore, it is best to just create a link to that file and execute it directly Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 16:12:46 -05:00
Glauber Costa	e890b8af09	use relative paths for installing scylla and iotune binaries The answer is yes: if we install them in $root/opt, we should link to $root/opt Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-02-04 14:33:51 -05:00
Piotr Jastrzebski	834bec5cc9	Read shard awareness columns as dropped Without this new version of Scylla won't be able to start with system tables inherited after older version that had shard awareness columns. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <cb62f20fc0c98f532c6f4ad5e08b3794951e85bd.1549289050.git.piotr@scylladb.com>	2019-02-04 18:43:11 +02:00
Rafael Ávila de Espíndola	bbd9dfcba7	Add a --split-dwarf option to configure.py It is off by default as it conflicts with distcc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190204002706.15540-1-espindola@scylladb.com>	2019-02-04 18:42:16 +02:00
Benny Halevy	a9e1e0233a	Add a dev build mode to test.py Message-Id: <20190204162112.7471-2-espindola@scylladb.com>	2019-02-04 18:38:23 +02:00
Rafael Ávila de Espíndola	6243443591	Add a dev build mode The build times I got with a clean ccache were: ninja dev 10806.89s user 678.29s system 2805% cpu 6:49.33 total ninja release 28906.37s user 1094.53s system 2378% cpu 21:01.27 total ninja debug 18611.17s user 1405.66s system 2310% cpu 14:26.52 total With this version -gz is not passed to seastar's configure. It should probably be seastar's configure responsibility to do that and I will send a separate patch to do it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190204162112.7471-1-espindola@scylladb.com>	2019-02-04 18:38:22 +02:00
Calle Wilund	9cadbaa96f	commitlog_replayer: Bugfix: finding truncation positions uses local var ref "uuid" was ref:ed in a continuation. Works 99.9% of the time because the continuation is not actually delayed (and assuming we begin the checks with non-truncated (system) cf:s it works). But if we do delay continuation, the resulting cf map will be borked. Fixes #4187. Message-Id: <20190204141831.3387-1-calle@scylladb.com>	2019-02-04 16:51:13 +02:00
Rafael Ávila de Espíndola	15a515a39b	build: Don't link utils/gz/gen_crc_combine_table with seastar It doesn't use seastar, so there is no point in linking with it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190203214145.43009-1-espindola@scylladb.com>	2019-02-04 15:43:16 +02:00
Botond Dénes	2a67355ded	multishard_combining_reader: better shard selection algorithm The multishard reader has to combine the output of all shards into a single fragment stream. To do that, each time a `partition_start` is read it has to check if there is another partition, from another shard, that has to be emitted before this partition. Currently for this it uses the partitioner. At every partition start fragment it checks if the token falls into the current shard sub-range. The shard sub-range is the continuous range of tokens, where each token belongs to the same shard. If the partition doesn't belong to the current shard sub-range the multishard reader assumes the following shard sub-range of the next shard will have data and move over to it. This assumption will however only stand on very dense tables, and will fail miserably on less dense tables, resulting in the multishard reader effectively iterating over the shard sub-ranges (4096 in the worst case), only to find data in just a few of them. This resulted in high user-perceived latency when scanning a sparse table. This patch replaces this algorithm with one based on a shard heap. The shards are now organized into a min-heap, by the next token they have data for. When a partition start fragment is read from the current shard, its token is compared to the smallest token in the shard heap. If smaller, we continue to read from the current shard. Otherwise we move to the shard with the smallest token. When constructing the reader, or after fast-forwarding we don't know what first token each reader will produce. To avoid reading in a partition from each reader, we assume each reader will produce the first token from the first shard sub-range that overlaps with the query range. This algorithm performs much better on sparse tables, while also being slightly better on dense tables. I did only a very rough measurement using CQL tracing. I populated a table with four rows on a 64 shards machine, then scanned the entire table. Time to scan the table (microseconds): before 27'846 after 5'248 Fixes: #4125 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d559f887b650ab8caa79ad4d45fa2b7adc39462d.1548846019.git.bdenes@scylladb.com>	2019-02-04 14:10:23 +02:00
Piotr Sarna	11e6d88ca7	tests: supplement filtering collections with more cases Filtering test cases for collections are supplemented with checking whether CONTAINS works correctly for sets and maps. Message-Id: <4a684152cdcdb65e1415ba5859699cb324312c2b.1548837150.git.sarna@scylladb.com>	2019-02-03 17:19:30 +02:00
Avi Kivity	468f8c7ee7	Merge "Print a warning if a row is too large" from Rafael " This is a first step in fixing #3988. " * 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla: Rename large_partition_handler Print a warning if a row is too large Remove defaut parameter value Rename _threshold_bytes to _partition_threshold_bytes keys: add schema-aware printing for clustering_key_prefix	2019-02-03 13:57:42 +02:00
Nadav Har'El	5a695b8029	Materialized views: fix three error messages Three error messages were supposed to include a column name, but a "{}" was missing in the format so the given column name didn't actually appear in the error message. So this patch adds the missing {}'s. Fixes #4183. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190203112100.13031-1-nyh@scylladb.com>	2019-02-03 12:23:29 +01:00
Tomasz Grabiec	72dd6f54e3	gdb: Print total amount of memory used by small and large allocations Message-Id: <1548956406-7601-2-git-send-email-tgrabiec@scylladb.com>	2019-02-01 13:18:16 +00:00
Tomasz Grabiec	f48fa542fc	gdb: Extend 'scylla memory' to show memory used by large allocations Adds new columns to the "Page spans" table named "large [B]" and "[spans]", which shows how much memory is allocated in spans of given size. Excludes spans used by small pools. Useful in determining what is the size of large allocations which consume the memory. Example output: Page spans: index size [B] free [B] large [B] [spans] 0 4096 4096 4096 1 1 8192 32768 0 0 2 16384 16384 0 0 3 32768 98304 2785280 85 4 65536 65536 1900544 29 5 131072 524288 471597056 3598 ... 31 8796093022208 0 0 0 Large allocations: 484675584 [B] Message-Id: <1548956406-7601-1-git-send-email-tgrabiec@scylladb.com>	2019-02-01 13:18:01 +00:00
Asias He	28d6d117d2	migration_manager: Fix nullptr dereference in maybe_schedule_schema_pull Commit `976324bbb8` changed to use get_application_state_ptr to get a pointer of the application_state. It may return nullptr that is dereferenced unconditionally. In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw: 4 nodes in the tests n1, n2, n3, n4 are started n1 is stopped n1 is changed to use different shard config n1 is restarted ( 2019-01-27 04:56:00,377 ) The backtrace happened on n2 right fater n1 restarts: 0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled 1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled 2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled 3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed) 4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status = 5 Segmentation fault on shard 0. 6 Backtrace: 7 0x00000000041c0782 8 0x00000000040d9a8c 9 0x00000000040d9d35 10 0x00000000040d9d83 11 /lib64/libpthread.so.0+0x00000000000121af 12 0x0000000001a8ac0e 13 0x00000000040ba39e 14 0x00000000040ba561 15 0x000000000418c247 16 0x0000000004265437 17 0x000000000054766e 18 /lib64/libc.so.6+0x0000000000020f29 19 0x00000000005b17d9 We do not know when this backtrace happened, but according to log from n3 an n4: INFO 2019-01-27 04:56:22,154 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL INFO 2019-01-27 04:56:21,594 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL We can be sure the backtrace on n2 happened before 04:56:21 - 19 seconds (the delay the gossip notice a peer is down), so the abort time is around 04:56:0X. The migration_manager::maybe_schedule_schema_pull that triggers the backtrace must be scheduled before n1 is restarted, because it dereference application_state pointer after it sleeps 60 seconds, so the time maybe_schedule_schema_pull is called is around 04:55:0X which is before n1 is restarted. So my theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time n1 has SCHEMA application_state, when n1 restarts, n2 gets new application state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty application_state for SCHEMA. We dereference the nullptr application_state and abort. Fixes: #4148 Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test Message-Id: <9ef33277483ae193a49c5f441486ee6e045d766b.1548896554.git.asias@scylladb.com>	2019-02-01 09:01:08 +02:00
Piotr Jastrzebski	ad217bbdc7	Revert "system_keyspace: add sharding information to local table" This reverts commit `bdce561ada`. Those columns are not used and cause problems with tools. Refs #4112 Message-Id: <c772ebc0ebc001e5bdf229424c6d51dc58cd5d2e.1548945023.git.piotr@scylladb.com>	2019-01-31 19:06:55 +01:00
Avi Kivity	9adf46b50e	Update seastar submodule * seastar 2f35731...c3be06d (1): > rpc: support closing streaming when only sink or source was created Ref #4124.	2019-01-31 12:39:02 +02:00
Nadav Har'El	7b9b7f8ebc	docs/metrics.md: document syntax for choosing specific instance/shard As another useful example of Prometheus syntax, show the syntax of plotting a graph for one particular node or shard. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190129221607.11813-1-nyh@scylladb.com>	2019-01-31 12:37:30 +02:00
Asias He	9d9ecda619	repair: Log keyspace and table name in repair_cf_range When a repair failed, we saw logs like: repair - Checksum of range (8235770168569320790, 8235957818553794560] on 127.0.0.1 failed: std::bad_alloc (std::bad_alloc) It is hard to tell which keyspace and table has failed. To fix, log the keyspace and table name. It is useful to know when debugging. Fixes #4166 Message-Id: <8424d314125b88bf5378ea02a703b0f82c2daeda.1548818669.git.asias@scylladb.com>	2019-01-31 12:36:46 +02:00
Gleb Natapov	a70374d982	messaging_service: do not forget to close stream when sending it to another side failed Fixes #4124 Message-Id: <20190131091857.GC3172@scylladb.com>	2019-01-31 12:01:56 +02:00
Piotr Jastrzebski	4b47094f30	Prevent undefined behaviour while writing range tombstones in LA/KA Stop calling .remove_suffix on empty string_view. ck_bview can be empty because this function can be called for a half open range tombstone. It is impossible to write such range tombstones to LA/KA SSTables so we should throw a proper exception instead of allowing an undefined behaviour. Refs #4113 Tests: unit(release) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c3738916953e4b10812aed95e645c739b4c29462.1548777086.git.piotr@scylladb.com>	2019-01-31 10:58:19 +01:00
Glauber Costa	94ead559f7	move scylla-housekeeping to dist/common/scripts All of our python scripts are there and they are all installed automatically into /usr/lib/scylla. By keeping scylla-housekeeping separately we are just complicating our build process. This would be just a minor annoyance but this broke the new relocatable process for python3 that I am trying to put together because I forgot to add the new location as a source for the scripts. Therefore, I propose we start being more diligent with this and keeping all scripts together for the future. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190123191732.32126-2-glauber@scylladb.com>	2019-01-31 11:44:34 +02:00
Jesse Haber-Kucharsky	c37aa258c5	build: Fix incremental builds when Seastar changes When a file in the `seastar` directory changes, we want to minimize the amount of Scylla artifacts that are re-built while ensuring that all changes in Seastar are reflected in Scylla correctly. For compiling object files, we change Seastar to be an "order only" dependency so that changes to Seastar don't trigger unnecessary builds. For linking, we add an "implicit" dependency on Seastar so that Scylla is re-linked when Seastar changes. With these changes, modifying a Seastar header file will trigger the recompilation of the affected Scylla object files, and modifying a Seastar source file will trigger linking only. Fixes #4171 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <0ab43d79ce0d41348238465d1819d4c937ac6414.1548906335.git.jhaberku@scylladb.com>	2019-01-31 11:00:40 +02:00
Raphael S. Carvalho	930f8caff9	sstables/compaction: Fix segfault when replacing expired sstable in incremental compaction Fully expired sstable is not added to compacting set, meaning it's not actually compacted, but it's kept in a list of sstables which incremental compaction uses to check if any sstable can be replaced. Incremental compaction was unconditionally removing expired sstable from compacting set, which led to segfault because end iterator was given. The fix is about changing sstable_set::erase() behavior to follow standard one for erase functions which will works if the target element is not present. Fixes #4085. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190130163100.5824-1-raphaelsc@scylladb.com>	2019-01-30 16:32:45 +00:00
Avi Kivity	056b6a4439	Update seastar submodule * seastar 07e1ed3...2f35731 (1): > Merge " Initial seastar ipv6 support" from Calle	2019-01-30 17:41:39 +02:00
Avi Kivity	1224cde871	Merge "Make perf_simple_query produce JSON results" from Paweł " This series enhances perf_simple_query error reporting by adding an option of producing a json file containing the results. The format of that file is very similar to the results produces by perf_fast_forward in order to ease integration with any tools that may want to interpret them. In addition to that perf_simple_query now prints to the standard output median, median absolute deviation, minimum and maximum of the partial results, so that there is no need for external scripts to compute those values. " * tag 'perf_simple_query-json/v1' of https://github.com/pdziepak/scylla: perf_simple_query: produce json results perf_simple_query: calculate and print statistics perf: time_parallel: return results of each iteration perf_simple_query: take advantage of threads in main()	2019-01-30 17:39:19 +02:00
Paweł Dziepak	6a0ee5dbbf	Merge "Simpler fix for the memtable reader's fragment monotonicity violation" from Botond " Recently it was discovered that the memtable reader (partition_snapshot_reader to be more precise) can violate mutation fragment monotonicity, by remitting range tombstones when those overlap with more than one ck range of the partition slice. This was fixed by `7049cd9`, however after that fix was merged a much simpler fix was proposed by Tomek, one that doesn't involve nearly as much changes to the partition snapshot reader and hences poses less risk of breaking it. This mini-series reverts the previous fix, then applies the new, simpler one. Refs: #4104 " * 'partition-snapshot-reader-simpler-fix/v2' of https://github.com/denesb/scylla: partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges Revert "partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges"	2019-01-30 15:24:31 +00:00
Jesse Haber-Kucharsky	b39eac653d	Switch to the the CMake-ified Seastar Committer: Avi Kivity <avi@scylladb.com> Branch: next Switch to the the CMake-ified Seastar This change allows Scylla to be compiled against the `master` branch of Seastar. The necessary changes: - Add `-Wno-error` to prevent a Seastar warning from terminating the build - The new Seastar build system generates the pkg-config files (for example, `seastar.pc`) at configure time, so we don't need to invoke Ninja to generate them - The `-march` argument is no longer inherited from Seastar (correctly), so it needs to be provided independently - Define `SEASTAR_TESTING_MAIN` so that the definition of an entry point is included for all unit test compilation units - Independently link Scylla against Seastar's compiled copy of fmt in its build directory - All test files use the (now public) Seastar testing headers - Add some missing Seastar headers to source files [avi: regenerate frozen toolchain, adjust seastar submoule] Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <02141f2e1ecff5cbcd56b32768356c3bf62750c4.1548820547.git.jhaberku@scylladb.com>	2019-01-30 11:17:38 +02:00
Botond Dénes	8d59c36165	partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges When entering a new ck range (of the partition-slice), the partition snapshot reader will apply to its range tombstones stream all the tombstones that are relevant to the new ck range. When the partition has range tombstones that overlap with multiple ck ranges, these will be applied to the range tombstone stream when entering any of the ck ranges they overlap with. This will result in the violation of the monotonicity of the mutation fragments emitted by the reader, as these range tombstones will be re-emitted on each ck range, if the ck range has at least one clustering row they apply to. For example, given the following partition: rt{[1,10]}, cr{1}, cr{2}, cr{3}... And a partition-slice with the following ck ranges: [1,2], [3, 4] The reader will emit the following fragment stream: rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ... Note how the range tombstone is emitted twice. In addition to violating the monotonicity guarantee, this can also result in an explosion of the number of emitted range tombstones. Fix by trimming range tombstones to the start of the current ck range, thus ensuring that they will not violate mutation fragment monotonicity guarantees. Refs: #4104 This is a much simpler fix for the above issue, than the already committed one (7049cd937A). The latter is reverted by the previous patch and this patch applies the simpler fix.	2019-01-30 10:01:13 +02:00
Nadav Har'El	9dd3c59c77	docs/metrics.md: explain Prometheus and Grafana docs/metrics.md so far explained just the REST API for retrieving current metrics from a single Scylla node. In this patch, I add basic explanations on how to use the Prometheus and Grafana tools included in the "scylla-grafana-monitoring" project. It is true that technically, what is being explained here doesn't come with the Scylla project and requires the separate scylla-grafana-monitoring to be installed as well. Nevertheless, most Scylla developers will need this knowledge eventually and suprisingly it appears it was never documented anywhere accessible to newbie developers, and I think metrics.md is the right place to introduce it. In fact, I myself wasn't aware until today that Prometheus actually had its own Web UI on port 9090, and that it is probably more useful for developers than Grafana is. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190129114214.17786-1-nyh@scylladb.com>	2019-01-29 15:46:06 +02:00
Duarte Nunes	35c03f41a4	Merge 'Fix multiple contains for one column' from Piotr " An error in validating CONTAINS restrictions against collections caused only the first restriction to be taken into account due to returning prematurely. This miniseries provides a fix for that as well as a matching test case. Tests: unit (release) Fixes #4161 " * 'fix_multiple_contains_for_one_column' of https://github.com/psarna/scylla: tests: enable CONTAINS tests for filtering cql3: remove premature return from is_satisfied_by cql3: restore indentation	2019-01-29 11:10:13 +00:00
Piotr Sarna	11aae54cca	tests: enable CONTAINS tests for filtering Tests for filtering with CONTAINS restrictions were not enabled, so they are now. Also, another case for having two CONTAINS restrictions for a single column is added. Refs #4161	2019-01-29 11:47:28 +01:00
Piotr Sarna	9595fec2ec	cql3: remove premature return from is_satisfied_by Function which checked whether a CONTAINS restriction is satisfied by a collection erroneously returned prematurely after checking just the first restriction - which works fine for the usual case, but fails if there are multiple CONTAINS restrictions present for a column. Fixes #4161	2019-01-29 11:47:28 +01:00
Piotr Sarna	89af01315d	cql3: restore indentation	2019-01-29 11:47:28 +01:00
Rafael Ávila de Espíndola	625080b414	Rename large_partition_handler Now that it also handles large rows, rename it to large_data_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:14 -08:00
Rafael Ávila de Espíndola	1185138a34	Print a warning if a row is too large Tests: unit (release) Refs #3988. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:10 -08:00
Rafael Ávila de Espíndola	776d5bb9e2	Remove defaut parameter value The value is already passed by cql_table_large_partition_handler, so the default was just for nop_large_partition_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola	30528fa853	Rename _threshold_bytes to _partition_threshold_bytes A followup patch will add a threshold for rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola	561285488b	keys: add schema-aware printing for clustering_key_prefix For reporting large rows we have to be able to print clustering keys in addition to partition keys. Refs #3988. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 13:01:54 -08:00
Paweł Dziepak	335dca54a5	perf_simple_query: produce json results	2019-01-28 16:36:06 +00:00
Paweł Dziepak	7d21c9c31f	perf_simple_query: calculate and print statistics	2019-01-28 16:36:06 +00:00
Paweł Dziepak	eb3d80fa2b	perf: time_parallel: return results of each iteration	2019-01-28 16:35:33 +00:00
Pekka Enberg	7bda3abbc6	toolchain/dbuild: Fix permission errors when SELinux is enabled Use the ":z" suffix to tell Docker to relabel file objets on shared volumes. Fixes accessing filesystem via dbuild when SELinux is enabled. Message-Id: <20190128160557.2066-1-penberg@scylladb.com>	2019-01-28 18:16:53 +02:00
Paweł Dziepak	6a1e1e8454	perf_simple_query: take advantage of threads in main()	2019-01-28 13:21:08 +00:00
Paweł Dziepak	11a1f97307	Merge "Fix cleanup of temporary sstable directories" from Benny " Cleanup of temporary sstable directories in distributed_loader::populate_column_family is completely broken and non tested. This code path was never executed since populate_column_family doesn't currently list subdirectories at all. This patchset fixes this code path and scans subdirectories in populate_column_family. Also, a unit test is added for testing the cleanup of incomplete (unsealed) sstables. Fixes: #4129 " * 'projects/sst-temp-dir-cleanup/v3' of https://github.com/bhalevy/scylla: tests: add test_distributed_loader_with_incomplete_sstables tests: single_node_cql_env::do_with: use the provided data_file_directories path if available tests: single_node_cql_env::_data_dir is not used distributed_loader: populate_column_family should scan directories too sstables: fix is_temp_dir distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir distributed_loader: remove temporary sstable directories only on shard 0 distributed_loader: push future returned by rmdir into futures vector	2019-01-28 12:23:00 +00:00
Duarte Nunes	ea34e242de	Merge 'Do not use hints for view building' from Piotr " This series prevents view building to fall back to storing hints. Instead, it will try to send hints to an endpoint as if it has consistency level ONE, and in case of failure retry the whole building step. Then, view building will never be marked as finished prematurely (because of pending hints), which will help avoid creating inconsistencies when decommissioning a node from the cluster. Tests: unit (release) dtest (materialized_views_test.py.) Fixes #3857 Fixes #4039 " 'do_not_mark_view_as_built_with_hints_7' of https://github.com/psarna/scylla: db,view: add updating view_building_paused statistics database: add view_building_paused metrics table: make populate_views not allow hints db,view: add allow_hints parameter to mutate_MV storage_proxy: add allow_hints parameter to send_to_endpoint	2019-01-28 10:31:14 +00:00
Piotr Sarna	9a6261ca27	db,view: add updating view_building_paused statistics Each time view building does is paused because of connection failure, view_building_paused metrics is bumped.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e30b0663d6	database: add view_building_paused metrics The metrics exposes how many times view building process was paused, e.g. because target node was down or overloaded.	2019-01-28 09:38:42 +01:00
Piotr Sarna	5dec6dc6c6	table: make populate_views not allow hints View building uses populate_views to generate and send view updates. This procedure will now not allow hints to be used to acknowledge the write. Instead, the whole building step will be retried on failure. Fixes #3857 Fixes #4039	2019-01-28 09:38:42 +01:00
Piotr Sarna	e30cf22956	db,view: add allow_hints parameter to mutate_MV Mutating MV function can now accept a parameter whether hints should be allowed during sending mutations to endpoints.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e0fe9ce2c0	storage_proxy: add allow_hints parameter to send_to_endpoint With hints allowed, send_to_endpoint will leverage consistency level ANY to send data. Otherwise, it will use the default - cl::ONE.	2019-01-28 09:38:41 +01:00
Rafael Ávila de Espíndola	5332ebd50c	Update the description of compaction_large_partition_warning_threshold_mb Despite the name, this option also controls if a warning is issued during memtable writes. Warning during memtable writes is useful but the option name also exists in cassandra, so probably the best we can do is update the description. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190125020821.72815-1-espindola@scylladb.com>	2019-01-28 09:09:35 +02:00
Takuya ASADA	5c6c008109	dist/ami: follow build script changes on -jmx/-tools/-ami packages We need to follow changes of rpm package build procedure on -jmx/-tools/-ami packages, since it have been changed when we merged relocatable pacakge. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190127204436.13959-1-syuu@scylladb.com>	2019-01-28 09:08:32 +02:00
Takuya ASADA	7db1b45839	reloc: move relocatable libraries from /opt/scylladb/lib to /opt/scylladb/libreloc On Scylla 3rdparty tools, we add /opt/scylladb/lib to LD_LIBRARY_PATH. We use same directory for relocatable binaries, including libc.so.6. Once we install both scylla-env package and relocatable version of scylla-server package, the loader tries to load libc from /opt/scylladb/lib then entire distribution become unusable. We may able to use Obsoletes or Conflict tag on .rpm/.deb to avoid install new Scylla package with scylla-env, but it's better & safer not to share same directory for different purpose. Fixes #3943 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190128023757.25676-1-syuu@scylladb.com>	2019-01-28 09:04:56 +02:00
Avi Kivity	274f553485	tools: toolchain: run dbuild container with same timezone as host Make it easier to work interactively by not reporting surprising times. There are also reports that dtest fails with incorrect timezones, but those are probably bugs in dtest. Message-Id: <20190127134754.1428-1-avi@scylladb.com>	2019-01-27 22:48:42 +00:00
Duarte Nunes	aafaf840a2	tests/secondary_index_test: Add reproducer for #4144 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-01-27 22:30:34 +00:00
Duarte Nunes	aa476cd6c9	index/secondary_index_manager: Add virtual columns to MV Virtual columns are MV-specific columns that contribute to the liveness of view rows. However, we were not adding those columns when creating an index's underlying MV, causing indexes to miss base rows. Fixes #4144 Branches: master, branch-3.0 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-01-27 22:30:12 +00:00
Benny Halevy	36b6a3ebcf	tests: add test_distributed_loader_with_incomplete_sstables Test removal of sstables with temporary TOC file, with and without temporary sstable directory. Temporary sstable directories may be empty or still have leftover components in them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:48:24 +02:00
Benny Halevy	64a23ea3bc	tests: single_node_cql_env::do_with: use the provided data_file_directories path if available Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	441809094a	tests: single_node_cql_env::_data_dir is not used Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	74ef09a3a2	distributed_loader: populate_column_family should scan directories too To detect and cleanup leftover temporary sstable directories. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	bd85975277	sstables: fix is_temp_dir 1. fs::canonical required that the path will exist. and there is no need for fs::canonical here. 2. fs::path::extension will return the leading dot. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	c2a5f3b842	distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir populate_column_family currently lists only regular files. ignoring all directories. A later patch in this series allows it to list also directories so to cleanup the temporary sstable directories, yet valid sub-directories, like staging\|upload\|snapshots, may still exist and need to be ignored. Other kinds of handling, like validating recgnized sub-directories and halting on unrecognized sub-directories are possible, yet out of scope for this patch(set). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	9bd7b2f4e6	distributed_loader: remove temporary sstable directories only on shard 0 Similar to calling remove_sstable_with_temp_toc later on in populate_column_family(), we need only one thread to do the cleanup work and the existing convention is that it's shard 0. Since lister::rmdir is checking remove_file of all entries (recursively) and the dir itself, doing that concurrently would fail. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-27 14:14:32 +02:00
Benny Halevy	bcfb2e509b	distributed_loader: push future returned by rmdir into futures vector	2019-01-27 14:14:32 +02:00
Asias He	ee0bb0aa94	tests: Drop the unsupported random_read mode in perf_sstable It is not supported. Remove it. Message-Id: <fe31e090574be96a9620b6902ceb843699d558d0.1548403105.git.asias@scylladb.com>	2019-01-25 14:24:40 +00:00
Avi Kivity	85abb13679	Merge "Fix cross shard cf usage" from Piotr " Lambda passed to distribute_reader_and_consume_on_shards shouldn't capture shard local variables. Fixes #4108 Tests: unit(release), dtest(update_cluster_layout_tests.TestLargeScaleCluster.add_50_nodes_test) " * 'haaawk/4108/v2' of github.com:scylladb/seastar-dev: Fix cross shard cf usage in repair Fix cross shard cf usage in streaming	2019-01-24 19:40:44 +02:00
Avi Kivity	d0f9e00e85	Merge " Support 64-bit gc_clock" (fixes) from Benny " Use int64_t in data::cell for expiry / deletion time. Extend time_overflow unit tests in cql_query_test to use select statements with and without bypass cache to access deeper into the system. Refs #3353 " * 'projects/gc_clock_64_fixes/v1' of https://github.com/bhalevy/scylla: tests: extend time_overflow unit tests data::cell: use int64_t for expiry and deletion time	2019-01-24 19:15:12 +02:00
Piotr Jastrzebski	fab1b7a3a2	Fix cross shard cf usage in repair Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 18:13:49 +01:00
Piotr Jastrzebski	1ac7283550	Fix cross shard cf usage in streaming Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 18:13:30 +01:00
Glauber Costa	ec66dd6562	scylla_setup: tell users about the possibility of a non-interactive session From day1, scylla_setup can be run either iteractively or through command line parameters. Still, one of the requests we are asked the most from users is whether we can provide them with a version of scylla_setup that they can call from their scripts. This probably happens because once you call a script interactively, it may not be totally obvious that a different mode is available. Even when we do tell users about that possibility, the request number two is then "which flags do I pass?" The solution I am proposing is to just tell users the answers to those qestions at the end of an interactive session. After this patch, we print the following message to the console: ScyllaDB setup finished. scylla_setup accepts command line arguments as well! For easily provisioning in a similar environmen than this, type: scylla_setup --no-raid-setup --nic eth0 --no-kernel-check \ --no-verify-package --no-enable-service --no-ntp-setup \ --no-node-exporter --no-fstrim-setup Also, to avoid the time-consuming I/O tuning you can add --no-io-setup and copy the contents of /etc/scylla.d/io* Only do that if you are moving the files into machines with the exact same hardware Notes on the implementation: it is unfortunate for these purposes that all our options are negated. Most conditionals are branching on true conditions, so although I could write this: args.no_option = not interactive_ask_service(...) if not args.no_option: ... I opted in this patch to write: option = interactive_ask_service(...) args.no_option = not option if option: ... There is an extra line and we have to update args separately, but it makes it less hard to get confused in the conditional with the double negation. Let me know if there are disagreements here. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190124153832.21140-1-glauber@scylladb.com>	2019-01-24 17:41:26 +02:00
Benny Halevy	6efd85ed01	tests: extend time_overflow unit tests Test also cql select queries with and without bypass cache. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-24 15:55:06 +02:00
Benny Halevy	7373825473	data::cell: use int64_t for expiry and deletion time Ttl may still use int32_t to reduce footprint Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-24 15:55:06 +02:00
Takuya ASADA	597059b4b1	dist/debian: skip stripping libprotobuf.so.15 dh_strip won't able to strip libprotobuf.so.15, and we actually don't need to strip dependency libraries, so skip it. Fixes #4135 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190123202213.2117-4-syuu@scylladb.com>	2019-01-24 15:51:56 +02:00
Takuya ASADA	aefc18e70d	dist/debian: install /usr/bin/file for dh_strip dh_strip requires /usr/bin/file but does not automatically installed, so install it on build_deb.sh. Fixes #4134 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190123202213.2117-3-syuu@scylladb.com>	2019-01-24 15:51:53 +02:00
Benny Halevy	fbebd0bb1d	thrift: validate_column_name: fix exception format string It's printing uint32_t rather than char*. Refs #4140 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190124104002.32381-1-bhalevy@scylladb.com>	2019-01-24 12:46:23 +02:00
Avi Kivity	b58b82c9a2	Merge "Cut build dependencies around types.hh" from Piotr " I've recently had to work around types.hh/types.cc files and had very unpleasent experience with incremental build on every change to types.hh. It took ~30 min on my machine which is almost as much as the clean build. I looked around and it turns out that types.hh contains the whole hierarchy of the types. On the same time, many places access the types only through abstract_type which is the root of the hierarchy. This patchset extracts user_type_impl, tuple_type_impl, map_type_impl, set_type_impl, list_type_impl and collection_type_impl from types.hh and places each of them in a separate header. The result of this is that change in user_type_impl causes now incremental build of ~6 min instead of ~30 min. Change to tuple_type_impl causes incremental build of ~7.5 min instead of ~30 min and change to map_type_impl triggers incremental build that takes ~20 min instead of ~30 min. Tests: unit(release) " * 'haaawk/types_build_speedup_2/rfc/2' of github.com:scylladb/seastar-dev: Stop including types/list.hh in cql3/tuples.hh Stop including types/set.hh into cql3/sets.hh Move collection_type_impl out of types.hh to types/collection.hh Move set_type_impl out of types.hh to types/set.hh Move list_type_impl out of types.hh to types/list.hh Move map_type_impl out of types.hh to types/map.hh Move tuple_type_impl from types.hh to types/tuple.hh Decouple database.hh from types/user.hh Allow to use shared_ptr with incomplete type other than sstable Move user_type_impl out of types.hh to types/user.hh	2019-01-24 11:21:22 +02:00
Piotr Jastrzebski	a3912a35f5	Stop including types/list.hh in cql3/tuples.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:57:19 +01:00
Piotr Jastrzebski	fe8dfc8fdc	Stop including types/set.hh into cql3/sets.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:57:19 +01:00
Piotr Jastrzebski	5a5201a50b	Move collection_type_impl out of types.hh to types/collection.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	ad016a732b	Move set_type_impl out of types.hh to types/set.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	b1e1b66732	Move list_type_impl out of types.hh to types/list.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	147cc031db	Move map_type_impl out of types.hh to types/map.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	b6b2fdc5be	Move tuple_type_impl from types.hh to types/tuple.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	7666e81b51	Decouple database.hh from types/user.hh This commit declares shared_ptr<user_types_metadata> in database.hh were user_types_metadata is an incomplete type so it requires "Allow to use shared_ptr with incomplete type other than sstable" to compile correctly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:55:04 +01:00
Piotr Jastrzebski	316be5c6b5	Allow to use shared_ptr with incomplete type other than sstable When seastar/core/shared_ptr_incomplete.hh is included in a header then it causes problems with all declarations of shared_ptr<T> with incomplete type T that end up in the same compilation unit. The problem happens when we have a compilation unit that includes two headers a.hh and b.hh such that a.hh includes seastar/core/shared_ptr_incomplete.hh and b.hh declares shared_ptr<T> with incomplete type T. On the same time this compilation unit does not use declared shared_ptr<T> so it should compile and work but it does not because shared_ptr_incomplete.hh is included and it forces instantiation of: template <typename T> T* lw_shared_ptr_accessors<T, void_t<decltype(lw_shared_ptr_deleter<T>{})>>::to_value(lw_shared_ptr_counter_base* counter) { return static_cast<T>(counter); } for each declared shared_ptr<T> with incomplete type T. Even the once that are never used. Following commit "Decouple database.hh from types/user.hh" moves user_types_metadata type out of database.hh and instead declares shared_ptr<user_types_metadata> in database.hh where user_types_metadata is incomplete. Without this commit the compilation of the following one fails with: In file included from ./sstables/sstables.hh:34, from ./db/size_estimates_virtual_reader.hh:38, from db/system_keyspace.cc:77: seastar/include/seastar/core/shared_ptr_incomplete.hh: In instantiation of ‘static T seastar::internal::lw_shared_ptr_accessors<T, seastar::internal::void_t<decltype (seastar::lw_shared_ptr_deleter<T>{})> >::to_value(seastar::lw_shared_ptr_counter_base) [with T = user_types_metadata]’: seastar/include/seastar/core/shared_ptr.hh:243:51: required from ‘static void seastar::internal::lw_shared_ptr_accessors<T, seastar::internal::void_t<decltype (seastar::lw_shared_ptr_deleter<T>{})> >::dispose(seastar::lw_shared_ptr_counter_base) [with T = user_types_metadata]’ seastar/include/seastar/core/shared_ptr.hh:300:31: required from ‘seastar::lw_shared_ptr<T>::~lw_shared_ptr() [with T = user_types_metadata]’ ./database.hh:1004:7: required from ‘static void seastar::internal::lw_shared_ptr_accessors_no_esft<T>::dispose(seastar::lw_shared_ptr_counter_base) [with T = keyspace_metadata]’ seastar/include/seastar/core/shared_ptr.hh:300:31: required from ‘seastar::lw_shared_ptr<T>::~lw_shared_ptr() [with T = keyspace_metadata]’ ./db/size_estimates_virtual_reader.hh:233:67: required from here seastar/include/seastar/core/shared_ptr_incomplete.hh:38:12: error: invalid static_cast from type ‘seastar::lw_shared_ptr_counter_base’ to type ‘user_types_metadata’ return static_cast<T>(counter); ^~~~~~~~~~~~~~~~~~~~~~~~ [131/415] CXX build/release/distributed_loader.o Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:45:25 +01:00
Piotr Jastrzebski	e92b4c3dbc	Move user_type_impl out of types.hh to types/user.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:04:04 +01:00
Rafael Ávila de Espíndola	f7d1dc16d4	database: Use nop_large_partition_handler to avoid self-reporting Currently nop_large_partition_handler is only used in tests, but it can also be used avoid self-reporting. Tests: unit(Release) I also tested starting scylla with --compaction-large-partition-warning-threshold-mb=0. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190123205059.39573-1-espindola@scylladb.com>	2019-01-23 21:11:21 +00:00
Avi Kivity	4882f29f82	Merge "Detemplatize primary key restrictions" from Piotr " This series is a first small step towards rewriting CQL restrictions layer. Primary key restrictions used to be a template that accepts either partition_key or clustering_key, but the implementation is already based on virtual inheritance, so in multiple cases these templates need specializations. Refs #3815 " * 'detemplatize_primary_key_restrictions_2' of https://github.com/psarna/scylla: cql3: alias single_column_primary_key_restrictions cql3: remove KeyType template from statement_restrictions cql3: remove template from primary_key_restrictions cql3: remove forwarding_primary_key_restrictions	2019-01-23 17:43:03 +02:00
Piotr Sarna	9982587bea	cql3: alias single_column_primary_key_restrictions In preparation for detemplatizing this class, it's aliased with single_column_partition_key restrictions and single_column_clustering_key_restrictions accordingly.	2019-01-23 17:43:03 +02:00
Piotr Sarna	4663094474	cql3: remove KeyType template from statement_restrictions The code is unfolded into serving partition and clustering key cases separately instead of overloading a template.	2019-01-23 17:43:03 +02:00
Piotr Sarna	4bd0cb8dd9	cql3: remove template from primary_key_restrictions Partition key restrictions and clustering key restrictions currently require virtual function specializations and have lots of distinct code, so there's no value in having primary_key_restrictions<KeyType> template.	2019-01-23 17:43:03 +02:00
Piotr Sarna	bdd8566ea3	cql3: remove forwarding_primary_key_restrictions I presume this header was created during code translation from C*, but it's not used or included anywhere.	2019-01-23 17:43:03 +02:00
Avi Kivity	c83ae62aed	build: fix libdeflate object file corruption during parallel build libdeflate's build places some object files in the source directory, which is shared between the debug and release build. If the same object file (for the two modes) is written concurrently, or if one more reads it while the other writes it, it will be corrupted. Fix by not building the executables at all. They aren't needed, and we already placed the libraries' objects in the build directory (which is unshared). We only need the libraries anyway. Fixes #4130. Branches: master, branch-3.0 Message-Id: <20190123145435.19049-1-avi@scylladb.com>	2019-01-23 15:32:17 +00:00
Nadav Har'El	76f1fcc346	cql3: really ensure retrieval of columns for filtering Commit `fd422c954e` aimed to fix issue #3803. In that issue, if a query SELECTed only certain columns but did filtering (ALLOW FILTERING) over other unselected columns, the filtering didn't work. The fix involved adding the columns being filtered to the set of columns we read from disk, so they can be filtered. But that commit included an optimization: If you have clustering keys c1 and c2, and the query asks for a specific partition key and c1 < 3 and c2 > 3, the "c1 < 3" part does NOT need to be filtered because it is already done as a slice (a contiguous read from disk). The committed code erroneously concluded that both c1 and c2 don't need to be filtered, which was wrong (c2 does need to be read and filtered). In this patch, we fix this optimization. Previously, we used the "prefix length", which in the above example was 2 (both c1 and c2 were filtered) but we need a new and more elaborate function, num_prefix_columns_that_need_not_be_filtered(), to determine we can only skip filtering of 1 (c1) and cannot skip the second. Fixes #4121. This patch also adds a unit test to confirm this. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190123131212.6269-1-nyh@scylladb.com>	2019-01-23 15:24:30 +02:00
Avi Kivity	835ad406de	tools: toolchain: update docker build command to include --no-cache If docker sees the Dockerfile hasn't changed it may reuse an old image, not caring that context files and dependent images have in fact changed. This can happen for us if install-dependencies.sh or the base Fedora image changed. To make sure we always get a correct image, add --no-cache to the build command. Message-Id: <20190122185042.23131-1-avi@scylladb.com>	2019-01-23 10:47:40 +01:00
Glauber Costa	5d754c1d11	install-dependencies.sh: add packages that will be needed by scylla-python3 Done in a separate step so we can update the toolchain first. dnf-utils is used to bring us repoquery, which we will use to derive the list of files in the python packages. patchelf is needed so we can add a DT_RUNPATH section to the interpreter binary. the python modules, as well as the python3 interpreter are taken from the current RPM spec file. Signed-off-by: Glauber Costa <glauber@scylladb.com> [avi: regenerate frozen toolchain image] Message-Id: <20190123011751.14440-1-glauber@scylladb.com>	2019-01-23 10:53:10 +02:00
Avi Kivity	c1dd04986b	Merge "Prepare for the switch to CMake-ified Seastar" from Jesse " This series prepares for the integration of the `master` branch of Seastar back into Scylla. A number of changes to the existing build are necessary to integrate Seastar correctly, and these are detailed in the individual change messages. I tested with and without DPDK, in release and debug mode. The actual switch is a separate patch. " * 'jhk/seastar_cmake/v4' of https://github.com/hakuch/scylla: build: Fix link order for DPDK tests: Split out `sstable_datafile_test` build: Remove unnecessary inclusion tests: Fix use-after-free errors in static vars build: Remove Seastar internals build: Only use Seastar flags from pkg-config build: Query Seastar flags using pkg-config build: Change parameters for `pkg_config` function	2019-01-23 10:33:00 +02:00
Duarte Nunes	88c7c1e851	Merge 'hinted handoff: cache cf mappings' from Vlad " Cache cf mappings when breaking in the middle of a segment sending so that the sender has them the next time it wants to send this segment for where it left off before. Also add the "discard" metric so that we can track hints that are being discarded in the send flow. " Fixes #4122 * 'hinted_handoff_cache_cf_mappings-v1' of https://github.com/vladzcloudius/scylla: hinted handoff: cache column family mappings for segments that were not sent out in full hinted handoff: add a "discarded" metric	2019-01-23 00:44:41 +00:00
Jesse Haber-Kucharsky	3d79bd25b2	build: Fix link order for DPDK Without this change, DPDK libraries will not be linked to Scylla correctly when we switch to the new pkg-config support in Seastar.	2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky	cfb1492a6e	tests: Split out `sstable_datafile_test` Each `*_test.cc` file must be compiled separately so that there is only one definition of `main`. This change correctly defines an independent `sstable_datafile_test` from `sstable_datafile_test.cc` and adds that test to the existing suite.	2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky	02dd7bcc82	build: Remove unnecessary inclusion	2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky	2a62550002	tests: Fix use-after-free errors in static vars Without these two variables being declared as TLS, executing these two tests in "debug" mode fail AddressSanitizer's checks.	2019-01-22 18:24:52 -05:00
Jesse Haber-Kucharsky	88cc43d5e0	build: Remove Seastar internals We don't need to re-specify Seastar internals in Scylla's build, since everything private to Seastar is managed via pkg-config. We can eliminate all references to ragel and generated ragel header files from Seastar. We can also simplify the dependence on generated Seastar header files by ensuring that all object files depend on Seastar being built first.	2019-01-22 18:24:38 -05:00
Jesse Haber-Kucharsky	4f44e143be	build: Only use Seastar flags from pkg-config Some Seastar-specific flags were manually specified as Ninja rules, but we want to rely exclusively on Seastar for its necessary flags. The pkg-config file generated by the latest version of Seastar is correct and allows us to do this, but the version generated by Scylla's current check-out of Seastar does not. Therefore, we have to manually adjust the pkg-config results temporarily until we update Seastar.	2019-01-22 18:24:38 -05:00
Jesse Haber-Kucharsky	8743cff59b	build: Query Seastar flags using pkg-config Previously, we manually parsed the pkg-config file. We now used pkg-config itself to get the correct build flags. This means that we will get the correct behavior for variable expansion, and fields like `Requires`, `Requires.private`, and `Libs.private`. Previously, these fields were ignored.	2019-01-22 18:24:38 -05:00
Vlad Zolotarov	34829b8f81	hinted handoff: cache column family mappings for segments that were not sent out in full We will try to send a particular segment later (in 1s) from the place where we left off if it wasn't sent out in full before. However we may miss some of column family mappings when we get back to sending this file and start sending from some entry in the middle of it (where we left off) if we didn't save column family mappings we cached while reading this segment from its begining. This happens because commitlog doesn't save a column family information in every entry but rather once for each uniq column family (version) per "cycle" (see commitlog::segment description for more info). Therefore we have to assume that a particular column family mapping appears once in the whole segment (worst case). And therefore, when we decide to resume sending a segment we need to keep the column family mappings we accumulated so far and drop them only after we are done with this particular segment (sent it out in full). Fixes #4122 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-01-22 15:24:22 -05:00
Vlad Zolotarov	4516a8cfc4	hinted handoff: add a "discarded" metric Account the amount of hints that were discarded in the send path. This may happen for instance due to a schema change or because a hint being to old. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-01-22 14:11:09 -05:00
Avi Kivity	fa0312d0f2	Merge "Support 64-bit gc_clock" from Benny " wrap around on 2038-01-19 03:14:07 UTC. Such dates are valid deletion times starting 2018-01-19 with the 20 years long maximum ttl. This patchset extends gc_clock::duration::rep to int64_t and adds respective unit tests for the max_ttl cases. Fixes #3353 Tests: unit (release) " * 'projects/gc_clock_64/v2' of https://github.com/bhalevy/scylla: tests: cql_query_test add test_time_overflow gc_clock: make 64 bit sstables: mc: use int64_t for local_deletion_time and ttl sstables: add capped_tombstone_deletion_time stats counter sstables: mc: cap partition tombstone local_deletion_time to max sstables: add capped_local_deletion_time stats counter sstables: mc: metadata collector: cap local_deletion_time at max sstables: mc: use proper gc_clock types for local_deletion_time and ttl db: get default_time_to_live as int32_t rather than gc_clock::rep sstables: safely convert ttl and local_deletion_time to int32_t sstables: mc: move liveness_info initialization to members sstables: mc: move parsing of liveness_info deltas to data_consume_rows_context_m sstables: mc: define expired_liveness_ttl as signed int32_t sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time	2019-01-22 18:21:55 +02:00
Glauber Costa	54bc0ce70d	scylla_setup: make sure it works (again) in interactive mode Commit `019a2e3a27` marked some arguments as required, which improved the usability of scylla_setup. The problem is that when we call scylla_setup in interactive mode, no argument should be required. After the aforementioned commit scylla_setup will either complain that the required arguments were not passed if zero arguments are present, or skip interactive mode if one of the mandatory ones is present. This patch fixes that by checking whether or not we were invoked with no command line arguments and lifting the requirements for mandatory arguments in that case. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190122003621.11156-1-glauber@scylladb.com>	2019-01-22 16:54:55 +02:00
Benny Halevy	7d0854a1e5	tests: cql_query_test add test_time_overflow Test 32-bit time overflow scenarios. Fails without "gc_clock: make 64 bit". Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	93270dd8e0	gc_clock: make 64 bit Fixes: #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	1ccd72f115	sstables: mc: use int64_t for local_deletion_time and ttl In preparation for changing gc_clock::duration::rep to int64_t. Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	427d6e6090	sstables: add capped_tombstone_deletion_time stats counter Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	0ec46924bf	sstables: mc: cap partition tombstone local_deletion_time to max deletion_time struct as int32_t deletion_time that cannot hold long time values. Cap local_deletion_time to max_local_deletion_time and log a warning about that, This corresponds to Cassandra's MAX_DELETION_TIME. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	156f9ffa11	sstables: add capped_local_deletion_time stats counter Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	7609a04565	sstables: mc: metadata collector: cap local_deletion_time at max max local_deletion_time_tracker in stats is int32_t so just track the limit of (max int32_t - 1) if time_point is greater than the limit. This corresponds to Cassandra's MAX_DELETION_TIME. Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	bd6861989d	sstables: mc: use proper gc_clock types for local_deletion_time and ttl Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	9878b36895	db: get default_time_to_live as int32_t rather than gc_clock::rep Otherwise, value_cast<> throws std::bad_cast exception when gc_clock::rep is defined as int64_t. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	33314cec3f	sstables: safely convert ttl and local_deletion_time to int32_t Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	9a00c5a763	sstables: mc: move liveness_info initialization to members Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	0aba922b6d	sstables: mc: move parsing of liveness_info deltas to data_consume_rows_context_m To be consistent with other calls to parse_* methods there. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	6465a673f5	sstables: mc: define expired_liveness_ttl as signed int32_t Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	c4c2133e3e	sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time mc format only writes delta local_deletion_time of tombstones. Conventional deletion_time is written only for the partition header. Restructure the code to pass a tombstone to write_delta_deletion_time rather than struct deletion_time to prepare for using 64-bit deletion times. The tombstone uses gc_clock::time_point while struct deletion_time is limited to int32_t local_deletion_time. Note that for "live" tombstones we encode <api::missing_timestamp, no_deletion_time> as was previously evaluated by to_deletion_time(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Benny Halevy	820906b794	sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 13:36:35 +02:00
Tomasz Grabiec	dbc1894bd5	lsa: Avoid unnecessary compact_and_evict_locked() When the reclaim request was satisfied from the pool there's no need to call compact_and_evict_locked(). This allows us to avoid calling boost::range::make_heap(), which is a tiny performance difference, as well as some confusing log messages. Message-Id: <1548091941-8534-1-git-send-email-tgrabiec@scylladb.com>	2019-01-21 20:19:20 +02:00
Jesse Haber-Kucharsky	72da3283b9	build: Change parameters for `pkg_config` function We can invoke pkg-config with multiple options, and we specify the package name first since this is the "target" of the pkg-config query. Supporting multiple options is necessary for querying Seastar's pkg-config file with `--static`, which we anticipate in a future change.	2019-01-21 11:38:25 -05:00
Glauber Costa	ca997b5f60	scylla_setup: warn users on the severity of answering no to IOTUne The system won't work properly if IOTune is not run. While it is fair to skip this step because it takes long-- indeed, it is common to provision io.conf manually to be able to skip this step, first time users don't know this and can have the impression that this is just a totally optional step. Except the node won't boot up without it. As a user nicely put recently in our mailing list: "...in this case, it would be even simpler to forbid answering "no" to this not-so-optional step :)" We should not forbid saying no to IOTune, but we should warn the user about the consequences of doing so. Fixes #4120 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190121144506.17121-1-glauber@scylladb.com>	2019-01-21 16:55:50 +02:00
Botond Dénes	4e89dea9ea	database: don't allow access to global semaphores Recently we had a bug (#4096) due to a component (`multishard_mutation_query()`) assuming that all reads used the semaphore obtainable via `database::user_read_concurrency_sem()`. This problem revealed that it is plain wrong to allow access to the shard-global semaphores residing in the database object. Instead all code wishing to access the relevant semaphore for some read, should do so via the relevant `table` object, thus guaranteeing that it will get the correct semaphore, configured for that table. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>	2019-01-21 16:29:02 +02:00
Piotr Sarna	5d76a635ca	distributed_loader: migrate flush_upload_dir to thread Flushing upload dir code suffers from overcomplication, so in order to make it a little bit simpler, it's moved to threaded context. Refs #4118 Message-Id: <232cca077bae7116cfa87de9c9b4ba60efc2a01d.1548077720.git.sarna@scylladb.com>	2019-01-21 15:48:17 +02:00
Gleb Natapov	85cb09294e	storage_service: do not start thrift and cql servers if a node is isolated due to errors Scylla starts doing IO much earlier that it starts cql/thrift servers. The IO may cause an error that will try stop all servers, but since they are still not running it will do nothing, but servers will be started later. Fix it by checking that the node is not isolated before starting servers. Message-Id: <20190110152830.GE3172@scylladb.com>	2019-01-21 13:04:23 +00:00
Tomasz Grabiec	e02baabd62	tests: perf_fast_forward: Introduce --with-compression option Message-Id: <1547819062-4369-1-git-send-email-tgrabiec@scylladb.com>	2019-01-21 12:18:31 +00:00
Botond Dénes	ff2884f25b	Revert "partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges" A much simpler and more complete fix was found. Let's revert this before applying the simpler fix. This reverts commit `7049cd9374`.	2019-01-21 13:56:56 +02:00
Botond Dénes	f229dff210	auth/service: unregister migration listener on stop() Otherwise any event that triggers notification to this listener would trigger a heap-use-after-free. Refs: #4107 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <b6bbd609371a2312aed7571b05119d59c7d103d7.1548067626.git.bdenes@scylladb.com>	2019-01-21 13:06:59 +02:00
Tomasz Grabiec	d7c701d2d1	Merge "Type-erase gratuitous templates with functions" from Avi Many area of the code are splattered with unneeded templates. This patchset replaces some of them, where the template parameter is a function object, with an std::function or noncopyable_function (with a preference towards the latter; but it is not always possible). As the template is compiled for each instantiation (if the function object is a lambda) while a function is compiled only once, there are significant savings in compile time and bloat. text data bss dec hex filename 85160690 42120 284910 85487720 5187068 scylla.before 84824762 42120 284910 85151792 5135030 scylla.after * https://github.com/avikivity/scylla detemplate/v2: api/commitlog: de-template acquire_cl_metric() database: de-template do_parse_schema_tables database: merge for_all_partitions and for_all_partitions_slow hints: de-template scan_for_hints_dirs() schema_tables: partially de-template make_map_mutation() distributed_loader: de-template tests: commitlog_test: de-template tests: cql_auth_query_test: de-template test: de-template eventually() and eventually_true() tests: flush_queue_test: de-template hint_test: de-template tests: mutation_fragment_test: de-template test: mutation_test: de-template	2019-01-21 11:32:22 +01:00
Avi Kivity	826cf90f3f	Merge "Restore mutating uploaded sstables to level 0" from Piotr " This miniseries fixes the behaviour of distributed loader, which now unconditionally mutates new sstables found in /upload dir to LCS level 0 first, and only after that proceeds with either queueing them for update generation or moving them to data directory. " * 'restore_always_mutating_sstables_level_0' of https://github.com/psarna/scylla: distributed_loader: restore indentation distributed_loader: restore always mutating to level 0	2019-01-20 20:32:15 +02:00
Benny Halevy	844a2de263	sstables: mc: prevent signed integer overflow Fix runtime error: signed integer overflow introduced by `2dc3776407` Delta-encoded values may wrap around if the encoded value is less than the base value. This could happen in two places: In the mc-format serialization header itself, where the base values are implicit Cassandra epoch time, and in the sstables data files, where the base values are taken from the encoding_stats (later written to the serialization_header). In these cases, when the calculation is done using signed integer/long we may see "runtime error: signed integer overflow" messages in debug mode (with -fsanitize=undefined / -fsanitize=signed-integer-overflow). Overflow here is expected and harmless since we do not gurantee that neither the base values in the serialization header are greater than or equal to Cassandra's epoch now that the delta-encoded values are always greater than or equal to the respective base values in the serialization header. To prevent these warnings, the subtraction/addition should be done with unsigned (two's complement) arithmetic and the result converted to the signed type. Note that to keep the code simple where possible, when also rely on implicit conversion of signed integers to unsigned when either one of added value is unsigned and the other is signed. Fixes: #4098 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190120142950.15776-1-bhalevy@scylladb.com>	2019-01-20 16:59:46 +02:00
Avi Kivity	1e5c09dbce	test: mutation_test: de-template Replace the with_column_family helper template with an ordinary funciton, to reduce code bloat.	2019-01-20 15:55:20 +02:00
Avi Kivity	28db56df13	tests: mutation_fragment_test: de-template The for_each_target() template is called four times, so making it a normal function reduces a lot of code generation.	2019-01-20 15:55:20 +02:00
Avi Kivity	401684503d	hint_test: de-template While cl_test is duplicated with commitlog_test, at least deduplicate it internally by converting it to an ordinary function.	2019-01-20 15:55:20 +02:00
Avi Kivity	208b0f80a4	tests: flush_queue_test: de-template The internal test_propagation template is instantiated many times. Replace with an oridinary function to reduce bloat. Call sites adjusted to have a uniform signature.	2019-01-20 15:55:20 +02:00
Avi Kivity	2f36d30572	test: de-template eventually() and eventually_true() These templates are not trivial and called many times. De-template them to reduce code bloat.	2019-01-20 15:55:20 +02:00
Avi Kivity	96a8eacc3c	tests: cql_auth_query_test: de-template Replace the with_user() and verify_unauthorized_then_ok() templates with functions.	2019-01-20 15:55:20 +02:00
Avi Kivity	e0b0e18234	tests: commitlog_test: de-template The cl_test function is called many times, so its contents are bloat. De-template it so it is compiled only once.	2019-01-20 15:55:20 +02:00
Avi Kivity	baf9480c8d	distributed_loader: de-template distributed_loader has several large templates that can be converted to normal function with the help of noncopyable_function<>, reducing code bloat. One of the lambdas used as an actual argument was adjusted, because the de-templated callee only accepts functions returning a future, while the original accepted both functions returning a future and functions returning void (similar to future::then).	2019-01-20 15:55:20 +02:00
Avi Kivity	e0914a080e	schema_tables: partially de-template make_map_mutation() make_map_mutation() is called several times, hopfully with the same Map type parameter. Replace the Func parameter with a noncopyable_function<>.	2019-01-20 15:55:20 +02:00
Avi Kivity	630f841e5b	hints: de-template scan_for_hints_dirs() This function is called twice, and is not doing anything performance critical, so replace the template parameter Func with std::function<>.x	2019-01-20 15:55:20 +02:00
Avi Kivity	fae4c6c0b6	database: merge for_all_partitions and for_all_partitions_slow for_all_partitions is only used in the implementation of for_all_partitions_slow, so merge them and get rid of a template.	2019-01-20 15:55:20 +02:00
Avi Kivity	9858395c3e	database: de-template do_parse_schema_tables This long slow-path function is called four times, so de-templating it is an easy win. We use std::function instead of noncopyable_function because the function is copied within the parallel_for_each callback. The original code uses a move, which is incorrect, but did not fail because moving the lambdas that were used as the actual arguments is equivalent to a copy.	2019-01-20 15:55:18 +02:00
Tomasz Grabiec	c422bfc2c5	tests: perf_fast_forward: Store results for each dataset in separate sub-directory Otherwise read test results for subsequent datasets will override each other. Also, rename population test case to not include dataset name, which is now redundant. Message-Id: <1547822942-9690-1-git-send-email-tgrabiec@scylladb.com>	2019-01-20 15:38:46 +02:00
Botond Dénes	7049cd9374	partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges When entering a new ck range (of the partition-slice), the partition snapshot reader will apply to its range tombstones stream all the tombstones that are relevant to the new ck range. When the partition has range tombstones that overlap with multiple ck ranges, these will be applied to the range tombstone stream when entering any of the ck ranges they overlap with. This will result in the violation of the monotonicity of the mutation fragments emitted by the reader, as these range tombstones will be re-emitted on each ck range, if the ck range has at least one clustering row they apply to. For example, given the following partition: rt{[1,10]}, cr{1}, cr{2}, cr{3}... And a partition-slice with the following ck ranges: [1,2], [3, 4] The reader will emit the following fragment stream: rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ... Note how the range tombstone is emitted twice. In addition to violating the monotonicity guarantee, this can also result in an explosion of the number of emitted range tombstones. Fix by applying only those range tombstones to the range tombstone stream, that have a position strictly greater than that of the last emitted clustering row (or range tombstone), when entering a new ck range. Fixes: #4104 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <e047af76df75972acb3c32c7ef9bb5d65d804c82.1547916701.git.bdenes@scylladb.com>	2019-01-20 15:38:04 +02:00
Paweł Dziepak	14757d8a83	types: collection_type: drop tombstone if covered by higher-level one At the moment are inefficiencies in how collection_type_impl::mutation::compact_and_expire( handles tombstones. If there is a higher-level tombstone that covers the collection one (including cases where there is no collection tombstone) it will be applied to the collection tombstone and present in the compaction output. This also means that the collection tombstone is never dropped if fully covered by a higher-level one. This patch fixes both those problems. After the compaction the collection tombstone is either unchanged or removed if covered by a higher-level one. Fixes #4092. Message-Id: <20190118174244.15880-1-pdziepak@scylladb.com>	2019-01-20 15:32:34 +02:00
Avi Kivity	e51ef95868	Update seastar submodule * seastar af6b797...7d620e1 (1): > perftune.py: don't let any exception out when connecting to AWS meta server Fixes #4102.	2019-01-20 13:59:09 +02:00
Avi Kivity	32e79fc23b	api/commitlog: de-template acquire_cl_metric() Use std::function instead of a template parameter. Likely doesn't gain anyting, because the template was always instantiated with the same type (the result of std::bind() with the same signatures), but still good practice. std::function was used instead of noncopyable_function because sharded::map_reduce0() copies the input function.	2019-01-20 11:58:39 +02:00
Avi Kivity	6e6372e8d2	Revert "Merge "Type-eaese gratuitous templates with functions" from Avi" This reverts commit `31c6a794e9`, reversing changes made to `4537ec7426`. It causes bad_function_calls in some situations: INFO 2019-01-20 01:41:12,164 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708 INFO 2019-01-20 01:41:13,952 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema) INFO 2019-01-20 01:41:13,958 [shard 0] legacy_schema_migrator - Dropping legacy schema tables INFO 2019-01-20 01:41:14,702 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables ERROR 2019-01-20 01:41:14,999 [shard 0] seastar - Exiting on unhandled exception: std::bad_function_call (bad_function_call)	2019-01-20 11:32:14 +02:00
Paweł Dziepak	e212d37a8a	utils/small_vector: fix leak in copy assignment slow path Fixes #4105. Message-Id: <20190118153936.5039-1-pdziepak@scylladb.com>	2019-01-18 17:49:46 +02:00
Paweł Dziepak	23cfb29fea	Merge "compaction: mc: re-calculate encoding_stats" from Benny " Use input sstables stats metadata to re-calculate encoding_stats. Fixes #3971. " * 'projects/compaction-encoding-stats/v3' of https://github.com/bhalevy/scylla: compaction: mc: re-calculate encoding_stats based on column stats memtable: extract encoding_stats_collector base class to encoding_stats header file	2019-01-18 14:36:17 +00:00
Tomasz Grabiec	7308effb45	tests: flat_mutation_reader_test: Drop unneeded includes Message-Id: <1547819118-4645-1-git-send-email-tgrabiec@scylladb.com>	2019-01-18 13:58:05 +00:00
Tomasz Grabiec	6461e085fe	managed_bytes: Fix compilation on gcc 8.2 The compilation fails on -Warray-bounds, even though the branch is never taken: inlined from ‘managed_bytes::managed_bytes(bytes_view)’ at ./utils/managed_bytes.hh:195:22, inlined from ‘managed_bytes::managed_bytes(const bytes&)’ at ./utils/managed_bytes.hh:162:77, inlined from ‘dht::token dht::bytes_to_token(bytes)’ at dht/random_partitioner.cc:68:57, inlined from ‘dht::token dht::random_partitioner::get_token(bytes)’ at dht/random_partitioner.cc:85:39: /usr/include/c++/8/bits/stl_algobase.h:368:23: error: ‘void* __builtin_memmove(void, const void, long unsigned int)’ offset 16 from the object at ‘<anonymous>’ is out of the bounds of referenced subobject ‘managed_bytes::small_blob::data’ with type ‘signed char [15]’ at offset 0 [-Werror=array-bounds] __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Work around by disabling the diagnostic locally. Message-Id: <1547205350-30225-1-git-send-email-tgrabiec@scylladb.com>	2019-01-18 13:48:05 +00:00
Tomasz Grabiec	31c6a794e9	Merge "Type-eaese gratuitous templates with functions" from Avi Many area of the code are splattered with unneeded templates. This patchset replaces some of them, where the template parameter is a function object, with an std::function or noncopyable_function (with a preference towards the latter; but it is not always possible). As the template is compiled for each instantiation (if the function object is a lambda) while a function is compiled only once, there are significant savings in compile time and bloat. text data bss dec hex filename 85160690 42120 284910 85487720 5187068 scylla.before 84824762 42120 284910 85151792 5135030 scylla.after * https://github.com/avikivity/scylla detemplate/v1: api/commitlog: de-template acquire_cl_metric() database: de-template do_parse_schema_tables database: merge for_all_partitions and for_all_partitions_slow hints: de-template scan_for_hints_dirs() schema_tables: partially de-template make_map_mutation() distributed_loader: de-template tests: commitlog_test: de-template tests: cql_auth_query_test: de-template test: de-template eventually() and eventually_true() tests: flush_queue_test: de-template hint_test: de-template tests: mutation_fragment_test: de-template test: mutation_test: de-template	2019-01-18 11:42:01 +01:00
Piotr Sarna	3d65eb5d4a	distributed_loader: restore indentation	2019-01-18 10:59:37 +01:00
Piotr Sarna	e50e9b5150	distributed_loader: restore always mutating to level 0 When introducing view update generation path for sstables in /upload directory, mutating these sstables was moved to regular path only. It was wrong, because sstables that need view updates generated from them may still need to be downgraded to LCS level 0, so they won't disrupt LCS assumptions after being loaded. Reported-by: Nadav Har'El <nyh@scylladb.com>	2019-01-18 10:35:20 +01:00
Avi Kivity	089931fb56	test: mutation_test: de-template Replace the with_column_family helper template with an ordinary funciton, to reduce code bloat.	2019-01-17 19:06:42 +02:00
Avi Kivity	53a3db9446	tests: mutation_fragment_test: de-template The for_each_target() template is called four times, so making it a normal function reduces a lot of code generation.	2019-01-17 19:05:48 +02:00
Avi Kivity	4a21de4592	hint_test: de-template While cl_test is duplicated with commitlog_test, at least deduplicate it internally by converting it to an ordinary function.	2019-01-17 19:03:31 +02:00
Avi Kivity	1f02fd3ff6	tests: flush_queue_test: de-template The internal test_propagation template is instantiated many times. Replace with an oridinary function to reduce bloat. Call sites adjusted to have a uniform signature.	2019-01-17 19:02:26 +02:00
Avi Kivity	63077501ed	test: de-template eventually() and eventually_true() These templates are not trivial and called many times. De-template them to reduce code bloat.	2019-01-17 19:00:55 +02:00
Avi Kivity	a5d3254ed3	tests: cql_auth_query_test: de-template Replace the with_user() and verify_unauthorized_then_ok() templates with functions. Some adjustments made to the call site to unify the signatures.	2019-01-17 18:59:30 +02:00
Avi Kivity	8c05debecb	tests: commitlog_test: de-template The cl_test function is called many times, so its contents are bloat. De-template it so it is compiled only once.	2019-01-17 18:57:35 +02:00
Avi Kivity	b6239134c2	distributed_loader: de-template distributed_loader has several large templates that can be converted to normal function with the help of noncopyable_function<>, reducing code bloat.	2019-01-17 18:56:22 +02:00
Avi Kivity	2407c35cc1	schema_tables: partially de-template make_map_mutation() make_map_mutation() is called several times, hopfully with the same Map type parameter. Replace the Func parameter with a noncopyable_function<>.	2019-01-17 18:54:43 +02:00
Avi Kivity	81d004b2c0	hints: de-template scan_for_hints_dirs() This function is called twice, and is not doing anything performance critical, so replace the template parameter Func with std::function<>.x	2019-01-17 18:51:46 +02:00
Avi Kivity	f61dbc9855	database: merge for_all_partitions and for_all_partitions_slow for_all_partitions is only used in the implementation of for_all_partitions_slow, so merge them and get rid of a template.	2019-01-17 18:50:36 +02:00
Avi Kivity	4568a4e4b0	database: de-template do_parse_schema_tables This long slow-path function is called four times, so de-templating it is an easy win.	2019-01-17 18:48:57 +02:00
Avi Kivity	08bd28942b	api/commitlog: de-template acquire_cl_metric() Use noncopyable_function instead of a template parameter. Likely doesn't gain anyting, because the template was always instantiated with the same type (the result of std::bind() with the same signatures), but still good practice.	2019-01-17 18:45:14 +02:00
Botond Dénes	4537ec7426	mutlishard_mutation_query(): use correct reader concurrency semaphore The multishard mutation query used the semaphore obtained from `database::user_read_concurrency_sem()` to pause-resume shard readers. This presented a problem when `multishard_mutation_query()` was reading from system tables. In this case the readers themselves would obtain their permits from the system read concurrency semaphore. Since the pausing of shard readers used the user read semaphore, pausing failed to fulfill its objective of alleviating pressure on the semaphore the reads obtained their permits from. In some cases this lead to a deadlock during system reads. To ensure the correct semaphore is used for pausing-resuming readers, obtain the semaphore from the `table` object. To avoid looking up the table on every pause or resume call, cache the semaphores when readers are created. Fixes: #4096 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c784a3cd525ce29642d7216fbe92638fa7884e88.1547729119.git.bdenes@scylladb.com>	2019-01-17 15:19:59 +02:00
Avi Kivity	8e9989685d	scyllatop: complete conversion to python3 `d2dbbba139` converted scyllatop's interperter to Python 3, but neglected to do the actual conversion. This patch does so, by running 2to3 over allfiles and adding an additional bytes->string decode step in prometheus.py. Superfluous 2to3 changes to print() calls were removed. Message-Id: <20190117124121.7409-1-avi@scylladb.com>	2019-01-17 12:50:25 +00:00
Duarte Nunes	7505815013	Merge 'Fix filtering with LIMIT and paging' from Piotr " Before this series the limit was applied per page instead of globally, which might have resulted in returning too many rows. To fix that: 1. restrictions filter now has a 'remaining' parameter in order to stop accepting rows after enough of them have already been accepted 2. pager passes its row limit to restrictions filter, so no more rows than necessary will be served to the client 3. results no longer need to be trimmed on select_statement level Tests: unit (release) " * 'fix_filtering_limit_with_paging_3' of https://github.com/psarna/scylla: tests: add filtering+limit+paging test case tests: allow null paging state in filtering tests cql3: fix filtering with LIMIT with regard to paging	2019-01-17 12:50:00 +00:00
Piotr Sarna	ed7328613f	tests: add filtering+limit+paging test case A test case that checks whether a combination of paging and LIMIT clause for filtering queries doesn't return with too many rows. Refs #4100	2019-01-17 13:25:10 +01:00
Piotr Sarna	7d4f994e98	tests: allow null paging state in filtering tests Previously the utility to extract paging state asserted that the state exists, but in future tests it would be useful to be able to call this function even if it would return null.	2019-01-17 13:25:10 +01:00
Piotr Sarna	87c23372fb	cql3: fix filtering with LIMIT with regard to paging Previously the limit was erroneously applied per page instead of being accumulated, which might have caused returning too many rows. As of now, LIMIT is handled properly inside restrictions filter. Fixes #4100	2019-01-17 13:25:09 +01:00
Piotr Sarna	02d88de082	db,view: add consuming units in staging table registration View update generator service can accept sstables even before it starts, but it should still acknowledge the number of waiters in the semaphore. Reported-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <fcaa0f2884ebb4d34d1716e9e1cfed0642b4b85d.1547661048.git.sarna@scylladb.com>	2019-01-16 18:05:17 +00:00
Benny Halevy	1d483bc424	compaction: mc: re-calculate encoding_stats based on column stats When compacting several sstables, get and merge their encoding_stats for encoding the result. Introduce sstable::get_encoding_stats_for_compaction to return encoding_stats based on the sstable's column stats. Use encoding_stats_collector to keep track of the minimum encoding_stats values of all input sstables. Fixes #3971 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-16 17:59:59 +02:00
Benny Halevy	e2c4d2d60a	memtable: extract encoding_stats_collector base class to encoding_stats header file To be used also by compaction. Refs #3971 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-16 17:59:58 +02:00
Asias He	4b9e1a9f1d	repair: Add row level metrics Number of rows sent and received - tx_row_nr - rx_row_nr Bytes of rows sent and received - tx_row_bytes - rx_row_bytes Number of row hashes sent and received - tx_hashes_nr - rx_hashes_nr Number of rows read from disk - row_from_disk_nr Bytes of rows read from disk - row_from_disk_bytes Message-Id: <d1ee6b8ae8370857fe45f88b6c13087ea217d381.1547603905.git.asias@scylladb.com>	2019-01-16 14:04:57 +02:00
Duarte Nunes	04a14b27e4	Merge 'Add handling staging sstables to /upload dir' from Piotr " This series adds generating view updates from sstables added through /upload directory if their tables have accompanying materialized views. Said sstables are left in /upload directory until updates are generated from them and are treated just like staging sstables from /staging dir. If there are no views for a given tables, sstables are simply moved from /upload dir to datadir without any changes. Tests: unit (release) " * 'add_handling_staging_sstables_to_upload_dir_5' of https://github.com/psarna/scylla: all: rename view_update_from_staging_generator distributed_loader: fix indentation service: add generating view updates from uploaded sstables init: pass view update generator to storage service sstables: treat sstables in upload dir as needing view build sstables,table: rename is_staging to requires_view_building distributed_loader: use proper directory for opening SSTable db,view: make throttling optional for view_update_generator	2019-01-15 18:19:27 +00:00
Duarte Nunes	9b79f0f58b	Merge 'Add stream phasing' from Piotr " This series addresses the problem mentioned in issue 4032, which is a race between creating a view and streaming sstables to a node. Before this patch the following scenario is possible: - sstable X arrives from a streaming session - we decide that view updates won't be generated from an sstable X by the view builder - new view is created for the table that owns sstable X - view builder doesn't generate updates from sstable X, even though the table has accompanying views - which is an inconsistency This race is fixed by making the view builder wait for all ongoing streams, just like it does for reads and writes. It's implemented with a phaser. Tests: unit (release) dtest(not merged yet: materialized_views_test.TestMaterializedViews.stream_from_repair_during_build_process_test) " * 'add_stream_phasing_2' of https://github.com/psarna/scylla: repair: add stream phasing to row level repair streaming: add phasing incoming streams multishard_writer: add phaser operation parameter view: wait for stream sessions to finish before view building table: wait for pending streams on table::stop database: add pending streams phaser	2019-01-15 18:18:40 +00:00
Piotr Sarna	0eb703dc80	all: rename view_update_from_staging_generator The new name, view_update_generator, is both more concise and correct, since we now generate from directories other than "/staging".	2019-01-15 17:31:47 +01:00
Piotr Sarna	a5d24e40e0	distributed_loader: fix indentation Bad indentation was introduced in the previous commit.	2019-01-15 17:31:37 +01:00
Piotr Sarna	13c8c84045	service: add generating view updates from uploaded sstables SSTables loaded to the system via /upload dir may sometimes be needed to generate view updates from them (if their table has accompanying views). Fixes #4047	2019-01-15 17:31:37 +01:00
Piotr Sarna	46305861c3	init: pass view update generator to storage service Storage service needs to access view update generator in order to register staging sstables from /upload directory.	2019-01-15 17:31:36 +01:00
Piotr Sarna	13f6453350	sstables: treat sstables in upload dir as needing view build In some cases, sstables put in the upload dir should have view updates generated from them. In order to avoid moving them across directories (which then involves handling failure paths), upload dir will also be treated as a valid directory where staging sstables reside. Regular sstables that are not needed for view updates will be immediately moved from upload/ dir as before.	2019-01-15 16:47:01 +01:00
Piotr Sarna	09401e0e71	sstables,table: rename is_staging to requires_view_building A generalized name will be more fitting once we treat uploaded sstables as requiring view building too.	2019-01-15 16:47:01 +01:00
Piotr Sarna	76616f6803	distributed_loader: use proper directory for opening SSTable Previous implementation assumes that each SSTable resides directly in table::datadir directory, while what should actually be used is directory path from SSTable descriptor. This patch prevents a regression when adding staging sstables support for upload/ dir.	2019-01-15 16:47:01 +01:00
Piotr Sarna	beb4836726	db,view: make throttling optional for view_update_generator Currently registering new view updates is throttled by a semaphore, which makes sense during stream sessions in order to avoid overloading the queue. Still, registration also occurs during initialization, where it makes little sense to wait on a semaphore, since view update generator might not have started at all yet.	2019-01-15 16:47:01 +01:00
Paweł Dziepak	635873639b	Merge "Encoding stats enhancements" from Benny " Cleanup various cases related to updating of metatdata stats and encoding stats updating in preparation for 64-bit gc_clock (#3353). Fixes #4026 Fixes #4033 Fixes #4035 Fixes #4041 Refs #3353 " * 'projects/encoding-stats-fixes/v6' of https://github.com/bhalevy/scylla: sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES sstables: mc: use api::timestamp_type in write_liveness_info sstables: mc: sstable_write encoding_stats are const mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time memtable: don't use encoding_stats epochs as default memtable: mc: udpate min_ttl encoding stats for dead row marker memtable: mc: add comment regarding updating encoding stats of collection tombstones sstables: metadata_collector: add update tombstone stats sstables: assert that delete_time is not live when updating stats sstables: move update_deletion_time_stats to metadata collector sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram sstables: mc: write_liveness_info and write_collection should update tombstone_histogram sstables: update_local_deletion_time for row marker deletion_time and expiration	2019-01-15 16:53:36 +02:00
Tomasz Grabiec	32f711ce56	row_cache: Fix crash on memtable flush with LCS Presence checker is constructed and destroyed in the standard allocator context, but the presence check was invoked in the LSA context. If the presence checker allocates and caches some managed objects, there will be alloc-dealloc mismatch. That is the case with LeveledCompactionStrategy, which uses incremental_selector. Fix by invoking the presence check in the standard allocator context. Fixes #4063. Message-Id: <1547547700-16599-1-git-send-email-tgrabiec@scylladb.com>	2019-01-15 16:53:36 +02:00
Piotr Sarna	08a42d47a5	repair: add stream phasing to row level repair In order to allow other services to wait for incoming streams to finish, row level repair uses stream phasing when creating new sstables from incoming data. Fixes scylladb#4032	2019-01-15 10:28:21 +01:00
Piotr Sarna	7e61f02365	streaming: add phasing incoming streams Incoming streams are now phased, which can be leveraged later to wait for all ongoing streams to finish. Refs #4032	2019-01-15 10:28:15 +01:00
Asias He	1cc7e45f44	database: Make log max_vector_size and internal_count debug level It is useful for developers but not useful for users. Make it debug level. Message-Id: <775ce22d6f8088a44d35601509622a7e73ddeb9b.1547524976.git.asias@scylladb.com>	2019-01-15 11:02:30 +02:00
Piotr Sarna	238003b773	multishard_writer: add phaser operation parameter Multishard writer can now accept a phaser operation parameter in order to sustain a phased operation (e.g. a streaming session).	2019-01-15 10:02:22 +01:00
Piotr Sarna	b9203ec4f8	view: wait for stream sessions to finish before view building During streaming, there's a race between streamed sstables and view creation, which might result in some tables not being used to generate view updates, even though they should. That happens when the decision about view update path for a table is done before view creation, but after already receiving some sstables via streaming. These will not be used in view building even though they should. Hence, a phaser is used to make the view builder wait for all ongoing stream sessions for a table to finish before proceeding with build steps. Refs #4032	2019-01-15 09:36:55 +01:00
Piotr Sarna	d3a8fb378c	table: wait for pending streams on table::stop Stream sessions are now phased, so it's possible to wait for existing streams to finish gently before stopping a table.	2019-01-15 09:36:55 +01:00
Piotr Sarna	8a5aaf2839	database: add pending streams phaser This phaser will be used later to wait for all existing stream sessions to finish before proceeding with view building.	2019-01-15 09:36:55 +01:00
Nadav Har'El	9062750089	scylla_util.py: make view_hints_directory setting optional It is optional to set "view_hints_directory", so we shouldn't insist that it is defined in scylla.yaml on upgrade. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190114125225.10794-1-nyh@scylladb.com>	2019-01-14 14:59:20 +02:00
Benny Halevy	238866228f	memtable: rename get_stats to get_encoding_stats For symmetry reasons to similar sstable and compaction methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190113105155.29118-2-bhalevy@scylladb.com>	2019-01-14 14:58:43 +02:00
Avi Kivity	df090a15ff	Merge "Add counters for inactive reads" from Botond " This mini-series adds counters for the inactive reads registered in the reader concurrency semaphore. " * 'reader-concurrency-semaphore-counters/v6' of https://github.com/denesb/scylla: tests/querier_cache: use stats to get the no. of inactive reads reader_concurrency_semaphore: add counters for inactive reads	2019-01-14 11:56:43 +02:00
Rafael Ávila de Espíndola	acd6999ba9	Don't use SEASTAR_HAVE_LZ4_COMPRESS_DEFAULT in scylla The existence of LZ4_compress_default is a property of the lz4 library, not seastar. With this patch scylla does its own configure check instead of depending on the one done by seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190114013737.5395-1-espindola@scylladb.com>	2019-01-14 11:51:20 +02:00
Rafael Ávila de Espíndola	684fb607c4	sstable: handle missing index entry This patch fixes a crash when the index file is corrupted and we get an empty index entry list. Tests: unit (release) Fixes: 2532 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190110202833.29333-1-espindola@scylladb.com>	2019-01-14 10:47:21 +01:00
Avi Kivity	f5ee466a1c	Merge "Cleanup UDT and tuple names creation" from Piotr " Currently the logic is scattered between types., cql3_types. and sstables/mc/writer.cc. This patchset places all the logic in types.* and makes sure we correctly add "frozen<...>" and "FrozenType(...)" to the names of tuples and UDTs. Fixes #4087 Tests: unit(release) " * 'haaawk/4087_v1' of github.com:scylladb/seastar-dev: Add comment explaining tuple type name creation Add "FrozenType(...)" to UDT name only when it's frozen Move "FrozenType(...)" addition to UDT name to user_type_impl Add "frozen<...>" to tuple CQL name only when it's frozen Move "frozen<...>" addition to tuple CQL name to tuple_type_impl Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type Add "frozen<...>" to UDT CQL name only when it's frozen Move "frozen<...>" addition to UDT CQL name to user_type_impl	2019-01-13 15:34:24 +02:00
Benny Halevy	b243852a70	sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	d9e2aa65fc	sstables: mc: use api::timestamp_type in write_liveness_info Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	7ea96aa778	sstables: mc: sstable_write encoding_stats are const Encoding stats are immutable once statistics are sealed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	5d2d2bf47a	mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time It is actually the local deletion time rather than the ttl Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	2c99eb28d8	memtable: don't use encoding_stats epochs as default Why default to an artificial minimum when you can do better with zero effort? Track the actual minima in the memtable instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	9b78911379	memtable: mc: udpate min_ttl encoding stats for dead row marker Update min ttl with expired_liveness_ttl (although it's value of max int32 is not expected to affect the minimum). Fixes #4041 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	47964d9ddc	memtable: mc: add comment regarding updating encoding stats of collection tombstones When the row flag has_complex_deletion is set, some collection columns may have deletion tombstones and some may not. we don't strictly need to update stats will not affect the encoding_stats anyway. Fixes #4035 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	75ccd29b6a	sstables: metadata_collector: add update tombstone stats Conditionally update timestamp and local_deletion_time stats based on tombstone Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	0ae85a126a	sstables: assert that delete_time is not live when updating stats Be compatible with Cassandra Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	12e6b503c9	sstables: move update_deletion_time_stats to metadata collector Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	2989b986ef	sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram Refs #4026 Refs #4033 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	bcb1fcd402	sstables: mc: write_liveness_info and write_collection should update tombstone_histogram Fixes #4033 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	0ca4ae658c	sstables: update_local_deletion_time for row marker deletion_time and expiration Fixes #4026 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Tomasz Grabiec	f12a3e2066	sstables: index_reader: Rename _promoted_index_size Message-Id: <1547219234-21182-2-git-send-email-tgrabiec@scylladb.com>	2019-01-13 11:29:13 +02:00
Tomasz Grabiec	6c5f8e0eda	sstables: index_reader: Simplify offset calculations Now that continuous_data_consumer::position() is meaningful (since `36dd660`), we can use our position in the stream to calculate offsets instead of duplicating state machine in offset calculations. The value of position() - data.size() always holds the current offset in the stream. Message-Id: <1547219234-21182-1-git-send-email-tgrabiec@scylladb.com>	2019-01-13 11:29:12 +02:00
Avi Kivity	0d52bdcbad	install-dependencies.sh: unwrap long lines Put package names one per line. This makes it easier to review changes, and to backport changes to this file. No content changes. Message-Id: <20190112091024.21878-1-avi@scylladb.com>	2019-01-12 14:23:27 +02:00
Avi Kivity	391d1e0fe0	table: const correctness for table::get_sstables() and related Do not allow write access to the sstable list via this accessor. Luckily there are no violations, and now we enforce it. Message-Id: <20190111151049.16953-1-avi@scylladb.com>	2019-01-11 17:39:17 +01:00
Rafael Ávila de Espíndola	cd9ce18874	sstable: rename the is_boundary predicate The new name makes it clear what is on either side of the boundary. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190110221324.33618-1-espindola@scylladb.com>	2019-01-11 14:36:49 +02:00
Piotr Jastrzebski	96b880f81c	Add comment explaining tuple type name creation To keep format compatibiliti we never wrap tuple type name into "org.apache.cassandra.db.marshal.FrozenType(...)". Even when the tuple is frozen. This patch adds a comment in tuple_type_impl::make_name that explains the situation. For more details see #4087 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:14:26 +01:00
Piotr Jastrzebski	57e655d716	Add "FrozenType(...)" to UDT name only when it's frozen At the moment Scylla supports only frozen UDTs but the code should be able to handle non-frozen UDTs as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:08:02 +01:00
Piotr Jastrzebski	fc17bd376b	Move "FrozenType(...)" addition to UDT name to user_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:07:47 +01:00
Piotr Jastrzebski	1fdfc461b8	Add "frozen<...>" to tuple CQL name only when it's frozen At the moment Scylla supports only frozen tuples but the code should be able to handle non-frozen tuples as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	749eee2711	Move "frozen<...>" addition to tuple CQL name to tuple_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	7aba17de2c	Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	56060573bb	Add "frozen<...>" to UDT CQL name only when it's frozen At the moment Scylla supports only frozen UDTs but the code should be able to handle non-frozen UDTs as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	a928c103c2	Move "frozen<...>" addition to UDT CQL name to user_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:09:00 +01:00
Raphael S. Carvalho	1b7cad3531	database: Fix race condition in sstable snapshot Race condition takes place when one of the sstables selected by snapshot is deleted by compaction. Snapshot fails because it tries to link a sstable that was previously unlinked by compaction's sstable deletion. Fixes #4051. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190110194048.26051-1-raphaelsc@scylladb.com>	2019-01-11 07:53:14 +02:00
Benny Halevy	2dc3776407	sstables: mc: sign-extend serialization_header min_local_deletion_time_base and min_ttl_base Refs #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190110141439.1324-1-bhalevy@scylladb.com>	2019-01-10 16:23:20 +02:00
Gleb Natapov	a29182b447	sstable: fix use after free while applying extensions in sstable::open_file sstable_file_io_extensions() return an array of pointers to extensions, but do_for_each() may defer and the array will be destroyed. The match keeps it alive until do_for_each completes. Message-Id: <20190110125656.GC3172@scylladb.com>	2019-01-10 15:10:06 +02:00
Avi Kivity	b247ce01c3	table: restore indentation after changes to table::make_sstable_reader Message-Id: <20190109175804.9352-2-avi@scylladb.com>	2019-01-10 13:00:53 +01:00
Avi Kivity	3d6be2f822	table: reduce duplication in table::make_sstable_reader make_sstable_reader needs to deal with single-key and scanning reads, and with restricting and non-restricting (in terms of read concurrency) readers. Right now it does this combinatorically - there are separate cases for restricting single-key reads, non-restricting single-key reads, restricing scans, and non-restricting scans. This makes further changes more complicated, so separate the two concepts. The patch splits the code into two stages; the first selects between a single-key and a scan, and the second selects between a restricting and non-restricting read. This slightly pessimizes non-restricting reads (a mutation_source is created and immediately destroyed), but that's not the common case. Tests: unit(release) Message-Id: <20190109175804.9352-1-avi@scylladb.com>	2019-01-10 13:00:40 +01:00
Benny Halevy	16dda033a5	sstables: row_marker: initialize _expiry compare_row_marker_for_merge compares deletion_time also for row markers that have missing timestamps. This happened to succeed due to implicit initialization to 0. However, we prefer the initialization to be explicit and allow calling row_marker::deletion_time() in all states. Fixes #4068 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190110102949.17896-1-bhalevy@scylladb.com>	2019-01-10 12:45:07 +01:00
Avi Kivity	4a6aeced59	Merge "Fix UDTs representation in serialization header" from Piotr " Tests: unit(release) " Fixes #4073. * commit 'FETCH_HEAD~1': Add test for serialization header with UDT Fix UDT names in serialization header	2019-01-10 12:57:11 +02:00
Piotr Jastrzebski	d4bc5b64cf	Add test for serialization header with UDT Serialization header stores column types for all columns in sstable. If any of them is a UDT then it has to be wrapped into "org.apache.cassandra.db.marshal.FrozenType(...)". This patch adds a test case to verify that. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-10 10:59:01 +01:00
Piotr Jastrzebski	3de85aebc9	Fix UDT names in serialization header Serialization header stores type names of all columns in a table. Including partition key columns, clustering key columns, static columns and regular columns. If one of those types is a user defined type then we need to wrap its name into "org.apache.cassandra.db.marshal.FrozenType(...)". Fixes #4073 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-10 10:58:30 +01:00
Benny Halevy	60323b79d1	sstables: mc: sign-extend delta local_deletion_time and delta ttl Follow Cassandra's encoding so that values that are less than the baseline encoding_stats will wrap-around in 64-bits rather tham 32. Fixes #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com>	2019-01-09 21:43:30 +02:00
Rafael Ávila de Espíndola	26ac2c23ef	Change _row_ names that refer to partitions This renames some variables and functions to make it clear that they refer to partitions and not rows. Old versions of sstablemetadata used to refer to a row histogram, but current versions now mention a partition histogram instead. This patch doesn't change the exposed API names. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181229223311.4184-2-espindola@scylladb.com>	2019-01-09 14:53:42 +02:00
Takuya ASADA	f00e9051ea	reloc: show error message when relocatable package doesn't exist Both build_rpm.sh/build_deb.sh are failing at beginning of the script when relocatable package does not exist, need to prevent it and show user friendly message. Fixes #4071 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190109094353.16690-1-syuu@scylladb.com>	2019-01-09 12:53:08 +02:00
Raphael S. Carvalho	f5301990fc	compaction: release reference of cleaned sstable in compaction manager Compaction manager holds reference to all cleaning sstables till the very end, and that becomes a problem because disk space of cleaned sstables cannot be reclaimed due to respective file descriptors opened. Fixes #3735. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181221000941.15024-1-raphaelsc@scylladb.com>	2019-01-08 14:14:01 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Rafael Ávila de Espíndola	51a08c3240	sstable: remove constexpr from run time predicates We never check these predicates at compile time. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190108010055.92042-1-espindola@scylladb.com>	2019-01-08 12:28:42 +02:00
Piotr Sarna	c5346cdf9b	database, table: split table-related code to table.cc All table:: related code is moved to table.cc source file, which splits database.cc size in half and thus allows faster compilation on multiple cores. Refs #1 Message-Id: <28e67f7793ff2147ffce18df5e0b077e14d3b8bd.1546940360.git.sarna@scylladb.com>	2019-01-08 12:02:42 +02:00
Avi Kivity	8ecb528d5a	Update seastar submodule * seastar 67fd967...af6b797 (1): > iotune: Initialize io_rates member variables Fixes #4064	2019-01-08 12:02:42 +02:00
Avi Kivity	d8adbeda11	tests: mutation_source_test: generate valid utf-8 data test_fast_forwarding_across_partitions_to_empty_range uses an uninitialized string to populate an sstable, but this can be invalid utf-8 so that sstable cannot be sstabledumped. Make it valid by using make_random_string(). Fixes #4040. Message-Id: <20190107193240.14409-1-avi@scylladb.com>	2019-01-08 12:02:42 +02:00
Asias He	1de24c8495	repair: Use mf.visit() in fragment_hasher When new fragment type is added, it will fail to compile instead of producing runtime errors. Message-Id: <cf10200e4185c779aad15da3a776a5b79f5323af.1546930796.git.asias@scylladb.com>	2019-01-08 12:02:42 +02:00
Rafael Ávila de Espíndola	67039e942b	Remove the only use of with_alignment from scylla In c++17 there are standard ways of requesting aligned memory, so seastar doesn't need to provide one. This patch is in preparation for removing with_alignment from seastar. Tests: unit (debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190107191019.22295-1-espindola@scylladb.com>	2019-01-07 21:34:47 +02:00
Rafael Ávila de Espíndola	0d4529a5f1	Change timeout to fix tests in a debug build The current timeout is way too small for debug builds. Currently jenkins runs avoid the problem by increasing the timeout by 100x. This patch increases it by 10x, with seems to be sufficient to run the tests in most desktop machines. Message-Id: <20190107191413.22531-1-espindola@scylladb.com>	2019-01-07 21:34:06 +02:00
Avi Kivity	34251f5ea1	tools: toolchain: update image for all-user sudo	2019-01-07 21:22:42 +02:00
Takuya ASADA	3514b185fd	tools: toolchain: allow sudo for all users Non-privileged user may not belongs to "wheel" group, for example Debian variants uses "sudo" group instead of "wheel". To make sudo able to work on all environment we should allow sudo for "ALL" instead of "wheel". Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190107173410.23140-1-syuu@scylladb.com>	2019-01-07 20:47:22 +02:00
Benny Halevy	40410465d7	sstables: mc: expired_liveness_ttl should be max int32_t rather than max uint32_t Corresponding to Cassandra's EXPIRED_LIVENESS_TTL = Integer.MAX_VALUE; Fixes #4060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190107172457.20430-1-bhalevy@scylladb.com>	2019-01-07 18:41:37 +01:00
Avi Kivity	20b6d00e56	tools: toolchain: support dbuild from subdirectory or parent directory of scylla.git When building something other than Scylla (like scylla-tools-java or scylla-jmx) it is convenient to run it from some other directory. To do that, allow running dbuild from any directory (so we locate tools/toolchain/image relative to the dbuild script rather than use a fixed path) and mount the current directory since it's likely the user will want to access files there. Message-Id: <20190107165824.25164-1-avi@scylladb.com>	2019-01-07 18:35:51 +01:00
Nadav Har'El	f6e0ce02fa	docs/isolation.md: new document Start a new document with an overview of isolation in Scylla, i.e., scheduling groups, I/O priority classes, controllers, etc. As all documents in docs/, this is a document for developers (not users!) who need to understand how isolation between different pieces of Scylla (e.g., queries, compaction, repair, etc.) works, which scheduling groups and I/O classes we have and why, etc. The document is still very partial and includes a lot of TODOs on places where the explanation needs to be expanded. In particular it needs an accurate explanation (and not just a name) of what kind of work is done under each of the groups and classes, and an explanation of how we set up RPC to use which scheduling groups for the code it executes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190103183232.21348-1-nyh@scylladb.com>	2019-01-07 17:48:35 +02:00
Botond Dénes	80affca5f7	tests/querier_cache: use stats to get the no. of inactive reads Now that we added stats for the inactive reads, the tests don't need the `reader_concurrency_semaphore::inactive_reads()` method, instead they can rely on the stats to check the number of inactive reads.	2019-01-07 17:06:26 +02:00
Botond Dénes	e56c26205f	reader_concurrency_semaphore: add counters for inactive reads Add counters that give insight into inactive read related events. Two counters are added: * permit_based_evictions * population	2019-01-07 16:45:49 +02:00
Nadav Har'El	da090a5458	materialized views: move hints to top-level directory While we keep ordinary hints in a directory parallel to the data directory, we decided to keep the materialized view hints in a subdirectory of the data directory, named "view_pending_updates". But during boot, we expect all subdirectories of data/ to be keyspace names, and when we notice this one, we print a warning: WARN: database - Skipping undefined keyspace: view_pending_updates This spurious warning annoyed users. But moreover, we could have bigger problems if the user actually tries to create a keyspace with that name. So in this patch, we move the view hints to a separate top-level directory, which defaults to /var/lib/scylla/view_hints, but as usual can be configured. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190107142257.16342-1-nyh@scylladb.com>	2019-01-07 16:43:43 +02:00
Takuya ASADA	eddecdd0b5	dist/redhat: drop unused dependencies wget and yum-builddep are not used anymore, don't install them. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190107091148.1590-7-syuu@scylladb.com>	2019-01-07 12:56:18 +00:00
Takuya ASADA	40dc62fa98	dist/debian: don't use sudo to rm debian dir sudo does not allowed in dbuild with non-root privilege, and also it should be owned by current user, stop using sudo. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190107091148.1590-5-syuu@scylladb.com>	2019-01-07 12:56:18 +00:00
Takuya ASADA	237de20ff9	dist/debian: correct dbuild path /usr/sbin/debuild is typo, should be /usr/bin. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190107091148.1590-4-syuu@scylladb.com>	2019-01-07 12:56:17 +00:00
Pekka Enberg	2520c8caac	Merge 'Improve frozen toolchain for continuous integration' from Avi "Add features that are useful for continuous integration pipelines (and also ordinary developers): - sudo support, with and without a tty, as our packaging scripts require it - install ccache package to allow reducing incremental build times - dependencies needed to build scylla-jmx and scylla-tools-java" * tag 'toolchain-ci/v1' of https://github.com/avikivity/scylla: tools: toolchain: update image for ant, maven, ccache, sudo tools: toolchain: dbuild: pass-through supplementary groups tools: toolchain: defeat PAM tools: toolchain: improve sudo support tools: toolchain: break long line in dbuild tools: toolchain: prepare sudoers file tools: toolchain: install ccache install-dependencies.sh: add maven and ant	2019-01-07 12:56:17 +00:00
Pekka Enberg	9b27a3035c	Merge 'Reduce inclusions of "database.hh"' from Avi "This patchset reduces inclusions of database.hh, particularly in header files. It reduces the number of objects depending on database.hh from 166 to 116. Tests: unit(release), playing a little with tracing" * tag 'database.hh/v1' of https://github.com/avikivity/scylla: streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh sstables: writer.hh: add some forward declarations table_helper: remove database.hh include table_helper: de-inline insert() and setup_keyspace() table_helper: de-template setup_keyspace() table_helper: simplify template body of table_helper::insert() schema_tables: remove #include of database.hh cql_type_parser: remove dependency on user_types_metadata thrift: add missing include of sleep.hh cql3: ks_prop_defs: remove #include "database.hh"	2019-01-07 12:56:17 +00:00
Benny Halevy	b017d87a43	tests: mc: add back missing sstable_3_x_test Statistics.db files To be able to verify the golden version with sstabledump. These files were generated by running sstable_3_x_test and keeping its generated output files. Refs #4043 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190103112511.23488-2-bhalevy@scylladb.com>	2019-01-07 12:56:16 +00:00
Benny Halevy	517ad58823	tests: mc: delete empty line from write_static_row/mc-1-big-TOC.txt Refs #4043 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190103112511.23488-1-bhalevy@scylladb.com>	2019-01-07 12:56:16 +00:00
Nadav Har'El	b14616b879	docs/logging.md: improvements Various small improvements to docs/logging.md: 1. Describe the options to log to stdout or syslog and their defaults. 2. Mention the possibility of using nodetool instead of REST API. 3. Additional small tweaks to formatting. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190106111851.26700-1-nyh@scylladb.com>	2019-01-06 13:20:53 +02:00
Nadav Har'El	232e97ad06	docs/logging.md: new document Add a new document about logging in Scylla, and how to change the log levels when running Scylla and during the run. It needs more developer-oriented information (e.g., how to create new logger subsystems in the code) but I think it's a good start. Some of the text is based on Glauber's writeup for the Scylla website on changing log levels at runtime. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190106103606.26032-1-nyh@scylladb.com>	2019-01-06 12:40:14 +02:00
Benny Halevy	2daf81e80f	dist: redhat/debian specs: add dependency on 'file' package Needed by seastar-addr2line Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190101203434.14858-1-bhalevy@scylladb.com>	2019-01-06 12:13:08 +02:00
Avi Kivity	f02c64cadf	streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh This header, which is easily replaced with a forward declaration, introduces a dependency on database.hh everywhere. Remove it and scatter includes of database.hh in source files that really need it.	2019-01-05 17:33:25 +02:00
Avi Kivity	ca93b88cfb	sstables: writer.hh: add some forward declarations This makes the header less dependent on previously-included headers.	2019-01-05 17:04:16 +02:00
Avi Kivity	53a21c7787	table_helper: remove database.hh include	2019-01-05 16:39:26 +02:00
Avi Kivity	7534412071	table_helper: de-inline insert() and setup_keyspace() After previous patches de-templated these functions, we can de-inline them. This helps reduce compile time and prepares to reduce header dependencies.	2019-01-05 16:28:46 +02:00
Avi Kivity	cfedf4ab0f	table_helper: de-template setup_keyspace() This setup function has no reason to be a template and is easily converted. We can then later de-inline it to reduce dependencies.	2019-01-05 16:23:10 +02:00
Avi Kivity	659147cd79	table_helper: simplify template body of table_helper::insert() Move most of the body into a non-template overload to reduce dependencies in the header (and template bloat). The function is not on any fast path, and noncopyable_function will likely not even allocate anything.	2019-01-05 16:22:08 +02:00
Avi Kivity	c3ef99f84f	schema_tables: remove #include of database.hh Distribute in source files (and one header - table_helper.hh) that need it.	2019-01-05 15:43:07 +02:00
Avi Kivity	f43f82d1d2	cql_type_parser: remove dependency on user_types_metadata A default parameter of type T (or lw_shared_ptr<T>) requires that T be defined. Remove the depndency by redefining the default parameter as an overload, for T = user_types_metadata.	2019-01-05 15:40:58 +02:00
Avi Kivity	4ba1d4d1dc	thrift: add missing include of sleep.hh Currently obtained indirectly through database.hh.	2019-01-05 15:39:30 +02:00
Avi Kivity	d24962e16c	cql3: ks_prop_defs: remove #include "database.hh" Replace with forward declaration to reduce rebuilds.	2019-01-05 14:26:03 +02:00
Jesse Haber-Kucharsky	17a5f7acab	build: Link against libatomic Since Scylla uses functions from the `atomic` header in its own source code, we need to explicitly link against the stub library that is provided for hardware architectures that do not have native support for atomic operations. Fixes #4053 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <7d62e762130494d73565ce8c031f53aaf866d3aa.1546645041.git.jhaberku@scylladb.com>	2019-01-05 13:38:57 +02:00
Avi Kivity	36e4e9fb54	Update seastar submodule * seastar 6c8c229...67fd967 (1): > perftune.py: tune only active NVMe HW queues on i3 AWS instances	2019-01-04 13:17:29 +02:00
Avi Kivity	b0980ba7c6	compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads The workload in #3844 has these characteristics: - very small data set size (a few gigabytes per shard) - large working set size (all the data, enough for high cache miss rate) - high overwrite rate (so a compaction results in 12X data reduction) As a result, the compaction backlog controller assigns very few shares to compaction (low data set size -> low backlog), so compaction proceeds very slowly. Meanwhile, we have tons of cache misses, and each cache miss needs to read from a large number of sstables (since compaction isn't progressing). The end result is a high read amplification, and in this test, timeouts. While we could declare that the scenario is very artificial, there are other real-world scenarios that could trigger it. Consider a 100% write load (population phase) followed by 100% read. Towards the end of the last compaction, the backlog will drop more and more until compaction slows to a crawl, and until it completes, all the data (for that compaction) will have to be read from its input sstables, resulting in read amplification. We should probably have read amplification affect the backlog, but for now the simpler solution is to increase the minimum shares to 50 so that compaction always makes forward progress. This will result in higher-than-needed compaction bandwidth in some low write rate scenarios so we will see fluctuations in request rate (what the controller was designed to avoid), but these fluctioations will be limited to 5%. Since the base class backlog_controller has a fixed (0, 0) point, remove it and add it to derived classes (setting it to (0, 50) for compaction). Fixes #3844 (or at least improves it). Message-Id: <20181231162710.29410-1-avi@scylladb.com>	2019-01-04 10:58:43 +01:00
Duarte Nunes	b851cb1a9a	distributed_loader: Forbid uploading MV sstables Instead suggest that the views be re-created. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190103142933.35354-1-duarte@scylladb.com>	2019-01-03 16:31:20 +02:00
Avi Kivity	7d3562a403	tools: toolchain: update image for ant, maven, ccache, sudo	2019-01-03 16:16:47 +02:00
Avi Kivity	344468e20d	tools: toolchain: dbuild: pass-through supplementary groups Useful for ccache.	2019-01-03 16:16:47 +02:00
Avi Kivity	11889f5ea9	tools: toolchain: defeat PAM Prevent PAM from enforcing security and preventing sudo from working. This is done by replacing the default configuration (designed for workstations) to one that uses pam_permit for everything.	2019-01-03 16:16:47 +02:00
Avi Kivity	9c258923d8	tools: toolchain: improve sudo support Bind-mount /etc/passwd and /etc/group so sudo doesn't complain, and support sudo without password or tty.	2019-01-03 16:16:47 +02:00
Avi Kivity	05f78df7b9	tools: toolchain: break long line in dbuild	2019-01-03 16:16:47 +02:00
Avi Kivity	f79a300081	tools: toolchain: prepare sudoers file Don't require a tty or passwords, since they won't be available in continuous integration environments.	2019-01-03 16:16:47 +02:00
Avi Kivity	25040824cf	tools: toolchain: install ccache Not strictly necessary, but often useful to reduce rebuild times. The user will need to bind-mount a populated cache.	2019-01-03 16:16:47 +02:00
Avi Kivity	527e3a58ff	install-dependencies.sh: add maven and ant Add tools needed to build scylla-jmx and scylla-tools-java. While not requirements of this repository, it's nicer if a single setup can be used to build and run everything. We also install pystache as it's used by packaging scripts.	2019-01-03 16:16:45 +02:00
Duarte Nunes	3235c13125	utils/fragmented_temporary_buffer: Correctly implement remove_suffix() The current implementation breaks the invariant that _size_bytes = reduce(_fragments, &temporary_buffer::size) In particular, this breaks algorithms that check the individual segment size. Correctly implement remove_suffix() by destroying superfluous temporary_buffer's and by trimming the last one, if needed. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190103133523.34937-1-duarte@scylladb.com>	2019-01-03 13:37:01 +00:00
Botond Dénes	021feef513	querier_cache: simplify memory eviction use-after-free fix, add tests Simplify the fix for memory based eviction, introduced by `918d255` so there is no need to massage the counters. Also add a check to `test_memory_based_cache_eviction` which checks for the bug fixed. While at it also add a check to `test_time_based_cache_eviction` for the fix to time based eviction (`e5a0ea3`). Tests: tests/querier_cache:debug Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c89e2788a88c2a701a2c39f377328e77ac01e3ef.1546515465.git.bdenes@scylladb.com>	2019-01-03 13:44:08 +02:00
Tomasz Grabiec	1613a623e1	Merge "Fix crash on corrupt sstable" from Rafael * https://github.com/espindola/scylla espindola/invalid_boundary4: sstables: Refactor predicates on bound_kind_m Fix crash on corrupt sstable	2019-01-03 12:02:09 +01:00
Duarte Nunes	42d9ca8266	Merge 'Add staging SSTables support to row level repair' from Piotr " This series adds staging SSTables support to row level repair. It was introduced for streaming sessions before, but since row level repair doesn't leverage sessions at all, it's added separately. Tests: unit (release) dtest (repair_additional_test.py:RepairAdditionalTest, excluding repair_abort_test, which fails for me locally on master) " * 'add_staging_sstables_generation_to_row_level_repair_2' of https://github.com/psarna/scylla: repair: add staging sstables support to row level repair main,repair: add params to row level repair init streaming,view: move view update checks to separate file	2019-01-03 09:40:13 +00:00
Piotr Sarna	a73d9ccf31	service: mark existing views as built before bootstrap When a node is bootstrapping, it will receive data from other nodes via streaming, including materialized views. Regardless whether these views are built on other nodes or not, building them on newly bootstrapped nodes has no effect - updates were either already streamed completely (if view building have finished) or will be propagated via view building, if the process is still ongoing. So, marking all views as 'built' for the bootstrapped node prevents it from spawning superfluous view building processes. Fixes #4001 Message-Id: <fd53692c38d944122d1b1013fdb0aedf517fa409.1546498861.git.sarna@scylladb.com>	2019-01-03 09:39:33 +00:00
Botond Dénes	e5a0ea390a	querier_cache: unregister queriers evicted due to expired TTL Currently queriers evicted due to their TTL expiring are not unregistered from the `reader_concurrency_semaphore`. This can cause a use-after-free when the semaphore tries to evict the same querier at some later point in time, as the querier entry it has a pointer to is now invalid. Fix by unregistering the querier from the semaphore before destroying the entry. Refs: #4018 Refs: #4031 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4adfd09f5af8a12d73c29d59407a789324cd3d01.1546504034.git.bdenes@scylladb.com>	2019-01-03 10:29:26 +02:00
Piotr Sarna	bc74ac6f09	repair: add staging sstables support to row level repair In some cases, sstables created during row level repair should be enqueued as staging in order to generate view updates from them. Fixes #4034	2019-01-03 08:36:45 +01:00
Piotr Sarna	a0003c52cf	main,repair: add params to row level repair init Row level repair needs references to system distributed keyspace and view update generator in order to enqueue some sstables as staging.	2019-01-03 08:31:41 +01:00
Piotr Sarna	9d46715613	streaming,view: move view update checks to separate file Checking if view update path should be used for sstables is going to be reused in row level repair code, so relevant functions are moved to a separate header.	2019-01-03 08:31:40 +01:00
Avi Kivity	918d255168	querier_cache: unregister querier from reader_concurrency_semaphore during eviction In insert_querier(), we may evict older queriers to make room for the new one. However, we forgot to unregister the evicted queriers from reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore eventually wanted to evict something, it saw an inactive_read_handle that was not connected to a querier_cache::entry, and crashed on use-after-free. Fix by evicting through the inactive_read_handle associated with the querier to be evicted. This removes traces of the querier from both reader_concurrency_semaphore and querier_cache. We also have to massage the statistics since querier_inactive_read::evict() updates different counters. Fixes #4018. Tests: unit(release) Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190102175023.26093-1-avi@scylladb.com>	2019-01-03 09:15:07 +02:00
Rafael Ávila de Espíndola	28c014351f	Fix crash on corrupt sstable The check in consume_range_tombstone was too late. Before getting to it we would fail an assert calling to_bound_kind. This moves the check earlier and adds a testcase. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-02 17:52:07 -08:00
Rafael Ávila de Espíndola	3c9178d122	sstables: Refactor predicates on bound_kind_m This moves the predicate functions to the start of the file, renames is_in_bound_kind to is_bound_kind for consistency with to_bound_kind and defines all predicates in a similar fashion. It also uses the predicates to reduce code duplication. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-02 17:50:44 -08:00
Avi Kivity	2717bdd301	tools: toolchain: allow adjusting "docker run" command line It is useful to adjust the command line when running the docker image, for example to attach a data volume or a ccache directory. Add e mechanism to do that. Message-Id: <20181228163306.19439-1-avi@scylladb.com>	2019-01-01 21:44:50 +00:00
Avi Kivity	d19660ec0a	Merge "commitlog: Use fragmented buffers for reading entries" from Duarte " Instead of allocating a contiguous temporary_buffer when reading mutations from the commitlog - or hint - replaying, use fragemnted buffers instead. Refs #4020 " * 'commitlog/fragmented-read/v1' of https://github.com/duarten/scylla: db/commitlog: Use fragmented buffers to read entries db/commitlog: Implement skip in terms of input buffer skipping tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix() utils/fragmented_temporary_buffer: Add remove_suffix tests/fragmented_temporary_buffer_test: Add unit test for skip() utils/fragmented_temporary_buffer: Allow skipping in the input stream	2019-01-01 19:08:34 +02:00
Avi Kivity	6641353854	tracing: remove static class_registry Static class_registries hinder librarification by requiring linking with all object files (instead of a library from which objects are linked on demand) and reduce readability by hiding dependencies and by their horrible syntax. Hide them behind a non-static, non-template tracing backend registry. Message-Id: <20181229121000.7885-1-avi@scylladb.com>	2018-12-31 13:24:54 +00:00
Duarte Nunes	b7517183fa	db/commitlog: Use fragmented buffers to read entries Leverage fragmented_temporary_buffer when reading commit log entries, avoiding large allocations. Refs #4020 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	0e50a9bc6d	db/commitlog: Implement skip in terms of input buffer skipping This simplifies the code and allows to get rid of the overload of advance() taking a temporary_buffer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8379ac6189	tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	1a88cd7992	utils/fragmented_temporary_buffer: Add remove_suffix Essentially hide some bytes off the end of the buffer. Needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	50dd8b67b2	tests/fragmented_temporary_buffer_test: Add unit test for skip() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8eab0a3e01	utils/fragmented_temporary_buffer: Allow skipping in the input stream Add fragmented_temporary_buffer::istream::skip(), needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Avi Kivity	c180a18dbb	Distribute distributed_loader into its own header and source files distributed_loader is a sizeable fraction of database.cc, so moving it out reduces compile time and improves readability. Message-Id: <20181230200926.15074-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Avi Kivity	49958d5836	tools: toolchain: update for lz4 1.8.3 lz4 1.8.3 was released with a fix for data corruption during compression. While the release notes indicate we aren't vulnerable, be cautious and update anyway. Message-Id: <20181230144716.7238-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Hagit Segev	141fad9c14	Update README.md fix a typo	2018-12-31 13:33:04 +02:00
Asias He	d90836a2d3	streaming: Make total_incoming_bytes and total_outgoing_bytes metrics monotonic Currently, they increases and decreases as the stream sessions are created and destroyed. Make them prometheus monotonically increasing counter for easier monitoring. Message-Id: <7c07cea25a59a09377292dc8f64ed33ff12eda87.1545959905.git.asias@scylladb.com>	2018-12-30 16:52:17 +02:00
Pekka Enberg	96172b7bca	Merge 'Fixes for the view_update_from_staging_generator' from Duarte "This series contains a couple of fixes to the view_update_from_staging_generator, the object responsible for generating view updates from sstables written through streaming. Fixes #4021" * 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla: db/view/view_update_from_staging_generator: Break semaphore on stop() db/view/view_update_from_staging_generator: Restore formatting db/view/view_update_from_staging_generator: Avoid creating more than one fiber	2018-12-29 18:31:40 +02:00
Duarte Nunes	f41d13f38c	db/view/view_update_from_staging_generator: Break semaphore on stop() This avoid having fibers waiting _registration_sem without ever being notified. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:04 +00:00
Duarte Nunes	4974addc5c	db/view/view_update_from_staging_generator: Restore formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:02 +00:00
Duarte Nunes	201196130d	db/view/view_update_from_staging_generator: Avoid creating more than one fiber If view_update_from_staging_generator::maybe_generate_view_updates() is called before view_update_from_staging_generator::start(), as can happen in main.cc, then we can potentially create more than one fiber, which leads to corrupted state and conflicting operations. To avoid this, use just one fiber and be explicit about notifying it that more work is needed, by leveraging a condition-variable. Fixes #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:52:51 +00:00
Duarte Nunes	66113a2d39	Merge 'Replace query_processor's sharded<database> with plain database' from Avi " A sharded<database> is not very useful for accessing data since data is usually distributed across many nodes, while a sharded<database> contains only a single node's view. So it is really only used for accessing replicated metadata, not data. As such only the local shard is accessed. Use that to simplify query_processor a little by replacing sharded<database> with a plain database. We can probably be more ambitious and make all accesses, data and metadata, go through storage_proxy, but this is a start. " * tag 'qp-unshard-database/v1' of https://github.com/avikivity/scylla: query_processor: replace sharded<database> with the local shard commitlog_replayer: don't use query_processor client_state: change set_keyspace() to accept a single database shard legacy_schema_migrator: initialize with database reference	2018-12-29 12:14:19 +00:00
Avi Kivity	0c0cc66ee7	system_keyspace, view: reduce interdependencies system_keyspace is an implementation detail for most of its users, not part of the interface, as it's only used to store internal data. Therefore, including it in a header file causes unneeded dependencies. This patch removes a dependency between views and system_keyspace.hh by moving view_name and view_build_progress into a separate header file, and using forward declarations where possible. This allows us to remove an inclusion of system_keyspace.hh from a header file (the last one), so that further changes to system_keyspace.hh will cause fewer recompilations. Message-Id: <20181228215736.11493-1-avi@scylladb.com>	2018-12-29 12:12:15 +00:00
Avi Kivity	30745eeb72	query_processor: replace sharded<database> with the local shard query_processor uses storage_proxy to access data, and the local database object to access replicated metadata. While it seems strange that the database object is not used to access data, it is logical when you consider that a sharded<database> only contain's this node's data, not the cluster data. Take advantage of this to replace sharded<database> with a single database shard.	2018-12-29 11:02:15 +02:00
Avi Kivity	f0a709cfc8	commitlog_replayer: don't use query_processor During normal writes, query processing happens before commitlog, so logically commitlog replaying the commitlog shouldn't need it. And in fact the dependency on query_processor can be eliminated, all it needs is the local node's database.	2018-12-29 11:00:29 +02:00
Avi Kivity	7830086317	client_state: change set_keyspace() to accept a single database shard set_keyspace() only needs one shard (it is checking replicated state, not sharded data) so arrange for it to receive only that one shard.	2018-12-29 10:58:39 +02:00
Avi Kivity	e4233262cf	legacy_schema_migrator: initialize with database reference Provide legacy_schema_migrator with a sharded<database> so it doesn't need to use the one from query_processor. We want to replace query_processor's sharded<database> with just a local database reference in order to simplify it, and this is standing in the way.	2018-12-29 10:58:22 +02:00
Duarte Nunes	bab7e6877b	streaming/stream_session: Only stage sstables for tables with views When streaming, sstables for which we need to generate view updates are placed in a special staging directory. However, we only need to do this for tables that actually have views. Refs #4021 Message-Id: <20181227215412.5632-1-duarte@scylladb.com>	2018-12-28 18:32:24 +02:00
Avi Kivity	feddf0b021	tools: toolchain: patch boost for use-after-free in Boost.Test XML output The version of boost in Fedora 29 has a use-after-free bug that is only exposed when ./test.py is run with the --jenkins flag. To patch it, use a fixed version from the copr repository scylladb/toolchain. Message-Id: <20181228150419.29623-1-avi@scylladb.com>	2018-12-28 16:35:28 +01:00
Tomasz Grabiec	7747f2dde3	Merge "nodetool toppartitions" from Rafi & Avi Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write operation over a period of time. Content: - data_listener classes: mechanism that interfaces with mutation readers in database and table classes, - toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this interfaces with data_listeners and the REST api), - REST api for toppartitions query. Uses Top-k structure for handling stream summary statistics (based on implementation in C, see #2811). What's still missing: - JMX interface to nodetool (interface customization may be required), - Querying #rows and #bytes (currently, only #partitions is supported). Fixes #2811 https://github.com/avikivity/scylla rafie_toppartitions_v7.1: top_k: whitespace and minor fixes top_k: map template arguments top_k: std::list -> chunked_vector top_k: support for appending top_k results nodetool toppartitions: refactor table::config constructor nodetool toppartitions: data listeners nodetool toppartitions: add data_listeners to database/table nodetool toppartitions: fully_qualified_cf_name nodetool toppartitions: Toppartitions query implementation nodetool toppartitions: Toppartitions query REST API nodetool toppartitions: nodetool-toppartitions script	2018-12-28 16:31:24 +01:00
Rafi Einstein	7677d2ba2c	nodetool toppartitions: nodetool-toppartitions script A Python script mimicking the nodetool toppartitions utility, utilizing Scylla REST API. Examples: $ ./nodetool-toppartitions --help usage: nodetool-toppartitions [-h] [-k LIST_SIZE] [-s CAPACITY] keyspace table duration Samples database reads and writes and reports the most active partitions in a specified table positional arguments: keyspace Name of keyspace table Name of column family duration Query duration in milliseconds optional arguments: -h, --help show this help message and exit -k LIST_SIZE The number of the top partitions to list (default: 10) -s CAPACITY The capacity of stream summary (default: 256) $ ./nodetool-toppartitions ks test1 10000 READ Partition Count 30 2 20 2 10 2 WRITE Partition Count 30 1 20 1 10 1 Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:48:03 +02:00
Rafi Einstein	197f38d4ee	nodetool toppartitions: Toppartitions query REST API A HTTP GET operation starts the query (with args: ks/cf name and duration in ms). It executes synchroneously, results are returned as JSON: $ curl -s -X GET http://localhost:10000/column_family/toppartitions/ks:cf1?duration=10000 \| jq { "read": [ { "count": "15", "error": "0", "partition": "4b504d39354f37353131" }, { "count": "15", "error": "0", "partition": "3738313134394d353530" } ], "write": [ { "count": "15", "error": "0", "partition": "4b504d39354f37353131" }, { "count": "15", "error": "0", "partition": "3738313134394d353530" } ] } Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	6b2c21f69b	nodetool toppartitions: Toppartitions query implementation toppartitions_query installs toppartitions_data_listener-s on all database shards, waits for the designated period, uninstalls shards and collects top-k read/write partition keys. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	404f75def5	nodetool toppartitions: fully_qualified_cf_name Encapsulate keyspace:column_family REST API argument parsing into fully_qualified_cf_name class. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	0bffe5f83e	nodetool toppartitions: add data_listeners to database/table Add data_listeners member to database. Adds data_listeners* to table::config, to be used by table methods to invoke listeners. Install on_read() listener in table::make_reader(). Install on_write() listener in database::apply_in_memory(). Tests: Unit (release) Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	08ba115c16	nodetool toppartitions: data listeners Mechanism that interfaces with mutation readers in database and table classes, to allow tracking most frequent partition keys in read and write operation. Basic design is specified in #2811. Tracking top #rows and #bytes will be supported in the future. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	038f8c7988	nodetool toppartitions: refactor table::config constructor Eliminae extra parameters to ctor and deduce them instead from db param. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	eda43b93c9	top_k: support for appending top_k results Allow appending results of one top_k into another. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:56 +02:00
Rafi Einstein	aeebe8e86b	top_k: std::list -> chunked_vector Replaced std::list with chunked_vector. Because chunked_vector requires a noexcept move constructor from its value type, change the bad_boy type in the unit test not to throw in the move constructor. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:07 +02:00
Avi Kivity	8e2f6d0513	Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz " partition_snapshots created in the memtable will keep a reference to the memtable (as region) and to memtable::_cleaner. As long as the reader is alive, the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumes destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043` (in >= 3.0-rc1) Fixes #4030. Tests: - mvcc_test (debug) " tag 'fix-snapshot-merging-use-after-free-v1.1' of github.com:tgrabiec/scylla: tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed tests: mvcc: Introduce mvcc_container::migrate() tests: mvcc: Make mvcc_partition move-constructible tests: mvcc: Introduce mvcc_container::make_not_evictable() tests: mvcc: Allow constructing mvcc_container without a cache_tracker mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup mvcc: partition_snapshot: Introduce migrate() mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner	2018-12-28 12:45:10 +02:00
Tomasz Grabiec	bb1c9cb6f3	tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	4d13dea39a	tests: mvcc: Introduce mvcc_container::migrate()	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	676868ed31	tests: mvcc: Make mvcc_partition move-constructible	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	c6798f7872	tests: mvcc: Introduce mvcc_container::make_not_evictable()	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	1fa00656ea	tests: mvcc: Allow constructing mvcc_container without a cache_tracker Some test cases will need many containers to simulate memtable -> cache transitions, but there can be only one cache_tracker per shard due to metrics. Allow constructing a conatiner without a cache_tracker (and thus non-evictable).	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	ac49b1def0	mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup partition_snapshots created in the memtable will keep a reference to the memtable (as region*) and to memtable::_cleaner. As long as the reader is alive the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that, nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumses destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem, because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043`. Fixes #4030.	2018-12-27 18:08:50 +01:00
Tomasz Grabiec	20f5d5d1a1	mvcc: partition_snapshot: Introduce migrate() Snapshots which outlive the memtable will need to have their _region and _cleaner references updated. The snapshot can be destroyed after the memtable when it is queud in the mutation_cleaner.	2018-12-27 18:08:50 +01:00
Tomasz Grabiec	67f9afbd1a	mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner	2018-12-27 18:08:50 +01:00
Gleb Natapov	37b4043677	streaming: always read from rpc::source until end-of-stream during mutation sending rpc::source cannot be abandoned until EOS is reached, but current code does not obey it if error code is received, it throws exception instead that aborts the reading loop. Fix it by moving exception throwing out of the loop. Fixes: #4025 Message-Id: <20181227135051.GC29458@scylladb.com>	2018-12-27 16:50:53 +02:00
Asias He	4d3c463536	storage_service: Stop cql server before gossip We saw failure in dtest concurrent_schema_changes_test.py: TestConcurrentSchemaChanges.changes_while_node_down_test test. ====================================================================== ERROR: changes_while_node_down_test (concurrent_schema_changes_test.TestConcurrentSchemaChanges) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 432, in changes_while_node_down_test self.make_schema_changes(session, namespace='ns2') File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 86, in make_schema_changes session.execute('USE ks_%s' % namespace) File "cassandra/cluster.py", line 2141, in cassandra.cluster.Session.execute return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state).result() File "cassandra/cluster.py", line 4033, in cassandra.cluster.ResponseFuture.result raise self._final_exception ConnectionShutdown: Connection to 127.0.0.1 is closed The test: session = self.patient_cql_connection(node2) self.prepare_for_changes(session, namespace='ns2') node1.stop() self.make_schema_changes(session, namespace='ns2') --> ConnectionShutdown exception throws The problem is that, after receiving the DOWN event, the python Cassandra driver will call Cluster:on_down which checks if this client has any connections to the node being shutdown. If there is any connections, the Cluster:on_down handler will exit early, so the session to the node being shutdown will not be removed. If we shutdown the cql server first, the connection count will be zero and the session will be removed. Fixes: #4013 Message-Id: <7388f679a7b09ada10afe7e783d7868a58aac6ec.1545634941.git.asias@scylladb.com>	2018-12-27 14:13:43 +02:00
Duarte Nunes	2f69ba2844	lwt: Remove Paxos-related Cassandra code Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181227112526.4180-1-duarte@scylladb.com>	2018-12-27 13:30:10 +02:00
Duarte Nunes	66e45469b2	streaming/stream_session: Don't use table reference across defer points When creating a sstable from which to generate view updates, we held on to a table reference across defer points. In case there's a concurrent schema drop, the table object might be destroyed and we will incur in a use-after-free. Solve this by holding on to a shared pointer and pinning the table object. Refs #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181227105921.3601-1-duarte@scylladb.com>	2018-12-27 13:05:46 +02:00
Avi Kivity	b349e11aba	tools: toolchain: avoid docker-provided /tmp On at least one system, using the container's /tmp as provided by docker results in spurious EINVALs during aio: INFO 2018-12-27 09:54:08,997 [shard 0] gossip - Feature ROW_LEVEL_REPAIR is enabled unknown location(0): fatal error: in "test_write_many_range_tombstones": storage_io_error: Storage I/O error: 22: Invalid argument seastar/tests/test-utils.cc(40): last checkpoint The setup is overlayfs over xfs. To avoid this problem, pass through the host's /tmp to the container. Using --tmpfs would be better, but it's not possible to guess a good size as the amount of temporary space needed depends on build concurrency. Message-Id: <20181227101345.11794-1-avi@scylladb.com>	2018-12-27 10:17:23 +00:00
Avi Kivity	2c4a732735	tools: toolchain: update baseline Fedora packages Image fedora-29-20181219 was broken due to the followin chain of events: - we install gnutls, which currently is at version 3.6.5 - gnutls 3.6.5 introduced a dependency on nettle 3.4.1 - the gnutls rpm does not include a version requirement on nettle, so an already-installed nettle will not be upgraded when gnutls is installed - the fedora:29 image which we use as a baseline has nettle installed - docker does not pull the latest tag in FROM statements during "docker build" - my build machine already had a fedora:29 image, with nettle 3.4 installed (the repository's image has 3.4.1, but docker doesn't automatically pull if an image with the required tag exists) As a result, the image ended up hacing gnutls 3.6.5 and nettle 3.4, which are incompatible. To fix, update all packages after installation to attempt to have a self consistent package set even if dependencies are not perfect, and regenerate the image. Message-Id: <20181226135711.24074-1-avi@scylladb.com>	2018-12-26 14:58:23 +00:00
Avi Kivity	1414837fcc	tools: toolchain: improve dbuild for continuous integration environments The '-t' flag to 'docker run' passes the tty from the caller environment to the container, which is nice for interactive jobs, but fails if there is no tty, such as in a continuous integration environment. Given that, the '-i' flag doesn't make sense either as there isn't any input to pass. Remove both, and replace with --sig-proxy=true which allows SIGTERM to terminate the container instead of leaving it alive. This reduces the chances of the build stopping but leaving random containers around. Message-Id: <20181222105837.22547-1-avi@scylladb.com>	2018-12-26 10:50:34 +00:00
Avi Kivity	bfd8ade914	tools: toolchain: update toolchain for gcc-8.2.1-6 gcc was updated with some important fixes; update the toolchain to include it. Message-Id: <20181219190548.28675-1-avi@scylladb.com>	2018-12-26 10:21:02 +00:00
Benny Halevy	206483e6af	position_in_partition_view: print bound_weight as int Rather than a non-printable char. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181226091115.18530-1-bhalevy@scylladb.com>	2018-12-26 11:19:30 +02:00
Rafael Ávila de Espíndola	f73c60d8cf	sstables: Convert an unreachable throw into an assert in read path The function pending_collection is only called when cdef->is_multi_cell() is true, so the throw is dead. This patch converts it to an assert. Message-Id: <20181207022119.38387-1-espindola@scylladb.com>	2018-12-26 11:10:19 +02:00
Benny Halevy	52188a20fa	HACKING.md: Add details about unit test debug info Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181225133513.20751-1-bhalevy@scylladb.com>	2018-12-25 16:03:24 +02:00
Avi Kivity	c96fc1d585	Merge "Introduce row level repair" from Asias " === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes: #3033 Tests: dtests/repair_additional_test.py " * 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits) repair: Enable row level repair repair: Add row_level_repair repair: Add docs for row level repair repair: Add repair_init_messaging_service_handler repair: Add repair_meta repair: Add repair_writer repair: Add repair_reader repair: Add repair_row repair: Add fragment_hasher repair: Add decorated_key_with_hash repair: Add get_random_seed repair: Add get_common_diff_detect_algorithm repair: Add shard_config repair: Add suportted_diff_detect_algorithms repair: Add repair_stats to repair_info repair: Introduce repair_stats flat_mutation_reader: Add make_generating_reader storage_service: Introduce ROW_LEVEL_REPAIR feature messaging_service: Add RPC verbs for row level repair repair: Export the repair logger ...	2018-12-25 13:13:00 +02:00
Takuya ASADA	b9a06ae552	dist/offline_installer/redhat: support building RHEL 7 offline installer We had issue to build offline installer on RHEL because of repository difference. This fix enables to build offline installer both on CentOS and RHEL. Also it introduces --releasever <ver>, to build offline installer for specific minor version of CentOS and RHEL. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181212032129.29515-1-syuu@scylladb.com>	2018-12-25 12:50:09 +02:00
Botond Dénes	3ae77a2587	configure.py: generate ${mode}-objects targets Sometimes one wants to just compile all the source files in the projects, because for example one just moved around code or files and there is no need to link and run anything, just check that everything still compiles. Since linking takes up a considerable amount of time it is worthwhile to have a specific target that caters for such needs. This patch introduces a ${mode}-objects target for each mode (e.g. release-objects) that only runs the compilation step for each source file but does not link anything. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <eaad329bf22dfaa3deff43344f3e65916e2c8aaf.1545045775.git.bdenes@scylladb.com>	2018-12-25 12:40:20 +02:00
Benny Halevy	f104951928	sstable_test: read_file should open the file read-only Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181218145156.12716-1-bhalevy@scylladb.com>	2018-12-25 12:02:46 +02:00
Rafael Ávila de Espíndola	f8c81d4d89	tests: sstables: mc: add tests with incompatible schemas In one test the types in the schema don't match the types in the statistics file. In another a column is missing. The patch also updates the exceptions to have more human readable messages. Tests: unit (release) Part of issue #3960. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181219233046.74229-1-espindola@scylladb.com>	2018-12-25 11:11:54 +02:00
Yibo Cai (Arm Technology China)	422987ab04	utils: add fast ascii string validation Validate ascii string by ORing all bytes and check if 7-th bit is 0. Compared with original std::any_of(), which checks ascii string byte by byte, this new approach validates input in 8 bytes and two independent streams. Performance is much higher for normal cases, though slightly slower when string is very short. See table below. Speed(MB/s) of ascii string validation +---------------+-------------+---------+ \| String length \| std::any_of \| u64 x 2 \| +---------------+-------------+---------+ \| 9 bytes \| 1691 \| 1635 \| +---------------+-------------+---------+ \| 31 bytes \| 2923 \| 3181 \| +---------------+-------------+---------+ \| 129 bytes \| 3377 \| 15110 \| +---------------+-------------+---------+ \| 1039 bytes \| 3357 \| 31815 \| +---------------+-------------+---------+ \| 16385 bytes \| 3448 \| 47983 \| +---------------+-------------+---------+ \| 1048576 bytes \| 3394 \| 31391 \| +---------------+-------------+---------+ Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1544669646-31881-1-git-send-email-yibo.cai@arm.com>	2018-12-24 09:58:08 +02:00
Tomasz Grabiec	419c771791	sstables: index_reader: Fix abort when _trust_pi == trust_promoted_index::no data is not moved-from if _trust_pi == trust_promoted_index::no, which triggers the assert on data.empty(). We should make it empty unconditionally. Message-Id: <1545408731-14333-1-git-send-email-tgrabiec@scylladb.com>	2018-12-23 12:09:21 +02:00
Tomasz Grabiec	07d153c769	sstables: mc: reader: Use enum class instead of variant variant is an overkill here. Message-Id: <1545409014-16289-1-git-send-email-tgrabiec@scylladb.com>	2018-12-23 12:04:02 +02:00
Duarte Nunes	e6a8883228	service/storage_proxy: Protect against empty mutation when storing hint mutation_holder::get_mutation_for() can return nullptr's, so protect against those when storing a hint. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181221194853.98775-2-duarte@scylladb.com>	2018-12-23 11:14:44 +02:00
Duarte Nunes	6c4a34f378	service/storage_proxy: Protect against empty mutation in mutation_holder The per_destination_mutation holder can contain empty mutations, so make sure release_mutation() skips over those. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181221194853.98775-1-duarte@scylladb.com>	2018-12-23 11:14:43 +02:00
Duarte Nunes	5e7d18380d	Merge 'Reduce dependencies on config.hh for extensions access' from Avi " Some files use db/config.hh just to access extensions. Reduce dependencies on this global and volatile file by providing another path to access extensions. Tests: unit(release) " * tag 'unconfig-2/v1' of https://github.com/avikivity/scylla: hints: reduce dependencies on db/config.hh commitlog: reduce dependencies on db/config.hh cql3: reduce dependencies on db/config.hh database: provide accessor to db::extensions	2018-12-21 20:15:44 +00:00
Avi Kivity	eae030b061	hints: reduce dependencies on db/config.hh Instead of accessing extensions via config, access it via database::extensions(). This reduces recompilations when configuration is extended.	2018-12-21 20:15:44 +00:00
Avi Kivity	cc8312a8b9	commitlog: reduce dependencies on db/config.hh Instead of accessing extensions via config, access it via database::extensions(). This reduces recompilations when configuration is extended.	2018-12-21 20:15:43 +00:00
Avi Kivity	d2dae3af86	cql3: reduce dependencies on db/config.hh Instead of accessing extensions via config, access it via database::extensions(). This reduces recompilations when configuration is extended.	2018-12-21 20:15:43 +00:00
Avi Kivity	74c1afad29	database: provide accessor to db::extensions Rather than forcing callers to go through get_config(), provide a direct accessor. This reduces dependencies on config.hh, and will allow separation of extensions from configuration.	2018-12-21 20:15:43 +00:00
Tomasz Grabiec	d2f96a60f6	sstables: mc: index_reader: Handle CK_SIZE split across buffers properly we incorrectly falled-through to the next state instead of returning to read more data. This can manifest in a number of ways, an abort, or incorrect read. Introduced in `917528c` Fixes #4011. Message-Id: <1545402032-4114-1-git-send-email-tgrabiec@scylladb.com>	2018-12-21 16:34:10 +02:00
Tomasz Grabiec	7afe2bad51	sstables: mc: reader: Avoid unnecessary index reads on fast forwarding When the next pending fragments are after the start of the new range, we know there is no need to skip. Caught by perf_fast_forward --datasets large-part-ds3 \ --run-tests=large-partition-slicing Refs #3984 Message-Id: <1545308006-16389-1-git-send-email-tgrabiec@scylladb.com>	2018-12-20 16:21:07 +00:00
Gleb Natapov	393269d34b	streaming: hold to sink while close() is running and call close on error as well Currently if something throws while streaming in mutation sending loop sink is not closed. Also when close() is running the code does not hold onto sink object. close() is async, so sink should be kept alive until it completes. The patch uses do_with() to hold onto sink while close is running and run close() on error path too. Fixes #4004. Message-Id: <20181220155931.GL3075@scylladb.com>	2018-12-20 18:03:37 +02:00
Rafi Einstein	533e46ac72	top_k: map template arguments Added Hash and KeyEqual template arguments to enable unordered_map in top_k implementation. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-20 16:41:40 +02:00
Rafi Einstein	75f21954d4	top_k: whitespace and minor fixes Style and minor logic changes from code review. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-20 16:41:33 +02:00
Tomasz Grabiec	2b55ab8c8e	Merge "Add more extensive test for mutation reader fast-forwarding" from Paweł Mutation readers allow fast-forwarding the ranges from which the data is being read. The main user of this feature is cache which, when reading from the underlying reader, may want to skip some data it already has. Unsurprisingly, this adds more complexity to the implementation of the readers and more edge cases the developers need to take care of. While most of the readers were at least to some extent checked in this area those test usually were quite isolated (e.g. one test doing inter-partition fast-forwarding, another doing intra-partition fast-forwarding) and as a consequence didn't cover many corner cases. This patch adds a generic test for fast-forwarding and slicing that covers more complicated scenarios when those operations are combined. Needless to say that did uncover some problems, but fortunately none of them is user-visible. Fixes #3963. Fixes #3997. Tests: unit(release, debug) * https://github.com/pdziepak/scylla.git test-fast-forwarding/v4.1: tests/flat_mutation_reader_assertions: accumulate received tombstones tests/flat_mutation_reader_assertions: add more test messages tests/flat_mutation_reader_assertions: relax has_monotonic_positions() check tests/mutation_readers: do not ignore streamed_mutation::forwarding Revert "mutation_source_test: add option to skip intra-partition fast-forwarding tests" memtable: it is not a single partition read if partition fast-forwaring is enabled sstables: add more tracing in mp_row_consumer_m row_cache: use make_forwardable() to implement streamed_mutation::forwarding row_cache: read is not single-partition if inter-partition forwarding is enabled row_cache: drop support for streamed_mutation::forwarding::yes entirely sstables/mp_row_consumer: position_range end bound is exclusive mutation_fragment_filter: handle streamed_mutation::forwarding::yes properly tests/mutation_reader: reduce sleeping time tests/memtable: fix partition_range use-after-free tests/mutation: fix partition range use-after-free flat_mutation_reader_from_mutations: add overload that accepts a slice and partition range flat_mutation_reader_from_mutations: fix empty range case flat_mutation_reader_from_mutations: destroy all remaining mutations tests/mutation_source: drop dropped column handling test tests/mutation_source: add test for complex fast_forwarding and slicing	2018-12-20 15:05:21 +01:00
Paweł Dziepak	3355d16938	tests/mutation_source: add test for complex fast_forwarding and slicing While we already had tests that verified inter- and intra-partition fast-forwarding as well as slicing, they had quite limited scope and didn't combine those operations. The new test is meant to extensively test these cases.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	26a30375b1	tests/mutation_source: drop dropped column handling test Schema changes are now covered by for_each_schema_change() function. Having some additional tests in run_mutation_source_tests() is problematic when it is used to test intermediate mutation readers because schema changes may be irrelevant for them, which makes the test a waste of time (might be a problem in debug mode) and requires those intermediate reader to use more complex underlying reader that supports schema changes (again, problem in a very slow debug mode).	2018-12-20 13:27:25 +00:00
Paweł Dziepak	048ed2e3d3	flat_mutation_reader_from_mutations: destroy all remaining mutations If the reader is fast-forwarded to another partition range mutation_ may be left with some partial mutations. Make sure that those are properly destroyed.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	d50cd31eee	flat_mutation_reader_from_mutations: fix empty range case An iterator shall not be dereferenced until it is verified that it is dereferencable.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	93488209de	tests/mutation: fix partition range use-after-free	2018-12-20 13:27:25 +00:00
Paweł Dziepak	e91165d929	tests/memtable: fix partition_range use-after-free	2018-12-20 13:27:25 +00:00
Paweł Dziepak	5db8dacd1f	tests/mutation_reader: reduce sleeping time It is a very bad taste to sleep anywhere in the code. The test should be fixed to explicitly test various orderings between concurrent operations, but before that happens let's at least readuce how much those sleeps slow it down by changing it from milliseconds to microseconds.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	243aade3b2	mutation_fragment_filter: handle streamed_mutation::forwarding::yes properly	2018-12-20 13:27:25 +00:00
Paweł Dziepak	dfa5b3d996	sstables/mp_row_consumer: position_range end bound is exclusive	2018-12-20 13:27:25 +00:00
Paweł Dziepak	df1d438fcd	row_cache: drop support for streamed_mutation::forwarding::yes entirely	2018-12-20 13:27:25 +00:00
Paweł Dziepak	adcb3ec20c	row_cache: read is not single-partition if inter-partition forwarding is enabled	2018-12-20 13:27:25 +00:00
Paweł Dziepak	7ecee197c4	row_cache: use make_forwardable() to implement streamed_mutation::forwarding Implementing intra-partition fast-forwarding adds more complexity to already very-much-not-trivial cache readers and isn't really critical in any way since it is not used outside of the tests. Let's use the generic adapter instead of natively implementing it.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	e96a5f96d9	sstables: add more tracing in mp_row_consumer_m	2018-12-20 13:27:25 +00:00
Paweł Dziepak	18825af830	memtable: it is not a single partition read if partition fast-forwaring is enabled Single-partition reader is less expensive than the one that accepts any range of partitions, but it doesn't support fast-forwarding to another partition range properly and therefore cannot be used if that option is enabled.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	bcb5aed1ef	Revert "mutation_source_test: add option to skip intra-partition fast-forwarding tests" This reverts commit `b36733971b`. That commit made run_mutation_reader_tests() support mutation_sources that do not implement streamed_mutation::forwarding::yes. This is wrong since mutation_sources are not allowed to ignore or otherwise not support that mode. Moreover, there is absolutely no reason for them to do so since there is a make_forwardable() adapter that can make any mutation_reader a forwardable one (at the cost of performance, but that's not always important).	2018-12-20 13:27:25 +00:00
Paweł Dziepak	8706750b9b	tests/mutation_readers: do not ignore streamed_mutation::forwarding It is wrong to silently ignore streamed_mutation::forwarding option which completely changes how the reader is supposed to operate. The best solution is to use make_forwardable() adapter which changes non-forwardable reader to a forwardable one.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	edf2c71701	tests/flat_mutation_reader_assertions: relax has_monotonic_positions() check Since `41ede08a1d` "mutation_reader: Allow range tombstones with same position in the fragment stream" mutation readers emit fragments in non-decreasing order (as opposed to strictly increasing), has_monotonic_posiitons() needs to be updated to allow that.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	787d1ba7b2	tests/flat_mutation_reader_assertions: add more test messages	2018-12-20 13:27:25 +00:00
Paweł Dziepak	593fb936c2	tests/flat_mutation_reader_assertions: accumulate received tombstones Current data model employed by mutation readers doesn't have an unique representation of range tombstones. This complicates testing by making multiple ways of emitting range tombstones and rows equally valid. This patch adds an option to verify mutation readers by checking whether tombstones they emit properly affect the clustered rows regardless of how exactly the tombstones are emitted. The interface of flat_mutation_reader_assertions is extended by adding may_produce_tombstones() that accepts any number of tombstones and accumulates them. Then, produces_row_with_key() accepts an additional argument which is the expected timestamp of the range tombstone that affects that row.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	e6d26a528f	Merge "Optimize slicing sstable readers" from Tomasz " Contains several improvements for fast-forwarding and slicing readers. Mainly for the MC format, but not only: - Exiting the parser early when going out of the fast-forwarding window [MC-format-only] - Avoiding reading of the head of the partition when slicing - Avoiding parsing rows which are going to be skipped [MC-format-only] " * 'sstable-mc-optimize-slicing-reads' of github.com:tgrabiec/scylla: sstables: mc: reader: Skip ignored rows before parsing them sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows sstables: mc: parser: Allow the consumer to skip the whole row sstables: continuous_data_consumer: Introduce skip() sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state() sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row sstables: reader: Do not read the head of the partition when index can be used sstables: mc: mutation_fragment_filter: Check the fast-forward window first sstables: mc: writer: Avoid calling unsigned_vint::serialized_size()	2018-12-20 12:48:22 +00:00
Avi Kivity	b66f59aa3d	Merge "materialized views: Apply backpressure from view replicas" from Duarte " As the amount of pending view updates increases we know that there’s a mismatch between the rate at which the base receives writes and the rate at which the view retires them. We react by applying backpressure to decrease the rate of incoming base writes, allowing the slow view replicas to catch up. We want to delay the client’s next writes to a base replica and we use the base’s backlog of view updates to derive this delay. To validate this approach we tested a 3 node Scylla cluster on GCE, using n1-standard-4 instances with NVMEs. A loader running on a n1-standard-8 instance run cassandra-stress with 100 threads. With the delay function d(x) set to 1s, we see no base write timeouts. With the delay function as defined in the series, we see that backlogs stabilize at some (arbitrary) point, as predicted, but this stabilization co-exists with base write timeouts. However, the system overall behaves better than the current version, with the 100 view update limit, and also better than the version without such limit or any backpressure. More work is necessary to further stabilize the system. Namely, we want to keep delaying until we see the backlog is decreasing. This will require us to add more delay beyond the stabilization point, which in turn should minimize the base write timeouts, and will also minimize the amount of memory the backlog takes at each base replica. Design document: https://docs.google.com/document/d/1J6GeLBvN8_c3SbLVp8YsOXHcLc9nOLlRY7pC6MH3JWo Fixes #2538 " Reviewed-by: Nadav Har'El <nyh@scylladb.com> * 'materialized-views/backpressure/v2' of https://github.com/duarten/scylla: (32 commits) service/storage_proxy: Release mutation as early as possible service/storage_proxy: Delay replica writes based on view update backlog service/storage_proxy: Get the backlog of a particular base replica service/storage_proxy: Add counters for delayed base writes main: Start and stop the view_update_backlog_broker service: Distribute a node's view update backlog service: Advertise view update backlog over gossip service/storage_proxy: Send view update backlog from replicas service/storage_proxy: Prepare to receive replica view update backlog service/storage_proxy: Expose local view update backlog tests/view_schema_test: Add simple test for db::view::node_update_backlog db/view: Introduce node_update_backlog class db/hints: Initialize current backlog database: Add counter for current view backlog database: Expose current memory view update backlog idl: Add db::view::update_backlog db/view: Add view_update_backlog database: Wait on view update semaphore for view building service/storage_proxy: Use near-infinite timeouts for view updates database: generate_and_propagate_view_updates no longer needs a timeout ...	2018-12-20 12:44:51 +02:00
Asias He	bcba6b4f4d	streaming: Futurize estimate_partitions The loop can take a long time if the number of sstables and/or ranges are large. To fix, futurize the loop. Fixes: #4005 Message-Id: <3b05cb84f3f57cc566702142c6365a04b075018e.1545290730.git.asias@scylladb.com>	2018-12-20 12:08:03 +02:00
Amos Kong	385d74db01	redhat/scylla.spec: add python34-setuptools dependency Commit `00476c3946` switched some scripts to python3, it introduced an ImportError: No module named 'pkg_resources'. Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <293c05d9315ec6c9da1f32e8cb3d2fdf8d8d3924.1545272049.git.amos@scylladb.com>	2018-12-20 06:32:36 +02:00
Duarte Nunes	2d7c026d6e	service/storage_proxy: Release mutation as early as possible When delaying a base write, there is no need to hold on to the mutation if all replicas have already replied. We introduce mutation_holder::release_mutation(), which frees the mutations that are no longer needed during the rest of the delay. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	756b601560	service/storage_proxy: Delay replica writes based on view update backlog As the amount of pending view updates increases we know that there’s a mismatch between the rate at which the base receives writes and the rate at which the view retires them. We react by applying backpressure to decrease the rate of incoming base writes, allowing the slow view replicas to catch up. We want to delay the client’s next writes to a base replica. We use the base’s backlog of view updates to derive this delay. If we achieve CL and the backlogs of all replicas involved were last seen to be empty, then we wouldn't delay the client's reply. However, it could be that one of the replicas is actually overloaded, and won't reply for many new such requests. We'll eventually start applying backpressure to the client via the background's write queue, but in the meanwhile we may be dropping view updates. To mitigate this we rely on the backlog being gossiped periodically. Fixes #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	997bdf5d98	service/storage_proxy: Get the backlog of a particular base replica Add a function that returns the view update backlog for a particular replica. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	819b6f3406	service/storage_proxy: Add counters for delayed base writes Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	6df32bfb0c	main: Start and stop the view_update_backlog_broker Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	37dfd22619	service: Distribute a node's view update backlog This patch introduces the view_update_backlog_broker class, which is responsible for periodically updating the local gossip state with the current node's view update backlog. It also registers to updates from other nodes, and updates the local coordinator's view of their view update backlogs. We consider the view update backlog received from a peer through the mutation_done verb to be always fresh, but we consider the one received through gossip to be fresh only if it has a higher timestamp than what we currently have recorded. This is because a node only updates its gossip state periodically, and also because a node can transitively receive gossip state about a third node with outdated information. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	8da6a31e75	service: Advertise view update backlog over gossip This lays the groundwork for brokering a node's view update backlog across the whole cluster. This is needed for when a coordinator does not contact a given replica for a long time, and uses a backlog view that is outdated and causes requests to be unnecessarily delayed. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	ede5742f9b	service/storage_proxy: Send view update backlog from replicas Change the inter-node protocol so we can propagate the view update backlog from a base replica to the coordinator through the mutation_done and mutation_failed verbs. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	34b48e1d98	service/storage_proxy: Prepare to receive replica view update backlog In subsequent patches, replicas will reply to the coordinator with their view update backlog. Before introducing changes to the messaging_service, prepare the storage_proxy to receive and store those backlogs. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	776fdd4d1a	service/storage_proxy: Expose local view update backlog The local view update backlog is the max backlog out of the relative memory backlog size and the relative hints backlog size. We leverage the db::view::node_update_backlog class so we can send the max backlog out of the node's shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	6662475dd9	tests/view_schema_test: Add simple test for db::view::node_update_backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	2bd76f8fc5	db/view: Introduce node_update_backlog class This class is an atomic view update backlog representation, safe to update from multiple shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	6afbec4685	db/hints: Initialize current backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	8d6718b6e4	database: Add counter for current view backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	2174eed640	database: Expose current memory view update backlog Expose the base replica's current memory view update backlog, which is defined in terms of units consumed from the semaphore. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	d54ac4961d	idl: Add db::view::update_backlog Add db::view::update_backlog to the newly created view.idl.hh. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	12ce517242	db/view: Add view_update_backlog The view update backlog represents the pending view data that a base replica maintains. It is the maximum of the memory backlog - how much memory pending view updates are consuming - and the disk backlog - how much view hints are consuming. The size of a backlog is relative to its maximum size. We will use this class to represent a base replica's view update backlog at the coordinator. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	fc9176e784	database: Wait on view update semaphore for view building View building sends view updates synchronously, which has natural backpressure. However, they 1) Contribute to the load on the view replicas, and; 2) Add memory pressure to the base replica. They should thus count towards the current view update backlog, and consume units from the view update concurrency semaphore. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	e33e187096	service/storage_proxy: Use near-infinite timeouts for view updates View updates are sent with a timeout of 5 minutes, unrelated to any user-defined value and meant as a protection mechanism. During normal operation we don’t benefit from timing out view writes and offloading them to the hinted-handoff queue, since they are an internal, non-real time workload that we already spent resources on. This value should be increases further, but that change depends on Refs #2538 Refs #3826 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	86198060e5	database: generate_and_propagate_view_updates no longer needs a timeout We no longer wait on the semaphore and instead over-subscribe it, so there's not reason to pass a timeout. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	39eda68094	database: Don't generate view updates when node is overloaded We arrive at an overloaded state when we fail to acquire semaphore units in the base replica. This can mean clients are working in interactive mode, we fail to throttle them and consequently should start shedding load. We want to avoid impacting base table availability by running out of memory, so we could offload the memory queue to disk by writing the view updates as hints without attempting to send them. However, the disk is also a limited resource and in extreme cases we won’t be able to write hints. A tension exists between forgetting the view updates, thereby opening up a window for inconsistencies between base and view, or failing the base replica write. The latter can fail the whole user write, or if the coordinator was able to achieve CL, can instead cause inconsistencies between base tables (we wouldn't want to store a hint, because if the base replica is still overloaded, we would redo the whole dance). Between the devil and the deep blue sea, we chose to forget view updates. As a further simplification, we don't even write hints, assuming that if clients can’t be throttled (as we'll attempt to do in future patches), it will only be a matter of time before view updates can’t be offloaded. We also start acquiring the semaphore units using consume(), which is non-blocking, but allows for underflow of the available semaphore units. This is okay, and we expect not to underflow by much, as we stop generating new view updates. Refs #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	a3d30ea99a	db/view: Propagate acquired semaphore units to mutate_MV() Propagate acquired semaphore units to mutate_MV() to allow the semaphore to be incrementally signalled as view updates are processed by view replicas. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	8c1e6fcee8	db/timeout_clock: Define timeout_semaphore_units Defines the type of semaphore_units<> associated with timeout_semaphore. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	11c02c51fe	database: Wait for pending view updates to drain before stopping Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	185a4594af	database: Restore formatting of table::stop() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	f286d2ec34	database: Wait for pending operations in table::stop() Stopping a table with in-flight reads and writes can be happening concurrently, which rely on table state and we must therefore prevent its destruction before those operations complete. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	1f1fc36b72	database: Make view update concurrency semaphore memory-based The semaphore currently limiting the amount of view updates a given base replica emits aims to control the load that is imposed on the cluster, to protect view replicas from being overloaded when there are bursts of traffic (especially for degenerate cases like an index with low selectivity). 100 is, however, an arbitrary number. It might allow too much load on the view replicas, and it might also allow too much memory from the base shard to be consumed. Conversely, it might allow for too few updates to be queued in case of a burst, or to absorb updates while a view replica becomes partitioned. To deal with the load that is inflicted on the cluster, future patches will ensure that the rate of base writes obeys the rate at which the slowest view replica can consume the corresponding view updates. To protect the current shard from using too much memory for this queue, we will limit it to 10% of the shard's memory. The goal is to both protect the shard from being overloaded, but also to allow it to absorb bursts of writes resulting in large view mutations. Refs #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	bf4277fd8c	service/storage_proxy: Remove unused send_to_endpoint() overloads The send_to_endpoint() overloads that receive a non-frozen mutation are no longer used. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	2753cfee88	db/view: Generate view updates as frozen_mutations Working in terms of frozen_mutations allows us to account more precisely the memory pending view updates consume at the storage_proxy layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	715da6fd6b	db/view: Reserve vector space in mutate_MV() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	5d011eb61f	db/view: Cleanup mutate_MV() In particular, extract out the logic updating the stats in case of a failed update. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	7cfcd21bbb	database: Make lambda in table::populate_views mutable This allows an std::move() in its body to work as intended. Also, make the lambda's argument type explicit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	122737a8ab	Merge seastar upstream * seastar 132e6cd...6c8c229 (3): > reactor: disable nowait aio due to a kernel bug > core/semaphore: Allow combining semaphore_units() > core/shared_ptr: Allow releasing a lw_shared_ptr to a non-const object Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181217153241.67514-2-duarte@scylladb.com>	2018-12-19 12:57:07 +02:00
Duarte Nunes	bf05e59672	seastar: Change the source repository to scylla-seastar Scylla is at the moment incompatible with the Seastar master branch, so in order to allow Scylla commits that depend on Seastar patches, we change the submodule to point to scylla-seastar and use a branch (master-20181217) to hold these dependent commits. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181217153241.67514-1-duarte@scylladb.com>	2018-12-19 12:57:03 +02:00
Rafael Ávila de Espíndola	ff18c837b7	tests: Add missing include in random-utils.hh This file uses std::cout and so should include <iostream>. Found with a patch to seastar that removes some redundant <iostream> includes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181218183816.34504-1-espindola@scylladb.com>	2018-12-19 10:52:19 +00:00
Avi Kivity	dd51c659f7	config: remove "to be removed before release" notice mc sstable config The "enable_sstables_mc_format" config item help text wants to remove itself before release. Since scylla-3.0 did not get enough mc format mileage, we decided to leave it in, so the notice should be removed. Fixes #4003. Message-Id: <20181219082554.23923-1-avi@scylladb.com>	2018-12-19 09:39:29 +00:00
Duarte Nunes	a7456db687	Merge 'Simplify natural endpoint calculation' from Calle " Implementation of origin change c000da13563907b99fe220a7c8bde3c1dec74ad5 Modifies network topology calculation, reducing the amount of maps/sets used by applying the knowledge of how many replicas we expect/need per dc and sharing endpoint and rack set (since we cannot have overlaps). Also includes a transposed origin test to ensure new calculation matches the old one. Fixes #2896 " * 'calle/network_topology' of github.com:scylladb/seastar-dev: network_topology_test: Add test to verify new algorith results equals old network_topology_strategy: Simplify calculate_natural_endpoints token_metadata: Add "get_location" ip to dc+rack accessor sequenced_set: Add "insert" method, following std::set semantics	2018-12-19 09:39:29 +00:00
Rafael Ávila de Espíndola	b93d8d863d	Add a test with mismatched timestamps. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181218035931.3554-1-espindola@scylladb.com>	2018-12-18 11:30:56 +01:00
Tomasz Grabiec	37d9ba68bc	sstables: mc: reader: Skip ignored rows before parsing them Currently filtering happens inside consume_row_end() after the whole row is parsed. It's much faster to skip without parsing. This patch moves filtering and range tombstones splitting to consume_row_start(). _stored_row is no longer needed because in case the filter returns store_and_finish, the consumer exits with retry_later, and the parser will call consume_row_start() again when resumed. Tests: ./build/release/tests/perf/perf_fast_forward_g \ --sstable-format=mc \ --datasets large-part-ds1 \ --run-tests=large-partition-skips Before: read skip time (s) frags frag/s mad f/s max f/s min f/s aio (KiB) 1 4096 1.085142 1953 1800 32 1803 1720 4990 159604 After: read skip time (s) frags frag/s mad f/s max f/s min f/s aio (KiB) 1 4096 0.694560 1953 2812 11 2813 2684 4986 159588	2018-12-18 11:13:52 +01:00
Tomasz Grabiec	e3c3ef2f0e	sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts This way we will later avoid calling clear() for ignored rows.	2018-12-18 11:11:48 +01:00
Tomasz Grabiec	fa126106f8	sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row	2018-12-18 11:11:48 +01:00
Tomasz Grabiec	522a75f761	sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows mp_row_consumer_m::consume_row_marker_and_tombstone() is called for both clustering and static rows, but it dereferences and modifies _in_progress_row, which is only set when inside a clustering row. Fixes #3999.	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	9498977a34	sstables: mc: parser: Allow the consumer to skip the whole row The MC format contains row size before the row body, which we can use to skip the row without parsing its contents, which will be much faster.	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	b4c3b78082	sstables: continuous_data_consumer: Introduce skip()	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	36dd660507	sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state() Will allow state_processor to know its position in the stream. Currently position() is meaningless inside process_state() because in some cases it points to the position after the buffer and in some cases before it. This patch standardizes on the former. This is more useful than the latter because process_state() trims from the front of the buffer as it consumes, so the position inside the stream can be obtained by subtracting the remaining buffer size from position(), without introducing any new variables.	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	e950c8b00a	sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row The size of the bitset is the same for given row kind across the sstable, so we can allocate it once. _columns_selector is moved into row_schema structure, which we have one for each row kind and setup in the constructor.	2018-12-18 11:11:47 +01:00
Tomasz Grabiec	fb15759934	sstables: reader: Do not read the head of the partition when index can be used read_partition() was always called through read_next_partition(), even if we're at the beginning of the read. read_next_partition() is supposed to skip to the next partition. It still works when we're positioned before a partition, it doesn't advance the consumer, but it clears _index_in_current_partition, because it (correctly) assumes it corresponds to the partition we're about to leave, not the one we're about to enter. This means that index lookups we did in the read initializer will be disregarded when reading starts, and we'll always start by reading partition data from the data file. This is suboptimal for reads which are slicing a large partition and don't need to read the front of the partition. Regression introduced in `4b9a34a854`. The fix is to call read_partition() directly when we're positioned at the beginning of the partition. For that purpose a new flag was introduced. test_no_index_reads_when_rows_fall_into_range_boundaries has to be relaxed, because it assumed that slicing reads will read the head of the partition. Refs #3984 Fixes #3992 Tested using: ./build/release/tests/perf/perf_fast_forward_g \ --sstable-format=mc \ --datasets large-part-ds1 \ --run-tests=large-partition-slicing-clustering-keys Before (focus on aio): offset read time (s) frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 4000000 1 0.001378 1 726 5 736 102 6 200 4 2 0 1 1 0 0 0 65.8% After: offset read time (s) frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 4000000 1 0.001290 1 775 6 788 716 2 136 2 0 0 1 1 0 0 0 69.1%	2018-12-18 11:11:37 +01:00
Tomasz Grabiec	385a4c23fd	sstables: mc: mutation_fragment_filter: Check the fast-forward window first Otherwise the parser will keep consuming and dropping fragments needlessly, rather than giving the user a chance to consume end-of-stream condition, and maybe skip again. Refs #3984	2018-12-18 11:11:37 +01:00
Tomasz Grabiec	62a1afaac9	sstables: mc: writer: Avoid calling unsigned_vint::serialized_size() Rather than adding serialized_size() to the body size before serializing the field, we can serialize the field to _tmp_bufs at the beginning and have the body size automatically account for it.	2018-12-18 11:11:36 +01:00
Duarte Nunes	1f578be187	Merge 'Fix evictable shard reader related issues' from Botond " Recently some additional issues were discovered related to recent changes to the way inactive readers are evicted and making shard readers evictable. One such issue is that the `querier_cache` is not prepared for the querier to be immediately evicted by the reader concurrency semaphore, when registered with it as an inactive read (#3987). The other issue is that the multishard mutation query code was not fully prepared for evicted shard readers being re-created, or failing why being re-created (#3991). This series fixes both of these issues and adds a unit test which covers the second one. I am working on a unit test which would cover the second issue, but it's proving to be a difficult one and I don't want to delay the fixes for these issues any longer as they also affect 3.0. Fixes: #3987 Fixes: #3991 Tests: unit(release, debug) " * 'evictable-reader-related-issues/v2' of https://github.com/denesb/scylla: multishard_mutation_query: reset failed readers to inexistent state multishard_mutation_query: handle missing readers when dismantling multishard_mutation_query: add support for keeping stats for discarded partitions multishard_mutation_query: expect evicted reader state when creating reader multishard_mutation_query: pretty-print the reader state in log messages querier_cache: check that the query wasn't evicted during registering reader_concurrency_semaphore: use the correct types in the constructor reader_concurrency_semaphore: add consume_resources() reader_concurrency_semaphore::inactive_read_handle: add operator bool()	2018-12-17 15:36:23 +00:00
Calle Wilund	e353a8633a	network_topology_test: Add test to verify new algorith results equals old Transposed from origin unit test. Creates a semi-random topology of racks, dcs, tokens and replication factors and verifies endpoint calculation equals old algo.	2018-12-17 13:10:59 +00:00
Calle Wilund	bfc6c89b00	network_topology_strategy: Simplify calculate_natural_endpoints Fixes #2896 (hopefully) Implementation of origin change c000da13563907b99fe220a7c8bde3c1dec74ad5 Reduces the amount of maps and sets and general complexity of endpoint calculation by simply mapping dc:s to expected node counts, re-using endpoint sets and iterate thusly. Tested with transposed origin unit test comparing old vs. new algo results. (Next patch)	2018-12-17 13:10:59 +00:00
Botond Dénes	b4c3aab4a7	multishard_mutation_query: reset failed readers to inexistent state When attempting to dismantling readers, some of the to-be-dismantled readers might be in a failed state. The code waiting on the reader to stop is expecting failures, however it didn't do anything besides logging the failure and bumping a counter. Code in the lower layers did not know how to deal with a failed reader and would trigger `std::bad_variant_access` when trying to process (save or cleanup) it. To prevent this, reset the state of failed readers to `inexistent_state` so code in the lower layers doesn't attempt to further process them.	2018-12-17 13:18:08 +02:00
Botond Dénes	9cef043841	multishard_mutation_query: handle missing readers when dismantling When dismantling the combined buffer and the compaction state we are no longer guaranteed to have the reader each partition originated from. The reader might have been evicted and not resumed, or resuming it might have failed. In any case we can no longer assume the originating reader of each partition will be present. If a reader no longer exists, discard the partitions that it emitted.	2018-12-17 13:18:08 +02:00
Botond Dénes	438bef333b	multishard_mutation_query: add support for keeping stats for discarded partitions In the next patches we will add code that will have to discard some of the dismantled partitions/fragments/bytes. Prepare the `dismantle_buffer_stats` struct for being able to track the discarded partitions/fragments/bytes in addition to those that were successfully dismantled.	2018-12-17 13:18:08 +02:00
Botond Dénes	ce52436af4	multishard_mutation_query: expect evicted reader state when creating reader Previously readers were created once, so `make_remote_reader()` had a validation to ensure readers were not attempted at being created more than once. This validation was done by checking that the reader-state is either `inexistent` or `successful_lookup`. However with the introduction of pausing shard readers, it is now possible that a reader will have to be created and then re-created several times, however this validation was not updated to expect this. Update the validation so it also expects the reader-state to be `evicted`, the state the reader will be if it was evicted while paused.	2018-12-17 13:18:08 +02:00
Botond Dénes	1effb1995b	multishard_mutation_query: pretty-print the reader state in log messages	2018-12-17 13:18:08 +02:00
Botond Dénes	5780f2ce7a	querier_cache: check that the query wasn't evicted during registering The reader concurrency semaphore can evict the querier when it is registered as an inactive read. Make the `querier_cache` aware of this so that it doesn't continue to process the inserted querier when this happens. Also add a unit test for this.	2018-12-17 13:18:08 +02:00
Botond Dénes	e1d8237e6b	reader_concurrency_semaphore: use the correct types in the constructor Previously there was a type mismatch for `count` and `memory`, between the actual type used to store them in the class (signed) and the type of the parameters in the constructor (unsigned). Although negative numbers are completely valid for these members, initializing them to negative numbers don't make sense, this is why they used unsigned types in the constructor. This restriction can backfire however when someone intends to give these parameters the maximum possible value, which, when interpreted as a signed value will be `-1`. What's worse the caller might not even be aware of this unsigned->signed conversion and be very suprised when they find out. So to prevent surprises, expose the real type of these members, trusting the clients of knowing what they are doing. Also add a `no_limits` constructor, so clients don't have to make sure they don't overflow internal types.	2018-12-17 13:18:08 +02:00
Botond Dénes	dfd649a6b4	reader_concurrency_semaphore: add consume_resources()	2018-12-17 13:18:08 +02:00
Botond Dénes	21b44adbfe	reader_concurrency_semaphore::inactive_read_handle: add operator bool()	2018-12-17 13:18:08 +02:00
Amnon Heiman	571755e117	node-exporter.service: Update command line to fix service startup The upgrade to node_exporter 0.17 commit `09c2b8b48a` ("node_exporter_install: switch to node_exporter 0.17") caused the service to no longer start. Turns out node_exported broke backwards compatibility of the command line between 0.15 to 0.16. Fix it up. While fixing the command line, all the collector that are enabled by default were removed. Fixes #3989 Signed-off-by: Amnon Heiman <amnon@scylladb.com> [ penberg@scylladb.com: edit commit message ] Message-Id: <20181213114831.27216-1-amnon@scylladb.com>	2018-12-17 10:22:17 +02:00
Rafael Ávila de Espíndola	4de14e6143	Add tests on broken mc range tombstones. This tests that we diagnose both two consecutive range starts and two consecutive range ends. Message-Id: <20181214212608.95452-1-espindola@scylladb.com>	2018-12-15 13:53:25 +01:00
Avi Kivity	b023e8b45d	Merge " Extract MC sstable writer to a separate compilation unit" from Tomasz " The motivation is to keep code related to each format separate, to make it easier to comprehend and reduce incremental compilation times. Also reduces dependency on sstable writer code by removing writer bits from sstales.hh. The ka/la format writers are still left in sstables.cc, they could be also extracted. " * 'extract-sstable-writer-code' of github.com:tgrabiec/scylla: sstables: Make variadic write() not picked on substitution error sstables: Extract MC format writer to mc/writer.cc sstables: Extract maybe_add_summary_entry() out of components_writer sstables: Publish functions used by writers in writer.hh sstables: Move common write functions to writer.hh sstables: Extract sstable_writer_impl to a header sstables: Do not include writer.hh from sstables.hh sstables: mc: Extract bound_kind_m related stuff into mc/types.hh sstables: types: Extract sstable_enabled_features::all() sstables: Move components_writer to .cc tests: sstable_datafile_test: Avoid dependency on components_writer	2018-12-14 15:05:00 +02:00
Duarte Nunes	224821303c	Merge 'Reduce the dependency on database.hh' from Botond " Working on database.hh or any header that is included in database.hh (of which there is a lot), is a major pain as each change involves the recompilation of half of our compilation units. Reduce the impact by removing the `#include "database.hh"` directive from as many header files as possible. Many headers can make do with just some forward declarations and don't need to include the entire headers. I also found some headers that included database.hh without actually needing it. Results Before: $ touch database.hh $ ninja build/release/scylla [1/154] CXX build/release/gen/cql3/CqlParser.o After: $ touch database.hh $ ninja build/release/scylla [1/107] CXX build/release/gen/cql3/CqlParser.o " * 'reduce-dependencies-on-database-hh/v2' of https://github.com/denesb/scylla: treewide: remove include database.hh from headers where possible database_fwd.hh: add keyspace fwd declaration service/client_state: de-inline set_keyspace() Move cache_temperature into its own header	2018-12-14 12:24:48 +00:00
Piotr Sarna	63bd43e57e	cql3: add refusing to create an index on static column Secondary indexes on static columns are not yet supported, so creating such index should return an appropriate error. Fixes #3993 Message-Id: <700b0a71e80da52d2d5250edacc12626b55681fa.1544785127.git.sarna@scylladb.com>	2018-12-14 11:15:28 +00:00
Rafael Ávila de Espíndola	f48d54543f	Use read_rows_flat to test broken sstables. The previous code was using mp_row_consumer_k_l to be as close to the tested code as possible. Given that it is testing for an unhandled exception, there is probably more value in moving it to a higher level, easier to use, API. This patch changes it to use read_rows_flat(). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181210235016.41133-1-espindola@scylladb.com>	2018-12-14 10:14:28 +01:00
Botond Dénes	1865e5da41	treewide: remove include database.hh from headers where possible Many headers don't really need to include database.hh, the include can be replaced by forward declarations and/or including the actually needed headers directly. Some headers don't need this include at all. Each header was verified to be compilable on its own after the change, by including it into an empty `.cc` file and compiling it. `.cc` files that used to get `database.hh` through headers that no longer include it were changed to include it themselves.	2018-12-14 08:03:57 +02:00
Botond Dénes	efe2b2c75d	database_fwd.hh: add keyspace fwd declaration	2018-12-14 08:03:57 +02:00
Tomasz Grabiec	245a0d953a	tests: cql_test_env: Start the compaction manager Broken in `fee4d2e` Not doing this results in compaction requests being ignored. One effect of this is that perf_fast_forward produces many sstables instead of one. Refs #3984 Refs #3983 Message-Id: <1544719540-10178-1-git-send-email-tgrabiec@scylladb.com>	2018-12-13 18:58:50 +02:00
Piotr Sarna	6743af5dbd	cql3: refuse to create index on COMPACT STORAGE with ck To follow C* compatibility, creating an index on COMPACT STORAGE table should be disallowed not only on base primary keys, but also when the base table contains clustering keys. Message-Id: <ab40c39730aff2e164d11ee5159ff62b8ec9e8e8.1544698186.git.sarna@scylladb.com>	2018-12-13 13:39:12 +00:00
Duarte Nunes	f8878238ed	service/storage_proxy: Embed the expire timer in the response handler Embedding the expire timer for a write response in the abstract_write_response_handler simplifies the code as it allows removing the rh_entry type. It will also make the timeout easily accessible inside the handler, for future patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181213111818.39983-1-duarte@scylladb.com>	2018-12-13 14:25:21 +02:00
Tomasz Grabiec	3889b05d7e	Merge "Tests and small fixes for composite markers" from Rafael * https://github.com/espindola/scylla espindola/add-composite-tests: Remove newline from exception messages. Fix end marker exception message. Add tests for broken start and end composite markers.	2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola	51fd880892	Add tests for broken start and end composite markers.	2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola	64439f6477	Fix end marker exception message. The code tested the end marker, but the exception mentioned the start marker. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola	cfd07185b7	Remove newline from exception messages. They are inconsistent with other uses of malformed_sstable_exception and incompatible with adding " in sstable ..." to the message. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-13 10:29:44 +01:00
Vlad Zolotarov	7da1ac2c2c	large_partition_handler: fix the message We currently detect large partitions - not rows. So this is what we should be reporting. Fixes #3986 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181212215506.9879-1-vladz@scylladb.com>	2018-12-13 00:11:27 +00:00
Rafael Ávila de Espíndola	894f07f912	Move default case out of two switches. These switches are fully covered, having the default label disables -Wswitch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181212160904.17341-1-espindola@scylladb.com>	2018-12-12 18:20:24 +01:00
Botond Dénes	10336c13fc	service/client_state: de-inline set_keyspace()	2018-12-12 18:14:03 +02:00
Botond Dénes	76fe4ebc18	Move cache_temperature into its own header Some headers need to include database.hh just because of cache_temperature. Move it into its own header so these includes can be removed.	2018-12-12 16:03:45 +02:00
Tomasz Grabiec	0a853b8866	sstables: index_reader: Avoid schema copy in advance_to() Introduced in `7e15e43`. Exposed by perf_fast_forward: running: large-partition-skips on dataset large-part-ds1 Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s (...) 1 0 5.268780 8000000 1518378 1 1 31.695985 4000000 126199 Message-Id: <1544614272-21970-1-git-send-email-tgrabiec@scylladb.com>	2018-12-12 11:33:46 +00:00
Tomasz Grabiec	ff2ad2f6bb	sstables: Make variadic write() not picked on substitution error If write(v, out, x) doesn't match any overload, the variadic write() will be picked, with Rest = {}. The compiler will print error messages about unable to find write(v, out), which totally obscures the original cause of mismatch. Make it picked only when there are at least two write() parameters so that debugging compilation errors is actually possible.	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	a14633c6d0	sstables: Extract MC format writer to mc/writer.cc This moves all MC-related writing code to mc/writer.cc: - m_format_write_helpers.hh is dropped - m_format_write_helpers_impl.hh is dropped - sstable_writer_m is moved out of sstables.cc sstable_writer_m is renamed to sstables::mc::writer	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	2636e6b5ab	sstables: Extract maybe_add_summary_entry() out of components_writer So that it can be used from writer implementations, which don't have access to the definition of the components_writer.	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	577e71478d	sstables: Publish functions used by writers in writer.hh	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	faf0ff1843	sstables: Move common write functions to writer.hh They are common for sstable writers of different formats. Note that writer.hh is supposed to be included only by writer implementations, not writer users.	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	3b4ccc85d0	sstables: Extract sstable_writer_impl to a header	2018-12-12 12:07:31 +01:00
Tomasz Grabiec	6e3c9c3e5e	sstables: Do not include writer.hh from sstables.hh It is only needed by writer implementations.	2018-12-12 12:07:05 +01:00
Tomasz Grabiec	bd7e9ad3ab	sstables: mc: Extract bound_kind_m related stuff into mc/types.hh	2018-12-12 12:06:46 +01:00
Tomasz Grabiec	a4721b4d50	sstables: types: Extract sstable_enabled_features::all()	2018-12-12 12:06:45 +01:00
Tomasz Grabiec	90074d0b75	sstables: Move components_writer to .cc	2018-12-12 12:06:45 +01:00
Tomasz Grabiec	eff47a59ee	tests: sstable_datafile_test: Avoid dependency on components_writer It's LA format specific and it's going to become private to sstable.cc	2018-12-12 12:06:22 +01:00
Avi Kivity	fa96e07e6b	build: pass C compiler configuration in relocatable package build Just like we allow customizing the C++ compiler, we should allow customizing the C compiler. Ref #3978 Message-Id: <20181211172821.30830-1-avi@scylladb.com>	2018-12-12 11:45:13 +01:00
Calle Wilund	707bff563e	token_metadata: Add "get_location" ip to dc+rack accessor	2018-12-12 09:32:05 +00:00
Calle Wilund	66472bc52d	sequenced_set: Add "insert" method, following std::set semantics	2018-12-12 09:32:05 +00:00
Asias He	b9e0db801d	repair: Enable row level repair Finally, enable new row level repair if the cluster supports it. If not, fallback to the old partition level repair. Fixes #3033	2018-12-12 16:49:01 +08:00
Asias He	d372317e99	repair: Add row_level_repair === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes #3033	2018-12-12 16:49:01 +08:00
Asias He	b2b20cd5c0	repair: Add docs for row level repair	2018-12-12 16:49:01 +08:00
Asias He	fab31efae1	repair: Add repair_init_messaging_service_handler This patch implements all the rpc handlers for row level repair.	2018-12-12 16:49:01 +08:00
Asias He	3c80727d51	repair: Add repair_meta This patch introduces repair_meta class that is the core class for the row level repair. For each range to repair, repair_meta objects are created on both repair master and repair slaves. It stores the meta data for the row level repair algorithms, e.g, the current sync boundary, the buffer used to hold the rows the peers are working on, the reader to read data from sstable and the writer to write data to sstable. This patch also implements the RPC verbs for row level repair, for example, REPAIR_ROW_LEVEL_START/REPAIR_ROW_LEVEL_STOP to starts/stops row level repair for a range, REPAIR_GET_SYNC_BOUNDARY to get sync boundary peers want to work on, REPAIR_GET_ROW_DIFF to get missing rows from repair slaves and REPAIR_PUT_ROW_DIFF to pus missing rows to repair slaves.	2018-12-12 16:49:01 +08:00
Asias He	65099bac85	repair: Add repair_writer repair_writer uses multishard_writer to apply the mutation_fragments to sstable. The repair master needs one such writer for each of the repair slave. The repair slave needs one writer for the repair master.	2018-12-12 16:49:01 +08:00
Asias He	5b75f64e0e	repair: Add repair_reader repair_reader is used to read data from disk. It is simply a local flat_mutation_reader reader for the repair master. It is more complicated for the repair slave. The repair slaves have to follow what repair master read from disk. For example, Assume repair master has 2 shards and repair slave has 3 shards Repair master on shard 0 asks repair slave on shard 0 to read range [0,100). Repair master on shard 1 asks repair slave on shard 1 to read range [0,100). Repair master on shard 0 will only read the data that belongs to shard 0 within range [0,100). Since master and slave have different shard count, repair slave on shard 0 has to use the multi shard reader to collect data on all the shards. It can not pass range [0, 100) to the multi shard reader, otherwise it will read more data than the repair master. Instead, repair slave uses a sharder using sharding configuration of the repair master, to generate the sub ranges belong to shard 0 of repair master. If repair master and slave has the same sharding configuration, a simple local reader is enough for repair slave.	2018-12-12 16:49:01 +08:00
Asias He	27128d132d	repair: Add repair_row repair_row is the in-memory representation of "row" that the row level repair works on. It represents a mutation_fragment that is read from the flat_mutation reader. The hash of a repair_row is the combination of the mutation_fragment hash and partition_key hash.	2018-12-12 16:49:01 +08:00
Asias He	3e7b1d2ef4	repair: Add fragment_hasher It is used to calculate the hash of a mutation_fragment.	2018-12-12 16:49:01 +08:00
Asias He	e135871e4a	repair: Add decorated_key_with_hash Represents a decorated_key and the hash for it so that we do not need to calculate more than once if the decorated_key is used more than once.	2018-12-12 16:49:01 +08:00
Asias He	16c1b26937	repair: Add get_random_seed Get a random uint64_t number as the seed for the repair row hashing. The seed is passed to xx_hasher. We add the randomization when hashing rows so that when we run repair for the next time the same row produces different hashing number.	2018-12-12 16:49:01 +08:00
Asias He	54888ac52c	repair: Add get_common_diff_detect_algorithm It is used to find the common difference detection algorithms supported by repair master and repair slaves. It is up to repair master to choose what algorithm to use.	2018-12-12 16:49:01 +08:00
Asias He	0b294d5829	repair: Add shard_config It is used to store the shard configuration.	2018-12-12 16:49:01 +08:00
Asias He	a36b0966cf	repair: Add suportted_diff_detect_algorithms It returns a vector of row level repair difference detection algorithms supported by this node. We are going to implement the "send_full_set" in the following patches.	2018-12-12 16:49:01 +08:00
Asias He	42f2cd8dc5	repair: Add repair_stats to repair_info Also add update_statistics() to update current stats.	2018-12-12 16:49:01 +08:00
Asias He	43c04302f3	repair: Introduce repair_stats It is used by row level repair to track repair statistics.	2018-12-12 16:49:01 +08:00
Asias He	0067d32b47	flat_mutation_reader: Add make_generating_reader Move generating_reader from stream_session.cc to flat_mutation_reader.cc. It will be used by repair code soon. Also introduce a helper make_generating_reader to hide the implementation of generating_reader.	2018-12-12 16:49:01 +08:00
Asias He	fe4afb1aa3	storage_service: Introduce ROW_LEVEL_REPAIR feature With this feature enabled, the node supports row level repair.	2018-12-12 16:49:01 +08:00
Asias He	acc9ff8dce	messaging_service: Add RPC verbs for row level repair This patch adds the RPC verbs that are needed by the row level repair. The usage of those verbs are in the following patches. All the verbs for row level repair are sent by the repair master. Repair master asks repair slaves to create repair meta objects, a.k.a, repair_meta object, to store the repair meta data needed by row level repair algorithm. The repair meta object is identified by the IP address of the repair master and a uint32 number repair_meta_id chosen by repair master. When repair master restarts or is out of the cluster, repair slaves will detect it and remove all existing repair_meta for the repair master. When repair slave restarts, the existing repair_meta on the slave will be gone. The sync boundary used in the verbs is the position_in_partition of the last mutation_fragment. In each repair round, peers work on (last_sync_boundary, current_sync_boundary]	2018-12-12 16:49:01 +08:00
Asias He	8cfdcf435e	repair: Export the repair logger It will be used by the row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	e62aeae2db	repair: Export repair_info It will be used by the row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	6be3b35d52	repair: Export estimate_partitions It will be used by row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	48341a2d4d	idl: Add decorated_key support Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	1db4e3fd0a	idl: Add row_level_diff_detect_algorithm Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	ccc706559f	idl: Add get_sync_boundary_response Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	1173d1dd5a	idl: Add repair_sync_boundary Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	dc223e9216	idl: Add partition_key_and_mutation_fragments Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	5fbbc63676	idl: Add position_in_partition Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	e9fbc27740	idl: Add bound_weight It will be used by the row level repair code.	2018-12-12 16:49:01 +08:00
Asias He	3c39462397	idl: Add partition_region Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	e2b9840e24	idl: Add repair_hash Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	1a0bc8acf1	repair: Add struct hash<node_repair_meta_id> for node_repair_meta_id	2018-12-12 16:49:01 +08:00
Asias He	28d090ffda	repair: Add struct hash<repair_hash> for repair_hash	2018-12-12 16:49:01 +08:00
Asias He	ce70225b1c	repair: Introduce row_level_diff_detect_algorithm It specifies the algorithm that is used to find the row difference in repair.	2018-12-12 16:49:01 +08:00
Asias He	e9251df478	repair: Introduce partition_key_and_mutation_fragments Represent a partition_key and frozen_mutation_fragments within the partition_key.	2018-12-12 16:49:01 +08:00
Asias He	5d5a1beaec	repair: Introduce node_repair_meta_id It uses an IP address and a repair_meta_id to identify a repair instance started by the row level repair.	2018-12-12 16:49:01 +08:00
Asias He	edd72e10ac	repair: Introduce get_sync_boundary_response The return value of the REPAIR_GET_SYNC_BOUNDARY verb. It will be used in the row level repair code soon.	2018-12-12 16:49:01 +08:00
Asias He	95b9a889cf	repair: Introduce repair_hash It represents the hash value of a repair row.	2018-12-12 16:49:01 +08:00
Asias He	3e86b7a646	repair: Introduce repair_sync_boundary Represent a position of a mutation_fragment read from a flat mutation reader. Repair nodes negotiate a small sub range identified by two repair_sync_boundary to work on in each round.	2018-12-12 16:49:01 +08:00
Asias He	063dfcda26	messaging_service: Add constructor for msg_addr Which takes the ip address and shard id.	2018-12-12 16:49:01 +08:00
Asias He	8cb3ea98d0	xx_hasher: Allow specifying seed It will be used by row level repair.	2018-12-12 16:49:01 +08:00
Asias He	165d3053b1	position_in_partition: Add get_type, get_bound_weight and get_clustering_key_prefix Needed by the RPC serialization code.	2018-12-12 16:49:01 +08:00
Asias He	4e55d22a8f	position_in_partition: Switch _bound_weight to use enum The _bound_weight in position_in_partition will be sent on wire in rpc. Make it enum instead of int.	2018-12-12 16:49:01 +08:00
Asias He	5bc109e1ee	position_in_partition: Add bound_weight It will be used to change _bound_weight to use enum instead of int8_t.	2018-12-12 16:49:01 +08:00
Asias He	05c663b932	position_in_partition: Use std::optional for clustering_key_prefix The new row level repair code will access clustering_key_prefix and it uses std::optional everywhere. Convert position_in_partition to use std::optional.	2018-12-12 16:49:01 +08:00
Asias He	0b31d7059b	position_in_partition: Make partition_region uint8_t It will be sent over rpc. Make the type explicit.	2018-12-12 16:49:01 +08:00
Asias He	dfd206b3a3	serializer: Add std::optional support	2018-12-12 16:49:01 +08:00
Asias He	3eecdc670f	serializer: Add std::list support Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	b540df2819	serializer: Add std::unordered_set support Needed by the row level repair RPC verbs.	2018-12-12 16:49:01 +08:00
Asias He	1367c8c47e	dht: Add make_partitioner Given the name and shard count and the sharding_ignore_msb_bits, make a partitioner. It is used by row level repair.	2018-12-12 16:49:01 +08:00
Asias He	f1a914060b	dht: Add constructor for decorated_key which takes token and partition_key decorated_key(const dht::token& t, const partition_key& k)	2018-12-12 16:49:01 +08:00
Asias He	71c1681f6c	storage_service: Notify NEW_NODE only when a node is new node This is a backport of CASSANDRA-11038. Before this, a restarted node will be reported as new node with NEW_NODE cql notification. To fix, only send NEW_NODE notification when the node was not part of the cluster Fixes: #3979 Tests: pushed_notifications_test.py:TestPushedNotifications.restart_node_test Message-Id: <453d750b98b5af510c4637db25b629f07dd90140.1544583244.git.asias@scylladb.com>	2018-12-12 07:33:49 +02:00
Juliana Oliveira	5eb76c9bc6	compress: add support for Cassandra's compression parameter This patch adds compatibility for Cassandra's "chunk_size_in_kb", as well as it keeps Scylla's "chunk_size_kb" compression parameter. Fixes #3669 Tests: unit (release) v2: use variable instead of array v3: fix commited files Signed-off-by: Juliana Oliveira <juliana@scylladb.com> Message-Id: <20181211215840.GA7379@shenzou.localdomain>	2018-12-11 23:33:27 +00:00
Nadav Har'El	a0379209e6	secondary indexes: fail attempts to create a CUSTOM INDEX Cassandra supports a "CREATE CUSTOM INDEX" to create a secondary index with a custom implementation. The only custom implementation that Cassandra supports is SASI. But Scylla doesn't support this, or any other custom index implementation. If a CREATE CUSTOM INDEX statement is used, we shouldn't silently ignore the "CUSTOM" tag, we should generate an error. This patch also includes a regression test that "CREATE CUSTOM INDEX" statements with valid syntax fail (before this patch, they succeeded). Fixes #3977 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181211224545.18349-2-nyh@scylladb.com>	2018-12-11 23:33:02 +00:00
Nadav Har'El	36db4fba23	Fix typo in error message Interestingly, this typo was copied from the original Cassandra source code :-) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181211224545.18349-1-nyh@scylladb.com>	2018-12-11 23:32:58 +00:00
Avi Kivity	5b08e91bdb	tools: add SYS_PTRACE capability to dbuild LeakSanitizer uses ptrace, and docker disables ptrace by default. Add it back so tests pass. Message-Id: <20181208112524.19229-1-avi@scylladb.com>	2018-12-11 19:09:12 +00:00
Avi Kivity	34a31a807d	build: build libdeflate with user selected C compiler If the user specified a C compiler, use it to build libdeflate. Fixes #3978. Message-Id: <20181211145604.14847-1-avi@scylladb.com>	2018-12-11 14:58:16 +00:00
Duarte Nunes	89ae3fbf11	db/system_distributed_keyspace: Create the schema with min_timestamp Different nodes can concurrently create the distributed system keyspace on boot, before the "if not exists" clause can take effect. However, the resulting schema mutations will be different since different nodes use different timestamps. This patch forces the timestamps to be the same across all nodes, so we save some schema mismatches. This fixes a bug exposed by `ca5dfdf`, whereby the initialization of the distributed system keyspace is done before waiting for schema agreement. While waiting for schema agreement in storage_service::join_token_ring(), the node still hasn't joined the ring and schemas can't be pulled from it, so nodes can deadlock. A similar situation can happen between a seed node and a non-seed node, where the seed node progresses to a different "wait for schema agreement" barrier, but still can't make progress because it can't pull the schema from the non-seed node still trying to join the ring. Finally, it is assumed that changes to the schema of the current distributed system keyspace tables will be protected by a cluster feature and a subsequent schema synchronization, such that all nodes will be at a point where schemas can be transferred around. Fixes #3976 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181211113407.20075-1-duarte@scylladb.com>	2018-12-11 13:35:48 +01:00
Paweł Dziepak	e3f53542c9	Merge "Optimize sstable writing of large partitions" from Tomasz " This series contains several optimizations of the MC format sstable writer, mainly: - Avoiding output_stream when serializing into memory (e.g. a row) - Faster serialization of primitive types when serializing into memory I measured the improvement in throughput (frag/s) using perf_fast_forward for datasets with a single large partition with many small rows: - 10% for a row with a single cell of 8 bytes - 10% for a row with a single cell of 100 bytes - 9% for a row with a single cell of 1000 bytes - 13% for a row with 6 cells of 100 bytes " * tag 'avoid-output-stream-in-sstable-writer-v2' of github.com:tgrabiec/scylla: bytes_ostream: Optimize writing of fixed-size types sstables: mc: Write temporary data to bytes_ostream rather than file_writer sstables: mc: Avoid double-serialization of a range tombstone marker sstables: file_writer: Generalize bytes& writer to accept bytes_view sstables: Templetize write() functions on the writer sstables: Turn m_format_write_helpers.cc into an impl header sstables: De-futurize file_writer bytes_ostream: Implement clear() bytes_ostream: Make initial chunk size configurable	2018-12-11 12:29:24 +00:00
Duarte Nunes	d66bd0100b	Merge 'Simplify db::extensions' from Avi " Carry out simplifications of db::extensions: less magical types, de-inline complex functions, and reduce #include dependencies Tests: unit(release) " * tag 'extensions-simplify/v1' of https://github.com/avikivity/scylla: extensions: remove unneeded includes extensions: deinline extension accessors extensions: return concrete types from the extension accessors extensions: remove dependency on cql layer	2018-12-10 22:00:51 +00:00
Avi Kivity	b251183359	extensions: remove unneeded includes <boost/any.hpp> is not used, and "schema.hh" can be replaced with forward declarations.	2018-12-10 21:34:09 +02:00
Avi Kivity	119a83bf2f	extensions: deinline extension accessors Quite complex code that is not performance sensitive. Move it out of line.	2018-12-10 21:22:56 +02:00
Avi Kivity	e9f5641b64	extensions: return concrete types from the extension accessors Returning "auto" makes it harder to understand what the function is returning, and impossible to de-inline. Return a vector of pointers instead. The caller should iterate immediately, in any case, and since the previous return value was a range of references to const unique_ptrs, nothing else could be done with it anyway.	2018-12-10 21:16:45 +02:00
Tomasz Grabiec	f206ef0038	bytes_ostream: Optimize writing of fixed-size types Inlining write() allows the writing code to be optimized for fixed-size types. In particular, memcpy() calls and loops will be eliminated. Saw 4% improvement in throughput in perf_fast_forward for tiny rows.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	5a35240d47	sstables: mc: Write temporary data to bytes_ostream rather than file_writer Currently temporary data is serialized into a file_writer, because that's what write() functions used to expect, which goes through an output_stream, a data_sink, into an in-memory data sink implementation which collects the temporary_buffers. Going through those abstractions is relatively expensive if we don't write much, because each time we begin to write after a flush() of the file_writer the output stream has to allocate a new buffer, which means a large allocation for small amount of data. We could avoid that and write into bytes_ostream directly, which will keep its buffer across clear(). write() functions which are used both to write directly into the data file and to a temporary arena were templatized to accept a Writer to which both file_writer and bytes_ostream conform.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	c4003b3e79	sstables: mc: Avoid double-serialization of a range tombstone marker	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	9edb9434e5	sstables: file_writer: Generalize bytes& writer to accept bytes_view Note that bytes is imlpicitly convertible to bytes_view.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	fad4fba4bc	sstables: Templetize write() functions on the writer Will allow writing to both a file_writer, or an in-memory writer like a bytes_ostream.	2018-12-10 20:08:16 +01:00
Tomasz Grabiec	f4016996d3	sstables: Turn m_format_write_helpers.cc into an impl header I need to templatize functions defined in it and want to avoid explicit instantiations. There is only one compilation unit in which this is used (sstables.cc). I think in the long term we should move all those "helpers" into sstables/mc/writer.{cc,hh} together with their only user, the sstable_writer_m class from sstables.cc.	2018-12-10 20:07:43 +01:00
Tomasz Grabiec	13999a4d09	sstables: De-futurize file_writer	2018-12-10 20:07:43 +01:00
Tomasz Grabiec	a1fb441df8	bytes_ostream: Implement clear()	2018-12-10 20:07:43 +01:00
Tomasz Grabiec	7cf5de3d9c	bytes_ostream: Make initial chunk size configurable	2018-12-10 20:07:43 +01:00
Avi Kivity	8e05bcbe71	extensions: remove dependency on cql layer The extensions class reaches into cql's property_definitions class to grab a map<sstring, sstring> type. This generates a few unneeded dependencies. Reduce dependencies by defining the map type ourselves; if cql's property_definitions changes in an incompatible way, it will have to adapt, rather than the extensions class.	2018-12-10 20:55:30 +02:00
Tomasz Grabiec	1dd2bf52ca	Merge "Add a couple of tests of broken sstables" From Rafael These are the current uninteresting cases I found when looking at malformed_sstable_exception. The existing code is working, just not being tested. * https://github.com/espindola/scylla.git espindola/espindola/broken-sst: Add a broken sstable test. Add a test with mismatched schema.	2018-12-10 19:30:58 +01:00
Tomasz Grabiec	538e041f22	Merge "Remove some dependencies on db::config" from Avi db::config is a global class; changes in any module can cause changes in db::config. Therefore, it is a cause of needless recompilation. Remove some of these dependencies by having consumers of db::config declare an intermediate config struct that is contains only configuration of interest to them, and have their caller fill it out (in the case of auth, it already followed this scheme and the patchset only moves the translation function). In addition, some outright pointless inclusions of db/config.hh are removed. The result is somewhat shorter compile times, and fewer needless recompiles. * https://github.com/avikivity/scylla unconfig-1/v1: config: remove inclusions of db/config.hh from header files repair: remove unneeded config.hh inclusion batchlog_manager: remove dependency on db::config auth: remove permissions_cache dependency on db::config auth: remove auth::service dependency on db::config auth: remove unneeded db/config.hh includes	2018-12-10 14:53:14 +01:00
Benny Halevy	ef53ddf3ae	scylla_io_setup: correct units in low space warning GiB -> GB Refs #2676 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181210092503.10344-1-bhalevy@scylladb.com>	2018-12-10 13:58:49 +02:00
Avi Kivity	475b151c97	Merge "Use utils::small_vector more in read path" from Paweł " This series optimises the read path by replacing some usages of std::vector by utils::small_vector. The motivation for this change was an observation that memory allocation functions are pointed out by the profiler as the ones where we spent most time and while they have a large number of callers storage allocation for some vectors was close to the top. The gains are not huge, since the problem is a lot of things adding up and not a single slow thing, but we need to start with something. Unfortunately, the performance of boost::container::small_vector is quite disappointing so a new implementation of a small_vector was introduced. perf_simple_query -c4 --duration 60, medians: ./perf_before ./perf_after diff read 343086.80 360720.53 5.1% Tests: unit(release, small_vector in debug) " * tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla: partition_slice: use small_vector for column_ids mutation_fragment_merger: use small_vector auth: use small_vector in resource auth: avoid list-initialisation of vectors idl: serialiser: add serialiser for utils::small_vector idl: serialiser: deduplicate vector serialisers utils: introduce small_vector intrusive_set_external_comparator: make iterator nothrow move constructible mutation_fragment_merger: value-initialise iterator	2018-12-10 13:50:59 +02:00
Duarte Nunes	a42b2895c2	Merge branch 'gossip: Send node UP event to cql client after cql server is up' from Asias " This is a backport of CASSANDRA-8236. Before this patch, scylla sends the node UP event to cql client when it sees a new node joins the cluster, i.e., when a new node's status becomes NORMAL. The problem is, at this time, the cql server might not be ready yet. Once the client receives the UP event, it tries to connect to the new node's cql port and fails. To fix, a new application_sate::RPC_READY is introduced, new node sets RPC_READY to false when it starts gossip in the very beginning and sets RPC_READY to true when the cql server is ready. The RPC_READY is a bad name but I think it is better to follow Cassandra. Nodes with or without this patch are supposed to work together with no problem. Refs #3843 " * 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev: storage_service: Use cql_ready facility storage_service: Handle application_state::RPC_READY storage_service: Add notify_cql_change storage_service: Add debug log in notify_joined storage_service: Add extra check in notify_joined storage_service: Add notify_joined storage_service: Add debug log in notify_up storage_service: Add extra check in notify_up storage_service: Add notify_up storage_service: Make notify_left log debug level storage_service: Introduce notify_left storage_service: Add debug log in notify_down storage_service: Introduce notify_down storage_service: Add set_cql_ready gossip: Add gossiper::is_cql_ready gms: Add endpoint_state::is_cql_ready gms: Add application_state::RPC_READY gms: Introduce cql_ready in versioned_value	2018-12-10 11:37:59 +00:00
Asias He	06dc9b8da0	storage_service: Use cql_ready facility At this point the cql_ready facility is ready. To use it, advertise the RPC_READY application state in the following cases: - When a node boots, set it to false - When cql server is ready, set it to true - When cql server is down, set it to false	2018-12-10 19:20:20 +08:00
Asias He	4761b53035	storage_service: Handle application_state::RPC_READY	2018-12-10 19:20:20 +08:00
Asias He	0e64814206	storage_service: Add notify_cql_change It is called when a RPC_READY gossip application state is received.	2018-12-10 19:20:20 +08:00
Asias He	a1bbd7bcc7	storage_service: Add debug log in notify_joined	2018-12-10 19:20:20 +08:00
Asias He	17d68cb408	storage_service: Add extra check in notify_joined Do not send node joined event if node is not in NORMAL status which means the node has joined the cluster officially.	2018-12-10 19:20:20 +08:00
Asias He	9abb15192f	storage_service: Add notify_joined Add a helper for node joined event.	2018-12-10 19:20:20 +08:00
Asias He	60c74431f7	storage_service: Add debug log in notify_up	2018-12-10 19:20:20 +08:00
Asias He	948d2b6c78	storage_service: Add extra check in notify_up Do not send up event if is_cql_ready is false which means cql server is not ready yet or node is down.	2018-12-10 19:20:20 +08:00
Asias He	48cd31dc1e	storage_service: Add notify_up Add a helper for node up event.	2018-12-10 19:20:20 +08:00
Asias He	03f9c3e7e5	storage_service: Make notify_left log debug level Be consistent with other notification log.	2018-12-10 19:20:20 +08:00
Asias He	a5ec25f28b	storage_service: Introduce notify_left Add a helper for node left event.	2018-12-10 19:20:20 +08:00
Asias He	15d7fce902	storage_service: Add debug log in notify_down	2018-12-10 19:20:19 +08:00
Asias He	f18cb0654d	storage_service: Introduce notify_down Add a helper for node down event.	2018-12-10 19:20:19 +08:00
Asias He	2f3130b36f	storage_service: Add set_cql_ready It is used to set the status of the RPC_READY of this node so it can be advertised by gossip.	2018-12-10 19:20:17 +08:00
Asias He	e07150166a	gossip: Add gossiper::is_cql_ready - New scylla node always send application_state::RPC_READY = false when the node boots and send application_state::RPC_READY = true when cql server is up - Old scylla node that does not support the application_state::RPC_READY never has application_state::RPC_READY in the endpoint_state, we can only think their cql server is up, so we return true here if application_state::RPC_READY is not present	2018-12-10 19:16:44 +08:00
Asias He	2737654c75	gms: Add endpoint_state::is_cql_ready Retrun if the endpoint_state has the RPC_READY application_state.	2018-12-10 19:16:44 +08:00
Asias He	67093324ad	gms: Add application_state::RPC_READY It is used to tell peer nodes that the cql server is ready and can accept clients request. Follow the same name which Cassandra uses.	2018-12-10 19:16:44 +08:00
Asias He	4ed2ef23e9	gms: Introduce cql_ready in versioned_value	2018-12-10 19:16:43 +08:00
Avi Kivity	7c7da0b462	sstables: fix overflow in clustering key blocks header bit access _ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too. Otherwise, a shift in the range 32-63 will produce wrong results. Fix by using a 64-bit mask. Found by Fedora 29's ubsan. Fixes #3973. Message-Id: <20181209120549.21371-1-avi@scylladb.com>	2018-12-10 11:09:25 +00:00
Takuya ASADA	a2d0ebf4d9	dist/offline_installer/redhat: fix missing dependencies Offline installer with Scylla 3.0 causes dependency error on CentOS, added missing packages. Fixes #3969 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181207020711.23055-1-syuu@scylladb.com>	2018-12-10 12:47:10 +02:00
Avi Kivity	904db433d9	Merge "Re-use commitlog segments" from Calle " Refs #3929 Enables re-use of commitlog segments. First, ensures we never succeed playing back a commitlog segment with name not matching the ID:s in the actual file data, by determining expected id based on file name. This will also handle partially written re-used files, as each chunk headers CRC is dependent on the ID, and will fail once we hit any left-overs. Second part renamed and puts files into a recycle list instead of actually deleting them when finished. Allocating new files will the prioritize this list before creating a new file. Note that since consumtion and release of segments can be somewhat unbalanced, this does not really guarantee we will use recycled files even in all cases when it might be possible, simply because of timing. It does however give a good chance of it. We limit recycled files based on the max disk size setting, thus we can potentially grow disk size more than without depending on timing, but not uncontrolled. While all this theoretially might improve disk writes in some cases, it is far from any magic bullet. No real performance testing has been done yet, only functional. " * 'calle/commitlog-reuse' of github.com:scylladb/seastar-dev: commitlog: Recycle used segments instead of delete + new file commitlog: Terminate all segments with a zero chunk commitlog_replay: Enforce file name based id matching	2018-12-10 11:15:02 +02:00
Calle Wilund	55f10ffc43	commitlog: Recycle used segments instead of delete + new file Refs #3929 When deleting a segment, IFF we have not yet filled up all reserves, instead of actually deleting the file, put it on a "recycle" list. Next segment allocation will instead of creating a new one simply rename the segment and reuse the file and its allocated space. We rename the file twice: Once on adding to recycle list, with special prefix so we don't mix up actual replayable segments and these. Second when we actually re-use the file (also to ensure consecutive names). Note that we limit the amount of recyclables, so a really stressed application which somehow fills up the replenish queue might cause us to still drop the segments. Could skip this but risk getting to many files on disk. Replay should be safe, since all entries are guarded by CRC based on the file ID (i.e. file name). Thus replaying a recycled segment will simply cause a CRC error in the main header and be ignored (see previous patch). Segments that are fully synced will have terminating zero-header (see previous patch) so we know when to stop processing a recycled file. If a file is the result of a mid-write crash, we will generate a CRC processing error as "normally" in this case, when hitting partially written block or coming to an old/new chunk boundary. v2: * Sync dir on rename * auto -> const sstring& * Allow recycling files as long as we're within disk space limits v3: * Use special names for files waiting for reuse	2018-12-10 09:09:07 +00:00
Calle Wilund	b13b6ef6a0	commitlog: Terminate all segments with a zero chunk Writes a final chunk header of zero to the file on close, to mark end-of-segment. This allows us to gracefully stop replay processing of a segment file even if it was not zeroed from the beginning (maybe recycled - hint hint).	2018-12-10 09:09:07 +00:00
Calle Wilund	b35af84599	commitlog_replay: Enforce file name based id matching When reading the header chunk of a commitlog file, check the stored id value against the id derived from the file name, and ignore if mismatched. This is a prerequisite for re-using renamed commitlog files, as we can then fail-fast should one such be left on disk, instead of trying to replay it. We also check said id via the CRC check for each chunk parsed. If we find a chunk with mismatched id, we will get a CRC error for the chunk, and replay will terminate (albeit not gracefully).	2018-12-10 09:09:07 +00:00
Amnon Heiman	09c2b8b48a	node_exporter_install: switch to node_exporter 0.17 The newer version of node_exporter comes with important bug fixes, that is especially important for I3.metal is not supported with the older version of node_exporter. The dashboards can now support both the new and the old version of node_exporter. Fixes #3927 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20181210085251.23312-1-amnon@scylladb.com>	2018-12-10 10:54:50 +02:00
Benny Halevy	bcb486b8b9	scylla_io_setup: io_tune should not run when there is less than 10GB of disk space Fixes #2676 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181209174852.3620-1-bhalevy@scylladb.com>	2018-12-10 10:38:33 +02:00
Yibo Cai (Arm Technology China)	6717816a8d	utils/gz: optimize crc_combine for arm64 Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1544418903-26290-1-git-send-email-yibo.cai@arm.com>	2018-12-10 10:31:08 +02:00
Avi Kivity	40677fae37	Merge "Compaction strategy aware major compaction" from Raphael " Make major compaction aware of compaction strategy, by using an optimal approach which suits the strategy needs. Refs #1431. " * 'compaction_strategy_aware_major_compaction_v2' of github.com:raphaelsc/scylla: tests: add test for compaction-strategy-aware major compaction compaction: implement major compaction heuristic for leveled strategy compaction: introduce notion of compaction-strategy-aware major compaction	2018-12-10 10:10:22 +02:00
Avi Kivity	d7c7949d43	auth: remove unneeded db/config.hh includes	2018-12-09 20:11:38 +02:00
Avi Kivity	37a681e46d	auth: remove auth::service dependency on db::config auth::service already has its own configuration and a function to create it from db::config; just move it to the caller. This reduces dependencies on the global db::config class.	2018-12-09 20:11:38 +02:00
Avi Kivity	77e6b7a155	auth: remove permissions_cache dependency on db::config permissions_cache already has its own configuration and a function to create it from db::config; just move it to the caller. This reduces dependencies on the global db::config class.	2018-12-09 20:11:38 +02:00
Avi Kivity	89be47e291	batchlog_manager: remove dependency on db::config Extract configuration into a new struct batchlog_manager_config and have the callers populate it using db::config. This reduces dependencies on global objects.	2018-12-09 20:11:38 +02:00
Avi Kivity	85e9b0d78d	repair: remove unneeded config.hh inclusion	2018-12-09 20:11:38 +02:00
Avi Kivity	864f55e745	config: remove inclusions of db/config.hh from header files Instead, distribute those inclusions to .cc files that require them. This reduces rebuilds when config.hh changes, and makes it easier to locate files that need config disaggregation.	2018-12-09 20:11:38 +02:00
Amos Kong	09a3b11c2f	scylla_setup: only ask for nic in interactive mode Current scylla_setup still asks for nic even nic is already assigned in cmdline. Fixes #3908 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <6b867e17a5583c495c771a37d5fa1e8366b1d61b.1542337635.git.amos@scylladb.com>	2018-12-09 15:29:31 +02:00
Gleb Natapov	9fb79bf379	storage_proxy: fix crash during write timeout callback invocation rh_entry address is captured inside timeout's callback lambda, so the structure should not be moved after it is created. Change the code to create rh_entry in-place instead of moving it into the map. Fixes #3972. Message-Id: <20181206164043.GN25283@scylladb.com>	2018-12-09 10:33:37 +02:00
Vladimir Krivopalov	6a5d8934a6	db: Enable SSTables 'mc' format by default. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <ab4394b98a520b87c986bea2ceef13d015688967.1544227350.git.vladimir@scylladb.com>	2018-12-08 11:07:38 +02:00
Tomasz Grabiec	b78d98a358	tests: perf_fast_forward: Fix result_collector::add() for multi-element results The results vector should be populated vertically, not horizontally. Responsible for assertion failure with --cache-enabled: void result_collector::add(test_result_vector): Assertion `rs.size() == results.size()' failed. Introduced in `3fc78a25bf`. Message-Id: <1544105835-24530-2-git-send-email-tgrabiec@scylladb.com>	2018-12-07 12:44:32 +00:00
Tomasz Grabiec	10cde9ae50	tests: perf_fast_forward: Fix live_range not being initialized Broken in `470552b7ab` Causes test failure when running with --cache-enabled Message-Id: <1544105835-24530-1-git-send-email-tgrabiec@scylladb.com>	2018-12-07 12:38:01 +00:00
Tomasz Grabiec	bb24d378b2	Merge "Fixes for collecting stats in SST3 + more tests" from Vladimir This patchset fixes several remaining issues found during thorough testing of SSTables 3.x statistics and enriches ~30 unit tests with statistics validation against Cassandra-generated golden copies. * https://github.com/argenet/scylla/tree/projects/sstables-30/sst3-tests-statistics/v1: sstables: Enforce estimated_partitions in generate_summary() to be always positive. sstables: Don't enforce default max_local_deletion_time value for 'mc' files. sstables: Update TTL/local deletion stats for non-expiring and live liveness_info. sstables: Collect statistics when writing RT markers to SSTables 3.x. tests: Return sstable_assertions from validate_read() helper. tests: Introduce helper for validating stats metadata in SSTables 3.x tests. tests: Add stats metadata validation to test_write_static_row. tests: Add stats metadata validation to test_write_composite_partition_key. tests: Add stats metadata validation to test_write_composite_clustering_key. tests: Add stats metadata validation to test_write_wide_partitions. tests: Add stats metadata validation to write_ttled_row tests: Add stats metadata validation to write_ttled_column tests: Add stats metadata validation to write_deleted_column tests: Add stats metadata validation to write_deleted_row tests: Add stats metadata validation to write_collection_wide_update tests: Add stats metadata validation to write_collection_incremental_update tests: Add stats metadata validation to write_multiple_partitions tests: Add stats metadata validation to write_multiple_rows tests: Add stats metadata validation to write_missing_columns_large_set tests: Add stats metadata validation to write_different_types tests: Add stats metadata validation to write_empty_clustering_values tests: Add stats metadata validation to write_large_clustering_key tests: Add stats metadata validation to write_compact_table tests: Add stats metadata validation to write_user_defined_type_table tests: Add stats metadata validation to write_simple_range_tombstone tests: Add stats metadata validation to write_adjacent_range_tombstones tests: Add stats metadata validation to write_non_adjacent_range_tombstones tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows tests: Add stats metadata validation to write_range_tombstone_same_start_with_row tests: Add stats metadata validation to write_range_tombstone_same_end_with_row tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests.	2018-12-07 12:05:55 +01:00
Vladimir Krivopalov	98ae39f920	tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	dcd639b4d5	tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	d07ab3b3ef	tests: Add stats metadata validation to write_range_tombstone_same_end_with_row Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	b856cf837e	tests: Add stats metadata validation to write_range_tombstone_same_start_with_row Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	ba24572fb6	tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	4167c9e51d	tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	fd1c9b84c6	tests: Add stats metadata validation to write_non_adjacent_range_tombstones Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	1a6d613654	tests: Add stats metadata validation to write_adjacent_range_tombstones Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	57d2d1a1c6	tests: Add stats metadata validation to write_simple_range_tombstone Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	bc5d5633dc	tests: Add stats metadata validation to write_user_defined_type_table Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	d9f2829ca0	tests: Add stats metadata validation to write_compact_table Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	3a1e287c6a	tests: Add stats metadata validation to write_large_clustering_key Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	722fc7222a	tests: Add stats metadata validation to write_empty_clustering_values Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	1367243b7e	tests: Add stats metadata validation to write_different_types Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	12b10c0cca	tests: Add stats metadata validation to write_missing_columns_large_set Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	c990c518fc	tests: Add stats metadata validation to write_multiple_rows Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	9bb46f7cc6	tests: Add stats metadata validation to write_multiple_partitions Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	99d3cbd2fc	tests: Add stats metadata validation to write_collection_incremental_update Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	0118b15c06	tests: Add stats metadata validation to write_collection_wide_update Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	85782ed729	tests: Add stats metadata validation to write_deleted_row Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	66913adcc6	tests: Add stats metadata validation to write_deleted_column Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	997101f105	tests: Add stats metadata validation to write_ttled_column Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	a018388049	tests: Add stats metadata validation to write_ttled_row Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	260dfb3492	tests: Add stats metadata validation to test_write_wide_partitions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	349a73c464	tests: Add stats metadata validation to test_write_composite_clustering_key. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	4f14e65d70	tests: Add stats metadata validation to test_write_composite_partition_key. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	a7b85e8009	tests: Add stats metadata validation to test_write_static_row. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	ccb2dec22b	tests: Introduce helper for validating stats metadata in SSTables 3.x tests. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	5f6240cd7d	tests: Return sstable_assertions from validate_read() helper. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	cc12449646	sstables: Collect statistics when writing RT markers to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	2e5c221865	sstables: Update TTL/local deletion stats for non-expiring and live liveness_info. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Rafael Ávila de Espíndola	298873d33b	Add a test with mismatched schema. The sstable in the test is fine, but the schema thinks a static column is regular.	2018-12-06 15:38:01 -08:00
Rafael Ávila de Espíndola	d392bc4924	Add a broken sstable test. This sstable has a static column with clustering information.	2018-12-06 15:23:33 -08:00
Raphael S. Carvalho	1ddbbe51e6	tests: add test for compaction-strategy-aware major compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:37:16 -02:00
Raphael S. Carvalho	525ee18560	compaction: implement major compaction heuristic for leveled strategy Major compaction for leveled strategy will now create a run of non-overlapping sstables at the highest level. Until now, a single sstable would be created at level 0 which was very suboptimal because all data would need to climb up the levels again, making it a very expensive I/O process. Refs #1431. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:22:31 -02:00
Raphael S. Carvalho	3d9566e40d	compaction: introduce notion of compaction-strategy-aware major compaction That's only the very first step which introduces the machinery for making major compaction aware of all strategies. By the time being, default implementation is used for them all which only suits size tiered. Refs #1431. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:22:30 -02:00
Vladimir Krivopalov	d2dfa2e15d	sstables: Don't enforce default max_local_deletion_time value for 'mc' files. Commit `cc6c383249` has fixed an issue with incorrectly tracking max_local_deletion_time and the check in validate_max_local_deletion_time was called to work around old files. This fix relaxes conditions for enforcing defaut max_local_deletion_time so that they don't apply to SSTables in 'mc' format because the original problem has been resolved before 'mc' format have been introduced. This is needed to be able to read correct values from Cassandra-generated SSTables that don't have a Scylla.db component. Its presence or absence is used as an indicator of possibly affected files. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 10:15:07 -08:00
Vladimir Krivopalov	0b1e6427ad	sstables: Enforce estimated_partitions in generate_summary() to be always positive. For tiny index files (< 8 bytes long) it could turn to zero and trigger an assertion in prepare_summary(). Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 10:15:07 -08:00
Raphael S. Carvalho	ffb00d2118	storage_service: remove outdated comment on ongoing compaction interrupt After commit `5e953b5e47`, compaction manager will forcefully stop ongoing compactions instead of waiting for them to finish. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181206142600.21354-1-raphaelsc@scylladb.com>	2018-12-06 15:43:42 +01:00
Tomasz Grabiec	6012a63660	Merge "Fix window during init where waiting for a feature can be ignored" from Avi storage_service keeps a bunch of "feature" variables, indicating cluster-wide supported features, and has the ability to wait until the entire cluster supports a given feature. The propagation of features depends on gossip, but gossip is initialized after storage_service, so the current code late-initializes the features. However, that means that whoever waits on a feature between storage_service initialization and gossip initialization loses their wait entry. In #3952, we have proof that this in fact happens. Fix this by removing the circular dependency. We now store features in a new service, feature_service, that is started before both gossip and storage_service. Gossip updates feature_service while storage_service reads for it. Fixes #3953. * https://github.com/avikivity/3953/v4.1: storage_service: deinline enable_all_features() gossiper: keep features registered tests/gossip: switch to seastar::thread storage_service: deinline init/deinit functions gossiper: split feature storage into a new feature_service gossiper: maybe enable features after start_gossiping() storage_service: fix gap when feature::when_enabled() doesn't work	2018-12-06 15:42:26 +01:00
Avi Kivity	33a0366ed8	storage_service: fix gap when feature::when_enabled() doesn't work storage_service::register_features() reassigns to feature variables in storage_service. This means that any call to feature::when_enabled() will be orphaned when the feature is assigned. Now that feature lifetimes are not tied to gossip, we can move the feature initialization to the constructor and eliminate the gap. When gossip is started it will evaluate application_states and enable features that the cluster agrees on.	2018-12-06 16:31:05 +02:00
Avi Kivity	587fd9b6c0	gossiper: maybe enable features after start_gossiping() Since we may now start with features already registered, we need to enable features immediately after gossip is started. This case happens in a cluster that already is fully upgraded on startup. Before this series, features were only added after this point.	2018-12-06 16:31:04 +02:00
Avi Kivity	4e553b692e	gossiper: split feature storage into a new feature_service Feature lifetime is tied to storage_service lifetime, but features are now managed by gossip. To avoid circular dependency, add a new feature_service service to manage feature lifetime. To work around the problem, the current code re-initializes features after gossip is initialized. This patch does not fix this problem; it only makes it possible to solve it by untyping features from gossip.	2018-12-06 16:31:04 +02:00
Avi Kivity	9b476fc377	storage_service: deinline init/deinit functions Reduces #include dependencies later on.	2018-12-06 16:31:04 +02:00
Avi Kivity	db72a7e8bd	tests/gossip: switch to seastar::thread Much simpler to manage the long initialization chain.	2018-12-06 16:31:04 +02:00
Avi Kivity	1215512e98	gossiper: keep features registered Gossiper unregisters enabled features as an optimization. However that makes decoupling features from gossiper harder. Disable this optimization; since the number of features is small and normal access is to a single feature at a time, there is no significant performance or memory loss.	2018-12-06 16:31:04 +02:00
Paweł Dziepak	9024187222	partition_slice: use small_vector for column_ids	2018-12-06 14:21:04 +00:00
Paweł Dziepak	a014367c5b	mutation_fragment_merger: use small_vector	2018-12-06 14:21:04 +00:00
Paweł Dziepak	142c4a9d84	auth: use small_vector in resource	2018-12-06 14:21:04 +00:00
Paweł Dziepak	edbcac85cb	auth: avoid list-initialisation of vectors List-initialisation forces often completely unnecessary copies of the elements.	2018-12-06 14:21:04 +00:00
Paweł Dziepak	890a5ba8ac	idl: serialiser: add serialiser for utils::small_vector	2018-12-06 14:21:04 +00:00
Paweł Dziepak	abb4953209	idl: serialiser: deduplicate vector serialisers In Scylla we have three implementations of vector-like structures std::vector, utils::chunked_vector and utils::small_vector. Which one is used is largerly an implementation detail and all should be serialised by the IDL infrastructure in exactly the same way. To make sure that it's indeed the case let's make them share the serialiser implementation.	2018-12-06 14:21:04 +00:00
Paweł Dziepak	23d19d21bd	utils: introduce small_vector small_vector is a variation of std::vector<> that reserves a configurable amount of storage internally, without the need for memory allocation. This can bring measurable gains if the expected number of elements is small. The drawback is that moving such small_vector is more expensive and invalidates iterators as well as references which disqualifies it in some cases.	2018-12-06 14:21:04 +00:00
Avi Kivity	21b4b2b9a1	Merge "Fix deadlocking multishard readers" from Botond " Multishard combining readers, running concurrently, with limited concurrency and no timeout may deadlock, due to inactive shard readers sitting on permits. To avoid this we have to make sure that all shard readers belonging to a multishard combining readers, that are not currently active, can be evicted to free up their permits, ensuring that all readers can make progress. Making inactive shard readers evictable is the solution for this problem, however the original series introducing this solution (`414b14a6bd`) did not go all they way and left some loose ends. These loose ends are tied up by this mini-series. Namely, two issues remained: * The last reader to reach EOS was not paused (made evictable). * Readers created/resumed as part of a read-ahead were not paused immediately after finishing the read-ahead. This series fixes both of these. Fixes: #3865 Tests: unit(release, debug) " * 'fix-multishard-reader-deadlock/v1' of https://github.com/denesb/scylla: multishard_combining_reader: pause readers after reading ahead multishard_combining_reader: pause all EOS'd readers	2018-12-06 16:08:11 +02:00
Botond Dénes	ee193f1ab4	multishard_combining_reader: pause readers after reading ahead Readers created or resumed just to read ahead should be paused right after, to avoid consuming all available permits on the shards they operate on, causing a deadlock.	2018-12-06 13:20:30 +02:00
Avi Kivity	d4f353d3c8	Merge "normalized python3 compatibility, shebang and encoding" from Alexys " This series of patches ensures that all the Python code base is python3 compliant and consistent by applying the following logic: - python3 classifier on setup.py to explicitly state our python compatibility matrix - add UTF-8 encoding header - correct every shebang to the same /usr/bin/env python3 - shebang is only added on scripts meant to be executed on their own (removed otherwise) - migrate some leftover scripts from python2 to python3 with minimal QA This work is important to prepare for a more drastic change on Python code styling using the black formatter and the setting up of automated QA checks on Python code base. " * 'python3_everywhere' of https://github.com/numberly/scylla: scylla-housekeeping: fix python3 compat and shebang dist/ami/files/scylla_install_ami: python3 shebang dist/docker/redhat/docker-entrypoint.py: add encoding comment fix_system_distributed_tables.py: fix python3 compat and shebang gen_segmented_compress_params.py: add encoding comment idl-compiler.py: python3 shebang scylla-gdb.py: python3 shebang configure.py: python3 shebang tools/scyllatop/: add / normalize python3 shebang scripts/: add / normalize python3 shebang dist/common/scripts: add / normalize python3 shebang test.py: add encoding comment setup.py: add python3 classifiers	2018-12-06 12:16:57 +02:00
Avi Kivity	f073ea5f87	Merge "Fix tombstone histogram when writing SSTables 3.x" from Vladimir " This patchset extends a number of existing tests to check SSTables statistics for 'mc' format and fixes an issue discovered with the help of one of the tests. Tests: unit {release} " * 'projects/sstables-30/check-stats/v2' of https://github.com/argenet/scylla: tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions. tests: Run sstable_tombstone_histogram_test for all SSTables versions. tests: Run min_max_clustering_key_test on all SSTables versions. tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions. tests: Run test_sstable_max_local_deletion_time on all SSTables versions. tests: Extend test checking tombstones histogram to cover all SSTables versions. sstables: Properly track row-level tombstones when writing SSTables 3.x. tests: Run min_max_clustering_key_test_2 for all SSTables versions. tests: Make reusable_sst() helper accept SSTables version parameter.	2018-12-06 11:44:33 +02:00
Botond Dénes	170fa382fa	multishard_combining_reader: pause all EOS'd readers Previously the last shard reader to reach EOS wasn't paused. This is a mistake and can contribute to causing deadlocks when the number of concurrently active readers on any shard is limited.	2018-12-06 10:30:43 +02:00
Vladimir Krivopalov	dd769f2b41	tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	a098387e9f	tests: Run sstable_tombstone_histogram_test for all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	06a47fc9f9	tests: Run min_max_clustering_key_test on all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	c53afd7bba	tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	cfbde5b89c	tests: Run test_sstable_max_local_deletion_time on all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	9955710cac	tests: Extend test checking tombstones histogram to cover all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	cdae62ec29	sstables: Properly track row-level tombstones when writing SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	0f3fb32028	tests: Run min_max_clustering_key_test_2 for all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	c474b0d851	tests: Make reusable_sst() helper accept SSTables version parameter. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Paweł Dziepak	504c586392	intrusive_set_external_comparator: make iterator nothrow move constructible	2018-12-05 20:07:29 +00:00
Paweł Dziepak	402902ac78	mutation_fragment_merger: value-initialise iterator ForwardIterators are default constructible, but they have to be value-initialised to compare equal to other value-initialised instances of that iterator.	2018-12-05 20:07:29 +00:00
Tomasz Grabiec	2c2d202354	tests: perf_fast_forward: Make output directory configurable Message-Id: <1544020034-16340-1-git-send-email-tgrabiec@scylladb.com>	2018-12-05 21:51:01 +02:00
Tomasz Grabiec	247347058c	tests: perf_fast_forward: Always print to stdout Otherwise errors cannot be made sense of, since error are reported always to stdout. Without test output we don't know what they're referring to. This change makes the output always go to stdout, in addition to other reportes, if any. Message-Id: <1544020084-16492-1-git-send-email-tgrabiec@scylladb.com>	2018-12-05 21:51:01 +02:00
Yibo Cai (Arm Technology China)	6fadba56cc	utils: optimize UTF-8 validation UTF-8 string is now validated by boost::locale::conv::utf_to_utf, it actually does string conversions which is more than necessary. As observed on Arm server, UTF-8 validation can become bottleneck under heavy loads. This patch introduces a brand new SIMD implementation supporting both NEON and SSE, as well as a naive approach to handle short strings. The naive approach is 3x faster than boost utf_to_utf, whilst SIMD method outperforms naive approach 3x ~ 5x on Arm and x86. Details at https://github.com/cyb70289/utf8/. UTF-8 unit test is added to check various corner cases. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1543978498-12123-1-git-send-email-yibo.cai@arm.com>	2018-12-05 21:51:01 +02:00
Tomasz Grabiec	3e70ae1d06	Merge "Improve times to start / stop the nodes" from Glauber If the compaction manager is started, compactions may start (this is regardless of whether or not we trigger them). The problem with that is that they start at a time in which we are flushing the commitlog and the initialization procedure waits for the commitlog to be fully flushed and the resulting memtables flushed before we move on. Because there are no incoming writes, the amount of shares in memtable flushes decrease as memory used decreases and that can cause the startup procedure to take a long time. We have recently started to bump the shares manually for manual flushes. While that guarantees that we will not drive the shares to zero, I will make the argument that we can do better by making sure that those things are, at this point, running alone: user experience is affected by startup times and the bump we give to user-triggered operations will only do so much. Even if we increase the shares a lot flushes will still be fighting for resources with compactions and startup will take longer than it could. By making sure that flushes are this point running alone we improve the user experience by making sure the startup is as fast as it can be. There is a similar problem at the drain level, which is also fixed in this series. Fixes #3958 * git@github.com:glommer/scylla.git faster-restart compaction_manager: delay initialization of the compaction manager. drain: stop compactions early	2018-12-05 21:51:01 +02:00
Asias He	eeeb2da7bb	gossip: Fix race in real_mark_alive and shutdown msg In dtest, we have self.check_rows_on_node(node1, 2000) self.check_rows_on_node(node2, 2000) which introduce the following cluster operations: 1) Initially: - node1 up - node2 up 2) self.check_rows_on_node(node1, 2000) - node2 down - node2 up (A: node2 will call gossiper::real_mark_alive when node2 boots up to mark node1 up) 3) self.check_rows_on_node(node2, 2000) - node1 down (B: node1 will send shutdown gossip message to node2, node2 will mark node1 down) - node1 up (C: when node1 is up, node2 will call gossiper::real_mark_alive) Since there is no guarantee the order of Operation A and Operation B, it is possible node2 will mark node1 as status=shutdown and mark node1 is UP. In Operation C, node2 will call gossiper::real_mark_alive to mark node1 up, but since node2 might think node1 is already up, node2 will exit early in gossiper::real_mark_alive and not log "InetAddress 127.0.0.1 is now UP, status={}" As a result, dtest fails to see node2 reports node1 is up when it boots node1 and fail the test. TimeoutError: 23 Nov 2018 10:44:19 [node2] Missing: ['127.0.0.1.* now UP'] In the log we can see node1 marked as DOWN and UP almost at the same time on node2: INFO 2018-11-23 22:31:29,999 [shard 0] gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown INFO 2018-11-23 22:31:30,006 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = shutdown Fixes #3940 Tests: dtest with 20 consecutive succesful runs Message-Id: <996dc325cbcc3f94fc0b7569217aa65464eaaa1c.1543213511.git.asias@scylladb.com>	2018-12-05 21:51:01 +02:00
Tomasz Grabiec	edbef7400b	configure.py: Always add a rule for building gen_crc_combine_table Fixes a build failure when only the scylla binary was selected for building like this: ./configure.py --with scylla In this case the rule for gen_crc_combine_table was missing, but it is needed to build crc_combine_table.o Message-Id: <1544010138-21282-1-git-send-email-tgrabiec@scylladb.com>	2018-12-05 21:51:01 +02:00
Botond Dénes	77dbc7d09a	querier: fix evict_one() and evict_all_for_table() Both of these have the same problem. They remove the to-be-evicted entries from `_entries` but they don't unregister the `entry` from the `read_concurrency_semaphore`. This results in the `reader_concurrency_semaphore` being left with a dangling pointer to the entries will trigger segfault when it tries to evict the associated inactive reads. Also add a unit test for `evict_all_for_table()` to check that it works properly (`evict_one()` is only used in tests, so no dedicated test for it). Fixes: #3962 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <57001857e3791c6385721b624d33b667ccda2e7d.1544010868.git.bdenes@scylladb.com>	2018-12-05 21:51:01 +02:00
Avi Kivity	0be554c337	storage_service: deinline enable_all_features() Next commit wants to make it depend on config, which is best done out-of-line.	2018-12-05 17:30:42 +02:00
Asias He	a5d8b66f2c	gossip: Make favor newly added node log debug level It is not very useful for user to know this. Message-Id: <6c2dfc522d6974adb97c34fbc1e3a0339d2d530c.1543997137.git.asias@scylladb.com>	2018-12-05 10:45:03 +02:00
Avi Kivity	b0cb69ec25	Merge "Make sstable reader fail on unknown colum names in MC format" from Piotr " Before the reader was just ignoring such columns but this creates a risk of data loss. Refs #2598 " * 'haaawk/2598/v3' of github.com:scylladb/seastar-dev: sstables: Add test_sstable_reader_on_unknown_column sstables: Exception on sstable's column not present in schema sstables: store column name in column_translation::column_info sstables: Make test_dropped_column_handling test dropped columns	2018-12-05 10:43:29 +02:00
Takuya ASADA	9388f3d626	reloc: drop --jobs from build_deb.sh/build_rpm.sh scripts Since we merged relocatable package, build_deb.sh/build_rpm.sh only does packaging using prebuilt binary taken from relocatable package, won't compile anything. So passing --jobs option to build_deb.sh/build_rpm.sh becomes meaningless, we can drop it. Note that we still can specify --jobs option on reloc/build_reloc.sh, it runs "ninja-build -jN" to compile Scylla, then generate relocatable package. See #3956 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181204205652.25138-1-syuu@scylladb.com>	2018-12-04 21:00:51 +00:00
Glauber Costa	0b7818d2b9	drain: stop compactions early drain suffers from the same problem as startup suffers now: memtables are flushed as part of the drain routine, and because there are no incoming writes the shares the controller assign to flushes go down over time, slowing down the process of drain. This patch reorders things so that we stop compactions first, and flush later. It guarantees that when flush do happen it will have the full bandwidth to work with. There is a comment in the code saying we should stop compactions forcefully instead of waiting for them to finish. I consider this orthogonal to this patch therefore I am not touching this. Doing so will make the drain operation even faster but can be done later. Even when we do it, having the flushes proceed alone instead of during compactions will make it faster. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-12-04 13:55:59 -05:00
Glauber Costa	fee4d2eb9b	compaction_manager: delay initialization of the compaction manager. If the compaction manager is started, compactions may start (this is regardless of whether or not we trigger them). The problem with that is that they start at a time in which we are flushing the commitlog and the initialization procedure waits for the commitlog to be fully flushed and the resulting memtables flushed before we move on. Because there are no incoming writes, the amount of shares in memtable flushes decrease as memory used decreases and that can cause the startup procedure to take a long time. We have recently started to bump the shares manually for manual flushes. While that guarantees that we will not drive the shares to zero, I will make the argument that we can do better by making sure that those things are, at this point, running alone: user experience is affected by startup times and the bump we give to user-triggered operations will only do so much. Even if we increase the shares a lot flushes will still be fighting for resources with compactions and startup will take longer than it could. By making sure that flushes are this point running alone we improve the user experience by making sure the startup is as fast as it can be. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-12-04 13:48:42 -05:00
Tomasz Grabiec	b8c405c019	Merge "Correct the usage of row ttl and add write-read test" from Piotr Fixes the condition which determines whether a row ttl should be used for a cell and adds a test that uses each generated mutation to populate mutation source and then verifies that it can read back the same mutation. * seastar-dev.git haaawk/sst3/write-read-test/v3: Fix use_row_ttl condition Add test_all_data_is_read_back	2018-12-04 19:47:28 +01:00
Tomasz Grabiec	9a4c00beb7	utils/gz: Fix compilation on non-x86 archs gen_crc_combine_table is now executed on every build, so it should not fail on unsupported archs. The generated file will not contain data, but this is fine since it should not be used. Another problem is that u32 and u64 aliases were not visible in the #else branch in crc_combine.cc Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com>	2018-12-04 18:17:27 +00:00
Piotr Jastrzebski	fed3b51abe	Add test_all_data_is_read_back This tests that a source after being populated with a mutation returns exactly the same mutation when read. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-12-04 11:42:08 +01:00
Piotr Sarna	7b0a3fbf8a	auth: add abort_source to waiting for schema agreement When the auth service is requested to stop during bootstrap, it might have still not reached schema agreement. Currently, waiting for this agreement is done in an infinite loop, without taking abort_source into account. This patch introduces checking if abort was requested and breaking the loop in such case, so auth service can terminate. Tests: unit (release) dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test) Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com>	2018-12-04 10:41:09 +00:00
Piotr Jastrzebski	75b99838fc	Fix use_row_ttl condition Previous condition was wrong and was using row ttl too often. We also have to change test_dead_row_marker to compare resulting sstable with sstable generated by Origin not by sstableupgrade. This is because sstableupgrade transmits information about deleted row marker automatically to cells in that row. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-12-04 10:51:36 +01:00
Avi Kivity	c3e664eec2	Merge "Improve corrupt sstable reporting" from Rafael " This is a small step in fixing issue #2347. It is mostly tests and testing infrastructure, but it does include a fix for a case where we were missing the filename in the malformed_sstable_exception. " * 'espindola/sstable-corruption-v2' of https://github.com/espindola/scylla: Add a filename to a malformed_sstable_exception. Try to read the full sst in broken_sst. Convert tests to SEASTAR_THREAD_TEST_CASE. Check the exception message. Move some tests to broken_sstable_test.cc	2018-12-04 10:32:10 +02:00
Avi Kivity	414b14a6bd	Merge "Make inactive shard readers evictable" from Botond " This series attempts to solve the regressions recently discovered in performance of multi-partition range-scans. Namely that they: * Flood the reader concurrency semaphore's queues, trampling other reads. * Behave very badly when too many of them is running concurrently (trashing). * May deadlock if enough of them is running without a timeout. The solution for these problems is to make inactive shard readers evictable. This should address all three issues listed above, to varying degrees: * Shard readers will now not cling onto their permits for the entire duration of the scan, which might be a lot of time. * Will be less affected by infinite concurrency (more than the node can handle) as each scan now can make progress by evicting inactive shard readers belonging to other scans. * Will not deadlock at all. In addition to the above fix, this series also bundles two further improvements: * Add a mechanism to `reader_concurrecy_semaphore` to be notified of newly inserted evictables. * General cleanups and fixes for `multishard_combining_reader` and `foreign_reader`. I can unbundle these mini series and send them separately, if the maintainers so prefer, altough considering that this series will have to be backported to 3.0, I think this present form is better. Fixes: #3835 " * 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits) tests/multishard_mutation_query_test: test stateless query too tests/querier_cache: fail resource-based eviction test gracefully tests/querier_cache: simplify resource-based eviction test tests/mutation_reader_test: add test_multishard_combining_reader_next_partition tests/mutation_reader_test: restore indentation tests/mutation_reader_test: enrich pause-related multishard reader test multishard_combining_reader: use pause-resume API query::partition_slice: add clear_ranges() method position_in_partition: add region() accessor foreign_reader: add pause-resume API tests/mutation_reader_test: implement the pause-resume API query_mutations_on_all_shards(): implement pause-resume API make_multishard_streaming_reader(): implement the pause-resume API database: add accessors for user and streaming concurrency semaphores reader_lifecycle_policy: extend with a pause-resume API query_mutations_on_all_shards(): restore indentation query_mutations_on_all_shards(): simplify the state-machine multishard_combining_reader: use the reader lifecycle policy multishard_combining_reader: add reader lifecycle policy multishard_combining_reader: drop unnecessary `reader_promise` member ...	2018-12-04 10:22:35 +02:00
Botond Dénes	9de4f3a834	tests/multishard_mutation_query_test: test stateless query too In the `test_read_all`, do a stateless read as well to ensure that path works correctly as well.	2018-12-04 08:51:05 +02:00
Botond Dénes	6676ceba7f	tests/querier_cache: fail resource-based eviction test gracefully Currently when this test fails, resources are not released in the correct order, which results in ASAN complaining about use-after-free in debug builds. This is due to the BOOST_REQUIRE macro aborting the test when the predicate fails, not allowing for correct destruction order to take place. To avoid this ugly failure, that adds noise and might cause a developer investigating into the failure starting on the wrong path, use the more mild BOOST_CHECK family of test macros. These will allow the test to run to completion even when the predicate fails, allowing for the correct destruction of the resources.	2018-12-04 08:51:05 +02:00
Botond Dénes	93e41397f7	tests/querier_cache: simplify resource-based eviction test Now that we have an accessor for all concurrency semaphores, we don't need the tricks of creating a dummy keyspace to get them. Use the accessors instead.	2018-12-04 08:51:05 +02:00
Botond Dénes	dcd2d116a3	tests/mutation_reader_test: add test_multishard_combining_reader_next_partition Test the interaction of the multishard reader with the foreign reader w.r.t next_partition(). next_partition() is a special operation, as it its execution is deferred until the next cross-shard operations. Give it some extra stress-testing.	2018-12-04 08:51:05 +02:00
Botond Dénes	20e994e526	tests/mutation_reader_test: restore indentation Left over from the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	a577ff97e9	tests/mutation_reader_test: enrich pause-related multishard reader test Enrich the existing test_multishard_combining_reader_as_mutation_source test case with delaying the pause/resume and eviction of paused readers.	2018-12-04 08:51:05 +02:00
Botond Dénes	22b14d593b	multishard_combining_reader: use pause-resume API Refactor the multishard combining reader to make use of the new pause-resume API to pause inactive shard readers. Make the pause-resume API mandatory to implement, as by now all existing clients have adapted it.	2018-12-04 08:51:05 +02:00
Botond Dénes	77b758707c	query::partition_slice: add clear_ranges() method Allows for clearing any custom partition ranges, effectively resetting them to the default ones. Useful for code that needs to set several different specific partition ranges, one after the other, but doesn't want to remember the last key it set a range for to be able to clear the previous range with `clear_range()`.	2018-12-04 08:51:05 +02:00
Botond Dénes	a594fd39ce	position_in_partition: add region() accessor	2018-12-04 08:51:05 +02:00
Botond Dénes	9601d23e0d	foreign_reader: add pause-resume API Allowing for pausing the reader and later resume it. Pausing the reader waits on the ongoing read ahead (if any), executes any pending `next_partition()` calls and than detaches the shard reader's buffer. The paused shard reader is returned to the client. Resuming the reader consists of getting the previously detached reader back, or one that has the same position as the old reader had. This API allows for making the inactive shard readers of the `multishard_combining_reader` evictable. The API is private, it's only accessible for classes knowing the full definition of the `foreign_reader` (which resides in a .cc file).	2018-12-04 08:51:05 +02:00
Botond Dénes	a12fae366d	tests/mutation_reader_test: implement the pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	f334d3717f	query_mutations_on_all_shards(): implement pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	72ed655ef0	make_multishard_streaming_reader(): implement the pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	bf0d1f4eea	database: add accessors for user and streaming concurrency semaphores These will soon be needed to register inactive user and streaming reads with the respective semaphores.	2018-12-04 08:51:05 +02:00
Botond Dénes	5f67a065c6	reader_lifecycle_policy: extend with a pause-resume API This API provides a way for the mulishard reader to pause inactive shard readers and later resume them when they are needed again. This allows for these paused shard readers to be evicted when the node is under pressure. How the readers are made evictable while paused is up to the clients. Using this API in the `multishard_combining_reader` and implementing it in the clients will be done in the next patches. Provide default implementation for the new virtual methods to facilitate gradual adoption.	2018-12-04 08:51:05 +02:00
Botond Dénes	6f0e0c4ed7	query_mutations_on_all_shards(): restore indentation The previous patch added half-aligned lines to improve readability of that patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	aa6083a75b	query_mutations_on_all_shards(): simplify the state-machine The `read_context` which handles creating, saving and looking-up the shard readers has to deal with its `destroy_reader()` method called any time, even before some other method finished its work. For example it is valid for a reader to be requested to be destroyed, even before the contexts finishes creating it. This means that state transitions that take time can be interleaved with another state transition request. To deal with this the read context uses `future_` states, states that mark an ongoing state transitions. This allows for state transition request that arrive in the middle of another state transition to be attached as a continuation to the ongoing transition, and to be executed after that finishes. This however resulted in complex code, that has to handle readers being in all sorts of different states, when the `save_readers()` method is called. To avoid all this complexity, exploit the fact that `destroy_reader()` receives a future<> as its argument, which resolves when all previous state transitions have finished. Use a gate to wait on all these futures to resolve. This way we don't need all those transitional states, instead in `save_readers()` we only need to wait on the gate to close. Thus the number of states `save_readers()` has to consider drops drastically. This has the theoretical drawback of the process of saving the readers having to wait on each of the readers to stop, but in practice the process finishes when the last reader is saved anyway, so I don't expect this to result in any slowdown.	2018-12-04 08:51:05 +02:00
Botond Dénes	007619de4c	multishard_combining_reader: use the reader lifecycle policy Refactor the multishard combining reader and its clients to use the reader lifecycle policy introduced in the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	0a616c899e	multishard_combining_reader: add reader lifecycle policy Currently `multishard_combining_reader` takes two functors, one for creating the readers and optionally one for destroying them. A bag of functors (`std::function`) however make for a terrible interface, and as we are about to add some more customization points, it's time to use something more formal: policy based design, a well-known design pattern. As well as merging the job of the two functors into a single policy class, also widen the area of responsibility of the policy to include keeping alive any resource the shard readers might need on their home shard. Implementing a proper reader cleanup is now not optional either. This patch only adds the `reader_managing_policy` interface, refactoring the multishard reader to use it will be done in the next patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	301abaca07	multishard_combining_reader: drop unnecessary `reader_promise` member The `reader_promise` member of the `shard_reader` was used to synchronize a foreground request to create the underlying reader with an ongoing background request with the same goal. This is however unnecessary. The underlying reader is created in the background only as part of a read ahead. In this case there is no need for extra synchronization point, the foreground reader create request can just wait for the read ahead to finish, for which there already exists a mean. Furthermore, foreground reader create requests are always followed by a `fill_buffer()` request, so by waiting on the read ahead we ensure that the following `fill_buffer()` call will not block.	2018-12-04 08:51:05 +02:00
Botond Dénes	a73175fdbc	multishard_combining_reader: drop tracking of pending next_partition calls Shard readers used to track pending `next_partition()` calls that they couldn't execute, because their underlying reader wasn't created yet. These pending calls were then executed after the reader was created. However the only situation where a shard reader can receive a `next_partition()` call, before its underlying reader wasn't created is when `next_partition()` is called on the multishard reader before a single fragment is read. In this case we know we are at a partition boundary and thus this call has no effect, therefore it is safe to ignore it.	2018-12-04 08:51:05 +02:00
Botond Dénes	ab3e639c3b	foreign_reader: use bool for pending_next_partition Foreign reader doesn't execute `next_partition()` calls straight away, when this would require interaction with the remote reader. Instead these calls are "remembered" and executed on the next occasion the foreign reader has to interact with the remote reader. This was implemented with a counter that counts the number of pending `next_partition()` calls. However when `next_partition()` is called multiple times, without interleaving calls to `operator()()` or `fast_forward_to()`, only the first such call has effect. Thus it doesn't make sense to count these calls, it is enough to just set a flag if there was at least one such call.	2018-12-04 08:51:05 +02:00
Botond Dénes	5a4fd1abab	multishard_combining_reader: drop support for streamed_mutation fast-forwarding It doesn't make sense for the multishard reader anyway, as it's only used by the row-cache. We are about to introduce the pausing of inactive shard readers, and it would require complex data structures and code to maintain support for this feature that is not even used. So drop it.	2018-12-04 08:51:05 +02:00
Botond Dénes	b36733971b	mutation_source_test: add option to skip intra-partition fast-forwarding tests To allow for using this test suite for testing mutation sources that don't support intra-partition fast-forwarding.	2018-12-04 08:51:05 +02:00
Botond Dénes	37f0117747	reader_concurrency_semaphore: refactor eviction mechanism As we are about to add multiple sources of evictable readers, we need a more scalable solution than a single functor being passed that opaquely evicts a reader when called. Add a generic way to register and unregister evictable (inactive) readers to the semaphore. The readers are expected to be registered when they become evictable and are expected to be unregistered when they cease to become evictable. The semaphore might evict any reader that is registered to it, when it sees fit. This also solves the problem of notifying the semaphore when new readers become evictable. Previously there was no such mechanism, and the semaphore would only evict any such new readers when a new permit was requested from it.	2018-12-04 08:51:00 +02:00
Rafael Ávila de Espíndola	21199a7a5c	Add a filename to a malformed_sstable_exception. It is reasonable for parse() to throw when it finds something wrong with the format. This seems to be the best spot to add the filename and rethrow. Also add a testcase to make sure we keep handling this error gracefully. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:50:23 -08:00
Rafael Ávila de Espíndola	a6e25e4bd0	Try to read the full sst in broken_sst. With this patch we use data_consume_rows to try to read the entire sstable. The patch also adds a test with a particular corruption that would not be found without parsing the file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:47:49 -08:00
Rafael Ávila de Espíndola	b1190c58ec	Convert tests to SEASTAR_THREAD_TEST_CASE. This will simplify future changes to broken_sst. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:26:06 -08:00
Rafael Ávila de Espíndola	e5c5afffc9	Check the exception message. This makes the tests a bit more strict by also checking the message returned by the what() function. This shows that some of the tests are out of sync with which errors they check for. I will hopefully fix this in another pass. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 12:31:53 -08:00
Rafael Ávila de Espíndola	f9d81bcd43	Move some tests to broken_sstable_test.cc sstable_test.cc was already a bit too big and there is potential for having a lot of tests about broken sstables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 12:16:30 -08:00
Rafael Ávila de Espíndola	cf4dc38259	Simplify state machine loop. These loops have the structure : while (true) { switch (state) { case state1: ... break; case state2: if (...) { ... break; } else {... continue; } ... } break; } There a couple things I find a bit odd on that structure: * The break refers to the switch, the continue to the loop. * A while (true) loop always hits a break or a continue. This patch uses early returns to simplify the logic to while (true) { switch (state) { case state1: ... return case state2: if (...) { ... return; } ... } } Now there are no breaks or continues. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181126171726.84629-1-espindola@scylladb.com>	2018-12-03 20:34:03 +01:00
Avi Kivity	b098b5b987	Merge "Optimize checksum_combine() for CRC32" from Tomek " zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977). Refs #3874 " * tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla: tests: perf_checksum: Test fast_crc32_combine() tests: Rename libdeflate_test to checksum_utils_test tests: libdeflate: Add more tests for checksum_combine() tests: libdeflate: Check both libdeflate and default checksummers sstables: Use fast_crc_combine() in the default checksummer utils/gz: Add fast implementation of crc32_combine() utils/gz: Add pre-computed polynomials utils/gz: Import Barett reduction implementation from libdeflate utils: Extract clmul() from crc.hh	2018-12-03 19:02:01 +02:00
Tomasz Grabiec	aa19f98d18	sstables: Write Statistics.db offset map entries in the same order as Cassandra Before this patch we were writing offset map enteies in unspecified order, the one returned by std::unorderd_map. Cassandra writes them sorted by metadata_type. Use the same order for improved compatibility. Fixes #3955. Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com>	2018-12-03 16:40:24 +02:00
Avi Kivity	4dc402b53f	Merge "Create sstable in a sub-directory" from Benny " Due to an XFS heuristic, if all files are in one (or a few) directories, then block allocation can become very slow. This is because XFS divides the disk into a few allocation groups (AGs), and each directory allocates preferentially from a single AG. That AG can become filled long before the disk is full. This patchset works around the problem by: - creating sstable component files in their own temporary, per-sstable sub-directory, - moving the files back into the canonical location right after begin created, and finally - removing the temp sub-directory when the sstable is sealed. - In addition, any temporary sub-directories that might have been left over if scylla crashed while creating sstables are looked up and removed when populating the table. Fixes: #3167 Tests: unit (release) " * 'issues/3167/v7' of https://github.com/bhalevy/scylla: distributed_loader::populate_column_family: lookup and remove temp sstable directories database: directly use std::experimental::filesystem::path for lister::path database: use std::experimental::filesystem::path for lister::path sstable: use std::experimental::filesystem rather than boost sstable::seal_sstable: fixup indentation sstable: create sstable component files in a subdirectory sstable::new_sstable_component_file: pass component_type rather than filename sstable: cleanup filename related functions sstable: make write_crc, write_digest, and new_sstable_component_file private methods	2018-12-03 16:26:12 +02:00
Tomasz Grabiec	feefb23232	tests: perf_checksum: Test fast_crc32_combine()	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	dda0f9b6eb	tests: Rename libdeflate_test to checksum_utils_test	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	7febdb5a5c	tests: libdeflate: Add more tests for checksum_combine()	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	b22ed75416	tests: libdeflate: Check both libdeflate and default checksummers	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	1eb03b6ff1	sstables: Use fast_crc_combine() in the default checksummer	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	1fb792c547	utils/gz: Add fast implementation of crc32_combine() zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977).	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	cd3d9d357b	utils/gz: Add pre-computed polynomials gen_crc_combine_table.cc will be run during build to produce tables with precomputed polynomials (4 x 256 x u32). The definitions will reside in: build/<mode>/gen/utils/gz/crc_combine_table.cc It takes 20ms to generate on my machine. The purpose of those polynomials will be explained in crc_combine.cc	2018-12-03 14:36:09 +01:00
Tomasz Grabiec	63e0da9e58	utils/gz: Import Barett reduction implementation from libdeflate	2018-12-03 14:36:09 +01:00
Tomasz Grabiec	bb7d95d6c3	utils: Extract clmul() from crc.hh	2018-12-03 14:36:08 +01:00
Botond Dénes	0cb7c43fb5	reader_concurrency_semaphore: add dedicated .cc file As we are about to extend the functionality of the reader concurrency semaphore, adding more method implementations that need to go to a .cc file, it's time we create a dedicated file, instead of keep shoving them into unrelated .cc files (mutation_reader.cc).	2018-12-03 13:37:02 +02:00
Avi Kivity	d6a22c50cb	Update libdeflate submodule * libdeflate 17ec6c9...e7e54ea (1): > build: improve out-of-tree build with multiple output trees	2018-12-03 11:18:02 +02:00
Botond Dénes	34c2d67614	reader_concurrency_semaphore: rearrange members Use standard convention of the rest of the code base. Type definitions first, then data members and finally member functions. As we are about to add more members, its especially important to make the growing class have a familiar member arrangement.	2018-12-03 08:26:10 +02:00
Benny Halevy	9e7125a9de	distributed_loader::populate_column_family: lookup and remove temp sstable directories These may be left over in case we crash while writing sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	857ff4f59a	database: directly use std::experimental::filesystem::path for lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	585ac6e641	database: use std::experimental::filesystem::path for lister::path We would like to get rid of boost::filesystem and gradually replace it with std::experimental::filesystem. TODO: using namespace fs = std::experimental::filesystem, use fs::path directly, rather than lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	0b74927757	sstable: use std::experimental::filesystem rather than boost Note: Requires linking with -lstdc++fs Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	61d116a1f1	sstable::seal_sstable: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	90118fa9ef	sstable: create sstable component files in a subdirectory When writing the sstable, create a temporary directory for creating all components so that each sstable files' will be assigned a different allocaton groups on xfs. Files are immediately renamed to their default location after creation. Temp directory is removed when the sstable is sealed. Additional work to be introduced in the following patches: - When populating tables, temp directories need to be looked up and removed. Fixes #3167 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	23d8afb20d	sstable::new_sstable_component_file: pass component_type rather than filename So we can create the file in the sstable directory and then move into the final location Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	7b170eb0dc	sstable: cleanup filename related functions - use const sstring& params rather than sstring - returning const sstring is superfleous Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	ad5f1e4fbb	sstable: make write_crc, write_digest, and new_sstable_component_file private methods Prepare for per-sstable sub directory. Also, these functions get most of their parameters from the sst at hand so they might as well be first class members. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Avi Kivity	2a0a36d48b	tools: update toolchain to fedora-29-20181202 Added: git, sudo, python Message-Id: <20181202185608.14141-1-avi@scylladb.com>	2018-12-02 19:00:55 +00:00
Benny Halevy	d257e5c123	sstable: remove unused get_sstable_key_range Since `024c8ef8a1` db: adjust sstable load to use sstable self-reporting of shard ownership Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181202114523.14296-1-bhalevy@scylladb.com>	2018-12-02 18:32:34 +02:00
Avi Kivity	224c4c0b81	tools: add frozen toolchain support Add a reference to a docker image that contains an "official" toolchain for building Scylla. In addition, add a script that allows easy usage of the image, and some documentation. Message-Id: <20181202120829.21218-1-avi@scylladb.com>	2018-12-02 18:32:34 +02:00
Takuya ASADA	0fdf807f51	install-dependencies.sh: add missing packages to run build in Fedora container git, python, sudo packages are installed by default on normal Fedora installation but not in Docker image, we need to install it by this script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181201020834.24961-1-syuu@scylladb.com>	2018-12-02 12:51:29 +02:00
Avi Kivity	009cbd3dcb	Merge "Fix multiple summary regeneration bugs." from Vladimir " This patchset addresses two recently discovered bugs both triggered by summary regeneration: Tests: unit {release} + Validated with debug build of Scylla (ASAN) that no use-after-free occurs when re-generating Summary.db. " * 'projects/sstables-30/summary-regeneration/v1' of https://github.com/argenet/scylla: tests: Add test reading SSTables in 'mc' format with missing summary. sstables: When loading, read statistics before summary. database: Capture io_priority_class by reference to avoid dangling ref.	2018-12-02 11:56:18 +02:00
Vladimir Krivopalov	d24875b736	tests: Add test reading SSTables in 'mc' format with missing summary. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 11:56:56 -08:00
Vladimir Krivopalov	b0e5404071	sstables: When loading, read statistics before summary. In case if summary is missing and we attempt to re-generate it, statistics must be already read to provide us with values stored in serialization header to facilitate clustering prefixes deserialization. Fixes #3947 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 11:56:56 -08:00
Vladimir Krivopalov	68458148e7	database: Capture io_priority_class by reference to avoid dangling ref. The original reference points to a thread-local storage object that guaranteed to outlive the continuation, but copying it make the subsequent calls point to a local object and introduces a use-after-free bug. Fixes #3948 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 10:43:36 -08:00
Piotr Jastrzebski	329303cae7	sstables: Add test_sstable_reader_on_unknown_column This test checks that sstable reader throws an exception when sstable contains a column that's not present in the schema. It also checks that dropped columns do not cause exceptions. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-30 10:29:47 +01:00
Piotr Jastrzebski	5cc3f904ce	sstables: Exception on sstable's column not present in schema Previously such column was ignored but it's better to be explicit about this situation. Refs #2598 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-30 08:59:13 +01:00
Piotr Jastrzebski	c0ce94c6f9	sstables: store column name in column_translation::column_info This will be used for better diagnostics. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-30 08:59:00 +01:00
Duarte Nunes	1afda28cf3	Merge 'Fix filtering with LIMIT' from Piotr " This series adds proper handling of filtering queries with LIMIT. Previously the limit was erroneously applied before filtering, which leads to truncated results. To avoid that, paged filtering queries now use an enhanced pager, which remembers how many rows dropped and uses that information to fetch for more pages if the limit is not yet reached. For unpaged filtering queries, paging is done internally as in case of aggregations to avoid returning keeping huge results in memory. Also, previously, all limited queries used the page size counted from max(page size, limit). It's not good for filtering, because with LIMIT 1 we would then query for rows one-by-one. To avoid that, filtered queries ask for the whole page and the results are truncated if need be afterwards. Tests: unit (release) " * 'fix_filtering_with_limit_2' of https://github.com/psarna/scylla: tests: add filtering with LIMIT test tests: split filtering tests from cql_query_test cql3: add proper handling of filtering with LIMIT service/pager: use dropped_rows to adjust how many rows to read service/pager: virtualize max_rows_to_fetch function cql3: add counting dropped rows in filtering pager	2018-11-29 23:07:40 +00:00
Piotr Jastrzebski	654eeb30ac	sstables: Make test_dropped_column_handling test dropped columns Before it was testing missing columns. It's better to test dropped columns because they should be ignored while for missing columns some sources will throw. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-29 16:16:44 +01:00
Avi Kivity	2dba809844	Merge "scylla_io_setup: support multiple devices" from Benny " This patchset adds support to scylla_io_setup for multiple data directories as well as commitlog, hints, and saved_caches directories. Refs #2415 Tests: manual testing with scylla-ccm generated scylla.yaml " * 'projects/multidev/v3' of https://github.com/bhalevy/scylla: scylla_io_setup: assume default directories under /var/lib/scylla scylla_io_setup: add support for commitlog, hints, and saved_caches directory scylla_io_setup: support multiple data directories	2018-11-29 16:44:33 +02:00
Piotr Sarna	7adbdaba0b	tests: add filtering with LIMIT test Refs #3902	2018-11-29 14:53:30 +01:00
Piotr Sarna	5f97c78875	tests: split filtering tests from cql_query_test In order to avoid blowing cql_query_test even more out of proportions, all filtering tests are moved to a separate file.	2018-11-29 14:53:30 +01:00
Piotr Sarna	acf4eadf88	cql3: add proper handling of filtering with LIMIT Previously, limit was erroneously applied before filtering, which might have resulted in truncated results. Now, both paged and unpaged queries are filtered first, and only after that properly trimmed so only X rows are returned for LIMIT X. Fixes #3902	2018-11-29 14:53:30 +01:00
Piotr Sarna	5b052bdae5	service/pager: use dropped_rows to adjust how many rows to read Filtering pager may drop some rows and as a result return less than what was fetched from the replica. To properly adjust how many rows were actually read, dropped_rows variable is introduced.	2018-11-29 14:53:29 +01:00
Piotr Sarna	021caeddf7	service/pager: virtualize max_rows_to_fetch function Regular pagers use max_rows to figure out how many rows to fetch, but filtering pager potentially needs the whole page to be fetched in order to filter the results.	2018-11-29 14:14:37 +01:00
Benny Halevy	5ec191536e	scylla_io_setup: assume default directories under /var/lib/scylla If a specific directory is not configure in scylla.yaml, scylla assumes a default location under /var/lib/scylla. Hard code these locations in scylla_io_setup until we have a better way to probe scylla about it. Be permissive and ignore the default directories if they don't not exist on disk and silently ignore them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-11-29 15:07:29 +02:00
Piotr Sarna	4f5ee3dfcd	cql3: add counting dropped rows in filtering pager Counter for dropped rows is added to the filtering pager. This metrics can be used later to implement applying LIMIT to filtering queries properly. Dropped rows are returned on visitor::accept_partition_end.	2018-11-29 14:06:59 +01:00
Benny Halevy	88b85b363a	scylla_io_setup: add support for commitlog, hints, and saved_caches directory Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-11-29 10:09:17 +02:00
Benny Halevy	e4382caa4a	scylla_io_setup: support multiple data directories Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-11-29 10:09:17 +02:00
Alexys Jacob	00476c3946	scylla-housekeeping: fix python3 compat and shebang	2018-11-29 00:04:02 +01:00
Alexys Jacob	1cf41760a8	dist/ami/files/scylla_install_ami: python3 shebang	2018-11-29 00:00:41 +01:00
Alexys Jacob	a6447f543c	dist/docker/redhat/docker-entrypoint.py: add encoding comment	2018-11-29 00:00:19 +01:00
Alexys Jacob	9f041158df	fix_system_distributed_tables.py: fix python3 compat and shebang	2018-11-28 23:59:51 +01:00
Alexys Jacob	887322daa2	gen_segmented_compress_params.py: add encoding comment	2018-11-28 23:59:18 +01:00
Alexys Jacob	14e65e1089	idl-compiler.py: python3 shebang	2018-11-28 23:58:38 +01:00
Alexys Jacob	170120a391	scylla-gdb.py: python3 shebang	2018-11-28 23:58:14 +01:00
Alexys Jacob	3902922113	configure.py: python3 shebang	2018-11-28 23:57:54 +01:00
Alexys Jacob	d2dbbba139	tools/scyllatop/: add / normalize python3 shebang	2018-11-28 23:57:03 +01:00
Alexys Jacob	e321b839c7	scripts/: add / normalize python3 shebang	2018-11-28 23:56:35 +01:00
Alexys Jacob	02656fb00e	dist/common/scripts: add / normalize python3 shebang	2018-11-28 23:55:26 +01:00
Alexys Jacob	954da947f8	test.py: add encoding comment	2018-11-28 23:54:41 +01:00
Alexys Jacob	cbd72786dd	setup.py: add python3 classifiers	2018-11-28 23:54:03 +01:00
Dan Yasny	019a2e3a27	scylla_setup: Mark required args Fixes #3945 Message-Id: <20181128220549.3083-1-dyasny@gmail.com>	2018-11-28 22:30:02 +00:00
Avi Kivity	de17150cb2	Update seastar submodule * seastar 1fbb633...132e6cd (2): > scripts: json2code: port to Python 3 > docker/dev/Dockerfile: add c-ares-devel to docker setup	2018-11-28 19:05:21 +02:00
Duarte Nunes	a589dade07	Merge 'Fix checking for multi-column restrictions in filtering' from Piotr " This series fixes #3891 by amending the way restrictions are checked for filtering. Previous implementation that returned false from need_filtering() when multi-column restrictions were present was incorrect. Now, the error is going to be returned from restrictions filter layer, and once multi-column support is implemented for filtering, it will require no further changes. Tests: unit (release) " * 'fix_multi_column_filtering_check_3' of https://github.com/psarna/scylla: tests: add multi-column filtering check cql3: remove incorrect multi-column check cql3: check filtering restrictions only if applicable cql3: add pk/ck_restrictions_need_filtering()	2018-11-28 15:36:37 +00:00
Piotr Sarna	ae0ffa6575	tests: add multi-column filtering check Multi-column restrictions filtering is not supported yet, so a simple case to ensure that is added.	2018-11-28 13:58:16 +01:00
Piotr Sarna	0013929782	cql3: remove incorrect multi-column check need_filtering() incorrectly returned false if multi-column restrictions were present. Instead, these restrictions should be allowed to need filtering. Fixes #3891	2018-11-28 13:58:16 +01:00
Piotr Sarna	65f21cc518	cql3: check filtering restrictions only if applicable Primary key restrictions should be checked only when they need filtering - otherwise it's superfluous, since they were already applied on query level.	2018-11-28 13:58:16 +01:00
Piotr Sarna	f59ddcab52	cql3: add pk/ck_restrictions_need_filtering() These functions return true if partition/clustering key restriction parts of statement restrictions require filtering.	2018-11-28 13:58:16 +01:00
Duarte Nunes	d09d4bbd91	Merge 'Fix checking if system tables need view updates' from Piotr " This miniseries ensures that system tables are not checked for having view updates, because they never do. What's more, distributed system table is used in the process, so it's unsafe to query the table while streaming it. Tests: unit (release), dtest(update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test) " * 'fix_checking_if_system_tables_need_view_updates_3' of https://github.com/psarna/scylla: streaming: don't check view building of system tables database: add is_internal_keyspace streaming: remove unused sstable_is_staging bool class	2018-11-28 10:00:34 +00:00
Piotr Sarna	8e6021dfa1	streaming: don't check view building of system tables System tables will never need view building, and, what's more, are actually used in the process of view build checking. So, checking whether system tables need a view update path is simplified to returning 'false'.	2018-11-28 09:21:56 +01:00
Piotr Sarna	1336b9ee31	database: add is_internal_keyspace Similarly to is_system_keyspace, it will allow checking if a keyspace is created for internal use.	2018-11-28 09:21:56 +01:00
Piotr Sarna	6ad2c39f88	streaming: remove unused sstable_is_staging bool class sstable_is_staging bool class is not used anywhere in the code anymore, so it's removed.	2018-11-28 09:21:56 +01:00
Duarte Nunes	9f639edaa2	Merge 'storage_proxy: fix some bugs in early (due to errors) request completion' from Gleb " The series fixed #3565 and #3566 " * 'gleb/write_failure_fixes' of github.com:scylladb/seastar-dev: storage_proxy: store hint for CL=ANY if all nodes replied with failure storage_proxy: complete write request early if all replicas replied with success of failure storage_proxy: check that write failure response comes from recognized replica storage_proxy: move code executed on write timeout into separate function	2018-11-27 21:44:01 +00:00
Takuya ASADA	52f030806f	install-dependencies.sh: fix dependency issues on Debian variants Sync Debian variants dependencies with dist/debian/control.mustache (before merging relocatable), use scylla 3rdparty packages. Since we use 3rdparty repo on seastar/install-dependencies.sh, drop repo setup part from this script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181031122800.11802-1-syuu@scylladb.com>	2018-11-27 21:44:01 +00:00
Gleb Natapov	17197fb005	storage_proxy: store hint for CL=ANY if all nodes replied with failure Current code assumes that request failed if all replicas replied with failure, but this is not true for CL=ANY requests. Take it into account. Fixed: #3565	2018-11-27 15:06:37 +02:00
Gleb Natapov	d1d04eae3c	storage_proxy: complete write request early if all replicas replied with success of failure Currently if write request reaches CL and all replicas replied, but some replied with failures, the request will wait for timeout to be retired. Detect this case and retire request immediately instead. Fixes #3566	2018-11-27 14:49:37 +02:00
Gleb Natapov	76ab3d716b	storage_proxy: check that write failure response comes from recognized replica Before accounting failure response we need to make sure it comes from a replica that participates in the request.	2018-11-27 14:44:49 +02:00
Rafael Ávila de Espíndola	777ea893e6	Delete data_consume_rows_at_once. As far as I can tell the old sstable reading code required reading the data into a contiguous buffer. The function data_consume_rows_at_once implemented the old behavior and incrementally code was moved away from it. Right now the only use is in two tests. The sstables used in those tests are already used in other tests with data_consume_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181127024319.18732-2-espindola@scylladb.com>	2018-11-27 14:11:50 +02:00
Avi Kivity	1ff6b8fb96	Merge "Don't binary compare compressed sstables in test_write_many_partitions_* tests" from Piotr " Compression is not deterministic so instead of binary comparing the sstable files we just read data back and make sure everything that was written down is still present. Tests: unit(release) " * 'haaawk/binary-compare-of-compressed-sstables/v3' of github.com:scylladb/seastar-dev: sstables: Remove compressed parameter from get_write_test_path sstables: Remove unused sstable test files sstables: Ensure compare_sstables isn't used for compressed files sstables: Don't binary compare compressed sstables sstables: Remove debug printout from test_write_many_partitions	2018-11-27 14:01:20 +02:00
Duarte Nunes	098dd90bd2	Merge 'Reduce dependencies around consistency_level.hh' from Avi " consistency_level.hh is rather heavyweighy in both its contents and what it includes. Reduce the number of inclusion sites and split the file to reduce dependencies. " * tag 'cl-header/v2' of https://github.com/avikivity/scylla: consistency_level: simplify validation API Split consistency_level.hh header database: remove unneeded consistency_level.hh include cql: remove unneeded includes of consistency_level.hh	2018-11-27 11:59:34 +00:00
Piotr Jastrzebski	4366302c4c	sstables: Extract mp_row_cosumer_m::check_schema_mismatch This method will contain common logic used in multiple places and reduce code duplication. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <bbda2f4ea4f9325055f096dc549f63b1bb03d3b6.1543311990.git.piotr@scylladb.com>	2018-11-27 12:45:12 +01:00
Avi Kivity	4676e07400	consistency_level: simplify validation API Remove unused parameters, replace refcounted pointers by references.	2018-11-27 13:41:49 +02:00
Avi Kivity	2c08bff8d5	Split consistency_level.hh header It has two unrelated users: cql for validation, and storage_proxy for complicated calculations. Split the simple stuff into a new header to reduce dependencies.	2018-11-27 13:32:10 +02:00
Avi Kivity	b015f41344	database: remove unneeded consistency_level.hh include	2018-11-27 13:30:56 +02:00
Gleb Natapov	7bc68aa0eb	storage_proxy: move code executed on write timeout into separate function Currently the callback is in lambda, but we will want to call the code not only during timer expiration.	2018-11-27 13:23:30 +02:00
Avi Kivity	9201d22c06	cql: remove unneeded includes of consistency_level.hh Move the includes to .cc to reduce include pollution.	2018-11-27 13:18:33 +02:00
Raphael S. Carvalho	626afa6973	database: conditionally release sstable references from compaction manager Not all compaction operations submitted through compaction manager sets a callback for releasing references of exhausted sstables in compaction manager itself. That callback lives in compaction descriptor which is passed to table::compaction(). Let's make the call conditional to avoid bad function call exceptions. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181126235616.10452-1-raphaelsc@scylladb.com>	2018-11-27 12:10:43 +01:00
Avi Kivity	2eaeb3e4eb	Update swagger-ui submodule Updates to version 2.2.10 with a local change (from Amnon) to support our location. Fixes #3942.	2018-11-27 13:01:02 +02:00
Tomasz Grabiec	17a8a9d13d	gdb: Properly parse unique_ptr in 'scylla lsa' There's no _M_t._M_head_impl any more in the standard library. We now have std_unique_ptr wrapper which abstracts this fact away so use that. Message-Id: <20181126174837.11542-1-tgrabiec@scylladb.com>	2018-11-27 12:32:41 +02:00
Tomasz Grabiec	eecda72175	gdb: Adjust 'scylla lsa' for removal of emergency reserve There's no _emergency_reserve any more. Show _free_segments instead. Message-Id: <20181126174837.11542-2-tgrabiec@scylladb.com>	2018-11-27 12:32:37 +02:00
Avi Kivity	5e759b0c07	Merge "Optimize checksum computation for the MC sstable format" from Tomek " One part of the improvement comes from replacing zlib's CRC32 with the one from libdeflate, which is optimized for modern architecture and utilizes the PCLMUL instruction. perf_checksum test was introduced to measure performance of various checksumming operations. Results for 514 B (relevant for writing with compression enabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 58414 16.711us 3.483ns 16.708us 16.725us crc_test.perf_adler_combine 165788278 6.059ns 0.031ns 6.027ns 7.519ns crc_test.perf_zlib_crc32_combine 59546 16.767us 26.191ns 16.741us 16.801us --- crc_test.perf_deflate_crc32_checksum 12705072 83.267ns 4.580ns 78.687ns 98.964ns crc_test.perf_adler_checksum 3918014 206.701ns 23.469ns 183.231ns 258.859ns crc_test.perf_zlib_crc32_checksum 2329682 428.787ns 0.085ns 428.702ns 510.085ns Results for 64 KB (relevant for writing with compression disabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 25364 38.393us 17.683ns 38.375us 38.545us crc_test.perf_adler_combine 169797143 5.842ns 0.009ns 5.833ns 6.901ns crc_test.perf_zlib_crc32_combine 26067 38.663us 95.094ns 38.546us 40.523us --- crc_test.perf_deflate_crc32_checksum 202821 4.937us 14.426ns 4.912us 5.093us crc_test.perf_adler_checksum 44684 22.733us 206.263ns 22.492us 25.258us crc_test.perf_zlib_crc32_checksum 18839 53.049us 36.117ns 53.013us 53.274us The new CRC32 implementation (deflate_crc32) doesn't provide a fast checksum_combine() yet, it delegates to zlib so it's as slow as the latter. Because for CRC32 checksum_combine() is several orders of magnitude slower than checksum(), we avoid calling checksum_combine() completely for this checksummer. We still do it for adler32, which has combine() which is faster than checksum(). SStable write performance was evaluated by running: perf_fast_forward --populate --data-directory /tmp/perf-mc \ --rows=10000000 -c1 -m4G --datasets small-part Below is a summary of the average frag/s for a memtable flush. Each result is an average of about 20 flushes with stddev of about 4k. Before: [1] MC,lz4: 330'903 [2] LA,lz4: 450'157 [3] MC,checksum: 419'716 [4] LA,checksum: 459'559 After: [1'] MC,lz4: 446'917 ([1] + 35%) [2'] LA,lz4: 456'046 ([2] + 1.3%) [3'] MC,checksum: 462'894 ([3] + 10%) [4'] LA,checksum: 467'508 ([4] + 1.7%) After this series, the performance of the MC format writer is similar to that of the LA format before the series. There seems to be a small but consistent improvement for LA too. I'm not sure why. " * tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla: tests: perf: Introduce perf_checksum tests: Add test for libdeflate CRC32 implementation sstables: compress: Use libdeflate for crc32 sstables: compress: Rename crc32_utils to zlib_crc32_checksummer licenses: Add libdeflate license Integrate libdeflate with the build system Add libdeflate submodule sstables: Avoid checksum_combine() for the crc32 checksummer sstables: compress: Avoid unnecessary checksum_combine() sstables: checksum_utils: Add missing include	2018-11-26 20:10:46 +02:00
Tomasz Grabiec	f1a35b654a	tests: perf: Introduce perf_checksum	2018-11-26 18:59:43 +01:00
Tomasz Grabiec	5b6e3fb5ed	tests: Add test for libdeflate CRC32 implementation	2018-11-26 18:59:42 +01:00
Tomasz Grabiec	bf0164cdaf	sstables: compress: Use libdeflate for crc32 Improves memtable flush performance by 10% in a CPU-bound case. Unlike the zlib implementation, libdeflate is optimized for modern CPUs. It utilizes the PCLMUL instruction.	2018-11-26 18:59:42 +01:00
Tomasz Grabiec	0ac1905f4f	sstables: compress: Rename crc32_utils to zlib_crc32_checksummer	2018-11-26 18:59:42 +01:00
Tomasz Grabiec	ba141a4852	licenses: Add libdeflate license	2018-11-26 18:59:41 +01:00
Tomasz Grabiec	048d569b45	Integrate libdeflate with the build system	2018-11-26 18:59:09 +01:00
Tomasz Grabiec	f704f7bc19	Add libdeflate submodule	2018-11-26 18:57:51 +01:00
Tomasz Grabiec	743cf43847	sstables: Avoid checksum_combine() for the crc32 checksummer checksum_combine() is much slower than re-feeding the buffer to checksum() for the zlib CRC32 checksummer. Introduce Checksum::prefer_combine() to determine this and select more optimal behavior for given checksummer. Improves performance of memtable flush with compression enabled by 30%.	2018-11-26 18:57:33 +01:00
Avi Kivity	b351a9fee7	db/repair_decision.hh: add missing #include Message-Id: <20181126154948.2453-1-avi@scylladb.com>	2018-11-26 18:49:08 +01:00
Tomasz Grabiec	88cf1c61ba	sstables: compress: Avoid unnecessary checksum_combine()	2018-11-26 14:31:38 +01:00
Tomasz Grabiec	8372cf7bcc	sstables: checksum_utils: Add missing include	2018-11-26 14:31:38 +01:00
Avi Kivity	c6d700279b	class_registry: introduce a non-static variant of class_registry class_registry's staticness brings has the usual problem of static classes (loss of dependency information) and prevents us from librarifying Scylla since all objects that define a registration must be linked in. Take a first step against this staticness by defining a nonstatic variant. The static class_registry is then redefined in terms of the nonstatic class. After all uses have been converted, the static variant can be retired. Message-Id: <20181126130935.12837-1-avi@scylladb.com>	2018-11-26 13:30:21 +00:00
Paweł Dziepak	62ea153629	Merge "Check for schema mismatch after dropping dead cells" from Piotr " Previously we were checking for schema incompatibility between current schema and sstable serialization header before reading any data. This isn't the best approach because data in sstable may be already irrelevant due to column drop for example. This patchset moves the check after actual data is read and verified that it has a timestamp new enough to classify it as nonobsolete. Fixes #3924 " * 'haaawk/3924/v3' of github.com:scylladb/seastar-dev: sstables: Enable test_schema_change for MC format sstables3: Throw error on schema mismatch only for live cells sstables: Pass column_info to consume_*_column sstables: Add schema_mismatch to column_info sstables: Store column data type in column_info sstables: Remove code duplication in column_translation	2018-11-26 13:10:18 +00:00
Avi Kivity	9a46ee69d4	doc: fix BYPASS CACHE documentation BYPASS CACHE was mistakenly documenting an earlier version of the patch. Correct it to document th committed version. Message-Id: <20181126125810.9344-1-avi@scylladb.com>	2018-11-26 13:04:52 +00:00
Piotr Jastrzebski	dec48dd1e2	sstables: Remove compressed parameter from get_write_test_path This parameter is no longer used. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:46:23 +01:00
Piotr Jastrzebski	92ffccd636	sstables: Remove unused sstable test files Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:35:15 +01:00
Piotr Jastrzebski	a29c9189cb	sstables: Ensure compare_sstables isn't used for compressed files Binary comparing compressed sstables is wrong because compression is not deterministic. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:35:15 +01:00
Piotr Jastrzebski	7e263208f0	sstables: Don't binary compare compressed sstables This family of test_write_many_partitions_* tests writes sstables down from memtable using different compressions. Then it compares the resulting file with a blueprint file and reads the data back to check everything is there. Compression is not deterministic so this patch makes the tests not compare resulting compressed sstable file with blueprint file and instead only read data back. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:35:03 +01:00
Piotr Jastrzebski	5c86294a56	sstables: Enable test_schema_change for MC format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:25:23 +01:00
Piotr Jastrzebski	4bdb86c712	sstables3: Throw error on schema mismatch only for live cells Previously we were throwing exception during the creation of column_translation. This wasn't always correct because sometimes column for which the mismatch appeared was already dropped and data present in sstable should be ignored anyway. Fixes #3924 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:25:10 +01:00
Piotr Sarna	6ab8235369	main: fix deinitialization order for view update generator View update generator should be stopped only after drain_on_shutdown() is performed on storage service. Message-Id: <4d2bda4c73422a2ebf46d6dcd06c95d960839889.1543230849.git.sarna@scylladb.com>	2018-11-26 11:21:37 +00:00
Duarte Nunes	2a371c2689	Merge 'Allow bypassing cache on a per-query basis' from Avi " Some queries are very unlikely to hit cache. Usually this includes range queries on large tables, but other patterns are possible. While the database should adapt to the query pattern, sometimes the user has information the database does not have. By passing this information along, the user helps the database manage its resources more optimally. To do this, this patch introduces a BYPASS CACHE clause to the SELECT statement. A query thus marked will not attempt to read from the cache, and instead will read from sstables and memtables only. This reduces CPU time spent to query and populate the cache, and will prevent the cache from being flooded with data that is not likely to be read again soon. The existing cache disabled path is engaged when the option is selected. Tests: unit (release), manual metrics verification with ccm with and without the BYPASS CACHE clause. Ref #3770. " * tag 'cache-bypass/v2' of https://github.com/avikivity/scylla: doc: document SELECT ... BYPASS CACHE tests: add test for SELECT ... BYPASS CACHE cql: add SELECT ... BYPASS CACHE clause db: add query option to bypass cache	2018-11-26 09:59:40 +00:00
Paweł Dziepak	13385778fd	Merge "Measure performance of dataset population in perf_fast_forward" from Tomasz * tag 'perf-ffwd-dataset-population-v2' of github.com:tgrabiec/scylla: tests: perf_fast_forward: Measure performance of dataset population tests: perf_fast_forward: Record the dataset on which test case was run tests: perf_fast_forward: Introduce the concept of a dataset tests: perf_fast_forward: Introduce make_compaction_disabling_guard() tests: perf_fast_forward: Initialize output manager before population tests: perf_fast_forward: Handle empty test parameter set tests: perf_fast_forward: Extract json_output_writer::write_common_test_group() tests: perf_fast_forward: Factor out access to cfg to a single place per function tests: perf_fast_forward: Extract result_collector tests: perf_fast_forward: Take writes into account in AIO statistics tests: perf_fast_forward: Reorder members tests: perf_fast_forward: Add --sstable-format command line option	2018-11-26 09:45:55 +00:00
Avi Kivity	58033ad3a4	doc: document SELECT ... BYPASS CACHE Add a new cql-extensions.md file and document BYPASS CACHE there.	2018-11-26 11:37:52 +02:00
Avi Kivity	f69401c609	tests: add test for SELECT ... BYPASS CACHE The test verifies that cache read metrics are not incremented during a cache bypass read.	2018-11-26 11:37:52 +02:00
Avi Kivity	ecf3f92ec7	cql: add SELECT ... BYPASS CACHE clause The BYPASS CACHE clause instructs the database not to read from or populate the cache for this query. The new keywords (BYPASS and CACHE) are not reserved.	2018-11-26 11:37:49 +02:00
Takuya ASADA	7740cd2142	dist/common/systemd/scylla-housekeeping-restart.service.mustache: specify correct repo for Debian variants We do specify correct repo for both Red Hat/Debian variants on -deily, but mistakenly don't for -restart, so do same on -restart. Fixes #3906 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181109224509.27380-1-syuu@scylladb.com>	2018-11-26 11:02:25 +02:00
Rafael Ávila de Espíndola	6746907999	Use fully covered switches in continuous_data_consumer do_process_buffer had two unreachable default cases and a long if-else-if chain. This converts the the if-else-if chain to a switch and a helper function. This moves the error checking from run time to compile time. If we were to add a 128 bit integer for example, gcc would complain about it missing from the switch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181125221451.106067-1-espindola@scylladb.com>	2018-11-25 22:52:11 +00:00
Avi Kivity	b4765af790	Merge "Introduce SSTable-run-based compaction" from Raphael " This new compaction approach consists of releasing exhausted fragments[1] of a run[2] a compaction proceeds, so decreasing considerably the space requirement. These changes will immediately benefit leveled strategy because it already works with the run concept. [1] fragment is a sstable composing a run; exhausted means sstable was fully consumed by compaction procedure. [2] run is a set of non-overlapping sstables which roughly span the entire token range. Note: Last patch includes an example compaction strategy showing how to work with the interface. unit tests: all modes passing dtests: compaction ones passing " * 'sstable_run_based_compaction_v10' of github.com:raphaelsc/scylla: tests: add example compaction strategy for sstable run based approach sstables/compaction: propagate sstable replacement to all compaction of a CF sstables: store cf pointer in compaction_info tests/sstable_test: add test for compaction replacement of exhausted sstable sstables: add sstable's on closed handling tests/sstables: add test for sstable run based compaction sstables/compaction_manager: prevent partial run from being selected for compaction compaction: use same run identifier for sstables generated by same compaction sstables: introduce sstable run sstables/compaction_manager: release reference to exhausted sstable through callback sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor database: do not keep reference to sstable in selector when done selecting compaction: share sstable set with incremental reader selector sstables/compaction: release space earlier of exhausted input sstables sstables: make partitioned sstable set's incremental selector resilient to changes in the set database: do not store reference to sstable in incremental selector tests/sstables: add run identifier correctness test sstables: use a random uuid for sstables without run identifier sstables: add run identifier to scylla metadata	2018-11-25 17:20:24 +02:00
Avi Kivity	b835b93ee6	db: add query option to bypass cache With the option enabled, we bypass the cache unconditionally and only read from memtables+sstables. This is useful for analytics queries.	2018-11-25 16:26:08 +02:00
Piotr Jastrzebski	c2561a2796	sstables: Remove debug printout from test_write_many_partitions Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-25 13:29:10 +01:00
Raphael S. Carvalho	3fa70d6b5f	tests: add example compaction strategy for sstable run based approach Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 20:16:54 -02:00
Raphael S. Carvalho	2058001f94	sstables/compaction: propagate sstable replacement to all compaction of a CF This is needed for parallel compaction to work with sstable run based approach. That's because regular compaction clones a set containing all sstables of its column family. So compaction A can potentially hold a reference to a compacting sstable of compaction B, so preventing compacting B from releasing its exhausted sstable. So all replacements are propagated to all compactions of a given column family, and compactions in turn, including the one which initiated the propagation, will do the replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:30 -02:00
Raphael S. Carvalho	953fdcc867	sstables: store cf pointer in compaction_info motivation is that we need a more efficient way to find compactions that belong to a given column family in compaction list. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:28 -02:00
Raphael S. Carvalho	baf89f0df3	tests/sstable_test: add test for compaction replacement of exhausted sstable Make sure that compaction is capable of releasing exhausted sstable space early in the procedure. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:26 -02:00
Raphael S. Carvalho	824c20b76d	sstables: add sstable's on closed handling Motivation is that it will be useful for catching regression on compaction when releasing early exhausted sstables. That's because sstable's space is only released once it's closed. So this will allow us to write a test case and possibly use it for entities holding exhausted sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:25 -02:00
Raphael S. Carvalho	0085e8371d	tests/sstables: add test for sstable run based compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:23 -02:00
Raphael S. Carvalho	e88d1d54b9	sstables/compaction_manager: prevent partial run from being selected for compaction Filter out sstable belonging to a partial run being generated by an ongoing compaction. Otherwise, that could lead to wrong decisions by the compaction strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:22 -02:00
Raphael S. Carvalho	23884fe9f6	compaction: use same run identifier for sstables generated by same compaction SSTables composing the same run will share the same run identifier. Therefore, a new compaction strategy will be able to get all sstables belong to the same run from sstable_set, which now keeps track of existing runs. Same UUID is passed to writers of a given compaction. Otherwise, a new UUID is picked for every sstable created by compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:20 -02:00
Raphael S. Carvalho	4f68cb34a6	sstables: introduce sstable run sstable run is a structure that will hold all sstables that has the same run identifier. All sstables belonging to the same run will not overlap with one another. It can be used by compaction strategy to work on runs instead of individual sstables. sstable_set structure which holds all sstables for a given column family will be responsible for providing to its user an interface to work with runs instead of individual sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:18 -02:00
Raphael S. Carvalho	fc92fb955d	sstables/compaction_manager: release reference to exhausted sstable through callback That's important for the reference to sstable to not be kept throughout the compaction procedure, which would break the goal of releasing space during compaction. Manager passes a callback to compaction which calls it whenever there's sstable replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:16 -02:00
Raphael S. Carvalho	3f309ebba9	sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor Motivation is that we want to release space for exhausted sstable and that will only happen when all references to it are gone and that backlog tracker takes the early replacement into account. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:13 -02:00
Raphael S. Carvalho	3433de3dc0	database: do not keep reference to sstable in selector when done selecting When compacting, we'll create all readers at once and will not select again from incremental selector, meaning the selector will keep all respective sstables in current_sstables, preventing compaction from releasing space as it goes on. The change is about refreshing sstable set's selector such that it will not hold a reference to an exhausted sstable whatsoever. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:12 -02:00
Raphael S. Carvalho	f6df949c1a	compaction: share sstable set with incremental reader selector By doing that, we'll be able to release exhausted sstable from both simulteaneously. That's achieved by sharing set containing input sstables with the incremental reader selector and removing exhausted sstables from shared set when the time has come. Step towards reducing disk requirement for compaction by making it delete sstable which all data is in a sealed new sstable. For that to happen, all references must be gone. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:10 -02:00
Raphael S. Carvalho	e5a0b05c15	sstables/compaction: release space earlier of exhausted input sstables Currently, compaction only replace input sstables at end of compaction, meaning compaction must be finished for all the space of those sstables to be released. What we can do instead is to delete earlier some input sstable under some conditions: 1) SStable data should be committed to a new, sealed output sstable, meaning it's exhausted. 2) Exhausted sstable mustn't overlap with a non-exhausted sstable because a tombstone in the exhausted could have been purged and the shadowed data in non-exhausted could be ressurected if system crashes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:07 -02:00
Raphael S. Carvalho	ace070c8fc	sstables: make partitioned sstable set's incremental selector resilient to changes in the set The motivation is that compaction may remove a sstable from the set while the incremental selector is alive, and for that to work, we need to invalidate the iterators stored by the selector. We could have added a method to notify it, but there will be a case where the one keeping the set cannot forward the notification to the selector. So it's better for the selector to take care of itself. Change counter approach is used which allows the selector to know when to invalidate the iterators. After invalidation, selector will move the iterator back into its right place by looking for lower bound for current pos. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:05 -02:00
Raphael S. Carvalho	8d11b0bbb4	database: do not store reference to sstable in incremental selector Use sstable generation instead to keep track of read sstables. The motivation is that we'll not keep reference to sstables, so allowing their space on disk to be released as soon they get exhausted. Generation is used because it guarantees uniqueness of the sstable. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:04 -02:00
Raphael S. Carvalho	edc87014c1	tests/sstables: add run identifier correctness test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:02 -02:00
Raphael S. Carvalho	a66b1954cc	sstables: use a random uuid for sstables without run identifier Older sstables must have an identifier for them to be associated with their own run. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:01 -02:00
Raphael S. Carvalho	62025fa52c	sstables: add run identifier to scylla metadata It identifies a run which a particular sstable belongs to. Existing sstables will have a random uuid associated with it in memory. UUID is the correct choice because it allows sstables to be exported without having conflicts when using identifier generated by different nodes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:52:44 -02:00
Rafael Ávila de Espíndola	d18bbe9d45	Remove unreachable default cases. These switches are fully covered. We can be sure they will stay this way because of -Werror and gcc's -Wswitch warning. We can also be sure that we never have an invalid enum value since the state machine values are not read from disk. The patch also removes a superfluous ';'. Message-Id: <20181124020128.111083-1-espindola@scylladb.com>	2018-11-24 09:31:51 +00:00
Piotr Jastrzebski	569508158c	sstables: Pass column_info to consume_*_column This will allow checking for schema mismatches and better error messages. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-23 21:48:14 +01:00
Piotr Jastrzebski	9ca6877cbd	sstables: Add schema_mismatch to column_info This field is true when there's a mismatch between column type in serialization header and current schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-23 21:48:14 +01:00
Piotr Jastrzebski	51fa8e0c94	sstables: Store column data type in column_info This will be used to check schema mismatch and provide informative error message. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-23 21:48:14 +01:00
Piotr Jastrzebski	99dfb9cc96	sstables: Remove code duplication in column_translation Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-23 21:48:14 +01:00
Raphael S. Carvalho	d29482dce8	sstables: deprecate sstable metadata's ancestors The reason for that is that it's not available in sstable format mc, so we can no longer rely on it in common code for the currently supported formats. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>	2018-11-23 19:38:32 +01:00
Tomasz Grabiec	8e93046abc	tests: perf_fast_forward: Measure performance of dataset population	2018-11-23 19:22:50 +01:00
Tomasz Grabiec	2c95aa4d8d	tests: perf_fast_forward: Record the dataset on which test case was run Now any given test case can potentially run on many different datasets.	2018-11-23 19:22:12 +01:00
Tomasz Grabiec	470552b7ab	tests: perf_fast_forward: Introduce the concept of a dataset A dataset represents a table with data, populated in certain way, with certain characteristics of the schema and data. Before this change, datasets were implicitly defined, with population hard-coded inside the populate() function. This change gathers logic related to datasets into classes, in order to: - make it easier to define new datasets. - be able to measure performance of dataset population in a standardized way. - being able to express constraints on datasets imposed by different test cases. Test cases are matched with possible datasets based on the abstract interface they accept (e.g. clustered_ds, multipartition_ds), and which must be implemented by a compatible dataset. To facilitate this matching, test function is now wrapped into a dataset_acceptor object, with an automatically-generated can_run() virtual method, deduced by make_test_fn(). - be able to select tests to run based on the dataset name. Only tests which are compatible with that dataset will be run.	2018-11-23 19:22:09 +01:00
Tomasz Grabiec	2746f78a9f	tests: perf_fast_forward: Introduce make_compaction_disabling_guard()	2018-11-23 19:18:10 +01:00
Tomasz Grabiec	b00d360281	tests: perf_fast_forward: Initialize output manager before population	2018-11-23 19:18:10 +01:00
Tomasz Grabiec	25dc481030	tests: perf_fast_forward: Handle empty test parameter set	2018-11-23 19:18:10 +01:00
Tomasz Grabiec	38a1b7e87b	tests: perf_fast_forward: Extract json_output_writer::write_common_test_group()	2018-11-23 19:18:10 +01:00
Tomasz Grabiec	a507ca8159	tests: perf_fast_forward: Factor out access to cfg to a single place per function Preparatory change before making n_rows be determined through a dataset object.	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	3fc78a25bf	tests: perf_fast_forward: Extract result_collector Extracts the result collection and reporting logic out of run_test_case(). Will be needed in population tests, for which we don't want the looping logic.	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	f4a70283ee	tests: perf_fast_forward: Take writes into account in AIO statistics Relevant for population tests. So far all tests were read tests.	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	96f5bd2f46	tests: perf_fast_forward: Reorder members	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	3ac5e8887e	tests: perf_fast_forward: Add --sstable-format command line option	2018-11-23 19:18:09 +01:00
Tomasz Grabiec	564b328b2e	Merge 'Add tests for schema changes' from Paweł This series adds a generic test for schema changes that generates various schema and data before and after an ALTER TABLE operation. It is then used to check correctness of mutation::upgrade() and sstable readers and lead to the discovery of #3924 and #3925. Fixes #3925. * https://github.com/pdziepak/scylla.git schema-change-test/v3.1 schema_builder: make member function names less confusing converting_mutation_partition_applier: fix collection type changes converting_mutation_partition_applier: do not emit empty collections sstable: use format() instead of sprint() tests/random-utils: make functions and variables inline tests: add models for schemas and data tests: generate schema changes tests/mutation: add test for schema changes tests/sstable: add test for schema changes	2018-11-23 15:11:31 +01:00
Paweł Dziepak	09439cd809	tests/sstable: add test for schema changes for_each_schema_change() is used for testing reading an sstable that was written with a different schema. Because of #3924, for now the mc format is not verified this way.	2018-11-23 12:14:06 +00:00
Paweł Dziepak	dc7f9fea5b	tests/mutation: add test for schema changes	2018-11-23 12:14:06 +00:00
Paweł Dziepak	35f9f424e9	tests: generate schema changes This patch adds for_each_schema_change() functions which generates schemas and data before and after some modification to the schema (e.g. adding a column, changing its type). It can be used to test schema upgrades.	2018-11-23 12:14:06 +00:00
Paweł Dziepak	daee4bd3b8	tests: add models for schemas and data This patch introduces a model of Scylla schemas and data, implemented using simple standard library primitives. It can be used for testing the actuall schemas, mutation_partitions, etc. used by the schema by comparing the results of various actions. The initial use case for this model was to test schema changes, but there is no reason why in the future it cannot be extended to test other things as well.	2018-11-23 12:14:06 +00:00
Takuya ASADA	cf0d00b81a	dist/ami: fix 'unknown configuration key: "enhanced_networking"' error while building AMI packer 1.3.2 no longer supported enhanced_networking directive, we need to use new directives("sriov_support" and "ena_support") to build with new version. packer provides automatic configuration file fixing tool, so new scylla.json is generated by following command: ./packer/packer fix scylla.json Fixes #3938 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181123053719.32451-1-syuu@scylladb.com>	2018-11-23 08:15:47 +02:00
Paweł Dziepak	91793c0a43	bytes_ostream: drop appending_hash specialisation appending_hash is used for computing hashes that become part of the binary interface. They cannot change between Scylla version and the same data needs to always result in the same hash. At the moment, appending_hash<bytes_ostream> doesn't fulfil those requirements since it leaks information how the underlying buffer is fragmented. Fortunately, it has no users so it doesn't casue any compatibility issues. Moreover, bytes_ostream is usually used as an output of some serialisation routine (e.g. frozen_mutation_fragment or CQL response). Those serialisation formats do not guarantee that there is a single representation of a given data and therefore are not fit to be hashed by appending_hash. Removing appending_hash<bytes_ostream> may help preventing such incorrect uses. Message-Id: <20181122163823.12759-1-pdziepak@scylladb.com>	2018-11-22 23:53:54 +00:00
Tomasz Grabiec	fb38f0e9f8	Update seastar submodule * seastar b924495...1fbb633 (3): > rpc: Reduce code duplication > tests: perf: Make do_not_optimize() take the argument by const& > doc: Fix import paths in the tutorial	2018-11-22 23:53:54 +00:00
Paweł Dziepak	2a0e929830	tests/random-utils: make functions and variables inline random-utils.hh is a header which may be included in multiple translation units so all members should be non-static inline to avoid any duplication.	2018-11-22 11:30:31 +00:00
Paweł Dziepak	edb5402a73	sstable: use format() instead of sprint() The format message was using the new stlye formatting markers ("{}") which are understood by format() but not by sprint() (the latter is basically deprecated).	2018-11-22 11:30:31 +00:00
Paweł Dziepak	1fbe33791d	converting_mutation_partition_applier: do not emit empty collections This patch changes the behaviour of the schema upgrade code so that if all cells and the tombstons of a collection are removed during the upgrade the collection is not emitted (as opposed to emitting an empty one). Both behaviours are valid, but the new one makes it more consistent with how atomic cells are upgraded and how schema upgrades work for sstable readers.	2018-11-22 11:30:31 +00:00
Paweł Dziepak	7b12aaa093	converting_mutation_partition_applier: fix collection type changes ALTER TABLE allows changing the type of a collection to a compatible one. This includes changes from a fixed-sized type to a variable-sized one. If that happens the atomic_cells representing collection elements need to be rewritten so that the value size is included. The logic for rewritting atomic cells already exists (for those that are not collection members) and is reused in this patch. Fixes #3925.	2018-11-22 11:30:31 +00:00
Paweł Dziepak	43e0201ec6	schema_builder: make member function names less confusing Right now, schema_builder member functions have names that very poorly convey the actions that are performed for them. This is made even worse by some overloads which drastically change the semantics. For example: schema_builder() .with_column("v1", /* ... /) .without_column("v1", removal_timestamp); Creates a column "v1" and adds an information that there was a column with that name that was removed at 'removal_timestamp'. schema_builder() .with_coulmn("v1") .without_column(utf8_type->decompose("v1")); This adds column "v1" and then immediately removes it. In order to clean up this mess the names were changes so that: with_/without_ functions only add informations to the schema (e.g. info that a column was removed, but without removing a column of that name if one exists) * functions which names start with a verb actually perform that action, e.g. the new remove_column() removes the column (and adds information that it used to exist) as in the second example.	2018-11-22 11:30:31 +00:00
Benny Halevy	dcd18e2b62	remove exec permission from top_k source files This was introduced by `32525f2694` Cc: Rafi Einstein <rafie@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181121163352.13325-1-bhalevy@scylladb.com>	2018-11-21 18:38:50 +02:00
Gleb Natapov	b4a8802edc	hints: make hints manager more resilient to unexpected directory content Currently if hints directory contains unexpected directories Scylla fails to start with unhandled std::invalid_argument exception. Make the manager ignore malformed files instead and try to proceed anyway. Message-Id: <20181121134618.29936-2-gleb@scylladb.com>	2018-11-21 14:53:03 +00:00
Gleb Natapov	9433d02624	hints: add auxiliary function for scanning high level hints directory We scan hints directory in two places: to search for files to replay and to search for directories to remove after resharding. The code that translates directory name to a shard is duplicated. It is simple now, so not a bit issue but in case it grows better have it in one place. Message-Id: <20181121134618.29936-1-gleb@scylladb.com>	2018-11-21 14:53:03 +00:00
Paweł Dziepak	4aa5d83590	Merge "Optimize sstable writing of the MC format" from Tomasz " Tested with perf_fast_forward from: github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1 Using the following command line: build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \ --data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \ --datasets small-part The average reported flush throughput was (stdev for the avergages is around 4k): - for mc before the series: 367848 frag/s - for lc before the series: 463458 frag/s (= mc.before +25%) - for mc after the series: 429276 frag/s (= mc.before +16%) - for lc after the series: 466495 frag/s (= mc.before +26%) Refs #3874. " * tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla: sstables: mc: Avoid serialization of promoted index when empty sstables: mc: Avoid double serialization of rows tests: sstable 3.x: Do not compare Statistics component utils: Introduce memory_data_sink schema: Optimize column count getters sstables: checksummed_file_data_sink_impl: Bypass output_stream	2018-11-21 13:11:40 +00:00
Tomasz Grabiec	049926bfb8	sstables: mc: Avoid serialization of promoted index when empty calculate_write_size() adds some overhead, even if we're not going to write anything.	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	0a9f5b563a	sstables: mc: Avoid double serialization of rows The old code was serializing the row twice. Once to get the size of its block on disk, which is needed to write the block length, and then to actually write the block. This patch avoids this by serializing once into a temporary buffer and then appending that buffer to the data file writer. I measured about 10% improvement in memtable flush throughput with this for the small-part dataset in perf_fast_forward.	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	8f686af9af	tests: sstable 3.x: Do not compare Statistics component The Statistics component recorded in the test was generated using a buggy verion of Scylla, and is not correct. Exposed by fixing the bug in the way statistics are generated. Rather than comparing binary content, we should have explicit checks for statistics.	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	143fd6e1c2	utils: Introduce memory_data_sink	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	789fac9884	schema: Optimize column count getters	2018-11-21 14:04:27 +01:00
Tomasz Grabiec	8e8b96c6ed	sstables: checksummed_file_data_sink_impl: Bypass output_stream We can avoid the data copying by switching from this: sink -> stream -> sink to this: sink -> sink	2018-11-21 14:04:27 +01:00
Avi Kivity	bb85a21a8f	Merge "compress: Restore lz4 as default compressor" from Duarte " Enables sstable compression with LZ4 by default, which was the long-time behavior until a regression turned off compression by default. Fixes #3926 " * 'restore-default-compression/v2' of https://github.com/duarten/scylla: tests/cql_query_test: Assert default compression options compress: Restore lz4 as default compressor tests: Be explicit about absence of compression	2018-11-21 14:20:39 +02:00
Benny Halevy	76b1c184b7	conf: clean up cassandra references in scylla.yaml Indicate the default scylla directories, rather than Cassandra's. Provide links to Scylladocumentation where possible, update links to Casandra documentation otherwise. Clean up a few typos. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181119141912.28830-1-bhalevy@scylladb.com>	2018-11-21 13:04:24 +02:00
Rafael Ávila de Espíndola	7fa7e9716d	Mention scylla-tools-java and scylla-jmx in HACKING.md I struggled a bit finding out why nodetool was not working, so it might be a good idea to expand the documentation a bit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181120233358.25859-1-espindola@scylladb.com>	2018-11-21 12:55:17 +02:00
Tomasz Grabiec	349c9f7a69	HACKING.md: Add a link to the slides about core dump debugging tools Message-Id: <1542793207-1620-1-git-send-email-tgrabiec@scylladb.com>	2018-11-21 11:45:23 +02:00
Michael Munday	53fdde75f6	dht: use little endian byte order explicitly for token hash This avoids a difference between little and big endian sytems. We now also calculate a full murmur hash for tokens with less than 8 bytes, however in practice the token size is always 8. Message-Id: <20181120214733.43800-1-mike.munday@ibm.com>	2018-11-21 11:44:29 +02:00
Michael Munday	360374cfde	tests: fix compilation of partitioner_test with boost 1.68 on IBM Z The boost multiprecision library that I am compiling against seems to be missing an overload for the cast to a string. The easy workaround seems to be to call str() directly instead. This also fixes #3922. Message-Id: <20181120215709.43939-1-mike.munday@ibm.com>	2018-11-21 11:43:42 +02:00
Duarte Nunes	9464fffc8c	tests/cql_query_test: Assert default compression options Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-11-20 22:47:27 +00:00
Duarte Nunes	36dc9e3280	compress: Restore lz4 as default compressor Fixes a regression introduced in `74758c87cd`, where tables started to be created without compression by default (before they were created with lz4 by default). Fixes #3926 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-11-20 22:47:27 +00:00
Duarte Nunes	5f64e34fcc	tests: Be explicit about absence of compression Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-11-20 22:47:26 +00:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Takuya ASADA	42baf6a6f7	dist/ami: update packer Update packer to latest version, 1.3.2. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181031110441.16284-2-syuu@scylladb.com>	2018-11-20 21:29:57 +02:00
Takuya ASADA	b9a42e83ad	dist/ami: enable AMI build log To make easier to debug AMI build error, enable logging. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181031110441.16284-1-syuu@scylladb.com>	2018-11-20 21:29:57 +02:00
Takuya ASADA	72411f95cb	reloc/build_reloc.sh: find ninja-build after executed install-dependencies.sh The build environment may not installed ninja-build before running install-dependencies.sh, so do it after running the script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181031110737.17755-1-syuu@scylladb.com>	2018-11-20 21:29:57 +02:00
Avi Kivity	183c2369f3	Update seastar submodule * seastar a44cedf...d59fcef (10): > dns: Set tcp output stream buffer size to zero explicitly > tests: add libc-ares to travis dependencies > tests: add dns_test to test suite > build: drop bundled c-ares package > prometheus: replace the instance label with an optional one > build: Refactor C++ dialect detection > build: add libatomic to install-depenencies.sh > core: use std::underlying_type for open_flags > core: introduce open_flags::operator& > core: Fix build for `gnu++14`	2018-11-20 21:29:57 +02:00
Tomasz Grabiec	57e25fa0f8	utils: phased_barrier: Make advance_and_await() have strong exception guarantees Currently, when advance_and_await() fails to allocate the new gate object, it will throw bad_alloc and leave the phased_barrier object in an invalid state. Calling advance_and_await() again on it will result in undefined behavior (typically SIGSEGV) beacuse _gate will be disengaged. One place affected by this is table::seal_active_memtable(), which calls _flush_barrier.advance_and_await(). If this throws, subsequent flush attempts will SIGSEGV. This patch rearranges the code so that advance_and_await() has strong exception guarantees. Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>	2018-11-20 16:15:12 +00:00
Glauber Costa	9f403334c8	remove monitor if sstable write failed In (almost) all SSTable write paths, we need to inform the monitor that the write has failed as well. The monitor will remove the SSTable from controller's tracking at that point. Except there is one place where we are not doing that: streaming of big mutations. Streaming of big mutations is an interesting use case, in which it is done in 2 parts: if the writing of the SSTable fails right away, then we do the correct thing. But the SSTables are not commited at that point and the monitors are still kept around with the SSTables until a later time, when they are finally committed. Between those two points in time, it is possible that the streaming code will detect a failure and manually call fail_streaming_mutations(), which marks the SSTable for deletions. At that point we should propagate that information to the monitor as well, but we don't. Fixes #3732 (hopefully) Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181114213618.16789-1-glauber@scylladb.com>	2018-11-20 16:15:12 +00:00
Gleb Natapov	d144e6ceac	messaging_service: enable port load balancing algorithm for RPC server In a homogeneous cluster this will reduce number of internal cross-shard hops per request since RPC calls will arrive to correct shard. Message-Id: <20181118150817.GF2062@scylladb.com>	2018-11-20 16:15:12 +00:00
Michael Munday	b9a2f4a228	dht: fix byte ordered partitioner midpoint calculation New versions of boost saturate the output of the convert_to method so we need to mask the part we want to extract. Updates #3922. Message-Id: <20181116191441.35000-1-mike.munday@ibm.com>	2018-11-16 21:19:06 +02:00
Glauber Costa	c6811bd877	sstables: correctly parse estimated histograms In commit `a33f0d6`, we changed the way we handle arrays during the write and parse code to avoid reactor stalls. Some potentially big loops were transformed into futurized loops, and also some calls to vector resizes were replaced by a reserve + push_back idiom. The latter broke parsing of the estimated histogram. The reason being that the vectors that are used here are already initialized internally by the estimated_histogram object. Therefore, when we push_back, we don't fill the array all the way from index 0, but end up with a zeroed beginning and only push back some of the elements we need. We could revert this array to a resize() call. After all, the reason we are using reserve + push_back is to avoid calling the constructor member for each element, but We don't really expect the integer specialization to do any of that. However, to avoid confusion with future developers that may feel tempted to converted this as well for the sake of consistency, it is safer to just make sure these arrays are zeroed. Fixes #3918 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181116130853.10473-1-glauber@scylladb.com>	2018-11-16 20:52:44 +02:00
Avi Kivity	d708dabab9	doc: add reference to Linux' submitting-patches document Since our development process is a derivative of Linux, almost everything there is pertinent. Message-Id: <20181115184037.5256-1-avi@scylladb.com>	2018-11-16 20:15:40 +02:00
Vladimir Krivopalov	759fbbd5f6	random_mutation_generator: Add row_marker to rows regardless of whether they're deleted. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <f55b91f1349f0e98def6b7ca9755b5ccf4f48a3e.1542308626.git.vladimir@scylladb.com>	2018-11-16 13:17:07 +01:00
Avi Kivity	6548a404b2	Remove patch file committed by mistake	2018-11-15 19:47:55 +02:00
Duarte Nunes	6fbf792777	db/view/view_builder: Don't timeout waiting for view to be built Remove the timeout argument to db::view::view_builder::wait_until_built(), a test-only function to wait until a given materialized view has finished building. This change is motivated by the fact that some tests running on slow environments will timeout. Instead of incrementally increasing the timeout, remove it completely since tests are already run under an exterior timeout. Fixes #3920 Tests: unit release(view_build_test, view_schema_test) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181115173902.19048-1-duarte@scylladb.com>	2018-11-15 19:41:43 +02:00
Amnon Heiman	25378916bc	API: colummn_family.hh yield in map_reduce_column_families_locally map_reduce_column_families_locally iterate over all tables (column family) in a shard. If the number of tables is big it can cause latency spikes. This patch replaces the current loop with a do_for_each allowing preepmtion inside the loop. Fixes #3886 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20181115154825.23430-1-amnon@scylladb.com>	2018-11-15 18:58:23 +02:00
Nadav Har'El	45f05b06d2	view_complex_test: fix another ttl In a previous patch I fixed most TTLs in the view_complex_test.cc tests from low numbers to 100 seconds. I missed one. This one never caused problems in practice, but for good form, let's fix it too. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115160234.26478-1-nyh@scylladb.com>	2018-11-15 18:03:28 +02:00
Nadav Har'El	78ed7d6d0c	Materialized Views and Secondary Index: no longer experimental After this patch, the Materialized Views and Secondary Index features are considered generally-available and no longer require passing an explicit "--experimental=on" flag to Scylla. The "--experimental=on" flag and the db::config::check_experimental() function remain unused, as we graduated the only two features which used this flag. However, we leave the support for experimental features in the code, to make it easier to add new experimental features in the future. Another reason to leave the command-line parameter behind is so existing scripts that still use it will not break. Fixes #3917 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115144456.25518-1-nyh@scylladb.com>	2018-11-15 17:59:27 +02:00
Vladimir Krivopalov	51afb1d8bd	tests: Generate deleted rows and shadowable tombstones in random_mutation_generator. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <77e956890264023227e07cc6d295df870d0a5af2.1542295208.git.vladimir@scylladb.com>	2018-11-15 16:26:07 +01:00
Avi Kivity	0216f49bb0	Merge "Add filtering support for CONTAINS" from Piotr " This series enables filtering support for CONTAINS restriction. " * 'enable_filtering_for_contains_2' of https://github.com/psarna/scylla: tests: add CONTAINS test case to filtering tests cql3: enable filtering for CONTAINS restriction cql3: add is_satisfied_by(bytes_view) for CONTAINS	2018-11-15 16:49:29 +02:00
Nadav Har'El	4108458b8e	view_complex_test: increase low ttl which may fail test on busy machine Several of the tests in tests/view_complex_test.cc set a cell with a TTL, and then skip time ahead artificially with forward_jump_clocks(), to go past the TTL time and check the cell disappeared as expected. The TTLs chosen for these tests were arbitrary numbers - some had 3 seconds, some 5 seconds, and some 60 seconds. The actual number doesn't matter - it is completely artificial (we move the clock with forward_jump_clocks() and never really wait for that amount of time) and could very well be a million seconds. But low numbers, like the 3 seconds, present a problem on extremely overcomitted test machines. Our eventually() function already allows for the possibility that things can hang for up to 8 seconds, but with a 3 second TTL, we can find ourselves with data being expired and the test failing just after 3 seconds of wall time have passed - while the test intended that the dataq will expire only when we explicitly call forward_jump_clocks(). So this patch changes all the TTLs in this test to be the same high number - 100 seconds. This hopefully fixes #3918. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115125607.22647-1-nyh@scylladb.com>	2018-11-15 15:34:08 +02:00
Piotr Jastrzebski	411437f320	Fix format string in mutation_partition::operator<< fmt does not allow bool values for :d and previous format string was resulting in: fmt::v5::format_error: invalid type specifier Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <3980a3cdb903263e29689b1c6cd24e3592826fe0.1542284205.git.piotr@scylladb.com>	2018-11-15 12:22:10 +00:00
Yannis Zarkadas	d292d0c78d	dist/redhat: extend docker entrypoint with more cmd flags With the use of Docker image, some extra options needed to be exposed to provide extended functionality when starting the image. The flags added by this commit are: - cluster-name: name of the Scylla cluster. cluster_name option in scylla.yaml. - rpc-address: IP address for client connections (CQL). rpc_address option in scylla.yaml. - endpoint-snitch: The snitch used to discover the cluster topology. endpoint_snitch option in scylla.yaml. - replace-address-first-boot: Replace a Scylla node by its IP. replace_address_first_boot option in scylla.yaml. Signed-off-by: Yannis Zarkadas <yanniszarkadas@gmail.com> [ penberg@scylladb.com: fix up merge conflicts ] Message-Id: <20181108234212.19969-2-yanniszarkadas@gmail.com>	2018-11-15 09:07:52 +02:00
Alexys Jacob	cd9d01cd7e	test.py: coding style fixes test.py:26:1: F401 'signal' imported but unused test.py:27:1: F401 'shlex' imported but unused test.py:28:1: F401 'threading' imported but unused test.py:173:1: E305 expected 2 blank lines after class or function definition, found 1 test.py:181:34: E241 multiple spaces after ',' test.py:183:34: E241 multiple spaces after ',' test.py:209:24: E222 multiple spaces after operator test.py:240:5: E301 expected 1 blank line, found 0 test.py:249:23: W504 line break after binary operator test.py:254:9: E306 expected 1 blank line before a nested definition, found 0 test.py:263:13: F841 local variable 'out' is assigned to but never used test.py:264:33: E128 continuation line under-indented for visual indent test.py:265:33: E128 continuation line under-indented for visual indent test.py:266:33: E128 continuation line under-indented for visual indent test.py:274:64: F821 undefined name 'e' test.py:278:53: F821 undefined name 'e' Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104115255.22547-1-ultrabug@gentoo.org>	2018-11-14 19:25:14 +02:00
Alexys Jacob	e76a1085d3	scylla-gdb.py: coding style fixes scylla-gdb.py:1:11: E401 multiple imports on one line scylla-gdb.py:5:1: F811 redefinition of unused 're' from line 2 scylla-gdb.py:10:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:19:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:24:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:30:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:39:9: E722 do not use bare 'except' scylla-gdb.py:47:33: E711 comparison to None should be 'if cond is None:' scylla-gdb.py:63:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:90:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:115:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:139:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:161:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:184:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:204:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:210:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:214:5: E301 expected 1 blank line, found 0 scylla-gdb.py:221:5: E301 expected 1 blank line, found 0 scylla-gdb.py:224:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:252:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:267:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:284:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:300:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:314:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:318:5: E301 expected 1 blank line, found 0 scylla-gdb.py:322:5: E301 expected 1 blank line, found 0 scylla-gdb.py:337:1: E305 expected 2 blank lines after class or function definition, found 1 scylla-gdb.py:339:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:342:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:345:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:348:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:352:129: E202 whitespace before ')' scylla-gdb.py:361:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:363:129: E202 whitespace before ')' scylla-gdb.py:371:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:375:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:378:5: E301 expected 1 blank line, found 0 scylla-gdb.py:383:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:386:5: E301 expected 1 blank line, found 0 scylla-gdb.py:393:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:396:5: E301 expected 1 blank line, found 0 scylla-gdb.py:407:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:410:5: E301 expected 1 blank line, found 0 scylla-gdb.py:412:9: E306 expected 1 blank line before a nested definition, found 0 scylla-gdb.py:439:26: E703 statement ends with a semicolon scylla-gdb.py:462:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:500:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:506:5: E722 do not use bare 'except' scylla-gdb.py:516:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:518:18: E271 multiple spaces after keyword scylla-gdb.py:522:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:530:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:533:5: E301 expected 1 blank line, found 0 scylla-gdb.py:537:13: E306 expected 1 blank line before a nested definition, found 0 scylla-gdb.py:547:9: E722 do not use bare 'except' scylla-gdb.py:550:26: E261 at least two spaces before inline comment scylla-gdb.py:568:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:571:5: E301 expected 1 blank line, found 0 scylla-gdb.py:577:13: E128 continuation line under-indented for visual indent scylla-gdb.py:577:39: E226 missing whitespace around arithmetic operator scylla-gdb.py:583:15: E128 continuation line under-indented for visual indent scylla-gdb.py:596:19: E128 continuation line under-indented for visual indent scylla-gdb.py:609:82: E227 missing whitespace around bitwise or shift operator scylla-gdb.py:609:90: E226 missing whitespace around arithmetic operator scylla-gdb.py:609:113: E226 missing whitespace around arithmetic operator scylla-gdb.py:613:1: E303 too many blank lines (3) scylla-gdb.py:645:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:659:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:671:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:678:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:679:9: E128 continuation line under-indented for visual indent scylla-gdb.py:680:9: E128 continuation line under-indented for visual indent scylla-gdb.py:681:9: E128 continuation line under-indented for visual indent scylla-gdb.py:682:9: E128 continuation line under-indented for visual indent scylla-gdb.py:708:12: E111 indentation is not a multiple of four scylla-gdb.py:721:13: E128 continuation line under-indented for visual indent scylla-gdb.py:723:13: E128 continuation line under-indented for visual indent scylla-gdb.py:725:13: E128 continuation line under-indented for visual indent scylla-gdb.py:727:13: E128 continuation line under-indented for visual indent scylla-gdb.py:729:13: E128 continuation line under-indented for visual indent scylla-gdb.py:748:33: E261 at least two spaces before inline comment scylla-gdb.py:770:17: E306 expected 1 blank line before a nested definition, found 0 scylla-gdb.py:795:17: E128 continuation line under-indented for visual indent scylla-gdb.py:796:17: E128 continuation line under-indented for visual indent scylla-gdb.py:797:17: E128 continuation line under-indented for visual indent scylla-gdb.py:798:17: E128 continuation line under-indented for visual indent scylla-gdb.py:800:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:807:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:814:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:820:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:823:5: E301 expected 1 blank line, found 0 scylla-gdb.py:845:35: E703 statement ends with a semicolon scylla-gdb.py:865:91: E703 statement ends with a semicolon scylla-gdb.py:896:9: F841 local variable 'segment_size' is assigned to but never used scylla-gdb.py:904:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:907:5: E301 expected 1 blank line, found 0 scylla-gdb.py:915:73: E128 continuation line under-indented for visual indent scylla-gdb.py:916:73: E128 continuation line under-indented for visual indent scylla-gdb.py:917:73: E126 continuation line over-indented for hanging indent scylla-gdb.py:922:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:925:5: E301 expected 1 blank line, found 0 scylla-gdb.py:933:13: E128 continuation line under-indented for visual indent scylla-gdb.py:934:13: E128 continuation line under-indented for visual indent scylla-gdb.py:934:49: E251 unexpected spaces around keyword / parameter equals scylla-gdb.py:934:51: E251 unexpected spaces around keyword / parameter equals scylla-gdb.py:934:74: E251 unexpected spaces around keyword / parameter equals scylla-gdb.py:934:76: E251 unexpected spaces around keyword / parameter equals scylla-gdb.py:940:13: E128 continuation line under-indented for visual indent scylla-gdb.py:941:13: E128 continuation line under-indented for visual indent scylla-gdb.py:949:17: E128 continuation line under-indented for visual indent scylla-gdb.py:950:17: E128 continuation line under-indented for visual indent scylla-gdb.py:951:17: E128 continuation line under-indented for visual indent scylla-gdb.py:952:21: E128 continuation line under-indented for visual indent scylla-gdb.py:953:21: E128 continuation line under-indented for visual indent scylla-gdb.py:954:21: E128 continuation line under-indented for visual indent scylla-gdb.py:955:21: E128 continuation line under-indented for visual indent scylla-gdb.py:958:1: E305 expected 2 blank lines after class or function definition, found 1 scylla-gdb.py:958:11: E261 at least two spaces before inline comment scylla-gdb.py:959:1: E302 expected 2 blank lines, found 0 scylla-gdb.py:971:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:989:5: E301 expected 1 blank line, found 0 scylla-gdb.py:993:5: E301 expected 1 blank line, found 0 scylla-gdb.py:995:5: E301 expected 1 blank line, found 0 scylla-gdb.py:997:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1001:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1005:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1029:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1034:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1037:46: E128 continuation line under-indented for visual indent scylla-gdb.py:1057:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1060:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1071:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1076:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1084:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1093:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1096:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1101:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1104:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1116:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1119:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1123:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1126:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1132:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1135:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1138:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1141:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1147:15: E241 multiple spaces after ':' scylla-gdb.py:1148:15: E241 multiple spaces after ':' scylla-gdb.py:1149:15: E241 multiple spaces after ':' scylla-gdb.py:1150:15: E241 multiple spaces after ':' scylla-gdb.py:1151:15: E241 multiple spaces after ':' scylla-gdb.py:1152:15: E241 multiple spaces after ':' scylla-gdb.py:1153:15: E241 multiple spaces after ':' scylla-gdb.py:1154:15: E241 multiple spaces after ':' scylla-gdb.py:1170:20: E221 multiple spaces before operator scylla-gdb.py:1191:40: E226 missing whitespace around arithmetic operator scylla-gdb.py:1191:59: E226 missing whitespace around arithmetic operator scylla-gdb.py:1225:1: E305 expected 2 blank lines after class or function definition, found 1 scylla-gdb.py:1227:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1233:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1236:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1240:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1278:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1281:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1284:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1287:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1293:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1296:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1320:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1323:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1355:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1362:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1383:1: E302 expected 2 blank lines, found 1 scylla-gdb.py:1386:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1388:9: E306 expected 1 blank line before a nested definition, found 0 scylla-gdb.py:1397:13: F841 local variable 'selector' is assigned to but never used scylla-gdb.py:1446:5: E301 expected 1 blank line, found 0 scylla-gdb.py:1477:5: E301 expected 1 blank line, found 0 Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104113603.1111-1-ultrabug@gentoo.org>	2018-11-14 19:25:14 +02:00
Alexys Jacob	e58eb6d6ab	idl-compiler.py: coding style fixes idl-compiler.py:22:1: F401 'json' imported but unused idl-compiler.py:23:1: F401 'sys' imported but unused idl-compiler.py:24:1: F401 're' imported but unused idl-compiler.py:25:1: F401 'glob' imported but unused idl-compiler.py:27:1: F401 'os' imported but unused idl-compiler.py:54:1: F811 redefinition of unused 'reindent' from line 33 idl-compiler.py:57:1: E302 expected 2 blank lines, found 1 idl-compiler.py:61:1: E302 expected 2 blank lines, found 1 idl-compiler.py:66:1: E302 expected 2 blank lines, found 1 idl-compiler.py:96:1: E302 expected 2 blank lines, found 1 idl-compiler.py:160:1: E302 expected 2 blank lines, found 1 idl-compiler.py:163:1: E302 expected 2 blank lines, found 1 idl-compiler.py:166:1: E302 expected 2 blank lines, found 1 idl-compiler.py:172:1: E302 expected 2 blank lines, found 1 idl-compiler.py:176:1: E302 expected 2 blank lines, found 1 idl-compiler.py:176:47: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:176:49: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:191:24: E203 whitespace before ':' idl-compiler.py:191:43: E203 whitespace before ':' idl-compiler.py:191:67: E203 whitespace before ':' idl-compiler.py:191:84: E202 whitespace before '}' idl-compiler.py:195:1: E302 expected 2 blank lines, found 1 idl-compiler.py:195:45: E203 whitespace before ',' idl-compiler.py:195:69: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:195:71: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:198:28: E225 missing whitespace around operator idl-compiler.py:198:40: E225 missing whitespace around operator idl-compiler.py:198:43: E272 multiple spaces before keyword idl-compiler.py:212:25: E203 whitespace before ':' idl-compiler.py:212:45: E203 whitespace before ':' idl-compiler.py:212:100: E203 whitespace before ':' idl-compiler.py:218:1: E302 expected 2 blank lines, found 1 idl-compiler.py:225:1: E302 expected 2 blank lines, found 1 idl-compiler.py:226:11: E271 multiple spaces after keyword idl-compiler.py:228:1: E302 expected 2 blank lines, found 1 idl-compiler.py:235:1: E302 expected 2 blank lines, found 1 idl-compiler.py:238:1: E302 expected 2 blank lines, found 1 idl-compiler.py:241:5: E722 do not use bare 'except' idl-compiler.py:243:1: E305 expected 2 blank lines after class or function definition, found 0 idl-compiler.py:245:1: E302 expected 2 blank lines, found 1 idl-compiler.py:250:25: E231 missing whitespace after ',' idl-compiler.py:253:1: E302 expected 2 blank lines, found 1 idl-compiler.py:256:1: E302 expected 2 blank lines, found 1 idl-compiler.py:263:1: E302 expected 2 blank lines, found 1 idl-compiler.py:266:1: E302 expected 2 blank lines, found 1 idl-compiler.py:267:75: E225 missing whitespace around operator idl-compiler.py:269:1: E302 expected 2 blank lines, found 1 idl-compiler.py:272:1: E302 expected 2 blank lines, found 1 idl-compiler.py:275:1: E302 expected 2 blank lines, found 1 idl-compiler.py:278:1: E305 expected 2 blank lines after class or function definition, found 1 idl-compiler.py:280:1: E302 expected 2 blank lines, found 1 idl-compiler.py:283:1: E302 expected 2 blank lines, found 1 idl-compiler.py:286:1: E302 expected 2 blank lines, found 1 idl-compiler.py:288:1: E302 expected 2 blank lines, found 0 idl-compiler.py:293:1: E302 expected 2 blank lines, found 1 idl-compiler.py:294:20: E203 whitespace before ':' idl-compiler.py:294:22: E241 multiple spaces after ':' idl-compiler.py:294:51: E203 whitespace before ':' idl-compiler.py:294:55: E202 whitespace before '}' idl-compiler.py:296:1: E302 expected 2 blank lines, found 1 idl-compiler.py:298:23: E203 whitespace before ':' idl-compiler.py:300:1: E305 expected 2 blank lines after class or function definition, found 1 idl-compiler.py:301:1: E302 expected 2 blank lines, found 0 idl-compiler.py:304:1: E302 expected 2 blank lines, found 1 idl-compiler.py:304:45: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:304:47: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:311:67: E202 whitespace before '}' idl-compiler.py:314:74: E241 multiple spaces after ':' idl-compiler.py:316:114: E241 multiple spaces after ':' idl-compiler.py:316:129: E203 whitespace before ':' idl-compiler.py:326:1: E302 expected 2 blank lines, found 1 idl-compiler.py:328:27: E231 missing whitespace after ',' idl-compiler.py:328:34: E225 missing whitespace around operator idl-compiler.py:330:1: E302 expected 2 blank lines, found 1 idl-compiler.py:332:5: F841 local variable 'typ' is assigned to but never used idl-compiler.py:348:63: E202 whitespace before '}' idl-compiler.py:352:1: E302 expected 2 blank lines, found 1 idl-compiler.py:353:21: E231 missing whitespace after ',' idl-compiler.py:368:30: E203 whitespace before ':' idl-compiler.py:374:30: E203 whitespace before ':' idl-compiler.py:411:57: E203 whitespace before ':' idl-compiler.py:413:1: E302 expected 2 blank lines, found 1 idl-compiler.py:413:64: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:66: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:80: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:82: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:98: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:413:100: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:415:51: E225 missing whitespace around operator idl-compiler.py:417:57: E225 missing whitespace around operator idl-compiler.py:448:1: E302 expected 2 blank lines, found 1 idl-compiler.py:448:60: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:62: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:76: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:78: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:94: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:448:96: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:451:51: E225 missing whitespace around operator idl-compiler.py:453:57: E225 missing whitespace around operator idl-compiler.py:455:30: E231 missing whitespace after ',' idl-compiler.py:477:1: E302 expected 2 blank lines, found 1 idl-compiler.py:477:48: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:477:50: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:477:67: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:477:69: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:484:24: E222 multiple spaces after operator idl-compiler.py:488:74: E203 whitespace before ':' idl-compiler.py:498:20: E222 multiple spaces after operator idl-compiler.py:507:68: E203 whitespace before ':' idl-compiler.py:507:88: E203 whitespace before ':' idl-compiler.py:514:87: E231 missing whitespace after ',' idl-compiler.py:520:14: E211 whitespace before '(' idl-compiler.py:521:15: E703 statement ends with a semicolon idl-compiler.py:523:1: E302 expected 2 blank lines, found 1 idl-compiler.py:540:47: E231 missing whitespace after ':' idl-compiler.py:542:1: E302 expected 2 blank lines, found 1 idl-compiler.py:542:47: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:542:49: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:542:69: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:542:71: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:547:24: E222 multiple spaces after operator idl-compiler.py:553:47: E231 missing whitespace after ':' idl-compiler.py:558:43: E231 missing whitespace after ':' idl-compiler.py:560:1: E302 expected 2 blank lines, found 1 idl-compiler.py:564:1: E302 expected 2 blank lines, found 1 idl-compiler.py:564:82: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:564:84: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:564:105: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:564:107: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:573:21: E222 multiple spaces after operator idl-compiler.py:576:25: E222 multiple spaces after operator idl-compiler.py:577:13: F841 local variable 'sate' is assigned to but never used idl-compiler.py:584:66: E203 whitespace before ':' idl-compiler.py:589:66: E203 whitespace before ':' idl-compiler.py:589:89: E203 whitespace before ':' idl-compiler.py:589:113: E203 whitespace before ':' idl-compiler.py:600:48: E203 whitespace before ':' idl-compiler.py:600:68: E203 whitespace before ':' idl-compiler.py:602:1: E302 expected 2 blank lines, found 1 idl-compiler.py:602:1: F811 redefinition of unused 'add_vector_node' from line 330 idl-compiler.py:604:38: E231 missing whitespace after ',' idl-compiler.py:604:59: E202 whitespace before ')' idl-compiler.py:607:1: E305 expected 2 blank lines after class or function definition, found 1 idl-compiler.py:609:1: E302 expected 2 blank lines, found 1 idl-compiler.py:615:39: E231 missing whitespace after ',' idl-compiler.py:622:1: E302 expected 2 blank lines, found 1 idl-compiler.py:630:46: E203 whitespace before ':' idl-compiler.py:637:33: E231 missing whitespace after ':' idl-compiler.py:640:90: E203 whitespace before ':' idl-compiler.py:641:13: F841 local variable 'vr' is assigned to but never used idl-compiler.py:642:1: E305 expected 2 blank lines after class or function definition, found 0 idl-compiler.py:644:1: E302 expected 2 blank lines, found 1 idl-compiler.py:657:1: E302 expected 2 blank lines, found 1 idl-compiler.py:657:51: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:657:53: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:657:67: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:657:69: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:660:5: E265 block comment should start with '# ' idl-compiler.py:679:16: E272 multiple spaces before keyword idl-compiler.py:692:56: E271 multiple spaces after keyword idl-compiler.py:695:5: F841 local variable 'is_param_vector' is assigned to but never used idl-compiler.py:699:1: E302 expected 2 blank lines, found 1 idl-compiler.py:699:56: E202 whitespace before ')' idl-compiler.py:711:1: E302 expected 2 blank lines, found 1 idl-compiler.py:719:26: E201 whitespace after '{' idl-compiler.py:730:39: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:730:41: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:733:1: E302 expected 2 blank lines, found 1 idl-compiler.py:735:21: E225 missing whitespace around operator idl-compiler.py:738:1: E302 expected 2 blank lines, found 1 idl-compiler.py:747:1: E305 expected 2 blank lines after class or function definition, found 1 idl-compiler.py:749:1: E302 expected 2 blank lines, found 1 idl-compiler.py:767:17: E211 whitespace before '(' idl-compiler.py:767:26: E203 whitespace before ':' idl-compiler.py:770:5: E303 too many blank lines (2) idl-compiler.py:777:20: E211 whitespace before '(' idl-compiler.py:777:29: E203 whitespace before ':' idl-compiler.py:783:28: E203 whitespace before ':' idl-compiler.py:783:44: E203 whitespace before ':' idl-compiler.py:783:82: E203 whitespace before ':' idl-compiler.py:786:1: E302 expected 2 blank lines, found 1 idl-compiler.py:794:28: E203 whitespace before ':' idl-compiler.py:802:33: E203 whitespace before ':' idl-compiler.py:815:21: E126 continuation line over-indented for hanging indent idl-compiler.py:815:28: E203 whitespace before ':' idl-compiler.py:815:50: E203 whitespace before ':' idl-compiler.py:817:82: E203 whitespace before ':' idl-compiler.py:817:104: E203 whitespace before ':' idl-compiler.py:827:33: E203 whitespace before ':' idl-compiler.py:827:48: E203 whitespace before ':' idl-compiler.py:827:68: E203 whitespace before ':' idl-compiler.py:827:84: E203 whitespace before ':' idl-compiler.py:827:100: E203 whitespace before ':' idl-compiler.py:859:24: E203 whitespace before ':' idl-compiler.py:859:58: E203 whitespace before ':' idl-compiler.py:859:78: E203 whitespace before ':' idl-compiler.py:861:1: E302 expected 2 blank lines, found 1 idl-compiler.py:865:1: E302 expected 2 blank lines, found 1 idl-compiler.py:876:1: E302 expected 2 blank lines, found 1 idl-compiler.py:876:71: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:876:73: E251 unexpected spaces around keyword / parameter equals idl-compiler.py:883:21: E222 multiple spaces after operator idl-compiler.py:884:28: E225 missing whitespace around operator idl-compiler.py:884:46: E225 missing whitespace around operator idl-compiler.py:884:49: E272 multiple spaces before keyword idl-compiler.py:904:86: E203 whitespace before ':' idl-compiler.py:904:107: E203 whitespace before ':' idl-compiler.py:906:81: E203 whitespace before ':' idl-compiler.py:906:106: E203 whitespace before ':' idl-compiler.py:906:124: E203 whitespace before ':' idl-compiler.py:906:143: E203 whitespace before ':' idl-compiler.py:911:49: E203 whitespace before ':' idl-compiler.py:911:69: E203 whitespace before ':' idl-compiler.py:911:93: E203 whitespace before ':' idl-compiler.py:918:85: E203 whitespace before ':' idl-compiler.py:918:108: E203 whitespace before ':' idl-compiler.py:918:151: E203 whitespace before ':' idl-compiler.py:922:62: E203 whitespace before ':' idl-compiler.py:922:90: E203 whitespace before ':' idl-compiler.py:925:82: E203 whitespace before ':' idl-compiler.py:925:110: E203 whitespace before ':' idl-compiler.py:940:70: E203 whitespace before ':' idl-compiler.py:940:128: E203 whitespace before ':' idl-compiler.py:942:110: E203 whitespace before ':' idl-compiler.py:942:168: E203 whitespace before ':' idl-compiler.py:948:25: E203 whitespace before ':' idl-compiler.py:948:75: E203 whitespace before ':' idl-compiler.py:954:78: E203 whitespace before ':' idl-compiler.py:954:101: E203 whitespace before ':' idl-compiler.py:954:144: E203 whitespace before ':' idl-compiler.py:957:62: E203 whitespace before ':' idl-compiler.py:957:90: E203 whitespace before ':' idl-compiler.py:969:13: E271 multiple spaces after keyword idl-compiler.py:971:13: E271 multiple spaces after keyword idl-compiler.py:976:1: E302 expected 2 blank lines, found 1 idl-compiler.py:987:1: E302 expected 2 blank lines, found 1 idl-compiler.py:1016:1: E302 expected 2 blank lines, found 1 idl-compiler.py:1023:42: E225 missing whitespace around operator idl-compiler.py:1024:79: E225 missing whitespace around operator idl-compiler.py:1027:1: E305 expected 2 blank lines after class or function definition, found 0 Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104112308.19409-1-ultrabug@gentoo.org>	2018-11-14 19:25:13 +02:00
Alexys Jacob	0cf480aad0	gen_segmented_compress_params.py: coding style fixes gen_segmented_compress_params.py:52:47: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:56:64: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:60:36: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:60:48: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:70:35: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:70:48: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:99:43: E226 missing whitespace around arithmetic operator gen_segmented_compress_params.py:106:18: E225 missing whitespace around operator gen_segmented_compress_params.py:120:5: E303 too many blank lines (2) gen_segmented_compress_params.py:200:30: E261 at least two spaces before inline comment gen_segmented_compress_params.py:200:31: E262 inline comment should start with '# ' gen_segmented_compress_params.py:218:76: E261 at least two spaces before inline comment gen_segmented_compress_params.py:219:59: E703 statement ends with a semicolon gen_segmented_compress_params.py:219:60: E261 at least two spaces before inline comment Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104115753.4701-1-ultrabug@gentoo.org>	2018-11-14 19:25:12 +02:00
Alexys Jacob	43a04ad693	fix_system_distributed_tables.py: coding style fixes fix_system_distributed_tables.py:28:20: E203 whitespace before ':' fix_system_distributed_tables.py:29:20: E203 whitespace before ':' fix_system_distributed_tables.py:30:20: E203 whitespace before ':' fix_system_distributed_tables.py:31:20: E203 whitespace before ':' fix_system_distributed_tables.py:33:20: E203 whitespace before ':' fix_system_distributed_tables.py:34:23: E203 whitespace before ':' fix_system_distributed_tables.py:35:23: E203 whitespace before ':' fix_system_distributed_tables.py:39:20: E203 whitespace before ':' fix_system_distributed_tables.py:40:20: E203 whitespace before ':' fix_system_distributed_tables.py:41:20: E203 whitespace before ':' fix_system_distributed_tables.py:42:20: E203 whitespace before ':' fix_system_distributed_tables.py:43:20: E203 whitespace before ':' fix_system_distributed_tables.py:44:20: E203 whitespace before ':' fix_system_distributed_tables.py:45:20: E203 whitespace before ':' fix_system_distributed_tables.py:46:20: E203 whitespace before ':' fix_system_distributed_tables.py:47:20: E203 whitespace before ':' fix_system_distributed_tables.py:48:20: E203 whitespace before ':' fix_system_distributed_tables.py:52:20: E203 whitespace before ':' fix_system_distributed_tables.py:53:20: E203 whitespace before ':' fix_system_distributed_tables.py:54:20: E203 whitespace before ':' fix_system_distributed_tables.py:55:20: E203 whitespace before ':' fix_system_distributed_tables.py:56:20: E203 whitespace before ':' fix_system_distributed_tables.py:57:20: E203 whitespace before ':' fix_system_distributed_tables.py:58:20: E203 whitespace before ':' fix_system_distributed_tables.py:59:20: E203 whitespace before ':' fix_system_distributed_tables.py:60:20: E203 whitespace before ':' fix_system_distributed_tables.py:61:20: E203 whitespace before ':' fix_system_distributed_tables.py:62:20: E203 whitespace before ':' fix_system_distributed_tables.py:66:19: E203 whitespace before ':' fix_system_distributed_tables.py:67:19: E203 whitespace before ':' fix_system_distributed_tables.py:72:19: E203 whitespace before ':' fix_system_distributed_tables.py:73:19: E203 whitespace before ':' fix_system_distributed_tables.py:74:19: E203 whitespace before ':' fix_system_distributed_tables.py:78:19: E203 whitespace before ':' fix_system_distributed_tables.py:79:19: E203 whitespace before ':' fix_system_distributed_tables.py:80:19: E203 whitespace before ':' fix_system_distributed_tables.py:84:19: E203 whitespace before ':' fix_system_distributed_tables.py:85:19: E203 whitespace before ':' fix_system_distributed_tables.py:89:19: E203 whitespace before ':' fix_system_distributed_tables.py:90:19: E203 whitespace before ':' fix_system_distributed_tables.py:91:19: E203 whitespace before ':' fix_system_distributed_tables.py:95:22: E203 whitespace before ':' fix_system_distributed_tables.py:96:22: E203 whitespace before ':' fix_system_distributed_tables.py:99:1: E302 expected 2 blank lines, found 0 fix_system_distributed_tables.py:103:72: E201 whitespace after '[' fix_system_distributed_tables.py:103:82: E202 whitespace before ']' fix_system_distributed_tables.py:105:43: E201 whitespace after '[' fix_system_distributed_tables.py:105:53: E202 whitespace before ']' fix_system_distributed_tables.py:111:16: E713 test for membership should be 'not in' fix_system_distributed_tables.py:118:20: E713 test for membership should be 'not in' fix_system_distributed_tables.py:135:25: E722 do not use bare 'except' fix_system_distributed_tables.py:138:5: E722 do not use bare 'except' fix_system_distributed_tables.py:144:1: E305 expected 2 blank lines after class or function definition, found 0 fix_system_distributed_tables.py:145:47: E251 unexpected spaces around keyword / parameter equals fix_system_distributed_tables.py:145:49: E251 unexpected spaces around keyword / parameter equals fix_system_distributed_tables.py:160:1: W391 blank line at end of file Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104113001.22783-1-ultrabug@gentoo.org>	2018-11-14 19:25:12 +02:00
Alexys Jacob	c9e3b739ae	dist/docker/redhat/scyllasetup.py: coding style fixes dist/docker/redhat/scyllasetup.py:6:1: E302 expected 2 blank lines, found 1 dist/docker/redhat/scyllasetup.py:41:21: E128 continuation line under-indented for visual indent dist/docker/redhat/scyllasetup.py:65:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:65:51: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:67:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:67:45: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:69:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:69:42: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:79:18: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:79:42: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:80:39: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:81:70: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:84:48: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:84:70: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:86:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:86:53: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:86:78: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:89:42: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:89:58: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:92:44: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:92:63: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:95:41: E225 missing whitespace around operator dist/docker/redhat/scyllasetup.py:95:57: E202 whitespace before ']' dist/docker/redhat/scyllasetup.py:98:22: E201 whitespace after '[' dist/docker/redhat/scyllasetup.py:98:42: E202 whitespace before ']' Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104110913.13796-1-ultrabug@gentoo.org>	2018-11-14 19:25:11 +02:00
Alexys Jacob	1585983fc9	dist/docker/redhat: coding style fixes dist/docker/redhat/docker-entrypoint.py:20:1: E722 do not use bare 'except' dist/docker/redhat/commandlineparser.py:13:13: E128 continuation line under-indented for visual indent Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104120134.9598-1-ultrabug@gentoo.org>	2018-11-14 19:25:10 +02:00
Alexys Jacob	c24e0e5599	dist/common/scripts/scylla_util.py: coding style fixes dist/common/scripts/scylla_util.py:388:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:414:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:418:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:453:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:468:5: E722 do not use bare 'except' dist/common/scripts/scylla_util.py:472:1: E302 expected 2 blank lines, found 1 Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181104120832.11273-1-ultrabug@gentoo.org>	2018-11-14 19:25:09 +02:00
Vladimir Krivopalov	2c21fb4897	Use coloured tests results in test.py script output. With the number of unit tests approaching one hundred, the output of test.py becomes more challenging to read. If some test fails, we will only get the details after all the tests complete, but some tests take way longer than others. With the coloured status, it is much simpler to immediately locate failing tests. Developer can cancel others and repeat the failing ones. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <63a99a2fb70fdc33fd6eeb8e18fee977a47bd278.1541541184.git.vladimir@scylladb.com>	2018-11-14 19:23:39 +02:00
Piotr Sarna	b04508041d	tests: add CONTAINS test case to filtering tests	2018-11-14 16:08:19 +01:00
Piotr Sarna	0fc7d63842	cql3: enable filtering for CONTAINS restriction With contains::is_satisfied_by(bytes_view) implemented, it's possible to enable filtering support for CONTAINS restriction. Fixes #3573	2018-11-14 14:39:21 +01:00
Piotr Sarna	d8a1693d84	cql3: add is_satisfied_by(bytes_view) for CONTAINS is_satisfied_by that takes a bytes_view parameter is needed for filtering, so it's provided for CONTAINS restriction.	2018-11-14 14:39:21 +01:00
Botond Dénes	9e4276669b	flat_mutation_reader: document next_partition() Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <01fa57c7473c00e4dc891527a8628026b6dccc01.1542180913.git.bdenes@scylladb.com>	2018-11-14 13:38:38 +00:00
Avi Kivity	447f953a2c	Merge "Add DEFAULT UNSET support to JSON" from Piotr " This series adds DEFAULT UNSET and DEFAULT NULL keyword support to INSERT JSON statement, as stated in #3909. Tests: unit (release) " * 'add_json_default_unset_2' of https://github.com/psarna/scylla: tests: add DEFAULT UNSET case to JSON cql tests tests: split JSON part of cql query test cql3: add DEFAULT UNSET to INSERT JSON	2018-11-13 09:14:50 -08:00
Piotr Sarna	fc4ecf9be4	tests: add DEFAULT UNSET case to JSON cql tests A case covering DEFAULT UNSET/DEFAULT NULL params is added to json cql query test suite. Refs #3909	2018-11-13 18:06:15 +01:00
Piotr Sarna	cb6fd6a30d	tests: split JSON part of cql query test JSON part of cql query test is split into another file to make cql_query_test.cc less huge.	2018-11-13 18:06:15 +01:00
Piotr Sarna	e153e590c1	cql3: add DEFAULT UNSET to INSERT JSON When inserting a JSON, additional DEFAULT UNSET or DEFAULT NULL keywords can be appended. With DEFAULT UNSET, values omitted in JSON will not be changed at all. With DEFAULT NULL (default), omitted values will be treated as having a 'null' value. Fixes #3909	2018-11-13 18:05:55 +01:00
Avi Kivity	a089f66755	Merge "ec2_multi_region_snitch: print a proper error message when a Public IP is not available" from Vlad " Fix for #3897 "Ec2MultiRegionSnitch: prints a cryptic error when a Public IP is not available" Ec2MultiRegionSnitch naturally requires a Public IP to be available and therefore it's expected to refuse to work without it. However the error message that is printed today is a total disaster and has to be fixed ASAP to be something much more human readable. This series adds a human readable preabmle that will let a poor user understand what should he/she do. " * 'improve-ec2-multi-region-snitch-error-message-when-pulic-address-is-not-available-v2' of https://github.com/vladzcloudius/scylla: locator: ec2_multi_region_snitch::start(): print a human readable error if Public IP may not be retrieved locator: ec2_multi_region_snitch::start(): rework on top of seastar::thread	2018-11-13 09:02:55 -08:00
Duarte Nunes	a38f6078fb	Merge 'Generating view updates during streaming' from Piotr During streaming, there are cases when we should invoke the view write path. In particular, if we're streaming because of repair or if a view has not yet finished building and we're bootstrapping a new node. The design constraints are: 1) The streamed writes should be visible to new writes, but the sstable should not participate in compaction, or we would lose the ability to exclude the streamed writes on a restart; 2) The streamed writes must not be considered when generating view updates for them; 3) Resilient to node restarts; 4) Resilient to concurrent stream sessions, possibly streaming mutations for overlapping ranges. We achieve this by writing the streamed writes to an sstable in a different folder, call it "staging". We achieve 1) by publishing the sstable to the column family sstable set, but excluding it from compactions. We do these steps upon boot, by looking at the staging directory, thus achieving 3). Fixes #3275 * 'streaming_view_to_staging_sstables_9' of https://github.com/psarna/scylla: (29 commits) tests: add materialized views test tests: add view update generator to cql test env main: add registering staging sstables read from disk database: add a check if loaded sstable is already staging database: add get_staging_sstable method streaming: stream tables with views through staging sstables streaming: add system distributed keyspace ref to streaming streaming: add view update generator reference to streaming main: add generating missed mv updates from staging sstables storage_service: move initializing sys_dist_ks before bootstrap db/view: add view_update_from_staging_generator service db/view: add view updating consumer table: add stream_view_replica_updates table: split push_view_replica_updates table: add as_mutation_source_excluding table: move push_view_replica_updates to table.cc database: add populating tables with staging sstables database: add creating /staging directory for sstables database: add sstable-excluding reader table: add move_sstable_from_staging_in_thread function ...	2018-11-13 15:16:31 +00:00
Piotr Sarna	1724ee55c7	tests: add materialized views test Right now materialized_views_test.cc contains view updating tests, but the intention is to move mv-related tests from cql_query_test here and use it for all future unit testing of MV.	2018-11-13 15:21:55 +01:00
Piotr Sarna	056a78bbc7	tests: add view update generator to cql test env Keeping view update generator in cql test env enables generating updates from staging sstables in tests.	2018-11-13 15:04:43 +01:00
Piotr Sarna	16c042039c	main: add registering staging sstables read from disk Staging sstables read from disk are registered to the view update generator right after initializing non system keyspaces. Fixes #3275	2018-11-13 15:04:43 +01:00
Piotr Sarna	de43b4f41d	database: add a check if loaded sstable is already staging Staging sstables are loaded before regular ones. If the process fails midway, an sstable can be linked both in the regular directory and in staging directory. In such cases, the sstable remains in staging and will be moved to the regular directory by view update streamer service.	2018-11-13 15:04:43 +01:00
Piotr Sarna	d7849e6ea4	database: add get_staging_sstable method This method can be used to check if sstable is staging, i.e. it shouldn't be compacted and it will not be used for generating view updates from other staging tables, and return proper shared_sstable pointer if it is.	2018-11-13 15:04:43 +01:00
Piotr Sarna	32c0fe8df2	streaming: stream tables with views through staging sstables While streaming to a table with paired views, staging sstables are used. After the table is written to disk, it's used to generate all required view updates. It's also resistant to restarts as it's stored on a hard drive in staging/ directory. Refs #3275	2018-11-13 15:04:42 +01:00
Piotr Sarna	dc74887ff3	streaming: add system distributed keyspace ref to streaming Streaming code needs system distributed keyspace to check if streamed sstables should be staging, so a proper reference is added.	2018-11-13 15:01:53 +01:00
Piotr Sarna	7ef5e1b685	streaming: add view update generator reference to streaming Streaming code may need view update generator service to generate and send view updates, so a proper reference is added.	2018-11-13 15:01:53 +01:00
Piotr Sarna	eb0c507a45	main: add generating missed mv updates from staging sstables If any sstables are found in the staging directory, it means that they missed generating view updates, so it's performed now.	2018-11-13 15:01:53 +01:00
Piotr Sarna	ca5dfdffc6	storage_service: move initializing sys_dist_ks before bootstrap Bootstrapping process may need system distributed keyspace to generate view updates, so initializing sys_dist_ks is moved before the bootstrapping process is launched.	2018-11-13 15:01:53 +01:00
Piotr Sarna	fc7267c797	db/view: add view_update_from_staging_generator service A shardable service for generating mv updates after restarts is added.	2018-11-13 15:01:52 +01:00
Piotr Sarna	ed05d91adc	db/view: add view updating consumer This consumer is used to generate and push view replica updates from read mutations.	2018-11-13 14:54:39 +01:00
Piotr Sarna	348fa3b092	table: add stream_view_replica_updates Generating view replica updates during streaming ignores the staging sstable that is used to generate them.	2018-11-13 14:52:22 +01:00
Piotr Sarna	fed9c59eb8	table: split push_view_replica_updates push_view_replica_updates is split in order to allow different mutation source to be provided.	2018-11-13 14:52:22 +01:00
Piotr Sarna	466d780445	table: add as_mutation_source_excluding A variant of table::as_mutation_source that allows excluding a single sstable is added.	2018-11-13 14:52:22 +01:00
Piotr Sarna	c825a17b9d	table: move push_view_replica_updates to table.cc	2018-11-13 14:52:22 +01:00
Piotr Sarna	a17fcb8d94	database: add populating tables with staging sstables After populating tables with regular sstables, same procedure is performed for staging sstables.	2018-11-13 14:52:22 +01:00
Piotr Sarna	19bf94fa8f	database: add creating /staging directory for sstables staging directory is now created on boot.	2018-11-13 14:52:22 +01:00
Piotr Sarna	e88b85134c	database: add sstable-excluding reader When generating view updates from a staging sstable, this sstable should not be used in the process. Hence, a reader that skips a single sstable is added.	2018-11-13 14:52:22 +01:00
Avi Kivity	a8203ca799	Update seastar submodule * seastar c02150e...a44cedf (5): > build: link against libatomic > dns.cc: Include name/address in resolver error messages > log: Print full error message for std::system_error > tests: test-utils: Add missing include > fstream: Introduce make_file_data_sink() Fixes #3894.	2018-11-13 03:28:16 -08:00
Piotr Sarna	160a6d58d2	table: add move_sstable_from_staging_in_thread function After materialized view updates are generated, the sstable should be moved from staging/ to a regular directory. It's expected to be called from seastar::async thread context.	2018-11-13 11:45:30 +01:00
Piotr Sarna	ff361ca877	sstables: add move_to_new_dir_in_thread function When moving sstables between directories, this helper function will create links and update generation and dir accordingly. It's expected to be called in thread context.	2018-11-13 11:45:30 +01:00
Piotr Sarna	b7977f4790	sstables: add staging directory to regex datadir/staging directory becomes a valid path for an sstable.	2018-11-13 11:45:30 +01:00
Piotr Sarna	e42d97060f	database: provide nonfrozen version of push_view_replica_updates Now it's also possible to pass a mutation to push to view replicas.	2018-11-13 11:45:30 +01:00
Piotr Sarna	642c3ae0e0	database: add subdir param to make_streaming_sstable_for_write This function allows specifying a subfolder to put a newly created sstable in - e.g. staging/ subfolder for streamed base table mutations.	2018-11-13 11:45:30 +01:00
Piotr Sarna	788e03433c	table: init table.cc file This file will be used to move table-related functions to it.	2018-11-13 11:45:30 +01:00
Piotr Sarna	8e053f9efb	database: add staging sstables to a map SSTables that belong to staging/ directory are put in the _sstables_staging map.	2018-11-13 11:45:30 +01:00
Piotr Sarna	3970808294	sstables: add is_staging() method This method returns true if the last part of directory structure is /staging.	2018-11-13 11:45:30 +01:00
Piotr Sarna	3f34312aa6	database: skip staging sstables in compaction Staging sstables are not part of the compaction process to ensure than each sstable can be easily excluded from view generation process that depends on the mentioned sstable.	2018-11-13 11:45:30 +01:00
Piotr Sarna	701d88e39f	database: add staging sstables map In order to keep track of staging sstables (used for mv updates), a map of them is now kept in table class.	2018-11-13 11:45:30 +01:00
Paweł Dziepak	6469a1b451	Merge "Write static rows for all partitions if there are static columns" from Vladimir " It appears that in case when there are any static columns in serialization header, Cassandra would write a (possibly empty) static row to every partition in the SSTables file. This patchset alings Scylla's logic with that of Cassandra. Note that Scylla optimizes the case when no partition contains a static row because it keeps track of updated columns that Scylla currently does not do - see #3901 for details. Fixes #3900. " * 'projects/sstables-30/write-all-static-rows/v1' of https://github.com/argenet/scylla: tests: Test writing empty static rows for partitions in tables with static columns. sstables: Ignore empty static rows on reading. sstables: Write empty static rows when there are static columns in the table.	2018-11-09 12:01:25 -08:00
Raphael S. Carvalho	1c5934c934	sstables: fix procedure to get fully expired sstables with MC format MC format lacks ancestors metadata, so we need to workaround it by using ancestors in metadata collector, which is only available for a sstable written during this instance. It works fine here because we only want to know if a sstable recently compacted has an ancestor which wasn't yet deleted. Fixes #3852. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <20181102154951.22950-1-raphaelsc@scylladb.com>	2018-11-06 09:28:37 +02:00
Vladimir Krivopalov	69b453fb69	tests: Test writing empty static rows for partitions in tables with static columns. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-05 13:47:30 -08:00
Vladimir Krivopalov	f767dfbb33	sstables: Ignore empty static rows on reading. Fixes #3900. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-05 13:47:30 -08:00
Vladimir Krivopalov	89051d37e3	sstables: Write empty static rows when there are static columns in the table. This is consistent with what Cassandra does. Fixes #3900. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-05 13:28:50 -08:00
Vladimir Krivopalov	2ebab69ce7	mutation_source_test: Use counter and collection columns in static rows. They are legal and should be covered along with atomic columns. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <a1c0e0f8c0c0f12b68af6df426370511f4e1253b.1541106233.git.vladimir@scylladb.com> [tgrabiec: fixed the patch title]	2018-11-02 10:33:27 +01:00
Vlad Zolotarov	2636395c65	locator: ec2_multi_region_snitch::start(): print a human readable error if Public IP may not be retrieved Public IP is required for Ec2MultiRegionSnitch. If it's not available different snitch should be used. This patch would result in a readable error message to be printed instead of just a cryptic message with HTTP response body. Fixes #3897 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-11-01 11:50:58 -04:00
Vlad Zolotarov	c462af5549	locator: ec2_multi_region_snitch::start(): rework on top of seastar::thread Rework ec2_multi_region_snitch::start() on top of seastar::async() in order to simplify the code. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-11-01 10:48:37 -04:00
Paweł Dziepak	1129134a4a	Merge "Convert sprint() calls to fmt" from Avi " The update to libfmt 5.2.1 brought with it a subtle change - calls to sprint("%s", 3) now throw a format_error instead of returning "3". To prevent such hidden (or not so hidden) bugs from lurking, convert all calls to the modern fmt syntax. Such conversion has several benefits: - prevent the bug from biting us - as fmt is being standardized, we can later move to std::format() - commonality with the logger format syntax (indeed, we may move the logger to use libfmt itself) During the conversion, some bugs were caught and fixed. These are presented in individual patches in the patchset. Most of the conversion was scripted, using https://github.com/avikivity/unsprint. Some sprint() calls remain, as they were too complex for the script. They will be converted later. " * tag 'fmt-1/v1' of https://github.com/avikivity/scylla: toplevel: convert sprint() to format() repair: convert sprint() to format() tests: convert sprint() to format() tracing: convert sprint() to format() service: convert sprint() to format() exceptions: convert sprint() to format() index: convert sprint() to format() streaming: convert sprint() to format() streaming: progress_info: fix format string api: convert sprint() to format() dht: convert sprint() to format() thrift: convert sprint() to format() locator: convert sprint() to format() gms: convert sprint() to format() db: convert sprint() to format() transport: convert sprint() to format() utils: convert sprint() to format() sstables: convert sprint() to format() auth: convert sprint() to format() cql3: convert sprint() to format() row_cache: fix bad format string syntax repair: fix bad format string syntax tests: fix bad format string syntax dht: fix bad format string syntax sstables: fix bad format string syntax utils: estimated_histogram: convert generated format strings to fmt tests: perf_fast_forward: rename "format" variable tests: perf_fast_forward: massage result of sprint() into std::string utils: i_filter: rename "format" variable system_keyspace: simplify complicated sprint() cql: convert Cql.g sprint()s to fmt types: get rid of PRId64 formatting	2018-11-01 13:16:17 +00:00
Avi Kivity	a71ab365e3	toplevel: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	51ce53738f	repair: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	f70ece9f88	tests: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	239ecec043	tracing: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	bb0eb9dae8	service: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	71fc5fb738	exceptions: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	7ae23d8f9b	index: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	fd513c42ad	streaming: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	8501e2a45d	streaming: progress_info: fix format string We try to escape % as \%, but the correct escape is %%.	2018-11-01 13:16:17 +00:00
Avi Kivity	da17c29bd3	api: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	82818758ca	dht: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	7a125c6634	thrift: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	0c33d13165	locator: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	e096fa2fde	gms: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	d77e044cde	db: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	5f79ff0f54	transport: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	be99101f36	utils: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	455f00e993	sstables: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	eb74fe784d	auth: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	cb7ee5c765	cql3: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	8cca3b2879	row_cache: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	6488b017c3	repair: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	bceff1550c	tests: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	7ff5569ee8	dht: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	738e713edf	sstables: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Avi Kivity	3cf434b863	utils: estimated_histogram: convert generated format strings to fmt Convert printf games to format games. Note that fmt supports specifying the field width as an argument, but that is left to a dedicated change.	2018-11-01 13:16:17 +00:00
Avi Kivity	8ca4b7abea	tests: perf_fast_forward: rename "format" variable The format local variable will soon alias with the format function which we intend to use in the same context. Rename it away to avoid a clash.	2018-11-01 13:16:17 +00:00
Avi Kivity	7908f09148	tests: perf_fast_forward: massage result of sprint() into std::string sprint() returns std::string(), but the new format() returns an sstring. Usually an sstring is wanted but in this case an sstring will fail as it is added to an std::string. Fix the failure (after spring->format conversion) by converting to an std::string.	2018-11-01 13:16:17 +00:00
Avi Kivity	7726ce23b7	utils: i_filter: rename "format" variable The format variable hides the format function, which we'll soon want to use here. Rename the format variable to unhide the function.	2018-11-01 13:16:17 +00:00
Avi Kivity	04b70a2ff8	system_keyspace: simplify complicated sprint() update_peer_info() uses two sprint()s where one would do, which confuses the sprint-to-fmt translator. Simplify the code by using just one call.	2018-11-01 13:16:17 +00:00
Avi Kivity	23e05a045b	cql: convert Cql.g sprint()s to fmt The only sprint() call had an extra complication due to quoting, which can be removed now.	2018-11-01 13:16:16 +00:00
Avi Kivity	8db8c01fbe	types: get rid of PRId64 formatting It's not needed for out sprint() implementation, and gets in the way of converting all formatting to fmt.	2018-11-01 13:16:16 +00:00
Avi Kivity	f170e3e589	Merge "dist: use perftune.py for disks tuning" from Vlad " Use perftune.py for tuning disks: - Distribute/pin disks' IRQs: - For NVMe drives: evenly among all present CPUs. - For non-NVMe drives: according to chosen tuning mode. - For all disks used by scylla: - Tune nomerges - Tune I/O scheduler. It's important to tune NIC and disks together in order to keep IRQ pinning in the same mode. Disk are detected and tuned based on the current content of /etc/scylla/scylla.yaml configuration file. " Fixes #3831. * 'use_perftune_for_disks-v3' of https://github.com/vladzcloudius/scylla: dist: change the sysconfig parameter name to reflect the new semantics scylla_util.py::sysconfig_parser: introduce has_option() dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics dist: don't distribute posix_net_conf.sh any more dist: use perftune.py to tune disks and NIC	2018-11-01 13:13:49 +00:00
Avi Kivity	96173e81e0	Update seastar submodule * seastar c1e0e5d...c02150e (5): > prometheus: pass names as query parameter instead of part of the URL > treewide: convert printf() style formatting to fmt > print: add fmt_print() > build: Remove experimental CMake support > Merge "Correct and clean-up `signal_test`" from Jesse	2018-11-01 13:13:48 +00:00
Yibo Cai (Arm Technology China)	79136e895f	utils/crc: calculate crc in parallel It achieves 2.0x speedup on intel E5 and 1.1x to 2.5x speedup on various arm64 microarchitectures. The algorithm cuts data into blocks of 1024 bytes and calculates crc for each block, which is furthur divided into three subblocks of 336 bytes(42 uint64) each, and 16 remaining bytes(2 uint64). For each iteration, three independent crc are caculated for one uint64 from each subgroup. It increases IPC(instructions per cycle) much. After subblocks are done, three crc and remaining two uint64 are combined using carry-less multiplication to reach the final result for one block of 1024 bytes. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1541042759-24767-1-git-send-email-yibo.cai@arm.com>	2018-11-01 10:19:32 +02:00
Vlad Zolotarov	84d341a12d	dist: change the sysconfig parameter name to reflect the new semantics We tune NIC and disks together now. Change the sysconfig parameter to reflect this new semantics. However if we detect an old parameter name in the scylla-server we would still update it thereby keeping the support for old installations. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:28:13 -04:00
Vlad Zolotarov	7950062a82	scylla_util.py::sysconfig_parser: introduce has_option() has_option() returns TRUE if a given configuration option is set. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:27:00 -04:00
Vlad Zolotarov	9a5373254a	dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics Change the name of the corresponding parameter (--setup-nic) to reflect the fact that we tune not just NIC now but rather NIC and disks together. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:27:00 -04:00
Vlad Zolotarov	c74e1a9368	dist: don't distribute posix_net_conf.sh any more We don't need it since we use perftune.py directly Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:27:00 -04:00
Vlad Zolotarov	0e47d8bb1d	dist: use perftune.py to tune disks and NIC Tune disks using perftune.py together with NIC. This is needed because disk(s) and NIC tuning has to be performed using the mode (for non-NVMe disks). We tune disks based on the current content of /etc/scylla/scylla.yaml. Don't use scylla-blocktune for optimizing disks' performance any more. Unite the decision to optimize the NIC and disks tuning. Optimize or not optimize them both together. Disable disk tuning for DPDK and "virtio" modes for now. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-31 15:27:00 -04:00
Takuya ASADA	5bf9a03d65	dist/debian: skip running dh_strip_nondeterminism On some Fedora environment dh build tries to run dh_strip_nondeterminism, and fails sice Fedora does not provide such command. (see: http://jenkins.cloudius-systems.com/view/master/job/scylla-master/job/unified-deb/3/console) To prevent the build error we need to skip it. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181030062935.9930-1-syuu@scylladb.com>	2018-10-31 10:23:54 +02:00
Tomasz Grabiec	62c7685b0d	Merge "Proper support for static rows in SSTables 3.x" from Vladimir This patchset addresses two issues with static rows support in SSTables 3.x. ('mc' format): 1. Since collections are allowed in static rows, we need to check for complex deletion, set corresponding flag and write tombstones, if any. 2. Column indices need to be partitioned for static columns the same way they are partitioned for regular ones. * github.com/argenet/scylla.git projects/sstables-30/columns-proper-order-followup/v1: sstables: Partition static columns by atomicity when reading/writing SSTables 3.x. sstables: Use std::reference_wrapper<> instead of a helper structure. sstables: Check for complex deletion when writing static rows. tests: Add/fix comments to test_write_interleaved_atomic_and_collection_columns. tests: Add test covering inverleaved atomic and collection cells in static row.	2018-10-30 10:36:46 +01:00
Vladimir Krivopalov	d82ac02fad	tests: Add test covering inverleaved atomic and collection cells in static row. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 15:01:34 -07:00
Vladimir Krivopalov	7bd95399ed	tests: Add/fix comments to test_write_interleaved_atomic_and_collection_columns. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 15:00:55 -07:00
Vladimir Krivopalov	6bd738ceb1	sstables: Check for complex deletion when writing static rows. It is possible to have collections in a static row so we need to check for collection-wide tombstones like with clustering rows. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 14:59:19 -07:00
Vladimir Krivopalov	6b7003088a	sstables: Use std::reference_wrapper<> instead of a helper structure. No need to store column_id separately as it can be accessed from the column_definition. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 14:58:08 -07:00
Vladimir Krivopalov	8592b834d1	sstables: Partition static columns by atomicity when reading/writing SSTables 3.x. Collections are permitted in static rows so same partitioning as for regular columns is required. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-29 10:32:02 -07:00
Takuya ASADA	2ac14dcf25	dist/redhat: prevent build error on older Fedora/CentOS Current scylla.spec fails build on Fedora 27, since python2-pystache is new package name that renamed on Fedora 28. But Fedora 28's python2-pystache has tag "Provides: pystache", so we can depends on old package name, this way we can build scylla.spec both on Fedora 27/28. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181028175450.31156-1-syuu@scylladb.com>	2018-10-29 11:36:40 +02:00
Yibo Cai (Arm Technology China)	1c48e3fbec	utils/crc: leverage arm64 crc extension It achieves 6.7x to 11x speedup on various arm64 microarchitectures. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1540781879-15465-1-git-send-email-yibo.cai@arm.com>	2018-10-29 10:50:48 +02:00
Nadav Har'El	b8337f8c9d	Materalized views: fix race condition in resharding while view building When a node reshards (i.e., restarts with a different number of CPUs), and is in the middle of building a view for a pre-existing table, the view building needs to find the right token from which to start building on all shards. We ran the same code on all shards, hoping they would all make the same decision on which token to continue. But in some cases, one shard might make the decision, start building, and make progress - all before a second shard goes to make the decision, which will now be different. This resulted, in some rare cases, in the new materialized view missing a few rows when the build was interrupted with a resharding. The fix is to add the missing synchronization: All shards should make the same decision on whether and how to reshard - and only then should start building the view. Fixes #3890 Fixes #3452 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181028140549.21200-1-nyh@scylladb.com>	2018-10-28 17:20:10 +00:00
Avi Kivity	75dbff984c	Merge "Re-order columns when reading/writing SSTables 3.x" from Vladimir " In Cassandra, row columns are stored in a BTree that uses the following ordering on them: - all atomic columns go first, then all multi-cell ones - columns of both types (atomic and multi-cell) are lexicographically ordered by name regarding each other Scylla needs to store columns and their respective indices using the same ordering as well as when reading them back. Fixes #3853 Tests: unit {release} + Checked that the following SSTables are dumped fine using Cassandra's sstabledump: cqlsh:sst3> CREATE TABLE atomic_and_collection3 ( pk int, ck int, rc1 text, rc2 list<text>, rc3 text, rc4 list<text>, rc5 text, rc6 list<text>, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''}; cqlsh:sst3> INSERT INTO atomic_and_collection3 (pk, ck, rc1, rc4, rc5) VALUES (0, 0, 'hello', ['beautiful','world'], 'here'); << flush >> sstabledump: [ { "partition" : { "key" : [ "0" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 96, "clustering" : [ 0 ], "liveness_info" : { "tstamp" : "1540599270139464" }, "cells" : [ { "name" : "rc1", "value" : "hello" }, { "name" : "rc5", "value" : "here" }, { "name" : "rc4", "deletion_info" : { "marked_deleted" : "1540599270139463", "local_delete_time" : "1540599270" } }, { "name" : "rc4", "path" : [ "45e22cb0-d97d-11e8-9f07-000000000000" ], "value" : "beautiful" }, { "name" : "rc4", "path" : [ "45e22cb1-d97d-11e8-9f07-000000000000" ], "value" : "world" } ] } ] } ] " * 'projects/sstables-30/columns-proper-order/v1' of https://github.com/argenet/scylla: tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x. sstables: Re-order columns (atomic first, then collections) for SSTables 3.x. sstables: Use a compound structure for storing information used for reading columns.	2018-10-28 10:56:09 +02:00
Rafi Einstein	32525f2694	Space-Saving Top-k algorithm for handling stream summary statistics Based on the following implementation ([2]) for the Space-Saving algorithm from [1]. [1] http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf [2] https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java The algorithm keeps a map between keys seen and their counts, keeping a bound on the number of tracked keys. Replacement policy evicts the key with the lowest count while inheriting its count, and recording an estimation of the error which results from that. This error estimation can be later used to prove if the distribution we arrived at corresponds to the real top-K, which we can display alongside the results. Accuracy depends on the number of tracked keys. Introduced as part of 'nodetool toppartition' query implementation. Refs #2811 Message-Id: <20181027220937.58077-1-rafie@scylladb.com>	2018-10-28 10:10:28 +02:00
Vladimir Krivopalov	f3dc2a4927	tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-26 16:58:34 -07:00
Vladimir Krivopalov	7e56e9fca6	sstables: Re-order columns (atomic first, then collections) for SSTables 3.x. In Cassandra, row columns are stored in a BTree that uses the following ordering on them: - all atomic columns go first, then all multi-cell ones - columns of both types (atomic and multi-cell) are lexicographically ordered by name regarding each other Since schema already has all columns lexicographically sorted by name, we only need to stably partition them by atomicity for that. Fixes #3853 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-26 15:58:33 -07:00
Vladimir Krivopalov	210507b867	sstables: Use a compound structure for storing information used for reading columns. This representation makes it easier to operate with compound structures instead of separate values that were stored in multiple containers. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-26 11:32:44 -07:00
Tomasz Grabiec	cf2d5c19fb	Merge "Properly write static rows missing columns for SSTables 3.x." from Vladimir Before this fix, write_missing_columns() helper would always deal with regular columns even when writing static rows. This would cause errors on reading those files. Now, the missing columns are written correctly for regular and static rows alike. * github.com/argenet/scylla.git projects/sstables-30/fix-writing-static-missing-columns/v1: schema: Add helper method returning the count of columns of specified kind. sstables: Honour the column kind when writing missing columns in 'mc' format. tests: Add test for a static row with missing columns (SStables 3.x.).	2018-10-26 09:06:01 +02:00
Vladimir Krivopalov	9843343ad8	tests: Add test for a static row with missing columns (SStables 3.x.). This is a test case for #3892. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-25 17:16:31 -07:00
Vladimir Krivopalov	44043cfd44	sstables: Honour the column kind when writing missing columns in 'mc' format. Previously, we've been writing the wrong missing columns indices for static rows because write_missing_columns() explicitly used regular columns internally. Now, it takes the proper column kind into account. Fixes #3892 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-25 17:09:09 -07:00
Vladimir Krivopalov	399f815a89	schema: Add helper method returning the count of columns of specified kind. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-25 17:07:20 -07:00
Tomasz Grabiec	dcac0ac80c	tests: sstables: Verify no index reads during scans which dont need it Reproducer for https://github.com/scylladb/scylla/issues/3868 Message-Id: <1540459849-27612-2-git-send-email-tgrabiec@scylladb.com>	2018-10-25 16:14:45 +03:00
Tomasz Grabiec	46d0c157ae	tests: sstables: Extract make_sstable_mutation_source() Message-Id: <1540459849-27612-1-git-send-email-tgrabiec@scylladb.com>	2018-10-25 16:14:39 +03:00
Tomasz Grabiec	fe0a0bdf1e	utils/loading_shared_values: Add missing stat update call in one of the cases Message-Id: <1540469591-32738-1-git-send-email-tgrabiec@scylladb.com>	2018-10-25 15:15:05 +03:00
Duarte Nunes	e46ef6723b	Merge seastar upstream * seastar d152f2d...c1e0e5d (6): > scripts: perftune.py: properly merge parameters from the command line and the configuration file > fmt: update to 5.2.1 > io_queue: only increment statistics when request is admitted > Adds `read_first_line.cc` and `read_first_line.hh` to CMake. > fstream: remove default extent allocation hint > core/semaphore: Change the access of semaphore_units main ctor Due to a compile-time fight between fmt and boost::multiprecision, a lexical_cast was added to mediate. sprint("%s", var) no longer accepts numeric values, so some sprint()s were converted to format() calls. Since more may be lurking we'll need to remove all sprint() calls. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-25 12:53:30 +03:00
Benny Halevy	2a57c454f2	update_compaction_history: handle execute_cql exception Fixes #3774 Tested using view_schema_test with and without injecting an exception in modification_statement::do_execute for "compaction_history". Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181017105758.9602-3-bhalevy@scylladb.com>	2018-10-24 18:39:53 +03:00
Benny Halevy	44e5c2643b	compaction_manager::maybe_stop_on_error: add stop_iteration param some call sites are stopping in any case, regardless of what maybe_stop_on_error returns. Reflect that in the log messages. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181017105758.9602-2-bhalevy@scylladb.com>	2018-10-24 18:39:52 +03:00
Avi Kivity	8210f4c982	Merge "Properly writing/reading shadowable deletions with SSTables 3.x." from Vladimir " This patchset adddresses two problems with shadowable deletions handling in SSTables 3.x. ('mc' format). Firstly, we previously did not set a flag indicating the presence of extended flags byte with HAS_SHADOWABLE_DELETION bitmask on writing. This would break subsequent reading and cause all types of failures up to crash. Secondly, when reading rows with this extended flag set, we need to preserve that information and create a shadowable_tombstone for the row. Tests: unit {release} + Verified manually with 'hexdump' and using modified 'sstabledump' that second (shadowable) tombstone is written for MV tables by Scylla. + DTest (materialized_views_test.py:TestMaterializedViews.hundred_mv_concurrent_test) that originally failed due to this issue has successfully passed locally. " * 'projects/sstables-30/shadowable-deletion/v4' of https://github.com/argenet/scylla: tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x. tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x. sstables: Support Scylla-specific extension for writing shadowable tombstones. sstables: Introduce a feature for shadowable tombstones in Scylla.db. memtable: Track regular and shadowable tombstones separately in encoding_stats_collector. sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion. sstables: Support checking row extension flags for Cassandra shadowable deletion.	2018-10-24 18:20:16 +03:00
Tomasz Grabiec	9e756d3863	sstable_mutation_reader: Do not read partition index when scanning Even when we're using a full clustering range, need_skip() will return true when we start a new partition and advance_context() will be called with position_in_partition::before_all_clustered_rows(). We should detect that there is no need to skip to that position before the call to advance_to(*_current_partition_key), which will read the index page. Fixes #3868. Message-Id: <1539881775-8578-1-git-send-email-tgrabiec@scylladb.com>	2018-10-24 15:55:13 +03:00
Avi Kivity	925ef48fce	Merge "Use relocatable package to generate .rpm/.deb" from Takuya " This patchset adds support generating .rpm/.deb from relocatable package. " * 'reloc_rpmdeb_v5' of https://github.com/syuu1228/scylla: configure.py: run create-relocatable-package.py everytime configure.py: add SCYLLA-RELEASE-FILE/SCYLLA-VERSION-FILE targets configure.py: use {mode} instead of $mode on scylla-package.tar.gz build target dist/ami: build relocatable .rpm when --localrpm specified dist/debian: use relocatable package to produce .deb dist/redhat: use relocatable package to produce .rpm install-dependencies.sh: add libsystemd as dependencies install.sh: drop hardcoded distribution name, add --target option to specify distribution build: add script to build relocatable package build: compress relocatable package build: add files on relocatable package to support generating .rpm/.deb	2018-10-24 14:44:09 +03:00
Takuya ASADA	59e4900ca7	configure.py: run create-relocatable-package.py everytime Right now we don't have dependencies for dist/, ninja not able to detect changes under the directory. To update relocatable package even only change is under dist/, we need to run create-relocatable-package.py everytime. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	6e1617d71c	configure.py: add SCYLLA-RELEASE-FILE/SCYLLA-VERSION-FILE targets To re-generate scylla version files when it removed, since these files required for relocatable package. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	0cb8a4cb0c	configure.py: use {mode} instead of $mode on scylla-package.tar.gz build target It's better to use {mode} to extract fixed path just like other build targets do. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	929f03533d	dist/ami: build relocatable .rpm when --localrpm specified Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	f3c3b9183c	dist/debian: use relocatable package to produce .deb Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	8e2dc9e4f4	dist/redhat: use relocatable package to produce .rpm Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	5fa7ed52e3	install-dependencies.sh: add libsystemd as dependencies Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	ce4067ca02	install.sh: drop hardcoded distribution name, add --target option to specify distribution Allow user to build .rpm for Fedora, need to support specifying distribution. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	6319229020	build: add script to build relocatable package To build relocatable package easier, add build_reloc.sh to build it in one command. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	a502715b29	build: compress relocatable package Since debian packaging system requires source package to compress tar file, so let's use .gz compression. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Takuya ASADA	85fed12c07	build: add files on relocatable package to support generating .rpm/.deb We are missing some files on relocatable package to generate .rpm/.deb, add them. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2018-10-24 11:29:47 +00:00
Paweł Dziepak	637b9a7b3b	atomic_cell_or_collection: make operator<< show cell content After the new in-memory representation of cells was introduced there was a regression in atomic_cell_or_collection::operator<< which stopped printing the content of the cell. This makes debugging more incovenient are time-consuming. This patch fixes the problem. Schema is propagated to the atomic_cell_or_collection printer and the full content of the cell is printed. Fixes #3571. Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>	2018-10-24 13:29:51 +03:00
Avi Kivity	a9836ad758	thrift: limit message size Limit message size according to the configuration, to avoid a huge message from allocating all of the server's memory. We also need to limit memory used in aggregate by thrift, but that is left to another patch. Fixes #3878. Message-Id: <20181024081042.13067-1-avi@scylladb.com>	2018-10-24 09:57:58 +01:00
Raphael S. Carvalho	c958294991	tests/sstable_perf: fix compaction mode for a multi shard instance Compaction mode fails if more than one shard is used because it doesn't make sure sstables used as input for compaction only contain local keys. Therefore, sstable generated by compaction has less keys than expected because non-local keys are purged out. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181022225153.12029-1-raphaelsc@scylladb.com>	2018-10-24 09:58:34 +03:00
Glauber Costa	fc5635100d	install seastar-addr2line and seastar-cpumap into scylla packages It is very useful for investigations in scylla issues, and we have been moving those scripts manually when needed. Make it officially part of the scylla package. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181023184400.23187-1-glauber@scylladb.com>	2018-10-24 09:52:17 +03:00
Amnon Heiman	6bcde841bd	scyllatop: Nicer error message when fail opening a log file or connecting scyllatop uses a log file, if opening the file fails, the user should get a clear response not an exception trace. The same is true for connecting to scylla After this patch the following: $ scyllatop.py -L /usr/lib/scyllatop.log scyllatop failed opening log file: '/usr/lib/scyllatop.log' With an error: [Errno 13] Permission denied: '/usr/lib/scyllatop.log' Fixes #3860 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20181021065525.22749-1-amnon@scylladb.com>	2018-10-24 09:50:45 +03:00
Vlad Zolotarov	4d1bb719a4	config: enable hinted handoff by default Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181019180401.12400-1-vladz@scylladb.com>	2018-10-24 09:47:36 +03:00
Vladimir Krivopalov	ad599d4342	tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	3dcf0acfc2	tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	759d36a26e	sstables: Support Scylla-specific extension for writing shadowable tombstones. The original SSTables 'mc' format, as defined in Cassandra, does not provide a way to store shadowable deletion in addition to regular row deletion for materialized views. It is essential to store it because of known corner-case issues that otherwise appear. For this to work, we introduce a Scylla-specific extended flag to be set in SSTables in 'mc' format that indicates a shadowable tombstone is written after the regular row tombstone. This is deemed to be safe because shadowable tombstones are specific to materialized views and MV tables are not supposed to be imported or exported. Note that a shadowable tombstone can be written without a regular tombstone as well as along with it. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	e168433945	sstables: Introduce a feature for shadowable tombstones in Scylla.db. This is used to indicate that the SSTables being read may contain a Scylla-specific HAS_SCYLLA_SHADOWABLE_TOMBSTONE extended flag set. If feature is not disabled, we should not honour this flag. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	a95ba2f38a	memtable: Track regular and shadowable tombstones separately in encoding_stats_collector. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	b7d48c1ccd	sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion. This flag can be only set in MV tables that are not supported to be imported to Scylla. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	8f79f76116	sstables: Support checking row extension flags for Cassandra shadowable deletion. This flag can be only used in MV tables that are not supposed to be imported to Scylla. Since Scylla representation of shadowable tombstones differs from that of Cassandra, such SSTables are rejected on read and Scylla never sets this flag on writing. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Avi Kivity	1533487ba8	Merge "hinted handoff: give a sender a low priority" from Vlad " Hinted handoff should not overpower regular flows like READs, WRITEs or background activities like memtable flushes or compactions. In order to achieve this put its sending in the STEAMING CPU scheduling group and its commitlog object into the STREAMING I/O scheduling group. Fixes #3817 " * 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla: db::hints::manager: use "streaming" I/O scheduling class for reads commitlog::read_log_file(): set the a read I/O priority class explicitly db::hints::manager: add hints sender to the "streaming" CPU scheduling group	2018-10-23 16:55:05 +00:00
Raphael S. Carvalho	65e8853e8d	tests: test that sstable cleanup wont get rid of key which token belongs to node Commit `1ce52d54` fixed sort order of local ranges, which is needed for cleanup to work properly because it relies on that to perform a binary search. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181023031322.22763-1-raphaelsc@scylladb.com>	2018-10-23 16:55:05 +00:00
Avi Kivity	d9e0ea6bb0	config: mark range_request_timeout_in_ms and request_timeout_in_ms as Used This makes them available in scylla --help. Fixes #3884. Message-Id: <20181023101150.29856-1-avi@scylladb.com>	2018-10-23 11:52:03 +01:00
Paweł Dziepak	c94d2b6aa6	cql3: restore original timeout behaviour for aggregate queries Commit `1d34ef38a8` "cql3: make pagers use time_point instead of duration" has unintentionally altered the timeout semantics for aggregate queries. Such requests fetch multiple pages before sending a response to the client. Originally, each of those fetches had a timeout-duration to finish, after the problematic commit the whole request needs to complete in a single timeout-duration. This, unsurprisingly, makes some queries that were successful before fail with a timeout. This patch restores the original behaviour. Fixes #3877. Message-Id: <20181022125318.4384-1-pdziepak@scylladb.com>	2018-10-23 12:52:42 +03:00
Takuya ASADA	950dbdb466	dist/common/sysctl.d: add new conf file to set fs.aio-max-nr We need raise fs.aio-max-nr to larger value since Seastar may allocates more then 65535 AIO events (= kernel default value) Fixes #3842 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181023030449.15445-1-syuu@scylladb.com>	2018-10-23 11:01:07 +03:00
Tomasz Grabiec	a34e417874	Merge "Stabilise perf_fast_forward results" from Paweł his series attempts to make fragments per second results reported by perf_fast_forward more stable. That includes running each test case multiple time and reporting median, median average deviation, maximum and minimum value. That should allow to relatively easily assess how repeatable the presented results are. Moreover, since perf_fast_forward does IO operation it is important that they do not introduce any excessive noise to the results. The location of the data directory is made configurable so that the user can choose less noisy disk or a ramdisk. * github.com/pdziepak/scylla.git stabilise-perf_fast_forward/v3: tests/perf_fast_forward: make fragments/s measurements more stable tests/perf_fast_forward: make data directory location configurable	2018-10-22 18:33:25 +02:00
Avi Kivity	d5d831f41b	tests: network_topology_strategy_test: remove quadratic complexity network_topology_strategy test creates a ring with hundreds of tokens (and one token per node). Then, for each token, it calls get_primary_ranges(), which in turn walks the token ring. However, because the each datacenter occupies a disjoint token range, this walk practically has to walk the entire ring until it collects enough endpoints for each datacenter. The whole thing takes 15 minutes. Speed this up by randomizing the token<->dc relationship. This is more realistic, and switches the algorithm to be O(token count), and now it completes in less than a minute (still not great, but better). Message-Id: <20181022154026.19618-1-avi@scylladb.com>	2018-10-22 17:06:57 +01:00
Paweł Dziepak	63a705dca3	tests/perf_fast_forward: make data directory location configurable perf_fast_forward populates perf_fast_forward_output with some data and then runs performance tests that read it. That makes the disk a significant factor in the final result and may make the results less repeatable. This patch adds a flag that allows setting the location of the data directory so that the user can opt for a less noisy disk or a ramdisk.	2018-10-22 16:52:58 +01:00
Paweł Dziepak	29e872f865	tests/perf_fast_forward: make fragments/s measurements more stable perf_fast_forward performs various operations, many of which involve sstable reads and verifies the metrics that there weren't any unnecessary IO operations. It also provides fragments per seconds measurements for the tests it runs. However, since some of the tests are very short and involve IO those values vary a lot what makes them not very useful. This commit attempts to stabilise those results. Each test case is run multiple time (by default for a second, but at least 3 times) and shows median, median absolute deviation, maximum and minimum value. This should allow assessing whether the changes in the results are just noise or a real regression or improvement.	2018-10-22 16:52:58 +01:00
Duarte Nunes	f3a5ec0fd9	db/view: Don't copy keyspace name Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181022104527.14555-1-duarte@scylladb.com>	2018-10-22 13:00:00 +02:00
George Kollias	c2343dc841	Make restricting reader fill_buffer more efficient Currently, restricting_mutation_reader::fill_buffer justs reads lower-layer reader's fragments one by one without doing any further transformations. This change just swaps the parent-child buffers in a single step, as suggested in #3604, and, hence, removing any possible per-fragment overhead. I couldn't find any test that exercises restricting_mutation_reader as a mutation source, so I added test_restricted_reader_as_mutation_source in mutation_reader_test. Tests: unit (release), though these 4 tests are failing regardless of my changes (they fail on master for me as well): snitch_reset_test, sstable_mutation_test, sstable_test, sstable_3_x_test. Fixes: #3604 Signed-off-by: George Kollias <georgioskollias@gmail.com> Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>	2018-10-22 11:36:54 +03:00
Duarte Nunes	3fe92663d4	Merge 'Fix for a select statement with filtered columns' from Eliran " This patchset fixes #3803. When a select statement with filtering is executed and the column that is needed for the filtering is not present in the select clause, rows that should have been filtered out according to this column will still be present in the result set. Tests: 1. The testcase from the issue. 2. Unit tests (release) including the newly added test from this patchset. " * 'issues/3803/v10' of https://github.com/eliransin/scylla: unit test: add test for filtering queries without the filtered column cql3 unit test: add assertion for the number of serialized columns cql3: ensure retrieval of columns for filtering cql3: refactor find_idx to be part of statement restrictions object cql3: add prefix size common functionality to all clustering restrictions cql3: rename selection metadata manipulation functions	2018-10-21 09:53:37 +01:00
Eliran Sinvani	145f931ae7	unit test: add test for filtering queries without the filtered column Test the usecase where the column that the filtering operates on is not a part of the select clause. The expected result is a set containing the columns of the select clause with the additional columns for filtering marked as non serializable. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:41:46 +03:00
Eliran Sinvani	86637a1d0d	cql3 unit test: add assertion for the number of serialized columns The result sets that the assertions are performed against are result sets before serialization to the user and therefore contain also columns that will not be serialized and sent as the query's final result. The patch adds an assertion on the number of columns that will be present in the final serialized result. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:41:46 +03:00
Eliran Sinvani	fd422c954e	cql3: ensure retrieval of columns for filtering When a query that needs filtering is executed, the columns that the coordinator is filtering by have to be retrieved.The columns should be retrieved even if they are not used for ordering or named in the actual select clause. If the columns are missing from the result set, then any filtering that restricts the missing column will not take place. Fixes #3803 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:41:46 +03:00
Eliran Sinvani	3e036e2c8c	cql3: refactor find_idx to be part of statement restrictions object find_idx calculates the index that will be used in the statement if indexes are to be used. In the static form it requires redundant information (the schema is already contained within the statement restrictions object). In addition find_idx will need to be used for filtering in order not to include redundant selectors in the selection objects. This change refactors find_idx to run under the statement restrictions object and changes it's scope from private to public. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:40:24 +03:00
Eliran Sinvani	4496086bf1	cql3: add prefix size common functionality to all clustering restrictions Up untill now, knowing the prefix size, which is used to determine if a filtering is needed was implemented only for a single column clustering restrictions. The patch adds a function to calculate the prefix size for all types of clustering key restrictions given the schema. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-21 08:39:57 +03:00
Vlad Zolotarov	a87c11bad2	storage_proxy::query_result_local: create a single tracing span on a replica shard Every call of a tracing::global_trace_state_ptr object instead of a tracing::tracing_state_ptr or a call to tracing::global_trace_state_ptr::get() creates a new tracing session (span) object. This should never be done unless query handling moves to a different shard. Fixes #3862 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181018003500.10030-1-vladz@scylladb.com>	2018-10-19 16:47:17 +00:00
Tomasz Grabiec	fc37b80d24	Merge "Correctly handle dropped columns in SSTable 3" from Piotr J. Previously we were making assumptions about missing columns (the size of its value, whether it's a collection or a counter) but they didn't have to be always true. Now we're using column type from serialization header to use the right values. Fixes #3859 * seastar-dev.git haaawk/projects/sstables-30/handling-dropped-columns/v4: sstables 3: Correctly handle dropped columns in column_translation sstables 3: Add test for dropped columns handling	2018-10-19 16:47:17 +00:00
Duarte Nunes	3a53b3cebc	Merge 'hinted handoff: add manager::state and split storing and replaying enablement' from Vlad " Refs #3828 (Probably fixes it) We found a few flaws in a way we enable hints replaying. First of all it was allowed before manager::start() is complete. Then, since manager::start() is called after messaging_service is initialized there was a time window when hints are rejected and this creates an issue for MV. Both issues above were found in the context of #3828. This series fixes them both. Tested {release}: dtest: materialized_views_test.py:TestMaterializedViews.write_to_hinted_handoff_for_views_test dtest: hintedhandoff_additional_test.py " * 'hinted_handoff_dont_create_hints_until_started-v1' of https://github.com/vladzcloudius/scylla: hinted handoff: enable storing hints before starting messaging_service db::hints::manager: add a "started" state db::hints::manager: introduce a _state	2018-10-19 16:47:16 +00:00
Avi Kivity	1ce52d5432	locator: fix abstract_replication_strategy::get_ranges() and friends violating sort order get_ranges() is supposed to return ranges in sorted order. However, `a35136533d` broke this and returned the range that was supposed to be last in the second position (e.g. [0, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9]). The broke cleanup, which relied on the sort order to perform a binary search. Other users of the get_ranges() family did not rely on the sort order. Fixes #3872. Message-Id: <20181019113613.1895-1-avi@scylladb.com>	2018-10-19 16:47:12 +00:00
Vlad Zolotarov	aca0882a3f	hinted handoff: enable storing hints before starting messaging_service When messaging_service is started we may immediately receive a mutation from another node (e.g. in the MV update context). If hinted handoff is not ready to store hints at that point we may fail some of MV updates. We are going to resolve this by start()ing hints::managers before we start messaging_service and blocking hints replaying until all relevant objects are initialized. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:49:58 -04:00
Vlad Zolotarov	cff4186517	db::hints::manager: add a "started" state Hinting is allowed after "started" before "stopping". Hints that attempted to be stored outside this time frame are going to be dropped. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:41:36 -04:00
Vlad Zolotarov	fb513a4b23	db::hints::manager: introduce a _state Introduce a multi-bit state field. In this patch it replaces the _stopping boolean. We are going to add more states in the following patches. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:41:33 -04:00
Piotr Jastrzebski	e94254b563	sstables 3: Add test for dropped columns handling Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-18 19:13:58 +02:00
Piotr Jastrzebski	cafb3dc2ae	sstables 3: Correctly handle dropped columns in column_translation Previously we were making assumptions about missing columns (the size of its value, whether it's a collection or a counter) but they didn't have to be always true. Now we're using column type from serialization header to use the right values. Fixes #3859 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-18 19:13:44 +02:00
Eliran Sinvani	ded3a03356	cql3: rename selection metadata manipulation functions In the past the addition of non serializable columns was being used only for post ordering of result sets.The newly added ALLOW FILTERING feature will need to use these functions to other post processing operations i.e filtering. The renaming accounts for the new and existing uses for the function. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2018-10-18 17:52:04 +03:00
Avi Kivity	472afea6cd	Update seastar submodule * seastar 4669469...d152f2d (5): > build: don't link with libgcc_s explicitly > scheduling: add std::hash<seastar::scheduling_group> > prometheus: Allow preemption between each metric > Merge "improve memory detection in containers" from Juliana > Merge "perf_tests: produce json reports" from Paweł	2018-10-18 14:55:18 +03:00
Duarte Nunes	7610cedc34	Merge "db/hints: Expose current backlog" from Duarte " Hints are stored on disk by a hints::manager, ensuring they are eventually sent. A hints::resource_manager ensures the hints::managers it tracks don't consume more than their allocated resources by monitoring disk space and disabling new hints if needed. This series fixes some bugs related to the backlog calculation, but mainly exposes the backlog through a hints::manager so upper layers can apply flow control. Refs #2538 " * 'hh-manager-backlog/v3' of https://github.com/duarten/scylla: db/hints/manager: Expose current backlog db/hints/manager: Move decision about blocking hints to the manager db/hints/resource_manager: Correctly account resources in space_watchdog db/hints/resource_manager: Replace timer with seastar::thread db/hints/resource_manager: Ensure managers are correctly registered db/hints/resource_manager: Fix formatting db/hints: Disallow moving or copying the managers	2018-10-16 20:35:34 +01:00
Duarte Nunes	624472d16a	db/hints/manager: Expose current backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:35:00 +01:00
Duarte Nunes	6dcb7a39d4	db/hints/manager: Move decision about blocking hints to the manager The space_watchdog enables or disables hints for the managers associated with a particular device. We encapsulate this decision inside the hints::managers by introducing the update_backlog() function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:35:00 +01:00
Duarte Nunes	207c9c8e38	db/hints/resource_manager: Correctly account resources in space_watchdog A db::hints::resource_manager manages the resources for one or two db::hints::managers. Each of these can be using the same or different devices. The db::hints::space_watchdog periodically checks whether each manager is within their resource allocation, and if not disables it. The watchdog iterates over the managers and accounts for the total size they are using. This is wrong, since it can account in the same variable the size consumed by managers using different devices. We fix this while taking advantage of the fact that on_timer is now called in the context of a seastar::thread, instead of using future combinators. Fixes #3821 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:34:54 +01:00
Duarte Nunes	25d266bdc1	db/hints/resource_manager: Replace timer with seastar::thread Will make on_timer() much simpler to allow fixing a bug in subsequent patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	278aa13bb0	db/hints/resource_manager: Ensure managers are correctly registered Registering a manager for a new device used std::unordered_map::emplace(), which may not insert the specified value if one with the same key has already been added. This could happen if both managers were using the same device and the fiber deferred in-between adding them. Found during code reading. Could cause hints to not be disabled for an overloaded manager. Fixes #3822 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	9e3b09cf48	db/hints/resource_manager: Fix formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	622ac734da	db/hints: Disallow moving or copying the managers Disable the copy and move ctors and assignment operators for both the hints::manager and the hints::resource_manager. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Glauber Costa	7edae5421d	sstables: print sstable path in case of an exception Without that, we don't know where to look for the problems Before: compaction failed: sstables::malformed_sstable_exception (Too big ttl: 3163676957) After: compaction_manager - compaction failed: sstables::malformed_sstable_exception (Too big ttl: 4294967295 in sstable /var/lib/scylla/data/system_traces/events-8826e8e9e16a372887533bc1fc713c25/mc-832-big-Data.db) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181016181004.17838-1-glauber@scylladb.com>	2018-10-16 20:31:20 +01:00
Asias He	7f826d3343	streaming: Expose reason for streaming On receiving a mutation_fragment or a mutation triggered by a streaming operation, we pass an enum stream_reason to notify the receiver what the streaming is used for. So the receiver can decide further operation, e.g., send view updates, beyond applying the streaming data on disk. Fixes #3276 Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com>	2018-10-15 22:03:28 +01:00
Benny Halevy	7eef527769	handle both special token_kinds in dht::tri_compare Handle the before_all_keys and after_all_keys token_kind at the highest layer before calling into the virtual i_partitioner::tri_compare that is not set up to handle these cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181015165612.29356-1-bhalevy@scylladb.com>	2018-10-15 20:00:54 +03:00
Glauber Costa	51906f7144	compactions: log tokens that we decide not to write down to an SSTable May be important when debugging issues related to cleanups Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181015162643.7834-1-glauber@scylladb.com>	2018-10-15 19:28:00 +03:00
Vladimir Krivopalov	092276b13d	sstables: Reset opened range tombstone when moving to another partition. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <f6dc6b0bd88ca44f2ef84c2a8bee43fde82c89cc.1539396572.git.vladimir@scylladb.com>	2018-10-14 11:20:11 +03:00
Vladimir Krivopalov	926b6430fd	sstables: Factor out code resetting values for a new partition. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <83a3a4ce6942b036be447bcfeb66142828e75293.1539396572.git.vladimir@scylladb.com>	2018-10-14 11:20:10 +03:00
Glauber Costa	98332de268	api: use longs instead of ints for snapshot sizes Int types in json will be serialized to int types in C++. They will then only be able to handle 4GB, and we tend to store more data than that. Without this patch, listsnapshots is broken in all versions. Fixes: #3845 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181012155902.7573-1-glauber@scylladb.com>	2018-10-12 21:17:24 +03:00
Tomasz Grabiec	b89556512a	Merge "Enable sstable_mutation_test with SSTables 3.x." from Vladimir Introduce uppermost_bound() method instead of upper_bound() in mutation_fragment_filter and clustering_ranges_walker. For now, this has been only used to produce the final range tombstone for sliced reads inside consume_partition_end(). Usage of the upper bound of the current range causes problems of two kinds: 1. If not all the slicing ranges have been traversed with the clustering range walker, which is possible when the last read mutation fragment was before some of the ranges and reading was limited to a specific range of positions taken from index, the emitted range tombstone will not cover the untraversed slices. 2. At the same time, if all ranges have been walked past, the end bound is set to after_all_clustered_rows and the emitted RT may span more data than it should. To avoid both situations, the uppermost bound is used instead, which refers to the upper bound of the last range in the sequence. * github.com/scylladb/seastar-dev.git haaawk/projects/sstables-30/enable-mc-with-sstable-mutation-test/v2 sstables: Use uppermost_bound() instead of upper_bound() in mutation_fragment_filter. tests: Enable sstable_mutation_test for SSTables 'mc' format. Rebased by Piotr J.	2018-10-12 15:14:17 +02:00
Vladimir Krivopalov	5b03fe7982	tests: Enable sstable_mutation_test for SSTables 'mc' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-12 14:18:15 +02:00
Vladimir Krivopalov	199dc9d5a7	sstables: Use uppermost_bound() instead of upper_bound() in mutation_fragment_filter. For now, this has been only used to produce the final range tombstone for sliced reads inside consume_partition_end(). Usage of the upper bound of the current range causes problems of two kinds: 1. If not all the slicing ranges have been traversed with the clustering range walker, which is possible when the last read mutation fragment was before some of the ranges and reading was limited to a specific range of positions taken from index, the emitted range tombstone will not cover the untraversed slices. 2. At the same time, if all ranges have been walked past, the end bound is set to after_all_clustered_rows and the emitted RT may span more data than it should. To avoid both situations, the uppermost bound is used instead, which refers to the upper bound of the last range in the sequence. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-12 14:18:15 +02:00
Tomasz Grabiec	193efef950	Merge "Make SST3 pass test_clustering_slices test" from Piotr * seastar-dev.git haaawk/sst3/test_clustering_slices/v8: sstables: Extract on_end_of_stream from consume_partition_end sstables: Don't call consume_range_tombstone_end in consume_partition_end sstables: Change the way fragments are returned from consumer	2018-10-12 14:11:51 +02:00
Piotr Jastrzebski	1a6cef80f0	sstables: Change the way fragments are returned from consumer Split range tombstone (if present) on every consume_row_end call and store both range tombstone and row in different fields called _stored_row and _stored_tombstone instead of using single field called _stored. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-12 13:51:39 +02:00
Piotr Jastrzebski	3109c94c84	sstables: Don't call consume_range_tombstone_end in consume_partition_end We don't need to check _opened_range_tombstone and _mf_filter again Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-12 13:51:28 +02:00
Piotr Jastrzebski	7dcea660e8	sstables: Extract on_end_of_stream from consume_partition_end The new function will be called when the stream of data is finished while old consume_partition_end will be called when partition is finished but stream is not done yet. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-12 13:50:52 +02:00
Piotr Jastrzebski	717cb2a9e7	sstables: Adopt test_clustering_slices test for SST3 Readers for SST3 return a bit more precise range tombstones when reader is slicing. Namely, SST2 readers return whole range tombstones that overlap with slicing range but SST3 trim those range tombstones to slicing range. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-11 15:47:47 +02:00
Tomasz Grabiec	a7a14e3af2	Merge "Handle dead row markers when writing to SSTables 3.x" from Vladimir There is a mismatch between row markers used in SSTables 2.x (ka/la) and liveness_info used by SSTables 3.x (mc) in that a row marker can be written as a deleted cell but liveness_info cannot. To handle this, for a dead row marker the corresponding liveness_info is written as expiring liveness_info with a fake TTL set to 1. This approach is adapted from the solution for CASSANDRA-13395 that exercised similar issue during SSTables upgrades. * github.com/argenet/scylla.git projects/sstables-30/dead-row-marker/v7: sstables: Introduce TTL limitation and special 'expired TTL' value. sstables: Write dead row marker as expired liveness info. tests: Add test covering dead row marker writing to SSTables 3.x.	2018-10-11 10:58:57 +02:00
Gleb Natapov	ceb361544a	stream_session: remove unused capture 'Consumer function' parameter for distribute_reader_and_consume_on_shards() captures schema_ptr (which is a seastar::shared_ptr), but the function is later copied on another shard at which point schema_ptr is also copied and its counter is incremented by the wrong shard. The capture is not even used, so lets just drop it. Fixes #3838 Message-Id: <20181011075500.GN14449@scylladb.com>	2018-10-11 11:10:58 +03:00
Botond Dénes	23f3831aaf	table::make_streaming_reader(): add forwarding parameter The single-range overload, when used by make_multishard_streaming_reader(), has to create a reader that is forwardable. Otherwise the multishard streaming reader will not produce any output as it cannot fast-forward its shard readers to the ranges produced by the generator. Also add a unit test, that is based on the real-life purpose the multishard streaming reader was designed for - serving partition from a shard, according to a sharding configuration that is different than the local one. This is also the scenario that found the buf in the first place. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <bf799961bfd535882ede6a54cd6c4b6f92e4e1c1.1539235034.git.bdenes@scylladb.com>	2018-10-11 10:59:18 +03:00
Vlad Zolotarov	5b12ec441d	db::hints::manager: use "streaming" I/O scheduling class for reads Make sure that read I/O in the context of HH sending do not overpower I/O in the context of queries, memtable flushes or compactions. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vlad Zolotarov	a89188de07	commitlog::read_log_file(): set the a read I/O priority class explicitly Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vlad Zolotarov	629972d586	db::hints::manager: add hints sender to the "streaming" CPU scheduling group Make sure that HH sends do not overpower (CPU wise) regular WRITEs flow. Fixes #3817 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vladimir Krivopalov	9a04200b03	tests: Add test covering dead row marker writing to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-10 11:44:54 -07:00
Vladimir Krivopalov	9c773fa6cf	sstables: Write dead row marker as expired liveness info. This allows to distinguish expired liveness info from yet-to-expire one and convert it into a dead row marker on read. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-10 11:44:14 -07:00
Vladimir Krivopalov	e71cc5ab20	sstables: Introduce TTL limitation and special 'expired TTL' value. This allows to store expired liveness info in SSTables 3.x format without introducing a possible conflict with real TTL values. As per Cassandra, TTL cannot exceed 20 years so taking the maximum value as a special value for indicating expired liveness info is safe. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-10 11:44:14 -07:00
Calle Wilund	3cb50c861d	messaging_service: Make rpc streaming sink respect tls connection Fixes #3787 Message service streaming sink was created using direct call to rpc::client::make_sink. This in turn needs a new socker, which it creates completely ignoring what underlying transport is active for the client in question. Fix by retaining the tls credential pointer in the client wrapper, and using this in a sink method to determine whether to create a new tls socker, or just go ahead with a plain one. Message-Id: <20181010003249.30526-1-calle@scylladb.com>	2018-10-10 12:55:28 +03:00
Avi Kivity	1891779e64	Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte " This series changes hinted handoff to work with `frozen_mutation`s instead of naked `mutation`s. Instead of unfreezing a mutation from the commitlog entry and then freezing it again for sending, now we'll just keep the read, frozen mutation. Tests: unit(release) " * 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla: db/hints/manager: Use frozen_mutation instead of mutation db/hints/manager: Use database::find_schema() db/commitlog/commitlog_entry: Allow moving the contained mutation service/storage_proxy: send_to_endpoint overload accepting frozen_mutation service/storage_proxy: Build a shared_mutation from a frozen_mutation service/storage_proxy: Lift frozen_mutation_and_schema service/storage_proxy: Allow non-const ranges in mutate_prepare()	2018-10-09 17:48:18 +03:00
Piotr Sarna	a93d27960c	tests: add secondary index paging unit test case A simple case for SI paging is added to secondary_index_test suite. This commit should be followed by more complex testing and serves as an example on how to extract paging state and use it across CQL queries. Message-Id: <b22bdb5da1ef8df399849a66ac6a1f377e6a650a.1539090350.git.sarna@scylladb.com>	2018-10-09 15:05:20 +01:00
Avi Kivity	cfab7a2be6	Update seastar submodule * seastar ed44af8...4669469 (2): > prometheus: Fix histogram text representation > reactor: count I/O errors Fixes #3827.	2018-10-09 16:36:47 +03:00
Gleb Natapov	319ece8180	storage_proxy: do not pass write_stats down to send_to_live_endpoints write_stats is referenced from write handler which is available in send_to_live_endpoints already. No need to pass it down. Message-Id: <20181009133017.GA14449@scylladb.com>	2018-10-09 16:33:53 +03:00
Botond Dénes	d467b518bc	multishard_mutation_query(): don't attempt to stop broken readers Currently, when stopping a reader fails, it simply won't be attempted to be saved, and it will be left in the `_readers` array as-is. This can lead to an assertion failure as the reader state will contain futures that were already waited upon, and that the cleanup code will attempt to wait on again. To prevent this, when stopping a reader fails, reset it to nonexistent state, so that the cleanup code doesn't attempt to do anything with it. Refs: #3830 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <a1afc1d3d74f196b772e6c218999c57c15ca05be.1539088164.git.bdenes@scylladb.com>	2018-10-09 15:59:50 +03:00
Gleb Natapov	207b57a892	storage_proxy: count number of timed out write attempts after CL is reached It is useful to have this counter to investigate the reason for read repairs. Non zero value means that writes were lost after CL is reached and RR is expected. Message-Id: <20181009120900.GF22665@scylladb.com>	2018-10-09 15:17:07 +03:00
Piotr Sarna	b3685342a6	service/pager: avoid dereferencing null partition key The pager::state() function returns a valid paging object even if the pager itself is exhausted. It may also not contain the partition key, so using it unconditionally was a bug - now, in case there is no partition key present, paging state will contain an empty partition key. Fixes #3829 Message-Id: <28401eb21ab8f12645c0a33d9e92ada9de83e96b.1539074813.git.sarna@scylladb.com>	2018-10-09 12:13:52 +03:00
Botond Dénes	4bb0bbb9e2	database: add make_multishard_streaming_reader() Creates a streaming reader that reads from all shards. Shard readers are created with `table::make_streaming_reader()`. This is needed for the new row-level repair. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4b74c710bed2ef98adf07555a4c841e5b690dd8c.1538470782.git.bdenes@scylladb.com>	2018-10-09 11:07:47 +03:00
Botond Dénes	3eeb6fbd23	table::make_streaming_reader(): add single-range overload This will be used by the `make_multishard_streaming_reader()` in the next patch. This method will create a multishard combining reader which needs its shard readers to take a single range, not a vector of ranges like the existing overload. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <cc6f2c9a8cf2c42696ff756ed6cb7949b95fe986.1538470782.git.bdenes@scylladb.com>	2018-10-09 11:07:46 +03:00
Botond Dénes	a56871fab7	tests/multishard_mutation_query_test: test rage-tombstones spanning multiple pages Extend the existing range-tombstone test, such that range tombstones span multiple pages worth of rows. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <583aa826ea12118289b08d483b55b5573d27e1ee.1539002810.git.bdenes@scylladb.com>	2018-10-09 10:18:28 +03:00
Vladimir Krivopalov	e9aba6a9c3	sstables: Add missing 'mc' format into format strings map in sstable::filename(). Fixes #3832. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <269421fb2ac8ab389231cbe9ed501da7e7ff936a.1539048008.git.vladimir@scylladb.com>	2018-10-09 10:07:08 +03:00
Asias He	8edf3defdf	range_streamer: Futurize add_ranges It might take long time for get_all_ranges_with_sources_for and get_all_ranges_with_strict_sources_for to calculate which cause reactor stall. To fix, run them in a thread and yield. Those functions are used in the slow path, it is ok to yield more than needed. Fixes #3639 Message-Id: <63aa7794906ac020c9d9b2984e1351a8298a249b.1536135617.git.asias@scylladb.com>	2018-10-09 09:46:50 +03:00
Nadav Har'El	b8668dc0f8	materialized views: refuse to filter by non-key column A materialized views can provide a filter so as to pick up only a subset of the rows from the base table. Usually, the filter operates on columns from the base table's primary key. If we use a filter on regular (non-key) columns, things get hairy, and as issue #3430 showed, wrong: merely updating this column in the base table may require us to delete, or resurrect, the view row. But normally we need to do the above when the "new view key column" was updated, when there is one. We use shadowable tombstones with one timestamp to do this, so it cannot take into account the two timestamp from those two columns (the filtered column and the new key column). So in the current code, filtering by a non-key column does not work correctly. In this patch we provide two test cases (one involving TTLs, and one involves only normal updates), which demonstrate vividly that it does not work correctly. With normal updates, trying to resurect a view row that has previously disappeared, fails. With TTLs, things are even worse, and the view row fails to disappear when the filtered column is TTLed. In Cassandra, the same thing doesn't work correctly as well (see CASSANDRA-13798 and CASSANDRA-13832) so they decided to refuse creating a materialized view filtering a non-key column. In this patch we also do this - fail the creation of such an unsupported view. For this reason, the two tests mentioned above are commented out in a "#if", with, instead, a trivial test verifying a failure to create such a view. Note that as explained above, when the filtered column and new view key column are different we have a problem. But when they are the same - namely we filter by a non-key base column which actually is a key in the view - we are actually fine. This patch includes additional test cases verifying that this case is really fine and provides correct results. Accordingly, this case is not forbidden in the view creation code. Fixes #3430. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181008185633.24616-1-nyh@scylladb.com>	2018-10-08 20:37:11 +01:00
Avi Kivity	0fa60660b8	Merge "Fix mutation fragments clobbering on fast_forward" from Vladimir " This patchset fixes a bug in SSTables 3.x reading when fast-forwarding is enabled. It is possible that a mutation fragment, row or RT marker, is read and then stored because it falls outside the current fast-forwarding range. If the reader is further fast-forwarded but the row still falls outside of it, the reader would still continue reading and get the next fragment, if any, that would clobber the currently stored one. With this fix, the reader does not attempt to read on after storing the current fragment. Tests: unit {release} " * 'projects/sstables-30/row-skipped-on-double-ff/v2' of https://github.com/argenet/scylla: tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x. sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment. sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value.	2018-10-08 20:18:42 +03:00
Vladimir Krivopalov	07d61683b6	tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-08 09:09:33 -07:00
Botond Dénes	d0eb443913	result_memory_accounter: drop state_for_another_shard() This is not used since range-scans were refactored (`e49a14e30`) as part of making them stateful. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <589f30163e29299e840750457919214a26f0da93.1539005336.git.bdenes@scylladb.com>	2018-10-08 14:29:48 +01:00
Duarte Nunes	48ebe6552c	Merge 'Fix issues with endpoint state replication to other shards' from Tomasz Fixes #3798 Fixes #3694 Tests: unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test) * tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla: gms/gossiper: Replicate enpoint states in add_saved_endpoint() gms/gossiper: Make reset_endpoint_state_map() have effect on all shards gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards gms/gossiper: Always override states from older generations	2018-10-08 14:19:19 +01:00
Avi Kivity	4b16867bd7	cql: relax writetime/ttl selections of collections writetime() or ttl() selections of non-frozen collections can work, as they are single cells. Relax the check to allow them, and only forbid non-frozen collections. Fixes #3825. Tests: cql_query_test (release). Message-Id: <20181008123920.27575-1-avi@scylladb.com>	2018-10-08 14:07:01 +01:00
Duarte Nunes	56e36ee14b	flat_mutation_reader: Use std::move(range) in move_buffer_content_to() Instead of open coding it. Tests: unit(release) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181008104328.13164-1-duarte@scylladb.com>	2018-10-08 13:57:13 +03:00
Avi Kivity	474bb4e44f	cql: functions: implement min/max/count for bytes type Uncomment existing declare() calls and implement tests. Because the data_value(bytes) constructor is explicit, we add explicit conversion to data_value in impl_min_function_for<> and impl_max_function_for<>. Fixes #3824. Message-Id: <20181008084127.11062-1-avi@scylladb.com>	2018-10-08 10:48:30 +01:00
Takuya ASADA	d89114d1fc	dist/debian: install GPG key for cross-building We found on some Debian environment Ubuntu .deb build fails with gpg error because lack of Ubuntu GPG key, so we need to install it before start pbuilder. Same as on Ubuntu, it needs to install Debian GPG key. Fixes #3823 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181008072246.13305-1-syuu@scylladb.com>	2018-10-08 10:43:25 +03:00
Botond Dénes	b01050e28c	HACKING.md: add link to the scylla-dev mailing list Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <9a5d967f791d7a0db584864f68f93bbc68f52372.1538977773.git.bdenes@scylladb.com>	2018-10-08 10:06:50 +03:00
Duarte Nunes	74d809f8be	db/hints/manager: Use frozen_mutation instead of mutation Instead of unfreezing a mutation from the commitlog and then freezing it again to send, just keep the read frozen mutation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Duarte Nunes	6eec9748fc	db/hints/manager: Use database::find_schema() Instead of using find_column_family() and repeatedly asking for column_family::schema(), use database::find_schema() instead. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Duarte Nunes	5b3d08defc	db/commitlog/commitlog_entry: Allow moving the contained mutation Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Duarte Nunes	3b6d2286e9	service/storage_proxy: send_to_endpoint overload accepting frozen_mutation Add an overload to send_to_endpoint() which accepts a frozen_mutation. The motivation is to allow better accounting of pending view updates, but this change also allows some callers to avoid unfreezing already frozen mutations. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:37:39 +01:00
Duarte Nunes	c7639f53e0	service/storage_proxy: Build a shared_mutation from a frozen_mutation Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Duarte Nunes	9e14412528	service/storage_proxy: Lift frozen_mutation_and_schema Lift frozen_mutation_and_schema to frozen_mutation.hh, since other subsystems using frozen_mutations will likely want to pass it around together with the schema. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Duarte Nunes	2c739f36cc	service/storage_proxy: Allow non-const ranges in mutate_prepare() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Avi Kivity	1cc81d1492	Update seastar submodule * seastar 71e914e...ed44af8 (4): > Merge "Add semaphore_units<>::split() function" from Duarte > scheduling: introduce destroy_scheduling_group() > tls: include "api.hh" for listen_options > rpc: connection-level resource isolation	2018-10-07 20:45:49 +03:00
Duarte Nunes	4162bff37a	Merge 'cql3: allow adding or dropping multiple columns in ALTER TABLE statement' from Benny " This patchset implements ALTER TABLE ADD/DROP for multiple columns. Fixes: #2907 Fixes: #3691 Tests: schema_change_test " * 'projects/cql3/alter-table-multi/v3' of https://github.com/bhalevy/scylla: cql3: schema_change_test: add test_multiple_columns_add_and_drop cql3: allow adding or dropping multiple columns in ALTER TABLE statement cql3: alter_table_statement: extract add/alter/drop per-column code into functions cql3: testing for MVs for alter_table_statement::type::drop is not per column cql3: schema_change_test: add test_static_column_is_dropped	2018-10-07 17:30:09 +01:00
Benny Halevy	0f350f5d59	cql3: schema_change_test: add test_multiple_columns_add_and_drop Add a unit test for adding or dropping multiple columns. See https://github.com/scylladb/scylla/issues/2907 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 19:14:29 +03:00
Benny Halevy	23fecc7e5e	cql3: allow adding or dropping multiple columns in ALTER TABLE statement Fixes #2907 Fixes #3691 See Cassandra reference: https://apache.googlesource.com/cassandra/+/cassandra-3.6/src/antlr/Parser.g /** * ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>; * ALTER COLUMN FAMILY <CF> ADD <column> <newtype>; \| ALTER COLUMN FAMILY <CF> ADD (<column> <newtype>,<column1> <newtype1>..... <column n> <newtype n>) * ALTER COLUMN FAMILY <CF> DROP <column>; \| ALTER COLUMN FAMILY <CF> DROP ( <column>,<column1>.....<column n>) * ALTER COLUMN FAMILY <CF> WITH <property> = <value>; * ALTER COLUMN FAMILY <CF> RENAME <column> TO <column>; / alterTableStatement returns [shared_ptr<alter_table_statement> expr] @init { alter_table_statement::type type; auto props = make_shared<cql3::statements::cf_prop_defs>(); std::vector<alter_table_statement::column_change> column_changes; std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>, shared_ptr<cql3::column_identifier::raw>>> renames; } : K_ALTER K_COLUMNFAMILY cf=columnFamilyName ( K_ALTER id=cident K_TYPE v=comparatorType { type = alter_table_statement::type::alter; column_changes.emplace_back(id, v); } \| K_ADD { type = alter_table_statement::type::add; } ( id1=cident v1=comparatorType b1=cfisStatic { column_changes.emplace_back(id1, v1, b1); } \| '(' id1=cident v1=comparatorType b1=cfisStatic { column_changes.emplace_back(id1, v1, b1); } (',' idn=cident vn=comparatorType bn=cfisStatic { column_changes.emplace_back(idn, vn, bn); } ) ')' ) \| K_DROP id=cident { type = alter_table_statement::type::drop; column_changes.emplace_back(id); } \| K_WITH properties[props] { type = alter_table_statement::type::opts; } \| K_RENAME { type = alter_table_statement::type::rename; } id1=cident K_TO toId1=cident { renames.emplace_back(id1, toId1); } ( K_AND idn=cident K_TO toIdn=cident { renames.emplace_back(idn, toIdn); } )* ) { $expr = ::make_shared<alter_table_statement>(std::move(cf), type, std::move(column_changes), std::move(props), std::move(renames)); } ; Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 19:14:26 +03:00
Benny Halevy	3fa6d3d3a8	cql3: alter_table_statement: extract add/alter/drop per-column code into functions In preparation to supporting ALTER TABLE with multiple columns (#3691) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 18:57:06 +03:00
Alexys Jacob	eebbae066a	dist/common/scripts/scylla_setup: fix gentoo linux installed package detection return code is expected to be 0 when installed package was found Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181002123433.4702-1-ultrabug@gentoo.org>	2018-10-07 16:46:02 +03:00
Alexys Jacob	850d046551	dist/common/scripts/scylla_ntp_setup: fix gentoo linux systemd service name fix typo as ntpd package systemd service is named ntpd, not sntpd Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181002123802.5576-1-ultrabug@gentoo.org>	2018-10-07 16:46:01 +03:00
Alexys Jacob	54151d2039	dist/common/scripts/scylla_cpuscaling_setup: fix file open mode for writing gentoo linux part tries to open the configuration file without the write flag, leading to an exception Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20181002123957.6010-1-ultrabug@gentoo.org>	2018-10-07 16:46:00 +03:00
Avi Kivity	700994a4f2	Merge "Add GDB commands for examining gossiper and RPC state" from Tomasz * 'gdb-gms-netw' of github.com:tgrabiec/scylla: gdb: Introduce 'scylla netw' command gdb: Introduce 'scylla gms' command gdb: Add sharded service wrapper gdb: Add unique_ptr wrapper gdb: Add list_unordered_set() gdb: Make std_vector wrapper indexable gdb: Add wrapper for std_map	2018-10-07 16:42:52 +03:00
Vlad Zolotarov	7cbe5f2983	service: priority_manager.hh: add #pragma once Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181005040552.2183-3-vladz@scylladb.com>	2018-10-07 16:04:26 +03:00
Duarte Nunes	30d6ed8f92	service/storage_proxy: Consider target liveness in sent_to_endpoint() So we don't attempt to send mutations to unreachable endpoints and instead store a hint for them, we now check the endpoint status and populate dead_endpoints accordingly in storage_proxy::send_to_endpoint(). Fixes #3820 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181007100640.2182-1-duarte@scylladb.com>	2018-10-07 16:04:26 +03:00
Benny Halevy	581b9006d4	cql3: testing for MVs for alter_table_statement::type::drop is not per column No column can be dropped from a table with materialized views so the respective exception can ignore and omit the dropped column name. In preparation for refactoring the respective code, moving the per-column code to member functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 15:16:32 +03:00
Benny Halevy	8d298064b1	cql3: schema_change_test: add test_static_column_is_dropped Test dropping of static column defined in CREATE TABLE, and adding and dropping of a static column using ALTER TABLE. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-07 14:34:28 +03:00
Duarte Nunes	a69d468101	service/storage_proxy: Fix formatting of send_to_endpoint() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181006204756.32232-1-duarte@scylladb.com>	2018-10-07 11:05:32 +03:00
Vladimir Krivopalov	9db124c6e5	sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment. Without it, the reader will attempt to read further and may clobber the stored fragment with the next one read, if any. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-05 19:09:09 -07:00
Vladimir Krivopalov	8e004684e9	sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value. Instead of blindly proceeding, use whatever the call to maybe_push_*() has returned. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-05 19:05:03 -07:00
Duarte Nunes	b839f551cf	cql3/statements/select_statement: Don't double count unpaged queries Unpaged queries are those for which the client didn't enable paging, and we already account for them in indexed_table_select_statement::do_execute(). Remove the second increment in read_posting_list(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181003121811.11750-1-duarte@scylladb.com>	2018-10-05 17:36:39 +02:00
Nadav Har'El	e4ef7fc40a	materialized views: enable two tests in view_schema_test We had two commented out tests based on Cassandra's MV unit tests, for the case that the view's filter (the "SELECT" clause used to define the view) filtered by a non-primary-key column. These tests used to fail because of problems we had in the filtering code, but they now succeed, so we can enable them. This patch also adds some comments about what the tests do, and adds a few more cases to one of the tests. Refs #3430. However, note that the success of these tests does not really prove that the non-PK-column filtering feature works fully correctly and that issue forbidding it, as explained in https://issues.apache.org/jira/browse/CASSANDRA-13798. We can probably fix this feature with our "virtual cells" mechanism, but will need to add a test to confirm the possible problem and its (probably needed fix). We do not add such a test in this patch. In the meantime, issue #3430 should remain open: we still allow users to create MV with such a filter, and, as the tests in this patch show, this "mostly" works correctly. We just need to prove and/or fix what happens with the complex row liveness issues a la issue #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181004213637.32330-1-nyh@scylladb.com>	2018-10-04 22:43:38 +01:00
Tomasz Grabiec	3c7de9fee9	gms/gossiper: Replicate enpoint states in add_saved_endpoint()	2018-10-04 12:54:00 +02:00
Tomasz Grabiec	ddf3a61bcf	gms/gossiper: Make reset_endpoint_state_map() have effect on all shards	2018-10-04 12:53:56 +02:00
Tomasz Grabiec	9e3f744603	gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards Lack of this may result in non-zero shards on some nodes still seeing STATUS as NORMAL for a node which shut down, in some cases. mark_as_shutdown() is invoked in reaction to an RPC call initiated by the node which is shutting down. Another way a node can learn about other node shutting down is via gossiping with a node which knows this. In that case, the states will be replicated to non-zero shards. The node which learnt via mark_as_shutdown() may also eventually propagate this to non-zero shards, e.g. when it gossips about it with other nodes, and its local version number at the time of mark_as_shudown() was smaller than the one used to set the STATE by the shutting down node.	2018-10-04 12:51:42 +02:00
Tomasz Grabiec	c4ec81e126	gms/gossiper: Always override states from older generations Application states of each node are versioned per-node with a pair of generation number (more significant) and value version. Generation number uniquely identifies the life time of a scylla process. Generation number changes after restart. Value versions start from 0 on each restart. When a node gets updates for application states, it merges them with its view on given node. Value updates with older versions are ignored. Gossiper processes updates only on shard 0, and replicates value updates to other shards. When it sees a value with a new generation, it correclty forgets all previous values. However, non-zero shards don't forget values from previous generations. As a result, replication will fail to override the values on non-zero shards when generation number changes until their value version exceeds the version prior to the restart. This will result in incorrect STATUS for non-seed nodes on non-zero shards. When restarting a non-seed node, it will do a shadow gossip round before setting its STATUS to NORMAL. In the shadow round it will learn from other nodes about itself, and set its STATUS to shutdown on all shards with a high value version. Later, when it sets its status to NORMAL, it will override it only on shard 0, because on other shards the version of STATUS is higher. This will cause CQL truncate to skip current node if the coordinator runs on non-zero shards. The fix is to override the entries on remote shards in the same way we do on shard 0. All updates to endpoint states should be already serialized on shard 0, and remote shards should see them in the same order. Introduced in `2d5fb9d` Fixes #3798 Fixes #3694	2018-10-04 12:47:27 +02:00
Piotr Sarna	a5570cb288	tests: add missing get() calls in threaded context One test case missed a few get() calls in order to wait for continuations, which only accidentally worked, because it was followed by 'eventually()' blocks. Message-Id: <69c145575ac81154c4b5f500d01c6b045a267088.1536839959.git.sarna@scylladb.com>	2018-10-04 10:55:45 +01:00
Piotr Sarna	8a2abd45fb	tests: add collections test for secondary indexing Test case regarding creating indexes on collection columns is added to the suite. Refs #3654 Refs #2962 Message-Id: <1b6844634b6e9a353028545813571647c92fb330.1536839959.git.sarna@scylladb.com>	2018-10-04 10:55:45 +01:00
Piotr Sarna	2d355bdf47	cql3: prevent creation of indexes on non-frozen collections Until indexes for non-frozen collections is implemented, creating such indexes should be disallowed to prevent unnecessary errors on insertions/selections. Fixes #3653 Refs #2962 Message-Id: <218cf96d5e38340806fb9446b8282d2296ba5f43.1536839959.git.sarna@scylladb.com>	2018-10-04 10:55:45 +01:00
Duarte Nunes	959559d568	cql3/statements/select_statement: Remove outdated comment Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181003193033.13862-1-duarte@scylladb.com>	2018-10-04 09:45:17 +03:00
Eliran Sinvani	20f49566a2	cql3 : add workaround to antlr3 null dereference bug The Antlr3 exception class has a null dereference bug that crashes the system when trying to extract the exception message using ANTLR_Exception<...>::displayRecognitionError(...) function. When a parsing error occurs the CqlParser throws an exception which in turn processesed for some special cases in scylla to generate a custom message. The default case however, creates the message using displayRecognitionError, causing the system to crash. The fix is a simple workaround, making sure the pointer is not null before the call to the function. A "proper" fix can't be implemented because the exception class itself is implemented outside scylla in antlr headers that resides on the host machine os. Tested manualy 2 testcases, a typo causing scylla to crash and a cql comment without a newline at the end also caused scylla to crash. Ran unit tests (release). Fixes #3740 Fixes #3764 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <cfc7e0d758d7a855d113bb7c8191b0fd7d2e8921.1538566542.git.eliransin@scylladb.com>	2018-10-03 18:30:06 +03:00
Tomasz Grabiec	9c57abcce7	gossiper: Fix shutdown_announce_in_ms not being respected shutdown_announce_in_ms specifies a period of time that a node which is shutting down waits to allow its state to propagate to other nodes. However, we were setting _enabled to false before waiting, which will make the current node ignore gossip messages. Message-Id: <1538576996-26283-1-git-send-email-tgrabiec@scylladb.com>	2018-10-03 15:43:00 +01:00
Tomasz Grabiec	fda8e271e3	gdb: Introduce 'scylla netw' command Prints information about the state of the messaging service layer. Example: (gdb) scylla netw Dropped messages: {0 <repeats 25 times>} Outgoing connections: IP: 127.0.0.2, (netw::messaging_service::rpc_protocol_client_wrapper*) 0x6000051cd220: stats: {replied = 0, pending = 0, exception_received = 0, sent_messages = 23, wait_reply = 0, timeout = 0} outstanding: 0 Server: resources={_count = 85899345, _ex = {_M_exception_object = 0x0}, _wait_list = {_list = {_front_chunk = 0x0, _back_chunk = 0x0, _nchunks = 0, _free_chunks = 0x0, _nfree_chunks = 0}, _on_expiry = {<No data fields>}, _size = 0}} Incoming connections: 127.0.0.1:28071: {replied = 0, pending = 0, exception_received = 0, sent_messages = 2, wait_reply = 0, timeout = 0}	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	cf07cda08f	gdb: Introduce 'scylla gms' command Prints gossiper state. Example: (gdb) scylla gms 127.0.0.2: (gms::endpoint_state) 0x6010050c0550 ({_generation = 1538568389, _version = 2147483647}) gms::application_state::STATUS: {version=18, value="NORMAL,968364964011550971"} gms::application_state::LOAD: {version=267, value="494510"} gms::application_state::SCHEMA: {version=13, value="27e48f6a-a668-398a-b2f5-cf4b905450e9"} gms::application_state::DC: {version=10, value="datacenter1"} gms::application_state::RACK: {version=11, value="rack1"} gms::application_state::RELEASE_VERSION: {version=4, value="3.0.8"} gms::application_state::RPC_ADDRESS: {version=3, value="127.0.0.2"} gms::application_state::NET_VERSION: {version=1, value="0"} gms::application_state::HOST_ID: {version=2, value="ee281b83-1acb-4aa3-927c-985a7d9a7c6f"} 127.0.0.1: (gms::endpoint_state) 0x6010051422b0 ({_generation = 1538557402, _version = 0}) gms::application_state::STATUS: {version=18, value="NORMAL,9176584852507611499"} gms::application_state::LOAD: {version=22521, value="409817"} gms::application_state::SCHEMA: {version=13, value="27e48f6a-a668-398a-b2f5-cf4b905450e9"} gms::application_state::DC: {version=10, value="datacenter1"} gms::application_state::RACK: {version=11, value="rack1"} gms::application_state::RELEASE_VERSION: {version=4, value="3.0.8"} gms::application_state::RPC_ADDRESS: {version=3, value="127.0.0.1"} gms::application_state::NET_VERSION: {version=1, value="0"} gms::application_state::HOST_ID: {version=2, value="88ff543f-e9b8-42eb-a876-c0f917078a31"}	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	8c6f8b1773	gdb: Add sharded service wrapper	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	4adfed9dba	gdb: Add unique_ptr wrapper	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	e29e302272	gdb: Add list_unordered_set()	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	272bc88699	gdb: Make std_vector wrapper indexable	2018-10-03 15:05:22 +02:00
Tomasz Grabiec	b436759d49	gdb: Add wrapper for std_map	2018-10-03 15:05:22 +02:00
Pekka Enberg	de48966abc	cql3: Move as_json_function class to separate file The as_json_function class is not registered as a function, but we can still keep it cql3/functions, as per its namespace, to reduce the size of select_statement.cc. Message-Id: <20181002132637.30233-1-penberg@scylladb.com>	2018-10-03 13:30:08 +01:00
Piotr Sarna	4a23297117	cql3: add asking for pk/ck in the base query Base query partition and clustering keys are used to generate paging state for an index query, so they always need to be present when a paged base query is processed. Message-Id: <f3bf69453a6fd2bc842c8bdbd602d62c91cf9218.1538568953.git.sarna@scylladb.com>	2018-10-03 13:26:51 +01:00
Piotr Sarna	50d3de0693	cql3: add checking for may_need_paging when executing base query It's not sufficient to check for positive page_size when preparing a base query for indexed select statement - may_need_paging() should be called as well. Message-Id: <d435820019e4082a64ca9807541f0c9ad334e6a8.1538568953.git.sarna@scylladb.com>	2018-10-03 13:26:51 +01:00
Piotr Sarna	11b8831c04	cql3: move base query command creation to a separate function Message-Id: <6b48b8cbd6312da4a17bfd3c85af628b4215e9f4.1538568953.git.sarna@scylladb.com>	2018-10-03 13:26:51 +01:00
Avi Kivity	7c8143c3c4	Revert "compaction: demote compaction start/end messages to DEBUG level" This reverts commit `b443a9b930`. The compaction history table doesn't have enough information to be a replacement for this log message yet.	2018-10-03 13:13:37 +03:00
Avi Kivity	b9702222f8	Merge "Handle simple column type schema changes in SST3" from Piotr " This patchset enables very simple column type conversions. It covers only handling variable and fixed size type differences. Two types still have to be compatiple on bits level to be able to convert a field from one to the other. " * 'haaawk/sst3/column_type_schema_change/v4' of github.com:scylladb/seastar-dev: Fix check_multi_schema to actually check the column type change Handle very basic column type conversions in SST3 Enable check_multi_schema for SST3	2018-10-03 13:12:10 +03:00
Piotr Jastrzebski	3a60eac1d5	Fix check_multi_schema to actually check the column type change Field 'e' was supposed to be read as blob but the test had a bug and the read schema was treating that field as int. This patch changes that and makes the test really check column type change. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-03 10:56:40 +02:00
Piotr Jastrzebski	3cecb61ac1	Handle very basic column type conversions in SST3 After this change very simple schema changes of column type will work. This change makes sure that variable size and fixed size types can be converted to each other but only if their bit representation can be automatically converted between those types. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-03 10:56:40 +02:00
Piotr Jastrzebski	c117a6b3c8	Enable check_multi_schema for SST3 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-10-03 10:56:39 +02:00
Nadav Har'El	bebe5b5df2	materialized views: add view_updates_pending statistic We are already maintaining a statistic of the number of pending view updates sent but but not yet completed by view replicas, so let's expose it. As all per-table statistics, also this one will only be exposed if the "--enable-keyspace-column-family-metrics" option is on. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-10-02 20:44:58 +01:00
Nadav Har'El	1d5f8d0015	materialized views: update stats.write statistics in all cases mutate_MV usually calls send_to_endpoint() to push view update to remote view replicas. This function gets passed a statistics object, service::storage_proxy_stats::write_stats and, in particular, updates its "writes" statistic which counts the number of ongoing writes. In the case that the paired view replica happens to be the same node, we avoid calling send_to_endpoint() and call mutate_locally() instead. That function does not take a write_stats object, so the "writes" statistic doesn't get incremented for the duration of the write. So we should do this explicitly. Co-authored-by: Nadav Har'El <nyh@scylladb.com> Co-authored-by: Duarte Nunes <duarte@scylladb.com>	2018-10-02 20:44:58 +01:00
Duarte Nunes	40a30d4129	db/schema_tables: Diff tables using ID instead of name Currently we diff schemas based on table/view name, and if the names match, then we detect altered schemas by comparing the schema mutations. This fails to detect transitions which involve dropping and recreating a schema with the same name, if a node receives these notifications simultaneously (for example, if the node was temporarily down or partitioned). Note that because the ID is persisted and created when executing a create_table_statement, then even if a schema is re-created with the exact same structure as before, we will still considered it altered because the mutations will differ. This also stops schema pulling from working, since it relies on schema merging. The solution is to diff schemas using their ID, and not their name. Keyspaces and user types are also susceptible to this, but in their case it's fine: these are values with no identity, and are just metadata. Dropping and recreating a keyspace can be views as dropping all tables from the keyspace, altering it, and eventually adding new tables to the keyspace. Note that this solution doesn't apply to tables dropped and created with the same ID (using the `WITH ID = {}` syntax). For that, we would need to detect deltas instead of applying changes and then reading the new state to find differences. However, this solution is enough, because tables are usually created with ID = {} for very specific, peculiar reasons. The original motivation meant for the new table to be treated exactly as the old, so the current behavior is in fact the desired one. Tests: unit(release), dtests(schema_test, schema_management_test) Fixes #3797 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181001230932.47153-2-duarte@scylladb.com>	2018-10-02 20:15:46 +02:00
Duarte Nunes	e404f09a23	db/schema_tables: Drop tables before creating new ones Doing it by the inverse order doesn't support dropping and creating a schema with the same name. Refs #3797 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181001230932.47153-1-duarte@scylladb.com>	2018-10-02 20:15:32 +02:00
Avi Kivity	aaab8a3f46	utils: crc32: mark power crc32 assembly as not requiring an executable stack The linker uses an opt-in system for non-executable stack: if all object files opt into a non-executable stack, the binary will have a non-executable stack, which is very desirable for security. The compiler cooperates by opting into a non-executable stack whenever possible (always for our code). However, we also have an assembly file (for fast power crc32 computations). Since it doesn't opt into a non-executable stack, we get a binary with executable stack, which Gentoo's build system rightly complains about. Fix by adding the correct incantation to the file. Fixes #3799. Reported-by: Alexys Jacob <ultrabug@gmail.com> Message-Id: <20181002151251.26383-1-avi@scylladb.com>	2018-10-02 18:48:23 +01:00
Avi Kivity	53a4b8ae86	Update seastar submodule * seastar 5712816...71e914e (12): > Merge "rpc shard to shard connection" from Gleb > Merge "Fix memory leaks when stoppping memcached" from Tomasz > scripts: perftune.py: prioritize I/O schedulers > alien: fix the size of local item[] > seastar-addr2line: don't invoke addr2line multiple times > reactor: use labels for different io_priority_class:s > util/spinlock: fix bad namespacing of <xmmintrin.h> > Merge "scripts: perftune.py: support different I/O schedulers" from Vlad > timer: Do not require callback to be copyable > core/reactor: Fix hang on shutdown with long task quota > build: use 'ppa:scylladb/ppa' instead of URL for sourceline > net/dns: add net::dns::get_srv_records() helper	2018-10-02 18:48:23 +01:00
Avi Kivity	7322ac105c	Merge "sstables_stats" from Benny " This patchset adds sstable partition/row read/write/seek statistics. Tests: dtest sstable_generation_loading_test.py stress_tool_test.py Fixes: #251 " * 'projects/sstables-stats/v5' of https://github.com/bhalevy/scylla: sstables stats: row reads sstables stats: partition seeks sstables stats: partition reads sstables stats: flat mutation reads sstables stats: cell/cell_tombstone writes sstables stats: partition/row/tombstone writes sstables_stats: writer_impl: move common members to base class	2018-10-02 15:05:10 +03:00
Duarte Nunes	7ba944a243	service/migration_manager: Validate duplicate ID in time We allow tables to be created with the ID property, mostly for advanced recovery cases. However, we need to validate that the ID doesn't match an existing one. We currently do this in database::add_column_family(), but this is already too late in the normal workflow: if we allow the schema change to go through, then it is applied to the system tables and loaded the next time the node boots, regardless of us throwing from database::add_column_family(). To fix this, we perform this validation when announcing a new table. Note that the check wasn't removed from database::add_column_family(); it's there since 2015 and there might have been other reasons to add it that are not related to the ID property. Refs #2059 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181001230142.46743-1-duarte@scylladb.com>	2018-10-02 13:40:40 +03:00
Calle Wilund	2996b8154f	storage_proxy: Add missing re-throw in truncate_blocking Iff truncation times out, we want to log it, but the exception should not be swallowed, but re-thrown. Fixes #3796. Message-Id: <20181001112325.17809-1-calle@scylladb.com>	2018-10-01 19:07:04 +02:00
Paweł Dziepak	ad4a50dab6	Merge "multi range reader: add support for range generating functor" from Botond " This series adds support for range generator functors to multi range reader. A range generator functor can lazily generate an uknown amount of ranges on-the-fly for the reader to read. The range generator support was added by refactoring `flat_multi_range_mutation_reader` to work in terms of a generator functor. The existing overload taking a `dht::partition_range_vector` is adapted to the generator interface behind the scenes. " * 'multi-range-reader-generator/v9' of https://github.com/denesb/scylla: tests/flat_mutation_reader_test: extend multi-range reader tests make_flat_multi_range_reader: add documentation make_flat_multi_range_reader: add generator overload flat_multi_range_reader: refactor to work in terms of generator make_flat_multi_range_reader(): better handle the 0 range case flat_mutation_reader: add move_buffer_content_to() flat_multi_range_mutation_reader: drop fwd_mr ctor parameter	2018-10-01 12:53:31 +01:00
Benny Halevy	bd6533f471	sstables stats: row reads Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:43 +03:00
Benny Halevy	192c1949a3	sstables stats: partition seeks Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:43 +03:00
Benny Halevy	edb3c23125	sstables stats: partition reads Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:43 +03:00
Benny Halevy	e9dffa56c8	sstables stats: flat mutation reads Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:43 +03:00
Benny Halevy	4ccdc1115d	sstables stats: cell/cell_tombstone writes Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:15:41 +03:00
Benny Halevy	2f48f72d5c	sstables stats: partition/row/tombstone writes Introduce per-thread sstables stats infrastructure Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:01:14 +03:00
Benny Halevy	6853c1677d	sstables_stats: writer_impl: move common members to base class To be used by sstable_writer for stats collection. Note that this patch is factored out so it can be verified with no other change in functionality. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-10-01 13:01:00 +03:00
Duarte Nunes	e6630c627b	Merge 'Add secondary index paging' from Piotr " Indexed select statement consists of two queries - the view query used to extract base keys and the base query that uses those keys to return base rows. The main idea of this series is to replace raw proxy.query() call during the view query to one that uses a pager. Additionally, paging info from the view query needs to be returned to the client, in order to be used later for requesting new pages. " * 'paging_indexes_7' of https://github.com/psarna/scylla: tests: add test for secondary index with paging cql3: remove execute(primary_keys) from select statement cql3: add incremental base queries to index query storage_proxy: make get_restricted_ranges public cql3: add base query handling function to indexed statement cql3: add generating base key from index keys cql3: add paging state generation function cql3: move getting index view schema to prepare stage pager: make state() defined for exhausted pagers cql3: add maybe_set_paging_state function cql3: rename set_has_more_pages to set_paging_state pager: add setters for partition/clustering keys cql3: add paging to read_posting_list cql3: add non-const get_result_metadata method cql3: make find_index_* functions return paging state cql3: make read_posting_list return future<rows> cql3: make pagers use time_point instead of duration	2018-10-01 10:42:21 +01:00
Avi Kivity	900ffad979	config: re-add murmur3_ignore_msb_bits to scylla.yaml Commit `d6b0c4dda4` changed the built-in default murmur3_ignore_msb_bits to 12 (from 0) and removed the scylla.yaml default. Removal of the scylla.yaml default was a mistake for two reasons: - if someone downgrades a cluster, keeping scylla.yaml derived from the master branch, they will experience resharding since the built-in default, which has changed, will take effect. While that scenario is not supported, it already happened and caused much consternation. - if, in the future, we wish to change the default, we will cause resharding again. Embedding the default in scylla.yaml allows us to change the default for new clusters while allowing upgraded clusters to retain older values. Therefore, this patch restores murmur3_ignore_msb_bits in scylla.yaml. Future changes to the configuration item should change both scylla.yaml and the built-in default. Message-Id: <20180930090053.21136-1-avi@scylladb.com>	2018-10-01 10:01:36 +03:00
Takuya ASADA	0a471c32cb	dist/ami/files/scylla_install_ami: enable ssh_deletekeys For some reason upstream AMI is disabling 'ssh_deletekeys' feature on cloud-init, but generating SSH host keys should important for public AMI images, so enable it again. See: https://cloudinit.readthedocs.io/en/latest/topics/modules.html?highlight=ssh_deletekeys#ssh Fixes scylladb/scylla-ami#31 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180927122816.27809-1-syuu@scylladb.com>	2018-09-30 16:29:46 +03:00
Paweł Dziepak	2bcaf4309e	utils/reusable_buffer: do not warn about large allocations Reusable buffers are meant to be used when protocol or third-party library limiations force us to allocate large contiguous buffers. There isn't much that can be done about this so there is little point in warning about that. Fixes #3788. Message-Id: <20180928085141.6469-1-pdziepak@scylladb.com>	2018-09-30 11:12:23 +03:00
Asias He	91dae0149d	token_metadata: Invalidate cached ring in update_normal_tokens In commit `4a0b561376`, "storage_service: Get rid of moving operation", we removed remove_from_moving() in update_normal_tokens(). However, remove_from_moving() calls invalidate_cached_rings(). We should call invalidate_cached_rings() in update_normal_tokens(), otherwise we will get wrong token range to address map in the token_metadata cache. This issue exists in master only. It is not in any of the releases. Message-Id: <c03f2ed478cfdb84494f36dce9a8cfc05ed9e0cd.1538288364.git.asias@scylladb.com>	2018-09-30 11:06:46 +03:00
Alexys Jacob	6d6764133b	dist/common/scripts: coding style fixes dist/common/scripts/scylla_blocktune.py:24:10: E401 multiple imports on one line dist/common/scripts/scylla_blocktune.py:27:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_blocktune.py:35:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_blocktune.py:48:1: E305 expected 2 blank lines after class or function definition, found 1 dist/common/scripts/scylla_blocktune.py:52:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_blocktune.py:59:5: E306 expected 1 blank line before a nested definition, found 0 dist/common/scripts/scylla_blocktune.py:74:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_blocktune.py:81:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_blocktune.py:87:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_config_get.py:26:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_config_get.py:43:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_config_get.py:53:1: E305 expected 2 blank lines after class or function definition, found 1 dist/common/scripts/scylla_util.py:18:22: E401 multiple imports on one line dist/common/scripts/scylla_util.py:19:22: E401 multiple imports on one line dist/common/scripts/scylla_util.py:24:1: F401 'string' imported but unused dist/common/scripts/scylla_util.py:32:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:50:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:61:30: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:75:53: E703 statement ends with a semicolon dist/common/scripts/scylla_util.py:79:32: E272 multiple spaces before keyword dist/common/scripts/scylla_util.py:80:25: E703 statement ends with a semicolon dist/common/scripts/scylla_util.py:85:32: E201 whitespace after '[' dist/common/scripts/scylla_util.py:85:51: E202 whitespace before ']' dist/common/scripts/scylla_util.py:130:34: E201 whitespace after '[' dist/common/scripts/scylla_util.py:130:65: E202 whitespace before ']' dist/common/scripts/scylla_util.py:170:1: E266 too many leading '#' for block comment dist/common/scripts/scylla_util.py:172:11: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:174:10: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:178:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:181:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:184:17: E201 whitespace after '[' dist/common/scripts/scylla_util.py:184:50: E202 whitespace before ']' dist/common/scripts/scylla_util.py:186:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:193:16: E201 whitespace after '[' dist/common/scripts/scylla_util.py:193:76: E202 whitespace before ']' dist/common/scripts/scylla_util.py:195:18: E201 whitespace after '{' dist/common/scripts/scylla_util.py:195:27: E203 whitespace before ':' dist/common/scripts/scylla_util.py:195:41: E203 whitespace before ':' dist/common/scripts/scylla_util.py:195:48: E202 whitespace before '}' dist/common/scripts/scylla_util.py:203:25: E201 whitespace after '[' dist/common/scripts/scylla_util.py:203:54: E202 whitespace before ']' dist/common/scripts/scylla_util.py:204:76: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:208:27: E703 statement ends with a semicolon dist/common/scripts/scylla_util.py:217:27: E201 whitespace after '[' dist/common/scripts/scylla_util.py:217:62: E202 whitespace before ']' dist/common/scripts/scylla_util.py:238:25: E201 whitespace after '[' dist/common/scripts/scylla_util.py:238:87: E202 whitespace before ']' dist/common/scripts/scylla_util.py:257:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:258:11: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:259:11: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:268:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:277:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:280:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:283:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:286:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:297:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:302:5: E722 do not use bare except' dist/common/scripts/scylla_util.py:305:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:325:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:329:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:335:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:338:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:341:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:343:81: E231 missing whitespace after ',' dist/common/scripts/scylla_util.py:352:1: E305 expected 2 blank lines after class or function definition, found 1 dist/common/scripts/scylla_util.py:352:21: E231 missing whitespace after ':' dist/common/scripts/scylla_util.py:352:41: E231 missing whitespace after ':' dist/common/scripts/scylla_util.py:352:65: E231 missing whitespace after ':' dist/common/scripts/scylla_util.py:353:1: E302 expected 2 blank lines, found 0 dist/common/scripts/scylla_util.py:358:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:360:22: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:365:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:367:11: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:370:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:373:15: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:374:14: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:375:14: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:376:20: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:385:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:388:9: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:389:9: E225 missing whitespace around operator dist/common/scripts/scylla_util.py:393:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:396:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:399:1: E302 expected 2 blank lines, found 1 dist/common/scripts/scylla_util.py:432:1: E302 expected 2 blank lines, found 1 Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20180918213707.6069-1-ultrabug@gentoo.org>	2018-09-30 11:00:37 +03:00
Botond Dénes	eba8d68313	tests/flat_mutation_reader_test: extend multi-range reader tests Add unit tests for the generator version and extend existing ones with tests for the corner cases (0 and 1 range).	2018-09-28 14:27:55 +03:00
Botond Dénes	bb7447bbe4	make_flat_multi_range_reader: add documentation	2018-09-28 14:27:55 +03:00
Botond Dénes	39bfd5d1df	make_flat_multi_range_reader: add generator overload Allows creating a multi range reader from an arbitrary callable that return std::optional<dht::partition_range>. The callable is expected to return a new range on each call, such that passing each successive range to `flat_mutation_reader::fast_forward_to` is valid. When exhausted the callable is expected to return std::nullopt.	2018-09-28 14:27:55 +03:00
Botond Dénes	8c5387890d	flat_multi_range_reader: refactor to work in terms of generator Instead of working with a dht::partition_range_vector directly, work with an abstract generator that returns a pointer to the next range on each invocation. When exhausted it returns nullptr. This opens up the possibility to create multi range readers from a generator functor that creates ranges lazily. This is indeed what the next path does.	2018-09-28 14:27:55 +03:00
Botond Dénes	f3bf2e83dd	make_flat_multi_range_reader(): better handle the 0 range case Previously, when the passed in range of partition ranges contained 0 ranges, an empty reader was returned. This means that the returned reader was forwardable or not depending on the number of passed in ranges. This is inconsistent and can lead to nasty surprises. To solve this problem add `forwardable_empty_mutation_reader`, a specialized reader that delays creating the underlying reader until fast_forward_to() is called on it, and thus a range is available. When `make_flat_multi_range_mutation_reader()` is called with `mutation_reader::forwarding::no` a simple empty reader is created, like before.	2018-09-28 14:27:55 +03:00
Botond Dénes	03be9510a7	flat_mutation_reader: add move_buffer_content_to() `move_buffer_content_to()` makes it possible to implement more efficient wrapping readers, readers that wrap another flat mutation reader but do no transformation to the underlying fragment stream. These readers, when filling their buffers, can simply fill the underlying reader's buffer, then move its content into their own. When the reader's own buffer is empty, this is very efficient, as it can be done by simply swapping the buffers, avoiding the work of moving the fragments one-by-one.	2018-09-28 14:27:54 +03:00
Botond Dénes	68b6c83ee8	flat_multi_range_mutation_reader: drop fwd_mr ctor parameter The factory function creating this reader ensures that the passed-in ranges vector has more then one range, which effectively makes the `fwd_mr` constructor parameter have no effect. The underlying reader will always be created with `mutation_reader::forwarding::yes` as it has to be able to fast-forward between the ranges.	2018-09-28 14:25:03 +03:00
Duarte Nunes	b8749a61dc	tests/aggregate_fcts_test: Fix formatting of create_table() And drop the template. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180927223315.28254-1-duarte@scylladb.com>	2018-09-28 09:45:27 +02:00
Duarte Nunes	17578c3579	tests/aggregate_fcts_test: Add test case for wrapped types Provide a test case which checks a type being wrapped in a reverse_type plays no role in assignment. Refs #3789 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180927223201.28152-2-duarte@scylladb.com>	2018-09-28 07:09:08 +03:00
Duarte Nunes	5e7bb20c8a	cql3/selection/selector: Unwrap types when validating assignment When validating assignment between two types, it's possible one of them is wrapped in a reverse_type, if it comes, for example, from the type associated with a clustering column. When checking for weak assignment the types are correctly unwrapped, but not when checking for an exact match, which this patch fixes. Technically, the receiver is never a reversed_type for the current callers, but this is the morally correct implementation, as the type being reversed or not plays no role in assignment. Tests: unit(release) Fixes #3789 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180927223201.28152-1-duarte@scylladb.com>	2018-09-28 07:08:19 +03:00
Piotr Sarna	da3821c598	tests: add test for secondary index with paging A test case with enough rows to have multiple pages is added to secondary_index_test suite.	2018-09-27 15:29:28 +02:00
Piotr Sarna	4b4f57747a	cql3: remove execute(primary_keys) from select statement Right now, with specialized execute() that takes primary keys for indexed_table_select_statement, the original execute() method implemented in select_statement is not used anywhere, so it's removed.	2018-09-27 15:29:28 +02:00
Piotr Sarna	9e0b3cad1e	cql3: add incremental base queries to index query Base queries that are part of index queries are allowed to be short, which can result in wasted work - e.g. when we query all replicas in parallel, but have to discard most of the result, since the first one (in token order) resulted in a short read. Thus, we start by quering 1 range, check if the read is short, and if not, continue by querying 2x more ranges than before. Refs #2960	2018-09-27 15:29:28 +02:00
Piotr Sarna	c41e0ade6c	storage_proxy: make get_restricted_ranges public This function is useful for splitting ranges in indexed queries.	2018-09-27 15:29:28 +02:00
Piotr Sarna	5b16aeb395	cql3: add base query handling function to indexed statement Handling a base query during the indexed statement execution may require updating its paging state.	2018-09-27 15:29:28 +02:00
Piotr Sarna	bce7232555	cql3: add generating base key from index keys A function that computes base partition/clustering key from index view primary key is provided.	2018-09-27 15:29:28 +02:00
Piotr Sarna	2f085848d8	cql3: add paging state generation function For indexed queries, the paging state needs to be updated based on the results of base query when the read was short.	2018-09-27 15:29:28 +02:00
Piotr Sarna	f21bcbefdf	cql3: move getting index view schema to prepare stage Searching for index view schema for an indexed statement can be done once in prepare stage, so it's moved to indexed_table_select_statement prepare method.	2018-09-27 15:29:28 +02:00
Piotr Sarna	b6d90b2869	pager: make state() defined for exhausted pagers If service::pager is exhausted, state() function used to return a nullptr instead of a pointer to a valid paging state and the documented return type in this case was 'unspecified'. Sometimes a paging state may be needed anyway, even if the pager is already exhausted - thus, state() return value becomes defined after this commit. Exhausted pagers will return a valid object to a state with _remaining field set to 0.	2018-09-27 15:29:28 +02:00
Piotr Sarna	c1be660c3a	cql3: add maybe_set_paging_state function set_paging_state is split into its unconditional variant and a maybe_ one in order to avoid double checks.	2018-09-27 15:29:28 +02:00
Piotr Sarna	744ac3bf7b	cql3: rename set_has_more_pages to set_paging_state This function's primary goal is to set the paging state passed as a parameter, so its name is changed to match the semantics better.	2018-09-27 15:29:28 +02:00
Glauber Costa	c3f27784de	database: guarantee a minimum amount of shares when manual operations are requested. We have found issues when a flush is requested outside the usual memtable flush loop and because there is not a lot of data the controller will not have a high amount of shares. To prevent this, this patch guarantees some minimum amount of shares when extraneous operations (nodetool flush, commitlog-driven flush, etc) are requested. Another option would be to add shares instead of guarantee a minimum. But in my view the approach I am taking here has two main advantages: 1) It won't cause spikes when those operations are requested 2) It is cumbersome to add shares in the current infrastructure, as just adding backlog can cause shares to spike. Consider this example: Backlog is within the first range of very low backlog (~0.2). Shares for this would be around ~20. If we want to add 200 shares, that is equivalent to a backlog of 0.8. Once we add those two backlogs together, we end up with 1 (max backlog). Fixes #3761 Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180927131904.8826-1-glauber@scylladb.com>	2018-09-27 15:20:31 +02:00
Piotr Sarna	336cc70438	pager: add setters for partition/clustering keys	2018-09-27 15:18:06 +02:00
Piotr Sarna	7c1e4c2deb	cql3: add paging to read_posting_list Instead of a single query, paging is used in order to query an index.	2018-09-27 15:18:06 +02:00
Piotr Sarna	b83aa69a2e	cql3: add non-const get_result_metadata method	2018-09-27 15:18:06 +02:00
Piotr Sarna	430a49f91a	cql3: make find_index_* functions return paging state In order to implement secondary index paging, intermediary query functions now also return paging state for the view query.	2018-09-27 15:18:06 +02:00
Piotr Sarna	c3dd1775c8	cql3: make read_posting_list return future<rows> Instead of returning a coordinator result and making a caller parse it later, read_posting_list now extracts rows by itself. This change is later needed when querying is replaced with a pager.	2018-09-27 15:18:06 +02:00
Piotr Sarna	1d34ef38a8	cql3: make pagers use time_point instead of duration A standard way for passing a timeout parameter is specifying a time_point, while pagers used to take a duration in order to compute time points on the fly. This patch adds a timeout parameter, which is a time_point, to fetch_page().	2018-09-27 15:18:06 +02:00
Tomasz Grabiec	78d9205a50	Merge "Multiple fixes to tests/normalizing_reader" from Vladimir This patchset addresses multiple errors in normalizing_reader implementation found during review. I have decided to not make a clustering key full inside before_key()/after_key() helpers. The reason is that for this they would need schema to be passed as another parameter so existing methods don't suit. OTOH, introducing new members for a class using for testing purposes only seems an overkill. * github.com/argenet/scylla.git projects/sstables-30/normalizing_reader_fixes/v1: range_tombstone: Add constructor accepting position_in_partition_views for range bounds. tests: Make sure range tombstone is properly split over rows with non-full keys. tests: Multiple fixes for draining and clearing range tombstones in normalizing_reader.	2018-09-27 12:51:47 +02:00
Vladimir Krivopalov	653fb37ea5	range_tombstone: Remove code that duplicates logic. The actions performed by the call to set_start() were duplicated by the immediately following code lines that are removed with this patch. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <20eaa1338c1719ded34f5c9ada69ec03907936f5.1537989044.git.vladimir@scylladb.com>	2018-09-27 12:05:25 +02:00
Vladimir Krivopalov	b74706a8f5	tests: Multiple fixes for draining and clearing range tombstones in normalizing_reader. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-26 19:24:10 -07:00
Vladimir Krivopalov	26d4d276e9	tests: Make sure range tombstone is properly split over rows with non-full keys. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-26 17:19:43 -07:00
Vladimir Krivopalov	fbccae0d15	range_tombstone: Add constructor accepting position_in_partition_views for range bounds. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-26 17:17:18 -07:00
Avi Kivity	e0b34003b5	tests: sstable_mutation_test: await background jobs We only wait from the last test case, so if an individual test is executed, a memory leak may be reported. Fix by waiting from all test cases. Message-Id: <20180926203723.18026-1-avi@scylladb.com>	2018-09-26 21:48:32 +01:00
Eliran Sinvani	44d93b4d4c	cql3: fix incorrect results returned from prepared select with an IN clause When executing a prepared select statement with a multicolumn IN, the system returned incorrect results due to a memory violation (a bytes view referring to an out of scope bytes object). Added test for the prepared statement results correctness. Tests: 1. unit (release) with the new test. 2. Python script. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <36c9cf9ed3fe72e3b4801e3cd120678429ce218a.1537947897.git.eliransin@scylladb.com>	2018-09-26 15:23:41 +03:00
Eliran Sinvani	22ad5434d1	cql3 : fix a crash upon preparing select with an IN restriction due to memory violation When preparing a select query with a multicolumn in restriction, the node crashed due to using a parameter after using a move on it. Tests: 1. UnitTests (release) 2. Preparing a select statement that crashed the system before, and verify it is not crashing. Fixes #3204 Fixes #3692 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <7ebd210cd714a460ee5557ac612da970cee03270.1537947897.git.eliransin@scylladb.com>	2018-09-26 15:23:38 +03:00
Avi Kivity	8f5e80e61a	Revert "setup: add the lazytime XFS version" This reverts commit `f828fe0d59`. It causes scylla_raid_setup to fail on CentOS 7. Fixes #3784.	2018-09-26 11:10:07 +01:00
Avi Kivity	e8d988caf8	Merge "Enable existing SSTables unit tests for 'mc' format" from Vladimir and Piotr " This patchset fixes several issues in SSTables 3.x ('mc') writing and parsing and extends existing SSTables unit tests to cover the new format. The only test enabled temporarily is check_multi_schema because it turned out that reading SSTables 3.x with a different schema has not been implemented in full. This will be addressed in a separate patchset. This patchset depends on the "Support SSTables 3.x in Scylla runtime" patchset. Tests: unit {release} " * 'projects/sstables-30/unit-tests/v3' of https://github.com/argenet/scylla: (25 commits) tests: Enable existing SSTables tests for 'mc' format. tests: Fix test_wrong_range_tombstone_order for 'mc' format. tests: Extend reader assertions to check clustering keys made full. tests: Disable test_old_format_non_compound_range_tombstone_is_read for 'mc' format. tests: Disable check_multi_schema for 'mc' format. tests: Fix test_promoted_index_read for 'mc' format by using normalizing_reader. tests: Fix promoted_index_read to not rely on a specific index length tests: Add 'mc' files for test_wrong_range_tombstone_order tests: Add 'mc' files for test_wrong_counter_shard_order tests: Add 'mc' files for summary_test tests: Add 'mc' files for test_promoted_index_read tests: Add 'mc' files for test_partition_skipping tests: Add 'mc' files for large_partition tests (promoted_index_read, sub_partition_read, sub_partitions_read tests: Add 'mc' files for test_counter_read tests: Add 'mc' files for test_broken_promoted_index_is_skipped tests: SSTables 'mc' files for sliced_mutation_reads_test. tests: Introduce normalizing_reader helper for SSTables tests. mutation_fragment: Add range_tombstone_stream::empty() method. sstables: Make key full when setting a range tombstone start from end open marker. sstables: For 'mc' format, use excl_start when split an RT over a row with a full key. ...	2018-09-26 11:10:07 +01:00
Avi Kivity	337ee6153a	Merge "Support SSTables 3.x in Scylla runtime" from Vladimir and Piotr " This patchset makes it possible to use SSTables 'mc' format, commonly referred to as 'SSTables 3.x', when running Scylla instance. Several bugs found on this way are fixed. Also, a configuration option is introduced to allow running Scylla either with 'mc' or 'la' format as default. Tests: unit {release} + tested Scylla with both 'la' and 'mc' formats to work fine: cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; [3/1890] cqlsh> USE test; cqlsh:test> CREATE TABLE cfsst3 (pk int, ck int, rc int, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''}; cqlsh:test> INSERT INTO cfsst3 (pk, ck, rc) VALUES ( 4, 7, 8); <<flush>> cqlsh:test> DELETE from cfsst3 WHERE pk = 4 and ck> 3 and ck < 8; <<flush>> cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 2, 3); cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 4, 6); cqlsh:test> SELECT * FROM cfsst3 ; pk \| ck \| rc ----+----+------ 2 \| 3 \| null 4 \| 6 \| null (2 rows) <<Scylla restart>> cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 5, 7); cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 6, 8); cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 7, 9); cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 8, 10); cqlsh:test> SELECT * from cfsst3 ; pk \| ck \| rc ----+----+------ 5 \| 7 \| null 8 \| 10 \| null 2 \| 3 \| null 4 \| 6 \| null 7 \| 9 \| null 6 \| 8 \| null (6 rows) " * 'projects/sstables-30/try-runtime/v8' of https://github.com/argenet/scylla: database: Honour enable_sstables_mc_format configuration option. sstables: Support SSTables 'mc' format as a feature. db: Add configuration option for enabling SSTables 'mc' format. tests: Add test for reading a complex column with zero subcolumns (SST3). sstables: Fix parsing of complex columns with zero subcolumns. sstables: Explicitly cast api::timestamp_type to uint64_t when delta-encoding. sstables: Use parser_type instead of abstract_type::parse_type in column_translation. bytes: Add helper for turning bytes_view into sstring_view. sstables: Only forward the call to fast_forwarding_to in mp_row_consumer_m if filter exists. sstables: Fix string formatting for exception messages in m_format_read_helpers. sstables: Don't validate timestamps against the max value on parsing. sstables: Always store only min bases in serialization_header. sstables: Support 'mc' version parsing from filename. SST3: Make sure we call consume_partition_end	2018-09-26 11:10:07 +01:00
Vladimir Krivopalov	38c8d1ce05	tests: Enable existing SSTables tests for 'mc' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 18:02:46 -07:00
Vladimir Krivopalov	c33e0f3f15	tests: Fix test_wrong_range_tombstone_order for 'mc' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 18:02:46 -07:00
Vladimir Krivopalov	ad2b9e44ee	tests: Extend reader assertions to check clustering keys made full. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 18:02:46 -07:00
Vladimir Krivopalov	9239195473	tests: Disable test_old_format_non_compound_range_tombstone_is_read for 'mc' format. This test is not applicable to the 'mc' format as it covers a backward compatibility case which may only occur with SSTables generated by older Scylla versions in 'ka' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 18:02:46 -07:00
Vladimir Krivopalov	952536c9f5	tests: Disable check_multi_schema for 'mc' format. Altering types in schema has been disabled in Origin (see CASSANDRA-12443). We do the same. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 18:02:46 -07:00
Vladimir Krivopalov	86aae36e04	tests: Fix test_promoted_index_read for 'mc' format by using normalizing_reader. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	5422203714	tests: Fix promoted_index_read to not rely on a specific index length Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	be5fe11f22	tests: Add 'mc' files for test_wrong_range_tombstone_order Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	3dd6e6f899	tests: Add 'mc' files for test_wrong_counter_shard_order Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	f08a2b35da	tests: Add 'mc' files for summary_test Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	7e40947a80	tests: Add 'mc' files for test_promoted_index_read Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	20f3edba61	tests: Add 'mc' files for test_partition_skipping Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	8c37801ae5	tests: Add 'mc' files for large_partition tests (promoted_index_read, sub_partition_read, sub_partitions_read Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	28c32a353a	tests: Add 'mc' files for test_counter_read Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	60c9a25b38	tests: Add 'mc' files for test_broken_promoted_index_is_skipped Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	24342dc27d	tests: SSTables 'mc' files for sliced_mutation_reads_test. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	4393233a86	tests: Introduce normalizing_reader helper for SSTables tests. This is a helper flat_mutation_reader that wraps another reader and splits range tombstones over rows before emitting them. It is used to produce the same mutation streams for both old (ka/la) and new (mc) SSTables formats in unit tests. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	7a5c4f0a63	mutation_fragment: Add range_tombstone_stream::empty() method. The method checks if the underlying range_tombstone_list is empty. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	eddf846c8a	sstables: Make key full when setting a range tombstone start from end open marker. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	fa48a78d71	sstables: For 'mc' format, use excl_start when split an RT over a row with a full key. This fixes the monotonicity issue as otherwise the range tombstone emitted after such clustering row has a start position that should be ordered before that of the row. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	45082ef18c	sstables: Don't write promoted index consisting of a single block in 'mc' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Piotr Jastrzebski	8f5ac1d86f	SST3: Make sure we emit range tombstone when slicing/fft If we go past the slice to be read with a range tombstone being opened we need to emit an RT corresponding to this slice. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-09-25 17:55:52 -07:00
Piotr Jastrzebski	ade8027960	Add mutation_fragment_filter::upper_bound This method returns end of current position range. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-09-25 17:55:52 -07:00
Piotr Jastrzebski	82ff29cde8	Add clustering_ranges_walker::upper_bound This method returns end of current position range. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-09-25 17:55:52 -07:00
Piotr Jastrzebski	bff49345cd	Add position_in_partition_view::as_end_bound_view This will be used in sstables 3. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-09-25 17:55:52 -07:00
Vladimir Krivopalov	cd80d6ff65	database: Honour enable_sstables_mc_format configuration option. Only enable SSTables 'mc' format if the entire cluster supports it and it is enabled in the configuration file. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	c98937e04c	sstables: Support SSTables 'mc' format as a feature. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	650b245657	db: Add configuration option for enabling SSTables 'mc' format. This flag will only be used for testing purposes until Scylla 3.o release and will be removed once SSTables 'mc' testing is completed. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	0edd3c57a9	tests: Add test for reading a complex column with zero subcolumns (SST3). The files are generated by Scylla as a compaction_history table. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	24590fe88c	sstables: Fix parsing of complex columns with zero subcolumns. Before this fix, a complex column with zero subcolumns would be incorrecty parsed as it would re-read the deletion time twice. Now, this case is handled properly. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	be3613bdb6	sstables: Explicitly cast api::timestamp_type to uint64_t when delta-encoding. This avoids noisy warnings like "signed value overflow" when ASAN is turned on. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	0048f4814e	sstables: Use parser_type instead of abstract_type::parse_type in column_translation. abstract_type::parse_type() only deals with simple types and fails to parse wrapped types such as org.apache.cassandra.db.marshal.FrozenType(org.apache.cassandra.db.marshal.ListType(org.apache.cassandra.db.marshal.UTF8Type)) Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	0f298113c7	bytes: Add helper for turning bytes_view into sstring_view. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	9166badebe	sstables: Only forward the call to fast_forwarding_to in mp_row_consumer_m if filter exists. It may happen that we hit the end of partition and then get fast_forward_to() called in which case we attempt to call it from an already destroyed object. We need to check the _mf_filter value before doing so. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	fc901eb700	sstables: Fix string formatting for exception messages in m_format_read_helpers. Before this fix, the code was a potential undefined behaviour and crash because it would add a large value to a const char* and try to create a std::string out of it. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	84341821b1	sstables: Don't validate timestamps against the max value on parsing. Internally, timestamps are represented as signed integers (int64_t) but stored as unsigned ones. So it is quite possible to store data with timestamp that is represented as a number larger than the max value of int64_t type. One such example is api::min_timestamp() that is used when generating system schema tables ("keyspaces"). When cast to uint64_t, it turns into a large value. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	bdca27ae41	sstables: Always store only min bases in serialization_header. There previously was an inconsistency in treating min values stored in a serialization_header. They are written to or read from a Statistics.db as deltas against fixed bases, but when we parse timeouts from the data file, we need the full bases, not just deltas. This inconsistency causes wrong timestamp values if we write an sstable and then read from it using one and the same sstable object because we turn min values into bases on write and then don't adjust them back because we already have them in memory. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	057c26f894	sstables: Support 'mc' version parsing from filename. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Piotr Jastrzebski	d8e6d1ed98	SST3: Make sure we call consume_partition_end even when we slice and fast forward to. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-09-25 17:23:40 -07:00
Raphael S. Carvalho	745e35fa82	database: Fix sstable resharding for mc format SStable format mc doesn't write ancestors to metadata, so resharding will not work with this new format because it relies on ancestors to replace new unshared sstables with old shared ones. Fix is about not relying on ancestors metadata for this operation. Fixes #3777. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180922211933.1987-1-raphaelsc@scylladb.com>	2018-09-25 18:37:48 +03:00
Nadav Har'El	05f8ed270b	Add docs/metrics.md - documentation on metrics Today I realised that although we have per-table metrics, they are not really available by default. I was suprised to find that we don't have (as far as I can tell) a document explaining why it is so, or how to enable them anyway. Moreover, the more I investigated this issue, the more I realised how little I know on Scylla's metrics - how they are calculated, how they are collected, their different types, and so on. So I sat down to figure out everything I wanted to learn about Scylla metrics, and then wrote it all down in a new document, docs/metrics.md. There are some missing pieces in this document marked by TODO, and probably additional missing pieces that I'm not aware of, but I think this is already a good start and can be (and should be) improved-on later. We really need to have more of these documents describing various Scylla subsystems to new developers - what each subsystem does, why it does what it does, where is the code, and so on. I am facing these problems every day as a seasoned developer - I can't even imagine what our new developers face when trying to understand a subsystem they are not yet familiar with. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180920131103.20590-1-nyh@scylladb.com>	2018-09-25 17:51:20 +03:00
Paweł Dziepak	a3746d3b05	paging: make may_need_paging() more conservative There is a bad interaction between may_need_paging() and query result size limiter. The former is trying to avoid the complexity of paged queries when the number of returned rows is going to be smaller than the page size. The latter uses the fact that paged queries need not return all requested rows to limit the size of a query results. Since may_need_paging() may turn a paged query into non-paged one as a side effect it disables the oversized result protection. This patch limits the cases when may_need_paging() disables paging to the situations when we know for sure that query result size limiter won't be needed, i.e.: the result is not going to contain more than one row. If the client knows for sure that the paging is not needed and the performance impact is worthwhile it can disable paging on its side. Otherwise, let's default to the safer behaviour. Fixes #3620. Message-Id: <20180925134431.24329-1-pdziepak@scylladb.com>	2018-09-25 17:01:04 +03:00
Avi Kivity	c6f651ead4	Merge "Use fragmented buffers in commitlog writes" from Paweł " This series changes commitlog write path so that it uses fragmented buffers and therefore avoids large allocations. This is done by first switching the code to use seastar memory_output_stream interface, which can handle fragmented buffer without any additional actions from the user code needed and then making it use buffers of fixed size 128 kB. Tests: unit(release, debug) dtest(commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup commitlog_test.py:TestCommitLog.test_commitlog_replay_with_alter_table) " * tag 'fragmented-commitlog-writes/v3' of https://github.com/pdziepak/scylla: commitlog: switch to fragmented buffers commitlog: drop buffer pools commitlog: drop recovery from bad alloc utils: drop data_output commitlog: use memory_output_stream serialization_visitors: add support for memory_output_stream utils: fragmented_temporary_buffer::view: add remove_prefix() utils: fragmented_temporary_buffer: add empty() and size_bytes() utils: fragmented_temporary_buffer: add get_ostream() idl: serializer: don't assume Iterator::value_type is bytes_view idl: serializer: create buffer view from streams utils: crc: accept FragmentRange	2018-09-25 12:43:06 +03:00
Avi Kivity	8276ada1c4	tests: sstable_3_x_test: await sstable background tasks When an sstable is deleted, this work is done as a background task since it cannot be done from the destructor. If we don't wait for that background task, it is detected as a leak by ASAN. Fix by waiting for background tasks in every test. A more complete fix would involve having a factory class create sstables and assume the responsibility for background tasks, and something similar to with_cql_test_env(), but that is deferred until later. Tests: sstable_3_x_test (debug). Message-Id: <20180923111745.8313-1-avi@scylladb.com>	2018-09-24 10:43:58 +02:00
Takuya ASADA	21a12aa458	dist/redhat: specify correct repo file path on scylla-housekeeping services Currently, both scylla-housekeeping-daily/-restart services mistakenly specify repo file path as "@@REPOFILES@@", witch is copied from .in template, need to be replace with actual path. Fixes #3776 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180921031605.9330-1-syuu@scylladb.com>	2018-09-23 11:38:26 +03:00
Glauber Costa	f828fe0d59	setup: add the lazytime XFS version Starting with kernel 4.17 XFS will support the lazytime mount option. That will be beneficial for Scylla as updating times synchronously is one of our current sources of stalls. Fortunately, older kernels are able to parse the option and just ignore it. We verified that to be the case in a 4.15 kernel on ubuntu. Therefore, just add the option unconditionally. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180920170017.13215-1-glauber@scylladb.com>	2018-09-20 20:12:44 +03:00
Gleb Natapov	0bf9a78c78	sstables: wrap file into checked file after applying extensions File extensions can also produce errors that checked file wants to intercept and act upon. The patch changes the order in which files are wrapped to make checked file the outermost wrapped to be able to handle exception generated by all inner wrappers. Message-Id: <20180920124430.GD2326@scylladb.com>	2018-09-20 15:57:38 +03:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Asias He	de05df216f	streaming: Use rpc::source on the shard where it is created rpc::source can only work on the shard where it is created, thus we can not apply the load distribution optimization. Disable it and let the multishard_writer to forward the data to the correct shard. Fixes #3731. Message-Id: <0d1b4d3e7adcfdc4e392b83aeb2544b95f3f46dd.1537430162.git.asias@scylladb.com>	2018-09-20 12:29:24 +03:00
Avi Kivity	8b2bf73c6f	Merge "Fix compaction metadata read/write for SSTables 3.x" from Vladimir " In SSTables 3.x, the 'ancestors' field of compaction metadata is no longer stored in the Statistics.db file The newly added test has previously failed due to this inconsistency. Tests: unit {release} " * 'projects/sstables-30/empty_clustering_key/v1' of https://github.com/argenet/scylla: tests: Add test for reading table with empty clustering key from SSTables 3.x. tests: Update Statistics.db files for SSTables 3.x write tests. sstables: Do not parse ancestors from compaction metadata for SSTables 3.x	2018-09-20 09:53:46 +03:00
Vladimir Krivopalov	bf351c4a4f	tests: Add test for reading table with empty clustering key from SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-19 20:57:23 -07:00
Vladimir Krivopalov	3bbb013ecd	tests: Update Statistics.db files for SSTables 3.x write tests. Those files have been generated with 'ancestors' field in compaction metadata and so were invalid. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-19 20:57:23 -07:00
Vladimir Krivopalov	48fa088ec6	sstables: Do not parse ancestors from compaction metadata for SSTables 3.x Ancestors array has been removed starting from 'ma' format (CASSANDRA-7066). Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-19 17:11:43 -07:00
Vlad Zolotarov	043ced243e	fix_system_distributed_tables.sh: adjust newly added 'request_size' and 'response_size' columns Adjust the script to the new schema of system_traces.sessions. Two new columns have been added: - request_size: int - response_size: int Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20180919005504.12498-1-vladz@scylladb.com>	2018-09-19 15:46:11 +01:00
Paweł Dziepak	4469f76e7c	commitlog: switch to fragmented buffers So far commitlog was using contiguous buffers for storing the data that is about to be written to disk. It was able to coalesce small writes so that multiple small mutations would use the same buffer, but if a muation was large the commitlog would attempt to allocate a single, appropriately large buffer. This excessively stresses the memory allocator and may cause memory fragmentation to become an issue. The solution is to use fixed-size buffers of 128 kB, which is the standard buffer size in Scylla and keep large values fragmented.	2018-09-18 17:22:59 +01:00
Paweł Dziepak	7c1add6769	commitlog: drop buffer pools Buffer pools were added in `7191a130bb` "Commitlog: recycle buffers to reduce fragmentation." They introduce a lot of complexity and will become unnecessary once the code is switched to use fixed-size 128kB buffers.	2018-09-18 17:22:59 +01:00
Paweł Dziepak	9fee8b8d76	commitlog: drop recovery from bad alloc If a node cannot allocate a 128 kB it is already in a very bad shape, so there isn't much value in trying to recover by attempting smaller allocations and it just adds more complexity to the segment allocation. It actually may be better to let some requests fail and give the node a chance to recover rather than trying to use every last byte of free memory and end up with bad_alloc in a noexcept context.	2018-09-18 17:22:59 +01:00
Paweł Dziepak	2e5b375309	utils: drop data_output	2018-09-18 17:22:59 +01:00
Paweł Dziepak	fe48aaae46	commitlog: use memory_output_stream memory_output_stream deals with all required pointer arithmetic and allows easy transition to fragmented buffers.	2018-09-18 17:22:59 +01:00
Paweł Dziepak	b9ab058834	serialization_visitors: add support for memory_output_stream	2018-09-18 17:22:59 +01:00
Paweł Dziepak	cbe2ef9e5c	utils: fragmented_temporary_buffer::view: add remove_prefix()	2018-09-18 17:22:59 +01:00
Alexys Jacob	24b90ef527	configure.py: coding style fixes configure.py:23:10: E401 multiple imports on one line configure.py:39:61: W291 trailing whitespace configure.py:47:1: E302 expected 2 blank lines, found 1 configure.py:53:16: W291 trailing whitespace configure.py:55:1: E302 expected 2 blank lines, found 1 configure.py:62:1: E302 expected 2 blank lines, found 1 configure.py:63:53: E251 unexpected spaces around keyword / parameter equals configure.py:63:55: E251 unexpected spaces around keyword / parameter equals configure.py:63:68: E251 unexpected spaces around keyword / parameter equals configure.py:63:70: E251 unexpected spaces around keyword / parameter equals configure.py:63:92: E251 unexpected spaces around keyword / parameter equals configure.py:63:94: E251 unexpected spaces around keyword / parameter equals configure.py:64:33: E251 unexpected spaces around keyword / parameter equals configure.py:64:35: E251 unexpected spaces around keyword / parameter equals configure.py:65:54: E251 unexpected spaces around keyword / parameter equals configure.py:65:56: E251 unexpected spaces around keyword / parameter equals configure.py:65:69: E251 unexpected spaces around keyword / parameter equals configure.py:65:71: E251 unexpected spaces around keyword / parameter equals configure.py:65:94: E251 unexpected spaces around keyword / parameter equals configure.py:65:96: E251 unexpected spaces around keyword / parameter equals configure.py:66:33: E251 unexpected spaces around keyword / parameter equals configure.py:66:35: E251 unexpected spaces around keyword / parameter equals configure.py:68:1: E302 expected 2 blank lines, found 1 configure.py:72:18: E712 comparison to True should be 'if cond is True:' or 'if cond:' configure.py:80:1: E302 expected 2 blank lines, found 1 configure.py:83:1: E302 expected 2 blank lines, found 1 configure.py:87:1: E302 expected 2 blank lines, found 1 configure.py:87:33: E251 unexpected spaces around keyword / parameter equals configure.py:87:35: E251 unexpected spaces around keyword / parameter equals configure.py:87:45: E251 unexpected spaces around keyword / parameter equals configure.py:87:47: E251 unexpected spaces around keyword / parameter equals configure.py:88:56: E251 unexpected spaces around keyword / parameter equals configure.py:88:58: E251 unexpected spaces around keyword / parameter equals configure.py:90:1: E302 expected 2 blank lines, found 1 configure.py:94:1: E302 expected 2 blank lines, found 1 configure.py:94:42: E251 unexpected spaces around keyword / parameter equals configure.py:94:44: E251 unexpected spaces around keyword / parameter equals configure.py:94:54: E251 unexpected spaces around keyword / parameter equals configure.py:94:56: E251 unexpected spaces around keyword / parameter equals configure.py:104:42: E251 unexpected spaces around keyword / parameter equals configure.py:104:44: E251 unexpected spaces around keyword / parameter equals configure.py:105:42: E251 unexpected spaces around keyword / parameter equals configure.py:105:44: E251 unexpected spaces around keyword / parameter equals configure.py:110:1: E302 expected 2 blank lines, found 1 configure.py:114:29: E251 unexpected spaces around keyword / parameter equals configure.py:114:31: E251 unexpected spaces around keyword / parameter equals configure.py:114:61: E251 unexpected spaces around keyword / parameter equals configure.py:114:63: E251 unexpected spaces around keyword / parameter equals configure.py:116:1: E302 expected 2 blank lines, found 1 configure.py:123:26: E251 unexpected spaces around keyword / parameter equals configure.py:123:28: E251 unexpected spaces around keyword / parameter equals configure.py:123:49: E251 unexpected spaces around keyword / parameter equals configure.py:123:51: E251 unexpected spaces around keyword / parameter equals configure.py:123:84: E251 unexpected spaces around keyword / parameter equals configure.py:123:86: E251 unexpected spaces around keyword / parameter equals configure.py:129:1: E302 expected 2 blank lines, found 1 configure.py:135:1: E302 expected 2 blank lines, found 1 configure.py:137:35: E251 unexpected spaces around keyword / parameter equals configure.py:137:37: E251 unexpected spaces around keyword / parameter equals configure.py:137:53: E251 unexpected spaces around keyword / parameter equals configure.py:137:55: E251 unexpected spaces around keyword / parameter equals configure.py:137:83: E251 unexpected spaces around keyword / parameter equals configure.py:137:85: E251 unexpected spaces around keyword / parameter equals configure.py:143:1: E302 expected 2 blank lines, found 1 configure.py:148:1: E302 expected 2 blank lines, found 1 configure.py:152:5: E301 expected 1 blank line, found 0 configure.py:159:5: E301 expected 1 blank line, found 0 configure.py:161:5: E301 expected 1 blank line, found 0 configure.py:163:5: E301 expected 1 blank line, found 0 configure.py:165:5: E301 expected 1 blank line, found 0 configure.py:168:1: E302 expected 2 blank lines, found 1 configure.py:169:5: F841 local variable 'mach' is assigned to but never used configure.py:175:1: E302 expected 2 blank lines, found 1 configure.py:178:5: E301 expected 1 blank line, found 0 configure.py:183:5: E301 expected 1 blank line, found 0 configure.py:185:5: E301 expected 1 blank line, found 0 configure.py:187:5: E301 expected 1 blank line, found 0 configure.py:189:5: E301 expected 1 blank line, found 0 configure.py:192:1: E305 expected 2 blank lines after class or function definition, found 1 configure.py:329:5: E123 closing bracket does not match indentation of opening bracket's line configure.py:335:5: E123 closing bracket does not match indentation of opening bracket's line configure.py:340:41: E251 unexpected spaces around keyword / parameter equals configure.py:340:43: E251 unexpected spaces around keyword / parameter equals configure.py:340:60: E251 unexpected spaces around keyword / parameter equals configure.py:340:62: E251 unexpected spaces around keyword / parameter equals configure.py:340:85: E251 unexpected spaces around keyword / parameter equals configure.py:340:87: E251 unexpected spaces around keyword / parameter equals configure.py:341:30: E251 unexpected spaces around keyword / parameter equals configure.py:341:32: E251 unexpected spaces around keyword / parameter equals configure.py:342:29: E251 unexpected spaces around keyword / parameter equals configure.py:342:31: E251 unexpected spaces around keyword / parameter equals configure.py:343:38: E251 unexpected spaces around keyword / parameter equals configure.py:343:40: E251 unexpected spaces around keyword / parameter equals configure.py:343:54: E251 unexpected spaces around keyword / parameter equals configure.py:343:56: E251 unexpected spaces around keyword / parameter equals configure.py:344:29: E251 unexpected spaces around keyword / parameter equals configure.py:344:31: E251 unexpected spaces around keyword / parameter equals configure.py:345:37: E251 unexpected spaces around keyword / parameter equals configure.py:345:39: E251 unexpected spaces around keyword / parameter equals configure.py:345:52: E251 unexpected spaces around keyword / parameter equals configure.py:345:54: E251 unexpected spaces around keyword / parameter equals configure.py:346:29: E251 unexpected spaces around keyword / parameter equals configure.py:346:31: E251 unexpected spaces around keyword / parameter equals configure.py:349:43: E251 unexpected spaces around keyword / parameter equals configure.py:349:45: E251 unexpected spaces around keyword / parameter equals configure.py:349:59: E251 unexpected spaces around keyword / parameter equals configure.py:349:61: E251 unexpected spaces around keyword / parameter equals configure.py:349:84: E251 unexpected spaces around keyword / parameter equals configure.py:349:86: E251 unexpected spaces around keyword / parameter equals configure.py:350:29: E251 unexpected spaces around keyword / parameter equals configure.py:350:31: E251 unexpected spaces around keyword / parameter equals configure.py:351:44: E251 unexpected spaces around keyword / parameter equals configure.py:351:46: E251 unexpected spaces around keyword / parameter equals configure.py:351:60: E251 unexpected spaces around keyword / parameter equals configure.py:351:62: E251 unexpected spaces around keyword / parameter equals configure.py:351:86: E251 unexpected spaces around keyword / parameter equals configure.py:351:88: E251 unexpected spaces around keyword / parameter equals configure.py:352:29: E251 unexpected spaces around keyword / parameter equals configure.py:352:31: E251 unexpected spaces around keyword / parameter equals configure.py:353:43: E251 unexpected spaces around keyword / parameter equals configure.py:353:45: E251 unexpected spaces around keyword / parameter equals configure.py:353:59: E251 unexpected spaces around keyword / parameter equals configure.py:353:61: E251 unexpected spaces around keyword / parameter equals configure.py:353:79: E251 unexpected spaces around keyword / parameter equals configure.py:353:81: E251 unexpected spaces around keyword / parameter equals configure.py:354:29: E251 unexpected spaces around keyword / parameter equals configure.py:354:31: E251 unexpected spaces around keyword / parameter equals configure.py:355:45: E251 unexpected spaces around keyword / parameter equals configure.py:355:47: E251 unexpected spaces around keyword / parameter equals configure.py:355:61: E251 unexpected spaces around keyword / parameter equals configure.py:355:63: E251 unexpected spaces around keyword / parameter equals configure.py:355:78: E251 unexpected spaces around keyword / parameter equals configure.py:355:80: E251 unexpected spaces around keyword / parameter equals configure.py:356:29: E251 unexpected spaces around keyword / parameter equals configure.py:356:31: E251 unexpected spaces around keyword / parameter equals configure.py:359:45: E251 unexpected spaces around keyword / parameter equals configure.py:359:47: E251 unexpected spaces around keyword / parameter equals configure.py:359:61: E251 unexpected spaces around keyword / parameter equals configure.py:359:63: E251 unexpected spaces around keyword / parameter equals configure.py:359:83: E251 unexpected spaces around keyword / parameter equals configure.py:359:85: E251 unexpected spaces around keyword / parameter equals configure.py:360:29: E251 unexpected spaces around keyword / parameter equals configure.py:360:31: E251 unexpected spaces around keyword / parameter equals configure.py:361:48: E251 unexpected spaces around keyword / parameter equals configure.py:361:50: E251 unexpected spaces around keyword / parameter equals configure.py:361:69: E251 unexpected spaces around keyword / parameter equals configure.py:361:71: E251 unexpected spaces around keyword / parameter equals configure.py:361:87: E251 unexpected spaces around keyword / parameter equals configure.py:361:89: E251 unexpected spaces around keyword / parameter equals configure.py:362:29: E251 unexpected spaces around keyword / parameter equals configure.py:362:31: E251 unexpected spaces around keyword / parameter equals configure.py:363:48: E251 unexpected spaces around keyword / parameter equals configure.py:363:50: E251 unexpected spaces around keyword / parameter equals configure.py:363:64: E251 unexpected spaces around keyword / parameter equals configure.py:363:66: E251 unexpected spaces around keyword / parameter equals configure.py:363:89: E251 unexpected spaces around keyword / parameter equals configure.py:363:91: E251 unexpected spaces around keyword / parameter equals configure.py:364:29: E251 unexpected spaces around keyword / parameter equals configure.py:364:31: E251 unexpected spaces around keyword / parameter equals configure.py:365:46: E251 unexpected spaces around keyword / parameter equals configure.py:365:48: E251 unexpected spaces around keyword / parameter equals configure.py:365:62: E251 unexpected spaces around keyword / parameter equals configure.py:365:64: E251 unexpected spaces around keyword / parameter equals configure.py:365:82: E251 unexpected spaces around keyword / parameter equals configure.py:365:84: E251 unexpected spaces around keyword / parameter equals configure.py:365:97: E251 unexpected spaces around keyword / parameter equals configure.py:365:99: E251 unexpected spaces around keyword / parameter equals configure.py:366:29: E251 unexpected spaces around keyword / parameter equals configure.py:366:31: E251 unexpected spaces around keyword / parameter equals configure.py:367:48: E251 unexpected spaces around keyword / parameter equals configure.py:367:50: E251 unexpected spaces around keyword / parameter equals configure.py:367:70: E251 unexpected spaces around keyword / parameter equals configure.py:367:72: E251 unexpected spaces around keyword / parameter equals configure.py:368:1: E101 indentation contains mixed spaces and tabs configure.py:368:1: W191 indentation contains tabs configure.py:368:4: E128 continuation line under-indented for visual indent configure.py:368:8: E251 unexpected spaces around keyword / parameter equals configure.py:368:10: E251 unexpected spaces around keyword / parameter equals configure.py:369:48: E251 unexpected spaces around keyword / parameter equals configure.py:369:50: E251 unexpected spaces around keyword / parameter equals configure.py:369:73: E251 unexpected spaces around keyword / parameter equals configure.py:369:75: E251 unexpected spaces around keyword / parameter equals configure.py:370:1: E101 indentation contains mixed spaces and tabs configure.py:370:13: E128 continuation line under-indented for visual indent configure.py:370:17: E251 unexpected spaces around keyword / parameter equals configure.py:370:19: E251 unexpected spaces around keyword / parameter equals configure.py:371:47: E251 unexpected spaces around keyword / parameter equals configure.py:371:49: E251 unexpected spaces around keyword / parameter equals configure.py:371:71: E251 unexpected spaces around keyword / parameter equals configure.py:371:73: E251 unexpected spaces around keyword / parameter equals configure.py:372:13: E128 continuation line under-indented for visual indent configure.py:372:17: E251 unexpected spaces around keyword / parameter equals configure.py:372:19: E251 unexpected spaces around keyword / parameter equals configure.py:373:50: E251 unexpected spaces around keyword / parameter equals configure.py:373:52: E251 unexpected spaces around keyword / parameter equals configure.py:373:76: E251 unexpected spaces around keyword / parameter equals configure.py:373:78: E251 unexpected spaces around keyword / parameter equals configure.py:374:13: E128 continuation line under-indented for visual indent configure.py:374:17: E251 unexpected spaces around keyword / parameter equals configure.py:374:19: E251 unexpected spaces around keyword / parameter equals configure.py:375:52: E251 unexpected spaces around keyword / parameter equals configure.py:375:54: E251 unexpected spaces around keyword / parameter equals configure.py:375:68: E251 unexpected spaces around keyword / parameter equals configure.py:375:70: E251 unexpected spaces around keyword / parameter equals configure.py:375:94: E251 unexpected spaces around keyword / parameter equals configure.py:375:96: E251 unexpected spaces around keyword / parameter equals configure.py:375:109: E251 unexpected spaces around keyword / parameter equals configure.py:375:111: E251 unexpected spaces around keyword / parameter equals configure.py:376:29: E251 unexpected spaces around keyword / parameter equals configure.py:376:31: E251 unexpected spaces around keyword / parameter equals configure.py:377:43: E251 unexpected spaces around keyword / parameter equals configure.py:377:45: E251 unexpected spaces around keyword / parameter equals configure.py:377:59: E251 unexpected spaces around keyword / parameter equals configure.py:377:61: E251 unexpected spaces around keyword / parameter equals configure.py:377:79: E251 unexpected spaces around keyword / parameter equals configure.py:377:81: E251 unexpected spaces around keyword / parameter equals configure.py:378:29: E251 unexpected spaces around keyword / parameter equals configure.py:378:31: E251 unexpected spaces around keyword / parameter equals configure.py:379:30: E251 unexpected spaces around keyword / parameter equals configure.py:379:32: E251 unexpected spaces around keyword / parameter equals configure.py:379:46: E251 unexpected spaces around keyword / parameter equals configure.py:379:48: E251 unexpected spaces around keyword / parameter equals configure.py:379:62: E251 unexpected spaces around keyword / parameter equals configure.py:379:64: E251 unexpected spaces around keyword / parameter equals configure.py:380:30: E251 unexpected spaces around keyword / parameter equals configure.py:380:32: E251 unexpected spaces around keyword / parameter equals configure.py:380:44: E251 unexpected spaces around keyword / parameter equals configure.py:380:46: E251 unexpected spaces around keyword / parameter equals configure.py:380:58: E251 unexpected spaces around keyword / parameter equals configure.py:380:60: E251 unexpected spaces around keyword / parameter equals configure.py:395:36: E251 unexpected spaces around keyword / parameter equals configure.py:395:38: E251 unexpected spaces around keyword / parameter equals configure.py:395:76: E251 unexpected spaces around keyword / parameter equals configure.py:395:78: E251 unexpected spaces around keyword / parameter equals configure.py:398:18: E127 continuation line over-indented for visual indent configure.py:424:32: W291 trailing whitespace configure.py:649:18: E124 closing bracket does not match visual indentation configure.py:650:17: E127 continuation line over-indented for visual indent configure.py:650:17: W503 line break before binary operator configure.py:651:17: W503 line break before binary operator configure.py:652:17: E124 closing bracket does not match visual indentation configure.py:784:8: E713 test for membership should be 'not in' configure.py:790:45: W291 trailing whitespace configure.py:819:32: E261 at least two spaces before inline comment configure.py:832:5: E123 closing bracket does not match indentation of opening bracket's line configure.py:836:35: E251 unexpected spaces around keyword / parameter equals configure.py:836:37: E251 unexpected spaces around keyword / parameter equals configure.py:836:49: E251 unexpected spaces around keyword / parameter equals configure.py:836:51: E251 unexpected spaces around keyword / parameter equals configure.py:845:45: E251 unexpected spaces around keyword / parameter equals configure.py:845:47: E251 unexpected spaces around keyword / parameter equals configure.py:845:59: E251 unexpected spaces around keyword / parameter equals configure.py:845:61: E251 unexpected spaces around keyword / parameter equals configure.py:848:43: E251 unexpected spaces around keyword / parameter equals configure.py:848:45: E251 unexpected spaces around keyword / parameter equals configure.py:869:1: E302 expected 2 blank lines, found 1 configure.py:879:1: E305 expected 2 blank lines after class or function definition, found 1 configure.py:965:118: E225 missing whitespace around operator configure.py:967:18: E124 closing bracket does not match visual indentation configure.py:969:27: F821 undefined name 'python' configure.py:969:73: E251 unexpected spaces around keyword / parameter equals configure.py:969:75: E251 unexpected spaces around keyword / parameter equals configure.py:976:7: E201 whitespace after '{' configure.py:976:12: E203 whitespace before ':' configure.py:976:73: E202 whitespace before '}' configure.py:981:58: E251 unexpected spaces around keyword / parameter equals configure.py:981:60: E251 unexpected spaces around keyword / parameter equals configure.py:987:10: E222 multiple spaces after operator configure.py:1001:17: E124 closing bracket does not match visual indentation configure.py:1026:29: E251 unexpected spaces around keyword / parameter equals configure.py:1026:31: E251 unexpected spaces around keyword / parameter equals configure.py:1100:82: W291 trailing whitespace configure.py:1110:29: E251 unexpected spaces around keyword / parameter equals configure.py:1110:31: E251 unexpected spaces around keyword / parameter equals configure.py:1110:49: E251 unexpected spaces around keyword / parameter equals configure.py:1110:51: E251 unexpected spaces around keyword / parameter equals configure.py:1111:64: E251 unexpected spaces around keyword / parameter equals configure.py:1111:66: E251 unexpected spaces around keyword / parameter equals configure.py:1112:13: E128 continuation line under-indented for visual indent configure.py:1112:22: E251 unexpected spaces around keyword / parameter equals configure.py:1112:24: E251 unexpected spaces around keyword / parameter equals configure.py:1140:106: W291 trailing whitespace configure.py:1149:86: E127 continuation line over-indented for visual indent configure.py:1191:116: E251 unexpected spaces around keyword / parameter equals configure.py:1191:118: E251 unexpected spaces around keyword / parameter equals configure.py:1191:139: E251 unexpected spaces around keyword / parameter equals configure.py:1191:141: E251 unexpected spaces around keyword / parameter equals configure.py:1197:83: E231 missing whitespace after ',' configure.py:1200:76: E231 missing whitespace after ',' configure.py:1215:99: W291 trailing whitespace configure.py:1242:31: E251 unexpected spaces around keyword / parameter equals configure.py:1242:33: E251 unexpected spaces around keyword / parameter equals Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20180917155438.12410-1-ultrabug@gentoo.org>	2018-09-18 13:49:23 +03:00
Avi Kivity	e5e59ea9cf	Merge "More SSTables 3.x write tests enriched with read after write." from Vladimir " Some of the write tests were missing the read after write validation which has now been added for better coverage. Tests: unit {release} " * 'projects/sstables-30/more-enriched-tests/v1' of https://github.com/argenet/scylla: tests: Enrich test_write_adjacent_range_tombstones_with_rows with read after write tests: Enrich test_write_many_range_tombstones with read after write tests: Enrich test_write_mixed_rows_and_range_tombstones with read after write tests: Enrich test_write_non_adjacent_range_tombstones with read after write tests: Enrich test_write_adjacent_range_tombstones with read after write tests: Enrich test_write_simple_range_tombstone with read after write. tests: Enrich test_write_deleted_column with read after write.	2018-09-18 13:45:52 +03:00
Paweł Dziepak	e464ad4f5d	utils: fragmented_temporary_buffer: add empty() and size_bytes()	2018-09-18 11:29:37 +01:00
Paweł Dziepak	f4bb219a8b	utils: fragmented_temporary_buffer: add get_ostream()	2018-09-18 11:29:37 +01:00
Paweł Dziepak	196c5a5eee	idl: serializer: don't assume Iterator::value_type is bytes_view	2018-09-18 11:29:36 +01:00
Paweł Dziepak	953942b256	idl: serializer: create buffer view from streams	2018-09-18 11:29:36 +01:00
Paweł Dziepak	252cf0c681	utils: crc: accept FragmentRange	2018-09-18 11:29:36 +01:00
Avi Kivity	9d90ba470b	Merge "Fix deleted counters handling in SSTables 3.x" from Vladimir " This patchset fixes the bug in SSTables 3.x parser that did not properly handle deleted counter cells. A write test is enriched to validate read after write so that this case is covered. Tests: unit {release} " * 'projects/sstables-30/fix-deleted-counters-read/v1' of https://github.com/argenet/scylla: tests: Read after write in test_write_counter_table. sstables: Fix deleted counter cells processing in SSTables 3.x parser.	2018-09-18 12:20:54 +03:00
Vladimir Krivopalov	8c08ccbd3b	tests: Enrich test_write_adjacent_range_tombstones_with_rows with read after write Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 11:06:24 -07:00
Vladimir Krivopalov	f0966a935e	tests: Enrich test_write_many_range_tombstones with read after write Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 11:06:10 -07:00
Vladimir Krivopalov	262874a90c	tests: Enrich test_write_mixed_rows_and_range_tombstones with read after write Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 11:05:56 -07:00
Vladimir Krivopalov	6fbf4d3589	tests: Enrich test_write_non_adjacent_range_tombstones with read after write Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 11:05:42 -07:00
Vladimir Krivopalov	4bf9c87a1a	tests: Enrich test_write_adjacent_range_tombstones with read after write Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 11:05:26 -07:00
Vladimir Krivopalov	5b087daf91	tests: Enrich test_write_simple_range_tombstone with read after write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 11:04:57 -07:00
Vladimir Krivopalov	e63d960b8e	tests: Enrich test_write_deleted_column with read after write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 11:04:25 -07:00
Eliran Sinvani	83628f5881	cql3: maintain correctness of multicolumn restriction on mixed order columns When a query with multicolumn inequality is issued on clustering columns having mixed order (ASC and DESC together), if the ranges are not broken to none overlapping lexicographically monotonic ones, the node return incorrect rows. This is due to the search nature (prefix comparison). The solution is to break the range imposed by the restriction into several single column restrictions OR-ed together that will be logically equivalent and preserve the monotonicity assumption. This commit also fixes incorrect results returned by a multicolumn query on an all descending columns. A unit test have been added to account for both issues fixed. Fixes #2050 Tests: Unit test, manual tests of the use case in the issue. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <3b96620a3bd8b0614359a3b0757f324d45189dbb.1536478193.git.eliransin@scylladb.com>	2018-09-17 20:35:55 +03:00
Vladimir Krivopalov	e796fa2b02	tests: Read after write in test_write_counter_table. This covers the case of deleted counter cells. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 10:11:48 -07:00
Vladimir Krivopalov	79ccce147c	sstables: Fix deleted counter cells processing in SSTables 3.x parser. Deleted counter cells should be processed the same way as regular deleted cells. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-17 10:10:57 -07:00
Alexys Jacob	cd74dfebfb	scripts: coding style fixes scripts/create-relocatable-package.py:24:1: F401 'shutil' imported but unused scripts/create-relocatable-package.py:24:1: F401 'tempfile' imported but unused scripts/create-relocatable-package.py:24:16: E401 multiple imports on one line scripts/create-relocatable-package.py:26:1: E302 expected 2 blank lines, found 1 scripts/create-relocatable-package.py:47:1: E305 expected 2 blank lines after class or function definition, found 1 scripts/create-relocatable-package.py:93:6: E225 missing whitespace around operator Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20180917152520.5032-1-ultrabug@gentoo.org>	2018-09-17 18:40:23 +03:00
Alexys Jacob	c80d7b97cc	scyllatop: more coding style fixes tools/scyllatop/metric.py:2:1: F401 're' imported but unused tools/scyllatop/metric.py:53:20: E221 multiple spaces before operator tools/scyllatop/metric.py:69:20: E221 multiple spaces before operator Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20180917153308.7240-1-ultrabug@gentoo.org>	2018-09-17 18:39:53 +03:00
Raphael S. Carvalho	5bc028f78b	database: fix 2x increase in disk usage during cleanup compaction Don't hold reference to sstables cleaned up, so that file descriptors for their index and data files will be closed and consequently disk space released. Fixes #3735. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180914194047.26288-1-raphaelsc@scylladb.com>	2018-09-17 17:26:46 +03:00
Alexys Jacob	46d101c1f2	scyllatop: coding style fixes tools/scyllatop/prometheus.py:3:1: F401 'sys' imported but unused tools/scyllatop/prometheus.py:7:1: E302 expected 2 blank lines, found 1 tools/scyllatop/prometheus.py:12:5: E301 expected 1 blank line, found 0 tools/scyllatop/prometheus.py:17:1: W293 blank line contains whitespace tools/scyllatop/prometheus.py:22:82: E225 missing whitespace around operator Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20180914110847.1862-1-ultrabug@gentoo.org>	2018-09-17 15:45:43 +03:00
Botond Dénes	a84c26799d	tests/mutation_reader_test: fix flaky restricted reader timeout test The test in question is `restricted_reader_timeout`. Use `eventually_true()` instead of `sleep()` to wait on the timeout expiring, making the test more robust on overloaded machines. Also fix graceful failing, another longstanding issue with this test. The readers created for the test need different destruction logic depending whether the test failed or succeeded. Previously this was dealt with by using the logic that worked in case of success and using asserts to abort when the test failed, thus avoiding developers investigating the invalid memory accesses happening due to the wrong destruction logic. The solution is to use BOOST_CHECK() macro in the check that validates whether timeout works as expected. This allows for execution to continue even if the test failed, and thus allows for running the proper cleanup code even when the test failed. Fixes: #3719 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <911921dffc924f1b0a3e86408757467e9be2b65b.1537169933.git.bdenes@scylladb.com>	2018-09-17 09:40:45 +01:00
Nadav Har'El	0006e21c4d	tests/view_complex_test: add missing timestamp test_partial_delete_selected_column() does a long string of various updates and deletes, each specifies a different timestamp. In one of these updates, the timestamp was forgotten. This means that the server picks the current time, a large number. As the test is currently written, it doesn't matter which timestamp was chosen, the test would still succeed (if timestamp >= 15, and it must be since the timestamp is the time from the epoch). But the intention was probably to use timestamp = 15, so let's make this intention clear. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180905095552.11883-2-nyh@scylladb.com>	2018-09-17 00:38:55 +01:00
Nadav Har'El	2ae4ed151e	tests/view_complex_test - add test passpoints We recently saw a failure in test_partial_delete_selected_column() but this is a very long test doing many operations and comparisons of their results, and without BOOST_TEST_PASSPOINT() we can't know which of them really failed. So let's sprinkle BOOST_TEST_PASSPOINT() calls between the different parts of test_partial_delete_selected_column(). If this test ever fails again, we'll know where. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180905095552.11883-1-nyh@scylladb.com>	2018-09-17 00:38:55 +01:00
Jesse Haber-Kucharsky	9d27045c76	auth: Shorten `random_device` instance life-span On Fedora 28, creating an instance of `std::random_device` opens a file descriptor for `/dev/urandom` (observed via `strace`). By declaring static thread-local instances of `std::random_device`, these descriptors will be open (barring optimization by the compiler) for the entire duration of the Scylla process's life. However, the `std::random_device` instance is only necessary for initializing the `RandomNumberEngine` for generating salts. With this change, the file-descriptor is closed immediately after the engine is initialized. I considered generalizing this pattern of initialization into a function, but with only two uses (and simple ones) I think this would only obscure things. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Tests: unit (release) Message-Id: <f1b985d99f66e5e64d714fd0f087e235b71557d2.1536697368.git.jhaberku@scylladb.com>	2018-09-12 12:14:21 +01:00
Botond Dénes	dfad223ea2	multishard_mutation_reader: shard_reader: don't do concurrent read-aheads multishard_mutation_reader starts read-aheads on the shards-to-be-read-soon. When doing this it didn't check whether the respective shards had an ongoing read-ahead already. This lead to a single shard executing multiple concurrent read-aheads. This is damaging for multiple reasons: * Can lead to concurrent access of the remote reader's data members. * The `shard_reader` was designed around a single read-ahead and thus will synchronise foreground reads with only the last one. The practical implications of this seen so far was that queries reading a large number of rows (large enough to reliably trigger the bug) would stop the read early, due the `combined_mutation_reader`'s internal accounting being messed up by concurrent access. Also add a unit test. Instead of coming up with a very specific, and very contrived unit test, use the test-case that detected this bug in the first place: count(*) on a table with lots of rows (>1000). This unit-test should serve well for detecting any similar bugs in the future. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <ff1c49be64e2fb443f9aa8c5c8d235e682442248.1536746388.git.bdenes@scylladb.com>	2018-09-12 11:43:18 +01:00
Botond Dénes	6a07b8ae83	multishard_mutation_reader: update shard_reader's comment The `adandoned` member was renamed to `stopped`. Update the comment accordingly. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1d655785f28fe1e5fa041f2f49852f0ad88be53e.1536743950.git.bdenes@scylladb.com>	2018-09-12 11:32:08 +02:00
Botond Dénes	d9a2ffad84	mutation_partition: don't move tracing_state early Currently the `trace_state` is moved into the `querier` object's constructor when one has to be created. Since the trace_state is used below this lines this had the effect that on the first page of the query, when a querier object has to be created, tracing would not work inside the `querier_cache` which received a move-from `trace_state` (a nullptr effectively). Change the move to a copy so the other half of the function doesn't use a moved-from `trace_state`. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4987419781aa287141aa9dc8ce99c5068b564c84.1536739052.git.bdenes@scylladb.com>	2018-09-12 11:32:08 +02:00
Botond Dénes	49704755b0	combined_mutation_reader: propagate timeout in fill_buffer() All user reads go through the combined reader. Not propagating the timeout down from there means that the storage layer's timeout functionality is effectively disabled. Spotted while reading the code. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <7fc10eca1c231dd04ac433913d9e6a51b6b17139.1536657041.git.bdenes@scylladb.com>	2018-09-11 15:44:28 +02:00
Botond Dénes	99ab43a1cc	flat_mutation_reader: add timeout parameter to operator()() For consistency with fast_foward_to() and fill_buffer(), and for correctness: operator()() calls fill_buffer() and thus should provide a timeout for the storage layer. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <6e97552ac2372e5846c955d94400b5315dbd2a89.1536657041.git.bdenes@scylladb.com>	2018-09-11 15:44:12 +02:00
Tomasz Grabiec	eb321a0830	Merge "Enrich SSTables 3.x write tests with subsequent read" from Vladimir As our support for reading SSTables 3.x rows is nearly complete, the write tests can be extended to read data after write. This patchset adds reading to a handful of write tests. * https://github.com/argenet/scylla/tree/projects/sstables-30/enrich-write-tests/v6: tests: Factor out the helper building SSTables path for write tests. tests: Add validate_read() helper to use in SSTables 3.x write tests. tests: Preserve tmpdir in SSTables 3.x write tests upon comparison. tests: Read SSTables for write_static_row test after validating write. tests: Read SSTables for write_composite_partition_key test after validating write. tests: Read SSTables for write_composite_clustering_key test after validating write. tests: Read SSTables for write_wide_partitions test after validating write. tests: Read SSTables for write_ttled_column test after validating write. tests: Read SSTables for write_collection_wide_update test after validating write. tests: Read SSTables for write_collection_incremental_update test after validating write. tests: Read SSTables for write_missing_columns_large_set test after validating write. tests: Read SSTables for write_multiple_partitions test after validating write. tests: Read SSTables for write_multiple_rows test after validating write. tests: Read SSTables for write_different_types test after validating write. tests: Read SSTables for write_empty_clustering_values test after validating write. tests: Read SSTables for write_large_clustering_keys test after validating write. tests: Read SSTables for write_user_defined_type_table test after validating write. tests: Read SSTables for write_deleted_row test after validating write. sstables: Fix SSTables 3.x parsing: check use_row_ttl() for TTLed columns. tests: Read SSTables for write_ttled_row test after validating write. Read SSTables for write_compact_table test after validating write. tests: Read SSTables for tests of many partitions after validating write.	2018-09-11 15:42:43 +02:00
Duarte Nunes	3f0643f34f	Merge 'Misc improvements to stateful range scans' from Botond " This series contains miscellaneous improvements to the stateful range scans. These improvements are either things that I forgot to include in the original series (tracing), was requested by other developers (comments) or I discovered them while reading the code (lockup and cleanup). " * 'multishard_mutation_query_fixes/v1' of https://github.com/denesb/scylla: multishard_mutation_query: add some tracing multishard_mutation_query: add comment to `read_context` multishard_mutation_query: always cleanup readers properly multishard_mutation_query: fix possible deadlock when creating a reader fails	2018-09-11 10:26:05 +01:00
Botond Dénes	7d71b42651	multishard_mutation_query: add some tracing Add tracing for the following events: 1) Dismantling of the combined buffer. 2) Dismantling of the compaction state. 3) Cleaning up the readers. (1) and (2) can possibly have adverse effects on the performance of the query and hence it is important that details about the dismantled fragments is exposed in the tracing data. (3) is less critical but still good to know how much readers were created by the read (in case they aren't saved). Since normally (in strateful queries) this will always be 0 only trace this when it is non-zero (and is interesting).	2018-09-11 08:18:16 +03:00
Botond Dénes	b41be7c8e5	multishard_mutation_query: add comment to `read_context` Explain the purpose of the class and its intended usage and any gotchas the reader/modifier of the code has to keep in mind.	2018-09-11 08:18:16 +03:00
Botond Dénes	b6e1a8f32d	multishard_mutation_query: always cleanup readers properly Currently the reader cleanup code, which ensures the readers and their dependent objects are destroyed in the corect order and a single smp::submit_to() message, are only run when the readers are attempted to be saved. However proper cleanup is needed not only then, but also when the query is not stateful. Rename the current `cleanup()` method to `stop()`, make it public and call it from a `finally()` block after the page is finalized to ensure readers are properly cleaned up at all times. Also make sure that failures in `stop()` are never propagated so that a failure in the cleanup doesn't fail the read itself.	2018-09-11 08:18:16 +03:00
Vladimir Krivopalov	c4a4ef6e3c	tests: Read SSTables for tests of many partitions after validating write. This covers five tests, including three for compressed tables: - write_many_partitions_deflate - write_many_partitions_lz4 - write_many_partitions_snappy - write_many_live_partitions - write_many_deleted_partitions Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	f1214bfceb	Read SSTables for write_compact_table test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	a39638c0ba	tests: Read SSTables for write_ttled_row test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	bcae761d72	sstables: Fix SSTables 3.x parsing: check use_row_ttl() for TTLed columns. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	9b55f06456	tests: Read SSTables for write_deleted_row test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	8869f1a591	tests: Read SSTables for write_user_defined_type_table test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	dae49358d8	tests: Read SSTables for write_large_clustering_keys test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	8c2bc4a16a	tests: Read SSTables for write_empty_clustering_values test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	6f23446962	tests: Read SSTables for write_different_types test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	4865f2f5a3	tests: Read SSTables for write_multiple_rows test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	3594b887df	tests: Read SSTables for write_multiple_partitions test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	eee775dab7	tests: Read SSTables for write_missing_columns_large_set test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	2d764da415	tests: Read SSTables for write_collection_incremental_update test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	88a3b05210	tests: Read SSTables for write_collection_wide_update test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	abdae2dd9e	tests: Read SSTables for write_ttled_column test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	cdf148dc67	tests: Read SSTables for write_wide_partitions test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	5b1a4686eb	tests: Read SSTables for write_composite_clustering_key test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	e908d07fe7	tests: Read SSTables for write_composite_partition_key test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	aa5dc16dbb	tests: Read SSTables for write_static_row test after validating write. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	42ab8ed3cd	tests: Preserve tmpdir in SSTables 3.x write tests upon comparison. It can be used to do other checks on written files, like reading them back. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	bc16304e99	tests: Add validate_read() helper to use in SSTables 3.x write tests. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Vladimir Krivopalov	6cddd7500a	tests: Factor out the helper building SSTables path for write tests. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-10 17:28:48 -07:00
Botond Dénes	b3f1fe14e8	multishard_mutation_query: fix possible deadlock when creating a reader fails Failing to create a reader (`do_make_remote_reader()`) can lead to a deadlock if the reader is in any of the future_*_state states, as the `then()` block is not executed and hence the promise of the first future in the chain is not set. Avoid this by changing the `then()` to a `then_wrapped()` and using `set_exception()` and `set_value()` accordingly, such that the future is resolved on both the happy and error path.	2018-09-10 16:41:13 +03:00
Avi Kivity	4553238653	messaging: fix unbounded allocation in TLS RPC server The non-TLS RPC server has an rpc::resource_limits configuration that limits its memory consumption, but the TLS server does not. That means a many-node TLS configuration can OOM if all nodes gang up on a single replica. Fix by passing the limits to the TLS server too. Fixes #3757. Message-Id: <20180907192607.19802-1-avi@scylladb.com>	2018-09-10 12:11:16 +01:00
Gleb Natapov	9e438933a2	mutation_query_test: add test for result size calculation Check that digest only and digest+data query calculate result size to be the same. Message-Id: <20180906153800.GK2326@scylladb.com>	2018-09-06 20:54:57 +03:00
Gleb Natapov	d7674288a9	mutation_partition: accurately account for result size in digest only queries When measuring_output_stream is used to calculate result's element size it incorrectly takes into account not only serialized element size, but a placeholder that ser::qr_partition__rows/qr_partition__static_row__cells constructors puts in the beginning. Fix it by taking starting point in a stream before element serialization and subtracting it afterwords. Fixes #3755 Message-Id: <20180906153609.GJ2326@scylladb.com>	2018-09-06 20:52:44 +03:00
Takuya ASADA	2136479012	dist/debian: delete mounts.conf on scylla-server.postrm Since we added mounts.conf on `687372bc48`, we need to delete the file on uninstall the package. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180905204631.9265-1-syuu@scylladb.com>	2018-09-06 16:50:14 +03:00
Gleb Natapov	98092353df	mutation_partition: correctly measure static row size when doing digest calculation The code uses incorrect output stream in case only digest is requested and thus getting incorrect data size. Failing to correctly account for static row size while calculating digest may cause digest mismatch between digest and data query. Fixes #3753. Message-Id: <20180905131219.GD2326@scylladb.com>	2018-09-06 13:09:41 +03:00
Takuya ASADA	ab361e9897	dist/redhat: add mounts.conf to ghost file Since we added mounts.conf on `687372bc48`, we need to delete the file on uninstall the package. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180905191037.1570-1-syuu@scylladb.com>	2018-09-05 22:14:48 +03:00
Jesse Haber-Kucharsky	682805b22c	auth: Use finite time-out for all QUORUM reads Commit `e664f9b0c6` transitioned internal CQL queries in the auth. sub-system to be executed with finite time-outs instead of infinite ones. It should have also modified the functions in `auth/roles-metadata.cc` to have finite time-outs. This change fixes some previously failing dtests, particularly around repair. Without this change, the QUORUM query fails to terminate when the necessary consistency level cannot be achieved. Fixes #3736. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <e244dc3e731b4019f3be72c52a91f23ee4bb68d1.1536163859.git.jhaberku@scylladb.com>	2018-09-05 21:55:26 +03:00
Tomasz Grabiec	82270c8699	storage_proxy: Fix misqualification of reads as foreground or background in some cases The foreground reads metric is derived from the number of live read executors minus the number of background reads. Background reads are counted down when their resolver times out. However, a read executor may still be around for a while, resulting in such reads being accounted as foreground. Usually, the gap in which this happens is short, because executor reference holders timeout quickly as well. It's not always the case though. For instance, local read executor doesn't time out quickly when the target shard has an overloaded CPU, and it takes a while before the request goes through all the queues, even if IO is not involved. Observed in #3628. Fixes #3734. Another problem is that all reads which received CL responses are accounted as background, until all replicas respond, but if such read needs reconciliation, it's still practically a foreground read and should be accounted as such. Found during code review. Fixes #3745. This patch fixes both issues by rearranging accounting to track foreground reads instead of background reads, and considering all reads as foreground until the resulting promise is resolved. Message-Id: <1535999620-25784-1-git-send-email-tgrabiec@scylladb.com>	2018-09-05 20:42:51 +03:00
Avi Kivity	c168805ca6	Merge "Filtering and fast-forwarding of range tombstones in SSTables 3.x" from Vladimir " This patchset adds proper support for sliced reads of partitions containing range tombstones. Given the SSTables 3.x repesentation of range tombstones by separate start and end markers, we refer to the index for the information about the currently opened range tombstone, if any, when skipping to the next promoted index block. Note that for this we have to take the promoted index block immediately preceding the one we are jumping to. Tests: unit {release} " * 'projects/sstables-30/range-tombstones-slicing/v3' of https://github.com/argenet/scylla: tests: Test filtering and forwarding on a partition with interleaved rows and RTs. tests: Add tests for reading wide partitions with range tombstones. sstables: Support slicing for range tombstones. sstables: Set/reset range tombstone start from end open marker. sstables: Fix end_open_marker population in promoted index blocks. sstables: Add need_skip() helper to data_consume_context. sstables: For end_open_marker, return both position in partition and deletion time.	2018-09-05 20:38:39 +03:00
Vladimir Krivopalov	3d13ee3909	tests: Test filtering and forwarding on a partition with interleaved rows and RTs. In this test, rows lie inside range tombstones so we split them on reading. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	d39e58a97a	tests: Add tests for reading wide partitions with range tombstones. Test the case where rows lie outside range tombstones. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	ec2047e1e6	sstables: Support slicing for range tombstones. Both filtering on queried ranges and fast-forwarding are supported. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	d57380f44c	sstables: Set/reset range tombstone start from end open marker. When we skip through a wide partition using promoted index, we may land to a position that lies in the middle of a range tombstone so we need to be aware of it. For this, we check if the previous promoted block has an end open marker and either set the range tombstone start using it or reset if missing. Note several things about the implementation. Firstly, we have to peek back at the previous promoted index block for the end open marker, and so we have to always preserve one more promoted index block when we read the next batch so that we can stil access it. Secondly, we use the previous promoted block end position to build position in partition for the range tombstone start. Lastly, we don't have a notion of end open marker in older consumers that work with SSTables of ka/la formats so we only call the corresponding methods if the consumer supports them. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	939e4893ef	sstables: Fix end_open_marker population in promoted index blocks. We should not access the internal object stored in std::optional when passing the end_open_marker, moreover that it can be disengaged. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	84bff86fbc	sstables: Add need_skip() helper to data_consume_context. This methods tells whether we will need to skip to reach the input position or not. It can be used for skipping with index when reading SSTables 3.x because we only want to to set/reset the open range tombstone bound when we actually move to another promoted index block. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Tomasz Grabiec	cd201d1987	db/batchlog_manager: Do not return a value from timer callback Timer callbacks are std::function<void()>. Exposed by changing callback_t to noncopyable_function<>. Message-Id: <1536138045-29209-1-git-send-email-tgrabiec@scylladb.com>	2018-09-05 12:32:21 +03:00
Asias He	89b769a073	storage_service: Wait for range setup before announcing join status When a joining node announcing join status through gossip, other existing nodes will send writes to the joining node. At this time, it is possible the joining node hasn't learnt the tokens of other nodes that causes the error like below: token_metadata - sorted_tokens is empty in first_token_index! storage_proxy - Failed to apply mutation from 127.0.4.1#0: std::runtime_error (sorted_tokens is empty in first_token_index!) To fix, wait for the token range setup before announcing the join status. Fixes: #3382 Tests: 60 run of materialized_views_test.py:TestMaterializedViews.add_dc_during_mv_update_test Message-Id: <01abb21ae3315ae275297e507c5956e5774557ef.1536128531.git.asias@scylladb.com>	2018-09-05 10:51:43 +03:00
Vlad Zolotarov	dae70e1166	tests: loading_cache_test: configure a validity timeout in test_loading_cache_loading_different_keys to a greater value Change the validity timeout from 1s to 1h in order to avoid false alarms on busy systems: for a short value there is a chance that (loading_cache.size() == num_loaders) check is going to run after some elements of the cache have already been evicted. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20180904193026.7304-1-vladz@scylladb.com>	2018-09-05 10:19:59 +03:00
Vladimir Krivopalov	ac0c71bdc1	sstables: For end_open_marker, return both position in partition and deletion time. Prior to this fix, the end_open_marker has been only accessible as a plain deletion_time structure. Now it also contains the start position of a promoted index block so that it can be used for setting range tombstone open bound. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-04 18:16:21 -07:00
Piotr Sarna	f494d03c3f	tests: add test case for filtering with DESC clustering order Refs #3741 Message-Id: <1b8eab8d668eb000b306686c15324e6acde8e616.1535981852.git.sarna@scylladb.com>	2018-09-04 16:05:19 +03:00
Piotr Sarna	8e52b66516	cql3: fix filtering with descending clustering order When slice::is_satisfied_by() restriction check is performed on raw data represented as bytes, it should always use a regular type comparator, not a reversed one. Reversed types are used to preserve descending clustering order, but comparison with constants should be used with a regular underlying type comparator (for x < 1 to actually mean 'lesser than 1' instead of 'bigger than 1, because the clustering order is reversed'). Fixes #3741 Message-Id: <3e25fc66688c9253287f2c4f31ede8339b9bbe23.1535981852.git.sarna@scylladb.com>	2018-09-04 16:05:15 +03:00
Piotr Sarna	5b5c9f2707	cql3: fix a 'pratition_key' typo partition_key got misspelled with 'pratition_key' typo in the original series. Message-Id: <de59fe6161df5442b19d8ba4336e2f828b7ede32.1535981852.git.sarna@scylladb.com>	2018-09-04 16:05:09 +03:00
Takuya ASADA	bd8a5664b8	dist/common/scripts/scylla_raid_setup: create scylla-server.service.d when it doesn't exist When /etc/systemd/system/scylla-server.service.d/capabilities.conf is not installed, we don't have /etc/systemd/system/scylla-server.service.d/, need to create it. Fixes #3738 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180904015841.18433-1-syuu@scylladb.com>	2018-09-04 10:12:32 +03:00
Tomasz Grabiec	4fb3f7e8eb	managed_vector: Make external_memory_usage() ignore reserved space This ensures that row::external_memory_usage() is invariant to insertion order of cells. It should be so, so that accounting of a clustering_row, merged from multiple MVCC versions by the partition_snapshot_flat_reader on behalf of a memtable flush, doesn't give a greater result than what is used by the memtable region. Overaccounting leads to assertion failure in ~flush_memory_accounter. Fixes #3625 (hopefully). Message-Id: <1535982513-19922-1-git-send-email-tgrabiec@scylladb.com>	2018-09-03 17:09:54 +03:00
Takuya ASADA	d78762d627	dist/debian: fix broken debian/changelog It also need $MUSTACHE_DIST. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180903094558.3862-1-syuu@scylladb.com>	2018-09-03 14:04:01 +03:00
Duarte Nunes	e49a14e308	Merge 'Stateful range scans' from Botond " This series extends the query statefullness, introduced by `f8613a841` to point queries, to range scans as well. This means that queriers will be saved and reused for range scans too. This series builds heavily on the infrastructure introduced by stateful point queries, namely the querier object and the querier_cache. It also builds on another critical piece of infrastructure, the multishard_combining_reader, introduced by `2d126a79b`. To make the range scan on a given node suspendable and resumable we move away from the current code in `storage_proxy::query_nonsingular_mutations_locally()` and use a multishard_combining_reader to execute the read. When the page is filled this reader is dismantled and its shard readers are saved in the querier cache. There are of course a lot more details to it but this is the gist of it. Tests: unit(release, debug), dtest(paging_test.py, paging_additional_test.py) " * '1865/range-scans/v7.1' of https://github.com/denesb/scylla: (33 commits) query_pagers: generate query_uuid for range-scans as well storage_proxy: use preferred/last replicas storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent db::consistency_level::filter_for_query() add preferred_endpoints storage_proxy: use query_mutations_from_all_shards() for range scans tests: add unit test for multishard_mutation_query() tests/mutation_assertions.hh: add missing include multishard_mutation_query: add badness counters database: add query_mutations_on_all_shards() mutation_compactor: add detach_state() flat_mutation_reader: add unpop_mutation_fragment() Move reconcilable_result_builder declaration to mutation_query.hh mutation_source_test: add an additional REQUIRE() mutation: add missing assert to mutation from reader querier: add shard_mutation_querier querier: prepare for multi-ranges tests/querier_cache: add tests specific for multiple entry-types querier: split querier into separate data and mutation querier types querier: move consume_page logic into a free function querier: move all matching related logic into free functions ...	2018-09-03 09:09:17 +01:00
Botond Dénes	cd49c23a66	query_pagers: generate query_uuid for range-scans as well And thus enable stateful range scans.	2018-09-03 10:31:44 +03:00
Botond Dénes	6486d6c8bd	storage_proxy: use preferred/last replicas	2018-09-03 10:31:44 +03:00
Botond Dénes	577a06ce1b	storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent	2018-09-03 10:31:44 +03:00
Botond Dénes	6e59cee244	db::consistency_level::filter_for_query() add preferred_endpoints To the second overload (the one without read-repair related params) too.	2018-09-03 10:31:44 +03:00
Botond Dénes	2f66bde26f	storage_proxy: use query_mutations_from_all_shards() for range scans	2018-09-03 10:31:44 +03:00
Botond Dénes	6779b63dfe	tests: add unit test for multishard_mutation_query()	2018-09-03 10:31:44 +03:00
Botond Dénes	c678b665b4	tests/mutation_assertions.hh: add missing include	2018-09-03 10:31:44 +03:00
Botond Dénes	253407bdc8	multishard_mutation_query: add badness counters Add badness counters that allow tracking problems. The following counters are added: 1) multishard_query_unpopped_fragments 2) multishard_query_unpopped_bytes 3) multishard_query_failed_reader_stops 4) multishard_query_failed_reader_saves The first pair of counters observe the amount of work range scan queries have to undo on each page. It is normal for these counters to be non-zero, however sudden spikes in their values can indicate problems. This undoing of work is needed for stateful range-scans to work. When stateful queries are enabled the `multishard_combining_reader` is dismantled and all unconsumed fragments in its and any of its intermediate reader's buffers are pushed back into the originating shard reader's buffer (via `unpop_mutation_fragment()`). This also includes the `partition_start`, the `static_row` (if there is one) and all extracted and active `range_tombstone` fragments. This together can amount to a substantial amount of fragments. (1) counts the amount of fragments moved back, while (2) counts the number of bytes. Monitoring size and quantity separately allows for detecting edge cases like moving many small fragments or just a few huge ones. The counters count the fragments/bytes moved back to readers located on the shard they belong to. The second pair of counters are added to detect any problems around saving readers. Since the failure to save a reader will not fail the read itself, it is necessary to add visibility to these failures by other means. (3) counts the number of times stopping a shard reader (waiting on pending read-aheads and next-partitions) failed while (4) counts the number of times inserting the reader into the `querier_cache` failed. Contrary to the first two counters, which will almost certainly never be zero, these latter two counters should always be zero. Any other value indicates problems in the respective shards/nodes.	2018-09-03 10:31:44 +03:00
Botond Dénes	97364c7ad9	database: add query_mutations_on_all_shards() This method allows for querying a range or ranges on all shards of the node. Under the hood it uses the multishard_combining_reader for executing the query. It supports paging and stateful queries (saving and reusing the readers between pages). All this is transparent to the client, who only needs to supply the same query::read_command::query_uuid through the pages of the query (and supply correct start positions on each page, that match the stop position of the last page).	2018-09-03 10:31:44 +03:00
Botond Dénes	33d72efa49	mutation_compactor: add detach_state() Allow the state of the compaction to be detached. The detached state is a set of mutation fragments, which if replayed through a new compactor object will result in the latter being in the same state as the previous one was. This allows for storing the compaction state in the compacted reader by using `unpop_mutation_fragment()` to push back the fragments that comprise the detached state into the reader. This way, if a new compaction object is created it can just consume the reader and continue where the previous compaction left off.	2018-09-03 10:31:44 +03:00
Botond Dénes	48054ed810	flat_mutation_reader: add unpop_mutation_fragment() This is the inverse of `pop_mutation_fragment()`. Allow fragments to be pushed back into the buffer of the reader to undo a previous consumtion of the fragments.	2018-09-03 10:31:44 +03:00
Botond Dénes	3bcd577907	Move reconcilable_result_builder declaration to mutation_query.hh It will be used by code outside of mutation_partition.cc so it needs to be public. The definition remains in mutation_partition.cc.	2018-09-03 10:31:44 +03:00
Botond Dénes	b8b34223a4	mutation_source_test: add an additional REQUIRE() test_streamed_mutation_forwarding_is_consistent_with_slicing already has a REQUIRE() for the mutation read with the slicing reader. Add another one for the forwarding reader. This makes it more consistent and also helps finding problems with either the forwarding or slicing reader.	2018-09-03 10:31:44 +03:00
Botond Dénes	d347866664	mutation: add missing assert to mutation from reader read_mutation_from_flat_mutation_reader's internal adapter can build a single mutation only and hence can consume only a single partition. If more than one partitions are pushed down from the producer the adaptor will very likely crash. To avoid unnecessary investigations add an assert() to fail early and make it clear what the real problem is. All other consume_ methods have an assert() already for their invariants so this is just following suit.	2018-09-03 10:31:44 +03:00
Botond Dénes	ecb1e79bcc	querier: add shard_mutation_querier The querier to be used for saving shard readers belonging to a multishard range scan. This querier doesn't provide a `consume_page` method as it doesn't support reading from it directly. It is more of a storage to allow caching the reader and any objects it depends on.	2018-09-03 10:31:44 +03:00
Botond Dénes	07cdf766c5	querier: prepare for multi-ranges In the next patch a querier will be added that reads multiple ranges as opposed to a single range that data and mutation queriers read. To keep `querier_cache` code seamless regarding this difference change all range-matching logic to work in terms of `dht::partition_ranges_view`. This allows for cheap and seamless way of having a single code-base for the insert/lookup logic. Code actually matching ranges is updated to be able to handle both singular and multi-ranges while maintaining backward compatibility.	2018-09-03 10:31:44 +03:00
Botond Dénes	88a7effd8d	tests/querier_cache: add tests specific for multiple entry-types	2018-09-03 10:31:44 +03:00
Botond Dénes	c12008b8cb	querier: split querier into separate data and mutation querier types Instead of hiding what compaction method the querier uses (and only expose it via rejecting 'can_be_used_for_page()`) make it very explicit that these are really two different queriers. This allows using different indexes for the two queriers in `querier_cache` and eliminating the possibility of picking up a querier with the wrong compaction method (read kind). This also makes it possible to add new querier type(s) that suit the multishard-query's needs without making a confusing mess of `querier` by making it a union of all querying logic. Splitting the queriers this way changes what happens when a lookup finds a querier of the wrong kind (e.g. emit_only_live::yes for an emit_only_live::no command). As opposed to dropping the found (but wrong) querier the querier will now simply not be found by the lookup. This is a result of using separate search indexes for the different mutation kinds. This change should have no practical implications. Splitting is done by making querier templated on `emit_only_live_rows`. It doesn't make sense to duplicate the entire querier as the two share 99% of the code.	2018-09-03 10:31:44 +03:00
Botond Dénes	e46251ebf6	querier: move consume_page logic into a free function In preparation of the now single querier being split into multiple more specialized ones. Make it possible for the multiple queriers sharing the same implementation. Also, the code can now be reused by outside code as well, not just queriers.	2018-09-03 10:31:44 +03:00
Botond Dénes	c53f17ddb8	querier: move all matching related logic into free functions So that they can be used for multiple querier classes easily, without inheritance. The functions are not visible from the header. Also update the comments on `querier` to w.r.t. the disappeared checking functions. Change the language to be more general. In practice these checks are never done by client code, instead they are done by the `querier_cache`.	2018-09-03 10:31:44 +03:00
Botond Dénes	43f464c52d	querier: inline querier::current_position() and make it public	2018-09-03 10:31:44 +03:00
Botond Dénes	86a61ded7d	querier: s/position/position_view/ Also treat it as a view, that is take it by value in functions, instead of reference.	2018-09-03 10:31:44 +03:00
Botond Dénes	6e4ec53679	querier: move position outside of querier In preparation for having multiple querier types that can share code without inheritance.	2018-09-03 10:31:44 +03:00
Botond Dénes	a172dfec4e	querier: move clustering_position_tracker outside of querier In preparation for having multiple querier types that can share code without inheritance.	2018-09-03 10:31:44 +03:00
Botond Dénes	7bd955e993	querier_cache: move insert/lookup related logic into free functions In preparations for introducing support multiple entry types in the querier_cache move all insert/lookup related logic into free functions. Later these functions will be templated so they can handle multiple entry types with the same code.	2018-09-03 10:31:44 +03:00
Botond Dénes	cded477b94	querier: return std::optional<querier> instead of using create_fun() Requiring the caller of lookup() to pass in a `create_fun()` was not such a good idea in hindsight. It leads to awkward call sites and even more awkward code when trying to find out whether the lookup was successfull or not. Returning an optional gives calling code much more flexibility and makes the code cleaner.	2018-09-03 10:31:44 +03:00
Botond Dénes	5f726e9a89	querier: move all to query namespace To avoid name clashes.	2018-09-03 10:31:44 +03:00
Botond Dénes	867f69b9d1	dht::i_partitioner: add partition_ranges_view	2018-09-03 10:31:44 +03:00
Botond Dénes	a011a9ebf2	mutation_reader: multishard_combining_reader: support custom dismantler Add a dismantler functor parameter. When the multishard reader is destroyed this functor will be called for each shard reader, passing a future to a `stopped_foreign_reader`. This future becomes available when the shard reader is stopped, that is, when it finished all in-progress read-aheads and/or pending next partition calls. The intended use case for the dismantler functor is a client that needs to be notified when readers are destroyed and/or has to have access to any unconsumed fragments from the foreign readers wrapping the shard readers.	2018-09-03 10:31:44 +03:00
Botond Dénes	f13b878a94	mutation_reader: pass all standard reader params to `remote_reader_factory` Extend `remote_reader_factory` interface so that it accepts all standard mutation reader creation parameters. This allows factory lambdas to be truly stateless, not having to capture any standard parameters that is needed for creating the reader. Standard parameters are those accepted by `mutation_source::make_reader()`.	2018-09-03 10:31:44 +03:00
Botond Dénes	e67c6d9f39	flat_mutation_reader::impl: add protected buffer() member To allow implementations to access the buffer in a read-only way.	2018-09-03 10:31:44 +03:00
Botond Dénes	8915293257	multishard_combining_reader: fix incorrect comment	2018-09-03 10:31:44 +03:00
Botond Dénes	75d60b0627	docs: add paged-queries.md design doc	2018-09-03 10:31:44 +03:00
Duarte Nunes	6593226849	Merge branch 'loading_cache: fix a consistency of size() and iterators APIs' from Vlad " After we fixed reloading flow it enabled situations when items are no longer cached but still held in the underlying loading_shared_values object. Since loading_cache::size() returns the size of its loading_shared_values object and loading_cache::begin()/end()/find() are returning iterators based on loading_shared_values iterators these APIs may return very weird values, e.g. size() may return the same value after one of the items have been removed using remove(key) API. This series fixes this by switching mentioned above APIs to work on top of lru_list object instead of loading_shared_values. " * 'loading_cache_fix_api_semantics-v1' of https://github.com/vladzcloudius/scylla: loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' loading_cache: make size() return the size of lru_list instead of loading_shared_values	2018-09-01 11:05:28 +01:00
Avi Kivity	fd8eae50db	build: add relocatable package target A relocatable package contains the Scylla (and iotune) executables (in a bin/ directory), any libraries they may need (lib/) the configuration file defaults (conf/) and supporting scripts (dist/). The libraries are picked up from the host; including libc and the dynamic linker (ld.so). We also provide a thunk script that forces the library path (LD_LIBRARY_PATH) to point at our libraries, and overrides the interpreter to point at our ld.so. With these files, it is possible to run a fully functional Scylla instance on any Linux distribution. This is similar to chroot or containers, except that we run in the same namespace as the host. The packages are created by running ninja build/release/scylla-package.tar or ninja --mode debug build/debug/scylla-package.tar Message-Id: <20180828065352.30730-1-avi@scylladb.com>	2018-08-31 23:14:42 +01:00
Vlad Zolotarov	945d26e4ee	loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' Reloading may hold value in the underlying loading_shared_values while the corresponding cache values have already been deleted. This may create weird situations like this: <populate cache with 10 entries> cache.remove(key1); for (auto& e : cache) { std::out << e << std::endl; } <all 10 entries are printed, including the one for "key1"> In order to avoid such situations we are going to make the loading_cache::iterator to be a transform_iterator of lru_list::iterator instead of loading_shared_values::iterator because lru_list contains entries only for cached items. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 20:56:44 -04:00
Vlad Zolotarov	1e56c7dd58	loading_cache: make size() return the size of lru_list instead of loading_shared_values reloading flow may hold the items in the underlying loading_shared_values after they have been removed (e.g. via remove(key) API) thereby loading_shared_values.size() doesn't represent the correct value for the loading_cache. lru_list.size() on the other hand - does. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 15:55:30 -04:00
Paweł Dziepak	dbbd664600	Update seastar submodule * seastar 12f18ce...5712816 (6): > tests: add signal_test to test list > Merge "Enhancements for memory_output_stream" from Paweł > seastar-addr2line: don't print an empty line between backtrace lines > seastar-addr2line: add --verbose option > seastar-addr2line: make prefix matching non-greedy > future: make available() const	2018-08-30 11:41:27 +01:00
Glauber Costa	8dea1b3c61	database: fix directory for information when loading new SSTables from upload dir When we load new SSTables, we use the directory information from the entry descriptor to build information about those SSTables. When the descriptor is created by flush_upload_dir, the sstable directory used in the descriptor contains the `upload` part. Therefore, we will try to load SSTables that are in the upload directory when we already moved them out and fail. Since the generation also changes, we have been historically fixing the generation manually, but not the SSTable directory. The reason for that is that up until recently, the SSTable directory was passed statically to open_sstables, ignoring whatever the entry descriptor said. Now that the sstable directory is also derived from the entry descriptor, we should fix that too. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180829165326.12183-1-glauber@scylladb.com>	2018-08-30 10:34:25 +03:00
Nadav Har'El	2f02d006b3	materialized views: more tests Additional tests for cases surrounding issue #3362, where base rows disappear (or not) and view rows need to disappear (or not) as well. These new tests focus on checking that view_updates::do_delete_old_entry() is correct. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180829131914.16042-2-nyh@scylladb.com>	2018-08-29 14:33:48 +01:00
Nadav Har'El	16a6f76873	materialized views: simplify do_delete_old_entry() In previous patches, we gave up on an old (and broken) attempt to track the timestamps of many unselected base-table columns through one row marker in the view table - and replaced them by "virtual cells", one per unselected cell. The do_delete_old_entry() function still contains old code which maintained that row marker, and is no longer needed. That old code is no only no longer needed, it also no longer did anything because all columns now appear in the view (as virtual columns) so the code ignored them when calculating the row marker. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180829131914.16042-1-nyh@scylladb.com>	2018-08-29 14:33:41 +01:00
Duarte Nunes	79d796e710	Merge 'Materialized Views: row liveness correction' from Nadav " When a view's partition key contains only columns from the base's partition key (and not an additional one), the liveness - existance or disappearance - of a view-table row is tied to the liveness of the base table row. And that, in turn, depends not only on selected columns (base-table columns SELECTed to also appear in the view) but also on unselected columns. This means that we may need to keep a view row alive even without data, just because some unselected column is alive in the base table. Before this patch set we tried to build a single "row marker" in the view column which tried to summarize the liveness information in all unselected columns. But this proved unworkable, as explained in issue #3362 and as will be demonstrated in unit tests at the end of this series. Because we can't replace several unselected cells by one row marker, what we do in this series is to add for each for the unselected cells a "virtual cell" which contains the cell's liveness information (timestamp, deletion, ttl) but not its value. For collections, we can't represent the entire collection by one virtual cell, and rather need a collection of virtual cells. Fixes #3362 " * 'virtual-cols-v3' of https://github.com/nyh/scylla: Materialized Views: test that virtual columns are not visible Materialized Views: unit test reproducing fixed issue #3362 Materialized Views: no need for elaborate row marker calculations Materialized Views: add unselected columns as virtual columns Materialized Views: fill virtual columns Do not allow selecting a virtual column schema: persist "view virtual" columns to a separate system table schema: add "view virtual" flag to schema's column_definition Add "empty" type name to CQL parser, but only for internal parsing	2018-08-29 14:32:38 +01:00
Paweł Dziepak	6f1c3e6945	Merge "Convert more execution_stages to inherit scheduling_groups" from Avi " Previous work (`71471bb322`) converted the CQL layer to inheriting execution stages, paving the way to multiple users sharing the front-end. This patchset does the same thing to the back-end, converting more execution stages to preserve the caller's scheduling_group. Since RPC now (`8c993e0728`) assigns the correct scheduling group within the replica, we can extend that work so a statement is executed with the same scheduling group all the way to sstable parsing, even if we cross nodes in the process. This improves performance isolation and paves the way to multi-user SLA guarantees. " * tag 'inherit-sched_group/v1' of https://github.com/avikivity/scylla: database: make database's mutation apply stage inherit its scheduling group from the caller database: make database::_mutation_query_stage inherit the scheduling group database: make database::_data_query_stage inheriting its caller's scheduling_group storage_proxy: make _mutate_stage inherit its caller's scheduling_group	2018-08-28 13:49:31 +01:00
Duarte Nunes	f6aadd8077	Merge 'utils::loading_cache: improve reload() robustness' from Vlad "This series introduces a few improvements related to a reload flow. From now on the callback may assume that the "key" parameter value is kept alive till the end of its execution in the reloading flow. It may also safely evict as many items from the cache as needed." Fixes #3606 * 'loading_cache_improve_reload-v1' of https://github.com/vladzcloudius/scylla: utils::loading_cache: hold a shared_value_ptr to the value when we reload utils::loading_cache::on_timer(): remove not needed capture of "this" utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload	2018-08-28 10:52:20 +01:00
Piotr Sarna	aa2bfc0a71	tests: add multi-column pk test to INSERT JSON case Refs #3687 Message-Id: <6ba1328549ed701691ca7cbdacc7d6fa72f2c3de.1534171422.git.sarna@scylladb.com>	2018-08-28 11:34:13 +03:00
Piotr Sarna	fa72422baa	cql3: fix handling multi-column partition key in INSERT JSON Multiple column partition keys were previously handled incorrectly, now the implementation is based on from_exploded instead of from_singular. Fixes #3687 Message-Id: <09e0bdb0f1c18d49b9e67c21777d93ba1545a13c.1534171422.git.sarna@scylladb.com>	2018-08-28 11:34:11 +03:00
Avi Kivity	1fd9974b6b	Merge "tests/loading_cache_test: Fix flakiness" from Duarte " Fix loading_cache_test flakiness by retrying assertions. Tests: unit(loading_cache_test(debug, release)) Fixes #3723 " * 'loading-cache-test-flake/v4' of https://github.com/duarten/scylla: tests/loading_cache_test: Unflake test_loading_cache_loading_reloading tests/loading_cache_test: Use eventually() instead of open-coding it tests/mutation_reader_test: Extract eventually_true() to eventually.hh tests/cql_test_env: Lift eventually() to its own header file	2018-08-28 09:35:09 +03:00
Takuya ASADA	4a5157857a	dist/debian: support package renaming on build script To automatically rename packages on enterprise release, added package name prefix as a variable on build_deb.sh. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180828010445.11920-1-syuu@scylladb.com>	2018-08-28 09:25:07 +03:00
Avi Kivity	22396d57c2	Update seastar submodule * seastar 9bb1611...12f18ce (17): > correctly configure I/O Scheduler for usage with the YAML file > Added support for user-defined signal handlers > Added reactor method to modify blocked_reactor_notify_ms > configure.py: Use the user-specified compiler for dialect detection > seastar-addr2line: clear current trace when omitting already seen trace > seastar-addr2line: fix redirecting output to a file > seastar-addr2line: don't require a space before the addresses > tests: Ensure test thread is always joined > README.md: Add cute badges > iotune: adjust num-io-queues recommendation > dns: add SRV record lookup > reactor: define max_aio_per_queue for C++14 > reactor,alien: silence GCC warnings > core,json,net: silence GCC warnings > fstream: "using data_sink_impl::put" to silence gcc warning > Merge 'Ensure Seastar compiles in C++14 mode' from Jesse > Revert "foreign_ptr: allow waiting for the destruction of the managed ptr"	2018-08-28 09:10:14 +03:00
Tomasz Grabiec	75cde85349	Merge "Support reading range tombstones" from Piotr and Vladimir Implement and test support for reading range tombstones in SSTables 3. Does not yet support reads which are using slicing or fast forwarding. From github.com/scylladb/seastar-dev.git haaawk/sstables3/tombstones_v11: Piotr Jastrzebski (5): sstables: Add consumer_m::consume_range_tombstone sstables: Support null columns in ck sstables: Support reading range_tombstones sstables: Test reading range_tombstones sstables: Add test for RT with non-full key Vladimir Krivopalov (2): sstables: Add operator<< overload for bound_kind_m. keys: Add clustering_key_prefix::make_full helper.	2018-08-27 20:43:38 +02:00
Duarte Nunes	40044c0460	tests/loading_cache_test: Unflake test_loading_cache_loading_reloading The `loading_cache_test::test_loading_cache_loading_reloading` test case is flaky, and fails in both debug and release mode. In an over-provisioned environment, it's possible that when the reactor runs, the timers for the `sleep()` and for reloading the `loading_cache` are both expired, and continuations are scheduled with an arbitrary order, causing the test to fail. Fixes #3723 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-27 19:24:05 +01:00
Duarte Nunes	0cb03b966d	tests/loading_cache_test: Use eventually() instead of open-coding it Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-27 19:24:05 +01:00
Duarte Nunes	b89fa0d67b	tests/mutation_reader_test: Extract eventually_true() to eventually.hh Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-27 19:24:05 +01:00
Duarte Nunes	636c5ded6c	tests/cql_test_env: Lift eventually() to its own header file Retrying is needed everywhere. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-27 19:24:00 +01:00
Avi Kivity	5792a59c96	migration_manager: downgrade frightening "Can't send migration request" ERROR This error is transient, since as soon as the node is up we will be able to send the migration request. Downgrade it to a warning to reduce anxiety among people who actually read the logs (like QA). The message is also badly worded as no one can guess what a migration request is, but that is left to another patch. Fixes #3706. Message-Id: <20180821070200.18691-1-avi@scylladb.com>	2018-08-27 14:49:36 +02:00
Takuya ASADA	10b67c7934	dist/ami: package scylla-ami as rpm Now scylla-ami is not submodule of scylla repo, it will works as independent repository just like scylla-jmx and scylla-tools, provides .rpm package to install AMI scripts on AMI. Most files are gone from dist/ami/files, but scylla_install_ami copied from scylla-ami, since it requires to install scylla .rpms, cannot pacakge in scylla-ami rpm. On scylla_install_ami, we dropped ixgbevf/ena drivers code, we will provide 'scylla-ixgbevf' and 'scylla-ena' DKMS .rpm instead. It will automatically build kernel modules for current kernel. A repo of the driver packages is on https://copr.fedorainfracloud.org/coprs/scylladb/scylla-ami-drivers/ Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180821201101.4631-1-syuu@scylladb.com>	2018-08-27 11:48:52 +03:00
Avi Kivity	62750eb517	Merge "Prepare for removing Iterator from simple_memory_input_stream" from Paweł " Right now, simple_memory_input_stream takes Iterator as a template parameter. That iterator is supposed to point to fragments in a underlying fragmented buffer. This makes no sense, since simple streams deal only with contiguous buffer. This series removes any assumption that simple_memory_input_stream has iterator_type member from Scylla so that it can be removed. " * tag 'prepare-simple-stream-no-iterator/v1' of https://github.com/pdziepak/scylla: idl: deserialized_bytes_proxy do not assume presence of iterator_type idl-compiler: specify return type of with_serialized_stream() lambdas	2018-08-26 16:29:06 +03:00
Avi Kivity	16478355be	Merge "Refactor password handling" from Jesse " This series is a refactor of password management, motivated by a combination of correctness bugs, improving testability, improving clarity, and adding documentation. Tests: unit (release) " * 'jhk/passwords_refactor/v2' of https://github.com/hakuch/scylla: auth: Clean up implementation comments auth: Remove unnecessary local variable auth: Allow different random engines for salt auth: Correct modulo bias in salt generation auth: Extract random byte generation for salt auth: Split out test for best supported scheme auth: Rename function to use full words auth: Add domain-specific exception for passwords auth: Document passwords interface auth: Move passsword stuff to its own namespace auth: Identify password hashing errors correctly auth: Add unit tests for password handling auth: Move password handling to its own files auth: Construct `std::random_device` instances once	2018-08-26 11:18:31 +03:00
Tomasz Grabiec	2afce13967	database: Avoid OOM when soft pressure but nothing to flush There could be soft pressure, but soft-pressure flusher may not be able to make progress (Refs #3716). It will keep trying to flush empty memtables, which block on earlier flushes to complete, and thus allocate continuations in memory. Those continuations accumulate in memory and can cause OOM. flush will take longer to complete. Due to scheduling group isolation, the soft-pressure flusher will keep getting the CPU. This causes bad_alloc and crashes of dtest: limits_test.py:TestLimits.max_cells_test Fixes #3717 Message-Id: <1535102520-23039-1-git-send-email-tgrabiec@scylladb.com>	2018-08-26 11:03:58 +03:00
Tomasz Grabiec	1e50f85288	database: Make soft-pressure memtable flusher not consider already flushed memtables The flusher picks the memtable list which contains the largest region according to region_impl::evictable_occupancy().total_space(), which follows region::occupancy().total_space(). But only the latest memtable in the list can start flushing. It can happen that the memtable corresponding to the largest region was already flushed to an sstable (flush permit released), but not yet fsynced or moved to cache, so it's still in the memtable list. The latest memtable in the winning list may be small, or empty, in which case the soft pressure flusher will not be able to make much progress. There could be other memtable lists with non-empty (flushable) latest memtables. This can lead to writes unnecessarily blocking on dirty. I observed this for the system memtable group, where it's easy for the memtables to overshoot small soft pressure limits. The flusher kept trying to flush empty memtables, while the previous non-empty memtable was still in the group. The CPU scheduler makes this worse, because it runs memtable_to_cache in a separate scheduling group, so it further defers in time the removal of the flushed memtable from the memtable list. This patch fixes the problem by making regions corresponding to memtables which started flushing report evictable_occupancy() as 0, so that they're picked by the flusher last. Fixes #3716. Message-Id: <1535040132-11153-2-git-send-email-tgrabiec@scylladb.com>	2018-08-26 11:02:34 +03:00
Tomasz Grabiec	364418b5c5	logalloc: Make evictable_occupancy() indicate no free space Doesn't fix any bug, but it's closer to the truth that all segments are used rather than none is used. Message-Id: <1535040132-11153-1-git-send-email-tgrabiec@scylladb.com>	2018-08-26 11:02:32 +03:00
Avi Kivity	54ac334f4b	Update scylla-ami submodule * dist/ami/files/scylla-ami c7e5a70...b7db861 (2): > scylla-ami-setup.service: run only on first startup > Use fstab to mount RAID volume on every reboot	2018-08-26 10:57:32 +03:00
Takuya ASADA	ff55e3c247	dist/common/scripts/scylla_raid_setup: refuse start scylla-server.service when RAID volume is not mounted Since the Linux system abort booting when it fails to mount fstab entries, user may not able to see an error message when we use fstab to mount /var/lib/scylla on AMI. Instead of abort booting, we can just abort to start scylla-server.service when RAID volume is not mounted, using RequiresMountsFor directive of systemd unit file. See #3640 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180824185511.17557-1-syuu@scylladb.com>	2018-08-26 10:55:34 +03:00
Avi Kivity	37f9a3c566	database: make database's mutation apply stage inherit its scheduling group from the caller Like the two preceeding patches, convert the mutation apply stage to an inheriting_concrete_scheduling_group. This change has two added benefits: we get rid of a thread_local, and we drop a with_scheduling_group() inside an execution stage which just creates a bunch of continuations and somewhat undoes the benefit of the execution stage.	2018-08-24 19:04:49 +03:00
Avi Kivity	ebff1cfc37	database: make database::_mutation_query_stage inherit the scheduling group Like the preceeding patch and for the same reasons, adjust database::_mutation_query_stage to inherit the scheduling group from its caller.	2018-08-24 19:04:49 +03:00
Avi Kivity	596fb6f2f7	database: make database::_data_query_stage inheriting its caller's scheduling_group Now (`8c993e0728`) that replica-side operations run under the correct scheduling group, we can inherit the scheduling_group for _data_query_stage from the caller. By itself this doesn't do much, but it will later allow us to have multiple groups for statement executions.	2018-08-24 19:04:49 +03:00
Avi Kivity	908e497f3d	storage_proxy: make _mutate_stage inherit its caller's scheduling_group Right now, storage_proxy's mutate_stage violates isolation by running in a plain execution_stage without a scheduling_group. This means do_mutate() will run under the main scheduling_group, at least until we reach the database apply execution stage, which is correct. Fix by moving to an inheriting execution stage; this works because the messaging service will tell RPC to set the correct execution stage for us. We could explicitly specify statement_scheduling_group, but inheriting the scheduling group allows us to have multiple statment scheduling groups, later.	2018-08-24 19:04:49 +03:00
Paweł Dziepak	4ca991ea65	idl: deserialized_bytes_proxy do not assume presence of iterator_type deserialized_bytes_proxy assumes that the provided input stream has iterator_type that represents the iterator pointing to the next fragment of the fragmented underlying buffyer. This makes little sense if the input stream is a contiguous one (i.e. simple_memory_input_stream) so let's not make such assumptions.	2018-08-24 16:19:40 +01:00
Paweł Dziepak	3b7579aa0e	idl-compiler: specify return type of with_serialized_stream() lambdas IDL-generated code uses with_serialized_stream() to optimise for cases when the underlying buffer is not fragmented. The provided lambda will be called with wither simple or fragmented stream as an argument. The consequence of this is that both instantations of generic lambda need to return the same type. This is a problem if the type is deduced and depends on the provided input stream (e.g. different type for fragmented and simple streams). The solution is to explictly specify the return type as the type returned by deserialising general utils::input_stream. This way each instantation of lambda can return whatever it wants as long as it is convertible to the type that the serialiser would return if utils::input_stream was given.	2018-08-24 16:07:20 +01:00
Tomasz Grabiec	10f6b125c8	database: Run system table flushes in the main scheduling group memtable flushes for system and regular region groups run under the memtable_scheduling_group, but the controller adjusts shares based on the occupancy of the regular region group. It can happen that regular is not under pressure, but system is. In this case the controller will incorrectly assign low shares to the memtable flush of system. This may result in high latency and low throughput for writes in the system group. I observed writes to the sytem keyspace timing out (on scylla-2.3-rc2) in the dtest: limits_test.py:TestLimits.max_cells_test, which went away after this. Fixes #3717. Message-Id: <1535016026-28006-1-git-send-email-tgrabiec@scylladb.com>	2018-08-23 15:07:05 +03:00
Piotr Sarna	94262cf5d0	tests: add null collection test scenario to INSERT JSON Refs #3664 Message-Id: <a34b9f5e8b9d7e3dd8906b559957220d74734b41.1534848313.git.sarna@scylladb.com>	2018-08-23 11:22:07 +03:00
Piotr Sarna	465045368f	cql3: add proper setting of empty collections in INSERT JSON Previously empty collections where incorrectly added as dead cells, which resulted in serialization errors later. Fixes #3664 Message-Id: <a9c90d66c6737641cafe40edb779df490ada0309.1534848313.git.sarna@scylladb.com>	2018-08-23 11:22:05 +03:00
Duarte Nunes	36a293bb23	cell_locking: Use xxhash instead of fnv1a Being the single user of fnv1a, this allows us to get rid of it. As the TODO inside fnv1a_hasher.hh indicates, and judging by any independent benchmark, fnv1a is very slow. As we have added xx_hash since then, and we know it to be fast, use it instead. Tests: unit(release/cell_locker_test) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180823081715.26089-1-duarte@scylladb.com>	2018-08-23 11:21:00 +03:00
Piotr Jastrzebski	2997fda1b1	sstables: Add test for RT with non-full key Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-08-22 18:28:11 +02:00
Piotr Jastrzebski	c50929233f	sstables: Test reading range_tombstones Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-08-22 18:28:11 +02:00
Piotr Jastrzebski	7434be348c	sstables: Support reading range_tombstones Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-08-22 18:27:41 +02:00
Piotr Jastrzebski	d19a108d87	sstables: Support null columns in ck Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-08-22 14:32:10 +02:00
Piotr Jastrzebski	3636697663	sstables: Add consumer_m::consume_range_tombstone Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-08-22 12:53:15 +02:00
Vladimir Krivopalov	8acf4ddb8e	keys: Add clustering_key_prefix::make_full helper. This method fills non-full clustering key with trailing empty values to make it full. This can be used for clustering keys of rows in a compact table as, unlike in regular tables, they can be non-full. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-22 12:13:23 +02:00
Amnon Heiman	ab207356a5	API: storage_service stream endpoints This patch changes how list of tokens returned from the storage_service API. Instead of create a vector and construct a json object of it, use the streaming capabilities of the http. This is important for large cluster and prevent large allocations. Fixes #3701 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20180820195631.26792-1-amnon@scylladb.com>	2018-08-22 11:24:38 +03:00
Takuya ASADA	e4f38b7c22	dist/redhat: support package renaming on build script To automatically rename packages on enterprise release, added package name prefix as a variable on build_rpm.sh. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180822072105.9420-1-syuu@scylladb.com>	2018-08-22 11:03:39 +03:00
Piotr Sarna	4a274ee7e2	tests: add parsing varint from JSON string test Refs #3666 Message-Id: <f4205e9484f5385796fade7986e3e38dcbc65bac.1534845398.git.sarna@scylladb.com>	2018-08-21 11:20:11 +01:00
Piotr Sarna	37a5c38471	types: enable deserializing varint from JSON string Previously deserialization failed because the JSON string representing a number was unnecessarily quoted. Fixes #3666 Message-Id: <a0a100dbac7c151d627522174303657d1da05c27.1534845398.git.sarna@scylladb.com>	2018-08-21 11:20:11 +01:00
Tomasz Grabiec	6937cc2d1c	Merge 'Fix multi-cell static list updates in the presence of ckeys' from Duarte Fixes a regression introduced in `9e88b60ef5`, which broke the lookup for prefetched values of lists when a clustering key is specified. This is the code that was removed from some list operations: std::experimental::optional<clustering_key> row_key; if (!column.is_static()) { row_key = clustering_key::from_clustering_prefix(params._schema, prefix); } ... auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column); Put it back, in the form of common code in the update_parameters class. Fixes #3703 https://github.com/duarten/scylla cql-list-fixes/v1: tests/cql_query_test: Test multi-cell static list updates with ckeys cql3/lists: Fix multi-cell static list updates in the presence of ckeys keys: Add factory for an empty clustering_key_prefix_view	2018-08-21 12:14:30 +02:00
Vladimir Krivopalov	c8422c9a91	sstables: Add operator<< overload for bound_kind_m. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-20 16:22:53 -07:00
Duarte Nunes	ff7304b190	tests/cql_query_test: Test multi-cell static list updates with ckeys Refs #3703 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-20 21:39:37 +01:00
Duarte Nunes	05731cb5ad	cql3/lists: Fix multi-cell static list updates in the presence of ckeys This patch fixes a regression introduced in `9e88b60ef5`, which broke the lookup for prefetched values of lists when a clustering key is specified. This is the code that was removed from some list operations: std::experimental::optional<clustering_key> row_key; if (!column.is_static()) { row_key = clustering_key::from_clustering_prefix(*params._schema, prefix); } ... auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column); Put it back, in the form of common code in the update_parameters class. Fixes #3703 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-20 21:39:37 +01:00
Duarte Nunes	ce461b06d7	keys: Add factory for an empty clustering_key_prefix_view Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-20 21:39:37 +01:00
Avi Kivity	231174cda9	build: auto-detect g++ -gz support Older combinations of g++/binutils don't support -gz, so auto-detect its presence. Fixes #3697. Message-Id: <20180817161113.2287-1-avi@scylladb.com>	2018-08-20 18:48:18 +02:00
Tomasz Grabiec	c31dff8211	Merge 'Skip inside wide partitions using index (rows only)' from Vladimir This patchset adds support for skipping inside wide partitions using index for sliced queries. This can significantly reduce disk I/O for queries that only need to read a small amount of data from a wide partition. Other changes include general code clean-up and simplification. * github.com/argenet/scylla.git tree/projects/sstables-30/skip_using_index/v6: sstables: Support resetting data_consume_rows_context_m to indexable_element::cell. tests: Add tests to cover skipping with index through SSTables 3.x. sstables: Support skipping inside wide partitions using index. to_string: Add operator<< overload for std::optional. sstables: Use std::optional instead of std::experimental::optional.	2018-08-20 18:39:51 +02:00
Avi Kivity	e605cd4ff8	multishard_writer_test: reduce mutation count in release mode We see occasional bad_alloc failures in release mode; this is due to the random mutation generator generating large mutations. Reduce the mutation count to 300. I tested 100 runs and all passed, so it reduces the false positive rate to < 1%.	2018-08-20 16:53:05 +03:00
Gleb Natapov	7277ee2939	storage_proxy: do not fail read without speculation on connection error After `ac27d1c93b` if a read executor has just enough targets to achieve request's CL and a connection to one of them will be dropped during execution ReadFailed error will be returned immediately and client will not have a chance to issue speculative read (retry). The patch changes the code to not return ReadFailed error immediately, but wait for timeout instead and give a client chance to issue speculative read in case read executor does not have additional targets to send speculative reads to by itself. Fixes #3699. Message-Id: <20180819131646.GK2326@scylladb.com>	2018-08-20 10:12:31 +03:00
Vladimir Krivopalov	f1b9f82ff5	sstables: Use std::optional instead of std::experimental::optional. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-17 18:20:05 -07:00
Vladimir Krivopalov	7b1d4915a1	to_string: Add operator<< overload for std::optional. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-17 18:20:05 -07:00
Vladimir Krivopalov	3e92434eed	sstables: Support skipping inside wide partitions using index. This fix adds proper support for skipping inside wide partitions using index for sliced reads. This significantly reduces disk I/O for filtered queries. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-17 18:20:04 -07:00
Vladimir Krivopalov	ec78fb9f13	tests: Add tests to cover skipping with index through SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-17 18:19:22 -07:00
Vladimir Krivopalov	4bf1e9de3f	sstables: Support resetting data_consume_rows_context_m to indexable_element::cell. Set the proper parsing state when resetting to indexable_element::cell. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-17 10:09:19 -07:00
Eliran Sinvani	f5f6cf2096	cql3: remove rejection of an IN relation if not on last partition KEY The constraint is no longer relevant, since Casandra removed it in version 2.2. In addition the mechanism for handling this case is already implemented and is identical in case of clustering keys with single column EQ,= and IN relations. (Cartesian product of singular ranges). A unit test for this test case was added. Fixes #1735 Tests: 1. Unit Tests. 2. Manual testing with the case described in the issue. 3. dtest: ql_additional_tests.py:TestCQL.composite_row_key_test Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <83b43fdc1ca0e0cc287f66f11816fc71b8bd2925.1534430405.git.eliransin@scylladb.com>	2018-08-16 19:32:43 +01:00
Eliran Sinvani	d743ceae76	cql3: ignore LIMIT in select statement with aggregate LIMIT should restrict the output result and not the query whose result set is aggregated. when using aggregate the output is guarantied to be only one row long. since LIMIT accepts only none negative numbers, it has no effect and can be ignored. Fixes #2028 Tests: The issue described Testcase , UnitTests. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <6c235376c81f052020e2ed23d0a3d071b36d4415.1534416997.git.eliransin@scylladb.com>	2018-08-16 19:31:56 +01:00
Nadav Har'El	8c604921ac	Materialized Views: test that virtual columns are not visible In the previous patches, we added "virtual columns" to materialized views to solve row liveness issues (issue #3362). Here we add a test that confirms that although these virtual columns exist in the view, they should not be visible to the user. They cannot be explicitly SELECTed from the view table, and a "SELECT *" will skip them. Refs #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:51:46 +03:00
Nadav Har'El	5ca974547a	Materialized Views: unit test reproducing fixed issue #3362 This patch includes several tests reproducing issue #3362 - the effect of unselected columns on view-table row liveness - and confirming that it was fixed. We found two example scenarios to demonstrate the bug. One scenario, test_3362_with_ttls(), involves an unselected column with a TTL. The other, test_3362_no_ttls() demonstrates the same bug without using TTL, and using explicit updates and deletions instead. These two tests are heavily commented, to explain what they test, and why. In addition to these two basic tests, we also include similar tests involving multiple items in a collection column, instead of multiple separate columns, which demonstrate the same problem exists there (and why, unfortunately, the "virtual columns" we add in that case need to be collections too). We also test that the virtual columns - and the problems they fix - work not only on columns originally created with the view, but also with unselected columns added later with ALTER TABLE on the base table. Refs #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:48:07 +03:00
Nadav Har'El	6c00341383	Materialized Views: no need for elaborate row marker calculations Now that we have separate virtual cells to represent unselected columns in a materialized view, we no longer need the elaborate row-marker liveness calculations which aimed (but failed) to do the same thing. So that code can be removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:45:41 +03:00
Nadav Har'El	30f721afab	Materialized Views: add unselected columns as virtual columns When a view's partition key contains only columns from the base's partition key (and not an additional one), the liveness (existance or disappearance) of a view-table row is tied to the liveness of the base table row - and that depends not only on selected columns (base-table columns SELECTed to also appear in the view) but also on unselected columns. This means that we may need to keep a view row alive even without data, just because some unselected column is alive in the base table. Before this patch we tried to build a single "row marker" in the view column which summarizes the liveness information in all unselected columns, but this proved unworkable, as explained in issue #3362 and as will be demonstrated in unit tests in a later patch. Because we can't replace several unselected cells by one row marker, what we do in this patch is to add for each for the unselected cell a "virtual cell" which contains the cell's liveness information (timestamp, deletion, ttl) but not its value. For collections, we can't represent the entire collection by one virtual cell, and rather need a collection of virtual cells. This patch just adds the virtual columns to the view schema. Code in the previous patch, when it notices the virtual columns in the view's schema, added the appropriate content into these columns. We may need to add virtual columns to a view when first created, but also when an unselected column is added to the base table with "ALTER TABLE", so both are supported in this patch. Fixes #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:42:22 +03:00
Nadav Har'El	782baa44ef	Materialized Views: fill virtual columns The add_cells_to_view() function usually adds selected cells from the base table to the view mutation. For issue #3362, we sometimes want to also add unselected cells as "virtual" cells - truncated versions of the base-table cells just without the values. This patch contains the code to fill the virtual columns' data using the regular columns from the base table. This patch does not yet actually add any virtual columns to the schema, so until that is done (in the next patch), this patch will not yet cause any behavior change. This is important for bisectability. Refs #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:38:27 +03:00
Nadav Har'El	3f3a76aa8f	Do not allow selecting a virtual column For issue #3362, we will need to add to a materialized view also unselected base-table columns as "virtual columns". We need these columns to exist to keep view rows alive, but we don't want the user to be able to see them. In this patch we prevent SELECTing the virtual columns of the view, and also exclude the virtual columns from a "SELECT *" on a view. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:34:22 +03:00
Nadav Har'El	36a657fc10	schema: persist "view virtual" columns to a separate system table In the previous patch, we added a "view virtual" flag on columns. In this patch we add persistance to this flag: I.e., writing it to the on-disk schema table and reading it back on startup. But the implementation is not as simple as adding a flag: In the on-disk system tables, we have a "columns" table listing all the columns in the database and their types. Cqlsh's "DESCRIBE MATERIALIZED VIEW" works by reading this "columns" table, and listing all of the requested view's columns. Therefore, we cannot add "virtual columns" - which are columns not added by the user and not intended to be seen - to this list. We therefore need to create in this patch a separate list for virtual columns, in a new table "view_virtual_columns". This table is essentially identical to the existing "columns" table, just separate. We need to write each column to the appropriate table (columns with the view_virtual flag to "view_virtual_columns", columns without it to the old "columns"), read from both on startup, and remember to delete columns from both when a table is dropped. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:30:06 +03:00
Nadav Har'El	0a1d93138d	schema: add "view virtual" flag to schema's column_definition In this patch we add a flag, "view virtual", that we can mark on on a column defined in a schema. In following patches, we will add such virtual columns to materialized views to allow view rows to remain alive despite having no data (refs #3362). After this patch, the "view virtual" flag exists in our in-memory representation of the schema, but not persisted to disk - we will fix this in the next patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:23:09 +03:00
Nadav Har'El	b4fc711903	Add "empty" type name to CQL parser, but only for internal parsing Even before this patch, Scylla supported the "empty" type (a column with no content) but only internally - i.e., in code but not in CQL syntax. The "empty" type was used in dense tables without regular columns, and a special optimization in db::cql_type_parser::parse() allowed this type name to be parsed when reading the schema tables, without allowing the "empty" type to be used by users in CQL statements. However, parse() only supported "empty" itself, and more complex types like list<empty> were not recognized by parse(). In the following patches, we plan to add to virtual columns to materialized views, with types empty, list<empty> or map<something, empty>. We need all these types to work, and before this patch, they don't. But we want all of these types to only work internally - when Scylla's code creates these hidden columns; we do not want to add the "empty" type to CQL's syntax. This is what we do in this patch: The CQL parser's comparator_type rule now has a parameter, "internal", used to differenciate internal calls via db::cql_type_parser::parse() from calls from CQL query parsing. If a user tries something like: CREATE TABLE e (pk empty PRIMARY KEY); He will get the error: Invalid (reserved) user type name empty Note that here, as usual, unknown types are treated as "user types", and "empty" is not allowed as a user type name - we "reserve" it in case one day in the future we will want to allow users a direct syntax to create empty columns. We already have, following Cassandra, a bunch of other names reserved from being user type names, including "byte", "complex", and others (see _reserved_type_names()), and using "empty" as a type name will result in a similar error message. Just like all other type names, the name "empty" is not a reserved keyword in other senses: a user can create a table or a column with the name "empty", just like he can create one with the name "int". Refs #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:12:27 +03:00
Duarte Nunes	a4355fe7e7	cql3/query_options: Use _value_views in prepare() _value_views is the authoritative data structure for the client-specified values. Indeed, the ctor called transport::request::read_options() leaves _values completely empty. In query_options::prepare() we were, however, using _values to associated values to the client-specified column names, and not _value_views. Fix this by using _value_views instead. As for the reasons we didn't see this bug earlier, I assume it's because very few drivers set the 0x04 query options flag, which means column names are omitted. This is the right thing to do since most drivers have enough information to correctly position the values. Fixes #3688 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180814234605.14775-1-duarte@scylladb.com>	2018-08-15 10:38:09 +01:00
Duarte Nunes	8751a58a2b	cql3/query_options: Preserve unset values when building value_views A raw value can be in one of three states: a valid value, an unset value, a null value. When translating raw_values to their views, we were treating both unset and null values are null raw_value_views. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180814231051.14385-1-duarte@scylladb.com>	2018-08-15 10:37:29 +01:00
Duarte Nunes	805ce6e019	cql3/query_processor: Validate presence of statement values timeously We need to validate before calling query_options::prepare() whether the set of prepared statement values sent in the query matches the amount of names we need to bind, otherwise we risk an out-of-bounds access if the client also specified names together with the values. Refs #3688 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180814225607.14215-1-duarte@scylladb.com>	2018-08-15 10:37:13 +01:00
Eliran Sinvani	d734d316a6	cql3: ensure repeated values in IN clauses don't return repeated rows When the list of values in the IN list of a single column contains duplicates, multiple executors are activated since the assumption is that each value in the IN list corresponds to a different partition. this results in the same row appearing in the result number times corresponding to the duplication of the partition value. Added queries for the in restriction unitest and fixed with a bad result check. Fixes #2837 Tests: Queries as in the usecase from the GitHub issue in both forms , prepared and plain (using python driver),Unitest. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <ad88b7218fa55466be7bc4303dc50326a3d59733.1534322238.git.eliransin@scylladb.com>	2018-08-15 10:21:22 +01:00
Duarte Nunes	a025bf6a7d	Merge seastar upstream Seastar introduced a "compat" namespace, which conflicts with Scylla's own "compat" namespaces. The merge thus includes changes to scope uses of Scylla's "compat" namespaces. * seastar 8ad870f...9bb1611 (5): > util/variant_utils: Ensure variant_cast behaves well with rvalues > util/std-compat: Fix infinite recursion > doc/tutorial: Undo namespace changes > util/variant_utils: Add cast_variant() > Add compatbility with C++17's library types Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-14 13:07:09 +01:00
Duarte Nunes	25a0a0f83d	tests/cql_test_env: Increase eventually() attempts The current value has proved to be insufficient for our CI infrastructure. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180814112201.8595-1-duarte@scylladb.com>	2018-08-14 12:37:32 +01:00
Duarte Nunes	495a92c5b6	tests/gossip_test: Use RAII for orderly destruction Change the test so that services are correctly teared down, by the correct order (e.g., storage_service access the messaging_service when stopping). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180814112111.8521-2-duarte@scylladb.com>	2018-08-14 12:27:14 +01:00
Duarte Nunes	3956a77235	tests/gossip_test: Don't bind address to avoid conflicts Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180814112111.8521-1-duarte@scylladb.com>	2018-08-14 12:27:02 +01:00
Piotr Sarna	310d0a74b9	cql3: throw proper request exception for INSERT JSON JSON code is amended in order to return proper "Missing mandatory PRIMARY KEY part" message instead of generic "Attempt to access value of a disengaged optional object". Fixes #3665 Message-Id: <69157d659d51ce5a2d408614ce3ba7bf8e3a5d88.1534161127.git.sarna@scylladb.com>	2018-08-13 23:57:37 +01:00
Piotr Sarna	b73669c329	tests: add parsing numeric values from string Numeric values (ints, doubles) should accept string representation when passed in INSERT JSON statement. Refs #3666 Message-Id: <586fea8fd08fe01f7a133f82f517e26d08d7cb76.1534153955.git.sarna@scylladb.com>	2018-08-13 23:57:37 +01:00
Piotr Sarna	b3f438bfec	types: enable parsing numeric JSON values from string In order to be Cassandra-compatible, JSON values passed in INSERT JSON statement should accept string parameters for numeric types - int, double, etc. Fixes #3666 Message-Id: <4da9a2f68de31492a2e9432493663a62b138c2f2.1534153955.git.sarna@scylladb.com>	2018-08-13 23:57:37 +01:00
Duarte Nunes	5de02ab98c	tracing: Pass string_view instead of string to add_query This resulted in superfluous copies. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180812085326.6260-1-duarte@scylladb.com>	2018-08-13 23:57:37 +01:00
Jesse Haber-Kucharsky	b95bbb2e72	auth: Clean up implementation comments	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	9519a03351	auth: Remove unnecessary local variable The variable could be declared `const`, but removing it outright seems more clear and this way we don't have to come up with a name.	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	52d3ff057a	auth: Allow different random engines for salt This makes the function useable in more contexts due to flexibility (including in tests), since the state is not captured and the characteristics of salt generation can be customized to the caller's needs.	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	836fd954e1	auth: Correct modulo bias in salt generation Instead of reducing the large value via `%`, which can produce non-uniformly distributed values when the range is small, we specify the range in the distribution, which is uniform by construction.	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	fe58a0b207	auth: Extract random byte generation for salt	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	fd60d61ebf	auth: Split out test for best supported scheme The `generate_salt` function invokes this function internally now. This change means that `generate_salt` is now thread-safe and therefore does not have to be invoked by a single thread only when starting the `password_authenticator`. This further means that `generate_salt` does not need to be part of the public interface of the module, and can be moved to the implementation file.	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	adf058bd1f	auth: Rename function to use full words	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	9b8cbb8542	auth: Add domain-specific exception for passwords	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	dbea3f5a01	auth: Document passwords interface	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	b272d622f8	auth: Move passsword stuff to its own namespace For clarity and nicer function names.	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	de01aaf181	auth: Identify password hashing errors correctly See `fce10f2c6e` for reference.	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	c10fcbf7a5	auth: Add unit tests for password handling This will mean we can make changes more confidently.	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	2a40bcb281	auth: Move password handling to its own files While the `password_authenticator` is a complex component with lots of dependencies, password hashing and checking itself is a process with limited logical state and dependencies, which makes it easy to isolate and test.	2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky	03cf57db62	auth: Construct `std::random_device` instances once `std::random_device` has a lot of implementation-specific behavior, and as a result we cannot assume much about its performance characteristics. We initialize thread-specific static instances of `std::random_device` once so that we don't have the overhead of invoking the ctor during every invocation of `gensalt`.	2018-08-13 13:24:45 -04:00
Duarte Nunes	f86811a3c9	Merge seastar upstream * seastar d40faff...8ad870f (9): > reactor: switch indentation > properly configure I/O Scheduler when --max-io-requests is passed > IOTune: tell users that the evaluation will take a while > exceptions: fix compilation with static libstdc++ > apps/iotune: print out which config file updated > foreign_ptr: allow waiting for the destruction of the managed ptr > Merge "Improve UX for backtraces read from stdin" from Botond Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-12 14:01:36 +01:00
Avi Kivity	183d5ba178	build: compress debug sections Compressing debug section reduces build size by 30% with no significant increase in build time. Results on a 4-core system (ninja release, size in MB): before: 18056 build real 59m43.138s user 229m3.180s sys 6m49.460s after: 12387 build real 60m30.112s user 232m8.962s sys 6m49.364s Presumably, the difference in debug mode is even greater.x Message-Id: <20180811180444.30578-1-avi@scylladb.com>	2018-08-11 19:41:55 +01:00
Takuya ASADA	2ef1b094d7	dist/common/scripts/scylla_setup: don't proceed RAID setup until user type 'done' Need to wait user confirmation before running RAID setup. See #3659 Fixes #3681 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180810194507.1115-1-syuu@scylladb.com>	2018-08-11 18:48:05 +03:00
Takuya ASADA	b7cf3d7472	dist/common/scripts/scylla_setup: don't mention about interactive mode prompt when running on non-interactive mode Skip showing message when it's non-interactive mode. Fixes #3674 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180810191945.32693-1-syuu@scylladb.com>	2018-08-11 18:48:03 +03:00
Takuya ASADA	ef9475dd3c	dist/common/scripts/scylla_setup: check existance of housekeeping.cfg before asking to run version check Skip asking to run version check when housekeeping.cfg is already exists. Fixes #3657 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180807232313.15525-1-syuu@scylladb.com>	2018-08-11 18:48:02 +03:00
Takuya ASADA	f30b701872	dist/debian: fix install scylla-server.service On previous commit we moved debian/scylla-server.service to debian/scylla-server.scylla-server.service to explicitly specify subpackage name, but it doesn't work for dh_installinit without '--name' option. Result of that current scylla-server .deb package missing scylla-server.service, so we need to rename the service to original file name. Fixes #3675 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180810221944.24837-1-syuu@scylladb.com>	2018-08-11 15:07:37 +03:00
Duarte Nunes	1521dc56ae	Merge 'Pass query options to restrictions filter' from Piotr " This miniseries fixes ALLOW FILTERING support for prepared statements by passing correct query options to the filter instead of empty ones. " * 'pass_query_options_to_restrictions_filter' of https://github.com/psarna/scylla: tests: add testing prepared statements with ALLOW FILTERING cql3: pass query options to restrictions filter	2018-08-09 18:15:18 +01:00
Duarte Nunes	95677877c2	Merge 'JSON support fixes' from Piotr " This series addresses SELECT/INSERT JSON support issues, namely handling null values properly and parsing decimals from strings. It also comes with updated cql tests. Tests: unit (release) " * 'json_fixes_3' of https://github.com/psarna/scylla: cql3: remove superfluous null conversions in to_json_string tests: update JSON cql tests cql3: enable parsing decimal JSON values from string cql3: add missing return for dead cells cql3: simplify parsing optional JSON values cql3: add handling null value in to_json cql3: provide to_json_string for optional bytes argument	2018-08-09 18:05:34 +01:00
Piotr Sarna	9ba218c161	cql3: remove superfluous null conversions in to_json_string Some types checked when passed bytes argument was empty, and if so, returned "null" as a JSON string. Now, with to_json_string(bytes_opt) it's not needed anymore. Also, some types returned "null" instead of signaling a deserialization error.	2018-08-09 18:07:12 +02:00
Piotr Sarna	fc187fa31e	tests: update JSON cql tests Tests are updated to check for recently fixed issues, i.e. * proper handling of null values * parsing decimal values from string Refs #3664 Refs #3666 Refs #3667	2018-08-09 18:07:12 +02:00
Piotr Sarna	957cc712b6	cql3: enable parsing decimal JSON values from string In order to be Cassandra-compatible, decimal type should be parsable from both numeric values and strings. Fixes #3666	2018-08-09 18:07:12 +02:00
Piotr Sarna	f962b85fa3	cql3: add missing return for dead cells Fixes #3664	2018-08-09 18:07:12 +02:00
Piotr Sarna	cdbeed4e3b	cql3: simplify parsing optional JSON values With new to_json_string implementation that accepts bytes_opt, parsing optional values can be simplified to remove explicit branching.	2018-08-09 18:07:12 +02:00
Piotr Sarna	e4396e17cb	cql3: add handling null value in to_json Previously to_json function would fail with null passed as a parameter. Fixes #3667	2018-08-09 18:07:12 +02:00
Piotr Sarna	52052b53a8	cql3: provide to_json_string for optional bytes argument In order to handle optional arguments in a neat way, a wrapper for to_json_string is provided.	2018-08-09 18:07:07 +02:00
Piotr Sarna	4a9014675f	tests: add testing prepared statements with ALLOW FILTERING ALLOW FILTERING support for prepared statements was buggy, so a test case for prepared statements is added to cql test suite.	2018-08-09 18:06:09 +02:00
Piotr Sarna	8c18aaa511	cql3: pass query options to restrictions filter Query options may contain bound values needed for checking filtering restrictions. Previously, empty query_options{} were used, which caused prepared statements to fail. Fixes #3677	2018-08-09 17:44:45 +02:00
Eliran Sinvani	3f2bb07599	cql3: Count unpaged select queries If the counter goes up this can be a possible reason for slowdown in queries (since it means that potentially a large amount of data will be sent to the client at once). Fixes #2478 Tests: cqlsh with PAGING OFF and ON and validating with a print. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <01253cee0b8c1110aaee3da41d1f434ca798b430.1533817568.git.eliransin@scylladb.com>	2018-08-09 13:53:44 +01:00
Tomasz Grabiec	024b3c9fd9	mutation_partition: Fix exception safety of row::apply_monotonically() When emplace_back() fails, value is already moved-from into a temporary, which breaks monotonicity expected from apply_monotonically(). As a result, writes to that cell will be lost. The fix is to avoid the temporary by in-place construction of cell_and_hash. To do that, appropriate cell_and_hash constructor was added. Found by mutation_test.cc::test_apply_monotonically_is_monotonic with some modifications to the random mutation generator. Introduced in `99a3e3a`. Fixes #3678. Message-Id: <1533816965-27328-1-git-send-email-tgrabiec@scylladb.com>	2018-08-09 15:29:10 +03:00
Tomasz Grabiec	fd543603dd	tests: random_mutation_generator: Use collection_member::yes for collection cells Caused assert failure when collection cells were so large as to require fragmentation. Currently collection cells are not fragmented, and deserialization asserts that. Message-Id: <1533817077-27583-1-git-send-email-tgrabiec@scylladb.com>	2018-08-09 15:27:20 +03:00
Vladimir Krivopalov	55d2fdee9a	clustering_key_filter_ranges: Fix move assignment to avoid undefined behaviour. Get rid of the new(this) trick that results in undefined behaviour because the class contains a const reference member. Use std::reference_wrapper instead to ease the transition. Refs #3032. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <5642bf79659231627dd7f8693c17cb46f274bcda.1533765105.git.vladimir@scylladb.com>	2018-08-09 00:53:17 +01:00
Takuya ASADA	ad7bc313f7	dist/common/scripts: pass format variables to colorprint() When we use str.format() to pass variables on the message it will always causes Exception like "KeyError: 'red'", since the message contains color variables but it's not passed to str.format(). To avoid the error we need to pass all format variables to colorprint() and run str.format() inside the function. Fixes #3649 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180803015216.14328-1-syuu@scylladb.com>	2018-08-08 18:37:50 +03:00
Avi Kivity	d6b0c4dda4	config: default murmur3_ignore_msb_bits to 12 even if not specified in scylla.yaml When murmur3_ignore_msb_bits was introduced in 1.7, we set its default zero (to avoid resharding on upgrade) and set it to 12 in the scylla.yaml template (to make sure we get the right value for new clusters). Now, however, things have changed: - clusters installed before 1.7 are a small minority - they should have resharded long ago - resharding is much better these days - we have more migrations from Cassandra compared to old clusters To allow clusters that migrated using their cassandra.yaml, and to clean up the default scylla.yaml, make the default 12. Users upgrading from pre-1.7 clusters will need to update their scylla.yaml, or to reshard (which is a good idea anyway). Fixes #3670. Message-Id: <20180808063003.26046-1-avi@scylladb.com>	2018-08-08 13:46:06 +02:00
Asias He	d47d46e1a8	streaming: Use streaming_write_priority for the sstable writer Use the streaming io priority otherwise it uses the default io priority. Message-Id: <e1836a9a93e7204d4bc9bba9c841d57c8b24aff8.1533715438.git.asias@scylladb.com>	2018-08-08 11:08:06 +03:00
Takuya ASADA	15825d8bf1	dist/common/scripts/scylla_setup: print message when EC2 instance is optimized for Scylla Currently scylla_ec2_check exits silently when EC2 instance is optimized for Scylla, it's not clear a result of the check, need to output message. Note that this change effects AMI login prompt too. Fixes #3655 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180808024256.9601-1-syuu@scylladb.com>	2018-08-08 10:17:52 +03:00
Takuya ASADA	652eb5ae0e	dist/common/scripts/scylla_setup: fix typo on interactive setup Scylls -> Scylla Fixes #3656 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180808002443.1374-1-syuu@scylladb.com>	2018-08-08 09:15:13 +03:00
Vladimir Krivopalov	7f77087caa	tests: Add tests performing compaction on SSTables 3.x. These tests check the correctness of resulting compacted SSTables based on the files produced by compacting input files with Cassandra. Note that output files are not identical to those generated by Cassandra because Scylla compaction does not yet optimise delta-encoded values using serialization header. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <3fa05ce72352292d1026ce80ac87552889d10d96.1533667535.git.vladimir@scylladb.com>	2018-08-08 08:50:41 +03:00
Rafi Einstein	c7f41c988f	Add a counter to count large partition warning in compaction Fixes #3562 Tests: dtest(compaction_test.py) Message-Id: <20180807190324.82014-1-rafie@scylladb.com>	2018-08-07 20:15:09 +01:00
Avi Kivity	c9caaa8e6e	docker: adjust for script conversion to Python Since our scripts were converted to Python, we can no longer source them from a shell. Execute them directly instead. Also, we now need to import configuration variables ourselves, since scylla_prepare, being an independent process, won't do it for us. Fixes #3647 Message-Id: <20180802153017.11112-1-avi@scylladb.com>	2018-08-07 15:34:03 +01:00
Takuya ASADA	a300926495	dist/common/scripts/scylla_setup: use specified NIC ifname correctly Interactive NIC selection prompt always returns 'eth0' as selected NIC name mistakenly, need to fix. Fixes #3651 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180803020724.15155-1-syuu@scylladb.com>	2018-08-06 20:59:19 +03:00
Amnon Heiman	80b1ef0f47	storage_service: Add nodes_status related metrics This patch adds a metric for a node own operation mode, the operation_mode metric represent the enum modes as gauge values according to: UNKNOWN = 0, STARTING = 1, JOINING = 2, NORMAL = 3, LEAVING = 4, DECOMMISSIONED = 5, DRAINING = 6, DRAINED = 7, MOVING = 8 Fixes: #3482 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20180806142706.23579-1-amnon@scylladb.com>	2018-08-06 18:19:56 +03:00
Tomasz Grabiec	88053b3bc9	tests: sstables: Replace sleep with accurate synchronzation Message-Id: <1533545829-31109-1-git-send-email-tgrabiec@scylladb.com>	2018-08-06 10:09:39 +01:00
Avi Kivity	13b729bf71	Merge "tracing: store request and response sizes" from Vlad " Store sizes of the request and the response for each traces query. In the example below I traced the cassandra-stress write workload with a default schema using the probabilistic tracing. Here is an entry created for one of queries: cassandra@cqlsh> SELECT parameters FROM system_traces.sessions where session_id=30c3a8ea-96bb-11e8-8a97-000000000000; parameters -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {'consistency_level': 'LOCAL_ONE', 'page_size': '5000', 'param[0]': 'f749eb03d6a995d8b3496075da8f20aa9228c5db12401e8a37000fa5baa13531...', 'param[1]': '845809b53a9aff7eef8f85308eaef79e03c696653ca23957f1ed5d539dc00463...', 'param[2]': 'd303585def93a5d40e41ceb12880ad3ede3d9f6308a1b1c5e42e911a191f1de1...', 'param[3]': 'be77c7da059d4b52687cd9b3eaa7d04cdfe7b5e38e84a8eea318299a01c7845f...', 'param[4]': '32faaaea1b3d73d9d628a4945b69a8531740348d49ee30c03f697dd2d63e8dee...', 'param[5]': '50503850374d34323330', 'query': 'UPDATE "standard1" SET "C0" = ?,"C1" = ?,"C2" = ?,"C3" = ?,"C4" = ? WHERE KEY=?', 'serial_consistency_level': 'SERIAL'} (1 rows) cassandra@cqlsh> SELECT request_size,response_size FROM system_traces.sessions where session_id=30c3a8ea-96bb-11e8-8a97-000000000000; request_size \| response_size --------------+--------------- 239 \| 4 (1 rows) Now let's try to read the same keyspace1.standard1 entry (based on the "key" value in "param[5]") from cqlsh and trace it using TRACING ON. cassandra@cqlsh> TRACING ON Now Tracing is enabled cassandra@cqlsh> SELECT * from keyspace1.standard1 where key=0x50503850374d34323330; key \| C0 \| C1 \| C2 \| C3 \| C4 ------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+- ----------------------------------------------------------------------- 0x50503850374d34323330 \| 0xf749eb03d6a995d8b3496075da8f20aa9228c5db12401e8a37000fa5baa135315430 \| 0x845809b53a9aff7eef8f85308eaef79e03c696653ca23957f1ed5d539dc00463e10e \| 0xd303585def93a5d40e41ceb12880ad3ede3d9f6308a1b1c5e42e911a191f1de12924 \| 0xbe77c7da059d4b52687cd9b3eaa7d04cdfe7b5e38e84a8eea318299a01c7845fb8a2 \| 0x32faaaea1b3d73d9d628a4945b69a8531740348d49ee30c03f697dd2d63e8dee5dde (1 rows) Tracing session: 639ca0a0-96bb-11e8-8a97-000000000000 activity \| timestamp \| source \| source_elapsed ------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+---------------- Execute CQL3 query \| 2018-08-02 21:20:20.906000 \| 192.168.1.138 \| 0 Parsing a statement [shard 0] \| 2018-08-02 21:20:20.906358 \| 192.168.1.138 \| -- Processing a statement [shard 0] \| 2018-08-02 21:20:20.906405 \| 192.168.1.138 \| 47 Creating read executor for token -5698461774438220979 with all: {192.168.1.138} targets: {192.168.1.138} repair decision: NONE [shard 0] \| 2018-08-02 21:20:20.906445 \| 192.168.1.138 \| 87 read_data: querying locally [shard 0] \| 2018-08-02 21:20:20.906448 \| 192.168.1.138 \| 90 Start querying the token range that starts with -5698461774438220979 [shard 0] \| 2018-08-02 21:20:20.906452 \| 192.168.1.138 \| 94 Querying is done [shard 0] \| 2018-08-02 21:20:20.906509 \| 192.168.1.138 \| 151 Done processing - preparing a result [shard 0] \| 2018-08-02 21:20:20.906533 \| 192.168.1.138 \| 175 Request complete \| 2018-08-02 21:20:20.906186 \| 192.168.1.138 \| 186 cassandra@cqlsh> TRACING OFF Disabled Tracing. cassandra@cqlsh> SELECT request_size,response_size FROM system_traces.sessions where session_id=639ca0a0-96bb-11e8-8a97-000000000000; request_size \| response_size --------------+--------------- 82 \| 369 (1 rows) " * 'tracing_request_response_size-v2' of https://github.com/vladzcloudius/scylla: tracing: move all tracing related API functions to a cold path tracing: store a query response size tracing: store request size	2018-08-05 18:26:29 +03:00
Jesse Haber-Kucharsky	fce10f2c6e	auth: Don't use unsupported hashing algorithms In previous versions of Fedora, the `crypt_r` function returned `nullptr` when a requested hashing algorithm was not supported. This is consistent with the documentation of the function in its man page. As of Fedora 28, the function's behavior changes so that the encrypted text is not `nullptr` on error, but instead the string "0". The info pages for `crypt_r` clarify somewhat (and contradict the man pages): Some implementations return `NULL` on failure, and others return an _invalid_ hashed passphrase, which will begin with a `` and will not be the same as SALT. Because of this change of behavior, users running Scylla on a Fedora 28 machine which was upgraded from a previous release would not be able to authenticate: an unsupported hashing algorithm would be selected, producing encrypted text that did not match the entry in the table. With this change, unsupported algorithms are correctly detected and users should be able to continue to authenticate themselves. Fixes #3637. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <bcd708f3ec195870fa2b0d147c8910fb63db7e0e.1533322594.git.jhaberku@scylladb.com>	2018-08-05 08:57:36 +03:00
Vlad Zolotarov	896c1822b5	tracing: move all tracing related API functions to a cold path This patch completes what was started in `a4282c2c6e` Make trace_state_ptr to be a wrapper class around lw_shared_ptr<trace_state> that hints that bool(trace_state_ptr) is likely to return FALSE. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-03 12:32:54 -04:00
Vlad Zolotarov	6db90a2e63	tracing: store a query response size Add a new "response_size" column to system_traces.sessions and store a size of an uncompressed response for a traced query. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-03 12:29:36 -04:00
Vlad Zolotarov	05020921bb	tracing: store request size Add a new column "request_size" to system_traces.sessions and store the uncompressed request frame data size. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-03 12:29:36 -04:00
Avi Kivity	3b42fcfeb2	Merge "Fix exception safety in imr::utils::object" from Paweł " There is an exception safety problem in imr::utils::object. If multiple memory allocations are needed and one of them fails the main object is going to be freed (as expected). However, at this stage it is not constructed yet, so when LSA asks its migrator for the size it may get a meaningless value. The solution is to remember the size until object is fully created and use sized deallocation in case of failures. Fixes #3618. Tests: unit(release, debug/imr_test) "	2018-08-02 12:10:24 +03:00
Takuya ASADA	1bb463f7e5	dist/debian: install .service on correct subpackage We mistakenly installing scylla-housekeeping-.service to scylla-conf package, all *.service should explicitly specified subpackage name. Fixes #3642 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180801233042.307-1-syuu@scylladb.com>	2018-08-02 11:39:52 +03:00
Paweł Dziepak	fd44d13145	tests/imr: add test for exception safety in imr::utils::object::make()	2018-08-01 16:50:58 +01:00
Paweł Dziepak	7ec906e657	imr: detect lsa migrator mismatch Each IMR type needs its own LSA migrator. It is possible that user will provide a migrator for a different type than the one which instance is being created. This patch adds compile-time detection of that bug.	2018-08-01 16:50:58 +01:00
Benny Halevy	6b179b0183	HACKING.md: update ./install-dependencies.sh filename Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20180801150813.25408-1-bhalevy@scylladb.com>	2018-08-01 18:09:29 +03:00
Paweł Dziepak	6fbf2d72e9	imr::utils::object_context: fix context_for for backpointer Each member of a structure may require different deserialisation context. They are provided by context_for<Tag>() method of the context used to deserialise the structure itself. imr::utils::object needs to add backpointer to the structure it manages so that it can be used in the LSA memory. This is done by creating a structure that has two members: the backpointer and the actual structure that imr::utils::object is to manage. imr::utils::object_context creates approperiate deserialisation context for it. context_for() is called for each member of a structure. object_context implementation of context_for() always created a deserialisation context for the underlying structure regardless which member that was, so it was done also for backpointer. This is wrong since the context may read the object on its creation. The fix is to use no_context_t for the backpointer.	2018-08-01 15:17:25 +01:00
Paweł Dziepak	61749019cb	imr::utils::object: fix exception safety if allocation fails imr::utils::object::make() handles creation of IMR objects. They are created in three phases: 1. The size of the object and all additional needed memory allocations is determined 2. All needed buffers are allocated 3. Data is written to the allocated space When IMR objects are deallocated LSA asks their migrator for the size. Migrator may read some parts of the object to figure out its size. This is a problem if there is allocation failure in make() at point 2. If one of required allocations fails, the buffers that were already acquired need to be freed. However, since the object hasn't been fully created yet migrator won't return a valid value. The solution for this is to remember object size until all allocations are completed. This way the LSA won't need to ask migrators for it in case of failure. imr::alloc::object_allocator already does that but imr::utils::object doesn't. This patch fixes that.	2018-08-01 15:17:13 +01:00
Piotr Sarna	156888fb44	docs: fix system.large_partitions doc entry For some reason the doc entry for large_partitions was outdated. It contained incorrect ORDERING information and wrong usage example, since large_partitions' schema changed multiple times during the reviewing process. Message-Id: <1910f270419536ebccffde163ec1bfc67d273306.1533128957.git.sarna@scylladb. com>	2018-08-01 16:12:39 +03:00
Asias He	95849371aa	range_streamer: Remove unordered_multimap usage We need the mapping between dht::token_range to std::vector<inet_address> and inet_address to dht::token_range_vector in various places. Currently, we use std::unordered_multimap and convert to std::unordered_map. It is better to use std::unordered_map in the first place. The changes like below: - Change from std::unordered_multimap<dht::token_range, inet_address> to std::unordered_map<dht::token_range, std::vector<inet_address>> - Change from std::unordered_multimap<inet_address, dht::token_range> to std::unordered_map<inet_address, dht::token_range_vector> Message-Id: <b8ecc41775e46ec064db3ee07510c404583390aa.1533106019.git.asias@scylladb.com>	2018-08-01 13:01:41 +03:00
Gleb Natapov	44a6afad8c	cache_hitrate_calculator: fix race when new table is added during calculations The calculation consists of several parts with preemption point between them, so a table can be added while calculation is ongoing. Do not assume that table exists in intermediate data structure. Fixes #3636 Message-Id: <20180801093147.GD23569@scylladb.com>	2018-08-01 12:45:03 +03:00
Avi Kivity	620e950fc8	Merge "No infinite time-outs for internal distributed queries" from Jesse " This series replaces infinite time-outs in internal distributed (non-local) CQL queries with finite ones. The implementation of tracing, which also performs internal queries, already has finite time-outs, so it is unchanged. Fixes #3603. " * 'jhk/finite_time_outs/v2' of https://github.com/hakuch/scylla: Use finite time-outs for internal auth. queries Use finite query time-outs for `system_distributed`	2018-08-01 11:23:42 +03:00
Asias He	4a0b561376	storage_service: Get rid of moving operation The moving operation changes a node's token to a new token. It is supported only when a node has one token. The legacy moving operation is useful in the early days before the vnode is introduced where a node has only one token. I don't think it is useful anymore. In the future, we might support adjusting the number of vnodes to reblance the token range each node owns. Removing it simplifies the cluster operation logic and code. Fixes #3475 Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>	2018-08-01 11:18:17 +03:00
Asias He	02befb6474	gossip: Log seeds seen It is useful for debugging bootstap issue, especially for large clusters. Also do not use the `_seeds` as the set_seeds function parameter since there is a class member called _seeds. Refs #3417 Message-Id: <15e6bdf06376949ced1bdb845f810da09266783d.1532474820.git.asias@scylladb.com>	2018-08-01 10:57:56 +03:00
Takuya ASADA	2cd99d800b	dist/common/scripts/scylla_ntp_setup: fix typo Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1533070539-2147-1-git-send-email-syuu@scylladb.com>	2018-08-01 10:31:07 +03:00
Avi Kivity	2c9b886b6d	logalloc: reindent No functional changes. Message-Id: <20180731125116.32009-1-avi@scylladb.com>	2018-08-01 00:35:54 +01:00
Jesse Haber-Kucharsky	e664f9b0c6	Use finite time-outs for internal auth. queries	2018-07-31 11:38:16 -04:00
Jesse Haber-Kucharsky	ca44f4de3c	Use finite query time-outs for `system_distributed`	2018-07-31 11:38:15 -04:00
Paweł Dziepak	b20a15bdda	Merge "Prevent scheduling leaks when out of memtable space" from Avi " When we are out of memtable space (real of virtual), lsa will defer running our mutation application and run it later when memory is in fact available. However, it will run it in the main group, giving the write more shares than it would otherwise get. This patchset fixes the problem by running those deferred mutation applications in the correct scheduling group. Fixes #3638 " * tag '3638/v2' of https://github.com/avikivity/scylla: database: tag dirty memory managers with scheduling groups logalloc: run releaser() in user-provided scheduling group	2018-07-31 11:55:19 +01:00
Avi Kivity	2d311c26b3	database: tag dirty memory managers with scheduling groups dirty memory managers run code on behalf of their callers in a background fiber, so provide that background fiber with the scheduling group appropriate to their caller. - system: main (we want to let system writes through quickly) - dirty: statement (normal user writes) - streaming: streaming (streaming writes)	2018-07-31 13:18:21 +03:00
Paweł Dziepak	98217f0d66	Update seastar submodule * seastar 6b97e00...d40faff (10): > tutorial: update build as needed for newer pandoc > core: fix __libc_free return type signature > future-utils: when_all: avoid calling member function on an uninitialized data member > future-util: reduce continuations in when_all (variadic version) > future-utils: remove allocation in when_all() if all futures are available > future: reduce allocations in when_all() > future: fill missing futurize::from_tuple() functions > future: expose more types in continuation_base > log: predict logger::is_enabled() as false > README: add Resources section with infomation about the mailing list etc.	2018-07-31 10:12:52 +01:00
Avi Kivity	0fc54aab98	logalloc: run releaser() in user-provided scheduling group Let the user specify which scheduling group should run the releaser, since it is running functions on the user's behalf. Perhaps a cleaner interface is to require the user to call a long-running function for the releaser, and so we'd just inherit its scheduling group, but that's a much bigger change.	2018-07-31 11:57:58 +03:00
Avi Kivity	f258df099a	Update ami submodule * dist/ami/files/scylla-ami d53834f...c7e5a70 (1): > ds2_configure.py: uncomment 'cluster_name' when it's commented out	2018-07-31 09:34:33 +03:00
Avi Kivity	e7ae4beef0	main: run prometheus and API servers under streaming group Both the Prometheus and the API servers are used for maintenance operations, similarly to streaming. Run them under the streaming scheduling group to prevent them from impacting normal operations, and rename the streaming scheduling group to reflect the more generic role. This helps to prevent spikes from Prometheus or API requests from interfering with the normal workload. Using an existing group is preferable to creating a new group because in the worst case, all the non-main-workload groups compete with the main workload. Consolidating them allows us to give them significant shares in total without increasing competition in the worst case. The group's label is unchanged to preserve compatibility with dashboards. A nice side effect is that repair, which is initiated by API calls, gets placed into the maintenance group naturally. Compaction tasks which are run by compaction manager are not changed. Message-Id: <20180714160723.23655-1-avi@scylladb.com>	2018-07-30 15:07:33 +01:00
Avi Kivity	a4282c2c6e	tracing: move tracing code to cold path Most queries run without tracing (and those that run with tracing are not sensitive to a few cycles), so mark the tracing paths as cold. Message-Id: <20180723133000.30482-1-avi@scylladb.com>	2018-07-30 15:05:57 +01:00
Rafi Einstein	123f2c2a1c	Add a counter for reverse queries Fixes #3492 Tests: dtest(cql_additional_tests.py) Message-Id: <20180729202615.22459-1-rafie@scylladb.com>	2018-07-30 12:34:43 +03:00
Takuya ASADA	032b26deeb	dist/common/scripts/scylla_ntp_setup: fix typo Comment on Python is "#" not "//". Fixes #3629 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180730091022.4512-1-syuu@scylladb.com>	2018-07-30 12:30:53 +03:00
Avi Kivity	04d88e8ff7	scripts: add a script to compute optimal number of compile jobs This will allow continuous integration to use the optimal number of compiler jobs, without having to resort to complex calculations from its scripting environment. Message-Id: <20180722172050.13148-1-avi@scylladb.com>	2018-07-30 10:15:11 +03:00
Avi Kivity	a4c9330bfc	Merge "Optimise paged queries" from Paweł " This series adds some optimisations to the paging logic, that attempt to close the performance gap between paged and not paged queries. The former are more complex so always are going to be slower, but the performance loss was unacceptably large. Fixes #3619. Performance with paging: ./perf_paging_before ./perf_paging_after diff read 271246.13 312815.49 15.3% Without paging: ./perf_nopaging_before ./perf_nopaging_after diff read 343732.17 342575.77 -0.3% Tests: unit(release), dtests(paging_test.py, paging_additional_test.py) " * tag 'optimise-paging/v1' of https://github.com/pdziepak/scylla: cql3: select statement: don't copy metadata if not needed cql3: query_options: make simple getter inlineable cql3: metadata: avoid copying column information query_pager: avoid visiting result_view if not needed query::result_view: add get_last_partition_and_clustering_key() query::result_reader: fix const correctness tests/uuid: add more tests including make_randm_uuid() utils: uuid: don't use std::random_device()	2018-07-26 19:24:03 +03:00
Nadav Har'El	25bd139508	cross-tree: clean up use of std::random_device() std::random_device() uses the relatively slow /dev/urandom, and we rarely if ever intend to use it directly - we normally want to use it to seed a faster random_engine (a pseudo-random number generator). In many places in the code, we first created a random_device variable, and then using it created a random_engine variable. However, this practice created the risk of a programmer accidentally using the random_device object, instead of the random_engine object, because both have the same API; This hurts performance. This risk materialized in just two places in the code, utils/uuid.cc and gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is not included in this patch, and the fix for gossiper.{cc,hh} is included here. To avoid risking the same mistake in the future, this patch switches across the code to an idiom where the random_device object is not named, so cannot be accidentally used. We use the following idiom: std::default_random_engine _engine{std::random_device{}()}; Here std::random_device{}() creates the random device (/dev/urandom) and pulls a random integer from it. It then uses this seed to create the random_engine (the pseudo-random number generator). The std::random_device{} object is temporary and unnamed, and cannot be unintentionally used directly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180726154958.4405-1-nyh@scylladb.com>	2018-07-26 16:54:58 +01:00
Takuya ASADA	8e4d1350c9	dist/common/scripts/scylla_ntp_setup: ignore ntpdate error Even ntpdate fails to adjust clock ntpd may able to recover it later, ignore ntpdate error keep running the script. Fixes #3629 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180726080206.28891-1-syuu@scylladb.com>	2018-07-26 14:44:53 +03:00
Paweł Dziepak	3e32245bb8	cql3: select statement: don't copy metadata if not needed	2018-07-26 12:37:20 +01:00
Paweł Dziepak	15775c958a	cql3: query_options: make simple getter inlineable	2018-07-26 12:37:06 +01:00
Paweł Dziepak	ef0c999742	cql3: metadata: avoid copying column information The column-related metadata is shared by all requests done with the same perpared query. However, metadata class contains also some additional flags and paging state which may differ. This patch allows sharing column information among multiple instances of the metadata class.	2018-07-26 12:17:04 +01:00
Paweł Dziepak	757d9e3b5d	query_pager: avoid visiting result_view if not needed query::result_visitor provides get_last_partition_and_clustering_key() which allows getting those without iterating through the whole result. Moreover, row count may be precomputed in the result, if it isn't there is query::result_view::count_partitions_and_rows() for getting it.	2018-07-26 12:14:48 +01:00
Paweł Dziepak	9b6dc52255	query::result_view: add get_last_partition_and_clustering_key() Paging needs to get last partition and clustering key (if the latter exists). Previously, this was done by result_view visitor but that is suboptimal. Let's add a direct getter for those.	2018-07-26 12:12:08 +01:00
Paweł Dziepak	b5ed4c8806	query::result_reader: fix const correctness	2018-07-26 12:11:27 +01:00
Paweł Dziepak	495df277f9	tests/uuid: add more tests including make_randm_uuid()	2018-07-26 12:03:37 +01:00
Paweł Dziepak	b485deb124	utils: uuid: don't use std::random_device() std::random_device() is extremely slow. This patch modifies make_rand_uuid() so that it requires only two invocations of the PRNG.	2018-07-26 12:02:32 +01:00
Avi Kivity	b167647bf6	dist: redhat: fix up bad file ownership of rpms/srpms mock outputs files owned by root. This causes attempts by scripts that want to junk the working directory (typically continuous integration) to fail on permission errors. Fixup those permissions after the fact. Message-Id: <20180719163553.5186-1-avi@scylladb.com>	2018-07-26 08:20:42 +03:00
Avi Kivity	bea1f715dc	storage_proxy: count cross-shard operations Count operations which were started on one shard and were performed on another, due to non-shard-aware driver and/or RPC. Message-Id: <20180723155118.8545-1-avi@scylladb.com>	2018-07-25 16:21:04 +01:00
Avi Kivity	d6ef74fe36	Merge "Fix JSON string quoting" from Piotr " This mini-series covers a regression caused by newest versions of jsoncpp library, which changed the way of quoting UTF-8 strings. Tests: unit (release) " * 'add_json_quoting_3' of https://github.com/psarna/scylla: tests: add JSON unit test types: use value_to_quoted_string in JSON quoting json: add value_to_quoted_string helper function Ref #3622. Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2018-07-25 17:49:55 +03:00
Piotr Sarna	b367cff05d	tests: add JSON unit test Since value_to_quoted_string now has an internal implementation, a unit test is provided to check if strings are quoted and escaped properly.	2018-07-25 13:16:06 +02:00
Piotr Sarna	d307b5712c	types: use value_to_quoted_string in JSON quoting In order to avoid regressions caused by external libraries, our own value_to_quoted_string implementation is used. Fixes #3622	2018-07-25 13:16:06 +02:00
Piotr Sarna	783762a958	json: add value_to_quoted_string helper function After open-source-parsers/jsoncpp@42a161f commit jsoncpp's version of valueToQuotedString no longer fits our needs, because too many UTF-8 characters are unnecessarily escaped. To remedy that, this commit provides our own string quoting implementation. Reported-by: Nadav Har'El <nyh@scylladb.com> Refs #3622	2018-07-25 13:16:00 +02:00
Piotr Sarna	f66aace685	cql3: fix INSERT JSON grammar Previously CQL grammar wrongfully required INSERT JSON queries to provide a list of columns, even though they are already present in JSON itself. Unfortunately, tests were written with this false assumption as well, so they're are updated. Message-Id: <33b496cba523f0f27b6cbf5539a90b6feb20269e.1532514111.git.sarna@scylladb.com>	2018-07-25 11:36:59 +01:00
Avi Kivity	b443a9b930	compaction: demote compaction start/end messages to DEBUG level Compactions start and end all the time, especially with many shards, and don't contribute much to understanding what is going on these days. Compaction throughput is available through the metrics and other information is available via the compaction history table. Demote compaction start and end messages to DEBUG level to keep the log clean. Cleaning and resharding compactions are kept as INFO, at least for now, since they are manual operations and therefore rarer. Message-Id: <20180724132859.14109-1-avi@scylladb.com>	2018-07-25 09:53:39 +01:00
Takuya ASADA	58f094e06d	dist/debian: fix ImportError on pystache Seems like pystache does not provides dependency, need to install it on build_deb.sh. Fixes #3627 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180724164852.16094-1-syuu@scylladb.com>	2018-07-25 07:42:19 +03:00
Avi Kivity	e2ad45c3db	Merge "Add clustering prefix logic to indexes and filtering" from Piotr " This series follows up ALLOW FILTERING support series and depends on this one: https://groups.google.com/d/msg/scylladb-dev/Qxt3_MP03jI/5ZhRTJ3gBwAJ The following optimizations regarding clustering key prefix and filtering are applied: * if clustering key restrictions require filtering, but they still contain any part of the prefix, this prefix can be used to narrow down the query by using it in computing clustering bounds * if an indexed query has partition key restrictions and any clustering key restrictions that form a prefix, then from now on this prefix will be used to narrow down the index query " Ref #3611. * 'use_prefix_with_filtering_and_si_4' of https://github.com/psarna/scylla: tests: add prefix cases to indexed filtered queries tests cql3: use ck prefix in filtered queries cql3: use clustering key prefix in index queries cql3: add conversion to ck longest prefix restrictions cql3: add prefix_size method to ck restrictions	2018-07-23 15:28:50 +03:00
Piotr Sarna	517a5b66ba	tests: add prefix cases to indexed filtered queries tests More cases related to querying clustering key prefix in an indexed query are added to secondary index test suite.	2018-07-23 14:10:52 +02:00
Piotr Sarna	8523c24576	cql3: use ck prefix in filtered queries If a filtering query has restrictions that include any clustering prefix, the longest prefix will be used to narrow down the query. Fixes #3611	2018-07-23 14:10:52 +02:00
Piotr Sarna	6cc8ccc771	cql3: use clustering key prefix in index queries If an indexed query has partition+clustering key restrictions as well and at least some of these restrictions create a prefix, this prefix is used in the index query to narrow down the number of rows read. Refs #3611	2018-07-23 14:10:52 +02:00
Piotr Sarna	ab74f75727	cql3: add conversion to ck longest prefix restrictions For optimization purposes it's sometimes useful to extract the longest prefix of clustering key restrictions in order to narrow down queries.	2018-07-23 14:10:52 +02:00
Piotr Sarna	2e4c493870	cql3: add prefix_size method to ck restrictions Clustering key restrictions are usually set for at least part of the clustering key prefix. A method of extracting the longest prefix's size is added.	2018-07-23 14:10:52 +02:00
Vladimir Krivopalov	ec7f853f49	sstables: Do not pass liveness_info to consume_row_end(). The liveness_info is unconditionally added to the _in_progress_row as of commit `cbfc741d70` so no need to pass it to consume_row_end() and add conditionally. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <7cd3e599817cbd4b857c3295153602cd2b9a6ef1.1532311852.git.vladimir@scylladb.com>	2018-07-23 13:10:36 +03:00
Avi Kivity	bb79eccf55	tests: sstable_mutation_test: hack around leak during sstable close sstable close is an asychronous operation launched in the background, so we can't wait for it. If the test ends before all operations are complete, the background operations are detected as leaks. We need either a proper close(), or maybe a sstables::quiesce() that waits until there are no sstables alive on the shard, but until then, a hack.	2018-07-23 12:40:46 +03:00
Avi Kivity	af6ce47082	Merge "Support filtering and fast-forwarding with SSTables 3.x" from Piotr and Vladimir " This patchset authored by Piotr fixes ck filtering and fast forwarding in SSTables 3.x. For now only clustering rows are supported and range tombstones will come next. Test: unit {release} " * 'projects/sstables-30/filtering/v5' of https://github.com/argenet/scylla: sstables: Minor clean-up and renaming to clustering_ranges_walker. sstables: Add test for filtering and forwarding sstables: Fix schema for static row tests sstables: Fix ck filtering and fast forwarding sstables: Introduce mutation_fragment_filter	2018-07-22 21:11:51 +03:00
Avi Kivity	761931659a	Merge "Do not linearise incoming CQL3 requests" from Paweł " This series changes the native CQL3 protocl layer so that it works with fragmented buffers instead of a single temporary_buffer per request. The main part is fragmented_temporary_buffer which represents a fragmented buffer consisting of multiple temporary_buffers. It provides helpers for reading fragmented buffer from an input_stream, interpreting the data in the fragmented buffer as well as view that satisfy FragmentRange concept. There are still situations where a fragmented buffer is linearised. That includes decompressing client requests (this uses reusable buffers in a similar way to the code that sends compressed responses), CQL statement restrictions and values that are hard-coded in prepared statements (hopefully, the values in those cases will be small), value validation in some cases (blobs are not validated, irrelevant for many fixed-size small types, but may be a problem for large text cells) as well as operations on collections. Tests: unit(release), dtests(cql_prepared_test.py, cql_tests.py, cql_additional_tests.py) " * tag 'fragmented-cql3-receive/v1' of https://github.com/pdziepak/scylla: (23 commits) types: bytes_view: override fragmented validate() cql3: value_view: switch to fragmented_temporary_buffer::view types: add validate that accepts fragmented_temporary_buffer::view cql3 query_options: add linearize() cql3: query_options: use bytes_ostream for temporaries cql3: operation: make make_cell accept fragmented_temporary_buffer::view atomic_cell: accept fragmented_temporary_buffer::view values cql3: avoid ambiguity in a call to update_parameters::make_cell() transport: switch to fragmented_temporary_buffer transport: extract compression buffers from response class tests/reusable_buffer: test fragmented_temporary_buffer support utils: reusable_buffer: support fragmented_temporary_buffer tests: add test for fragmented_temporary_buffer util fragment_range: add general linearisation functions utils: add fragmented_temporary_buffer tests: add basic test for transport requests and responses tests/random-utils: print seed tests/random-utils: generate sstrings cql3: add value_view printer and equality comparison transport: move response outside of cql_server class ...	2018-07-22 19:40:37 +03:00
Avi Kivity	30cddd4531	Merge "Support reading promoted index from SSTables 3.x" from Vladimir and Piotr " This patchset adds support for reading Index.db files written in SSTables 3.x ('mc') format. Note that the offsets map introduced in SSTables 3.x is neither used nor read yet. It is located last in promoted index and so current parsers just ignore it for the time being. Later it should be used to perform binary search of a desired promoted index block in large partition, thus reducing the complexity from linear to logarithmic. Tests: unit {release} " * 'projects/sstables-30/index_reader/v5' of https://github.com/argenet/scylla: sstables: Add getter for end_open_marker to index_reader. tests: Add test reading index for a partition comprised of RT markers of boundary types. tests: Add test for reading index of a partition comprised of only range tombstones. tests: Use std::adjacent_find in index_reader_assertions::has_monotonic_positions() tests: Read rows only index sstables: Do not seek through the promoted index for static row positions. sstables: Read promoted index stored in SSTables 3.x ('mc') format. sstables: Make promoted_index_block support clustering keys for both ka/la and mc formats. utils: Add overloaded_functor helper. position_in_partition: Add a constructor from range_tag_t{}, bound_kind and clustering_key_prefix. sstables: Support reading signed vints in continuous_data_consumer. sstables: Factor out the code building a vector of fixed clustering values lengths. sstables: Remove unused includes from index_entry.hh tests: Add test for reading SSTables 3.x index file with empty promoted index. tests: Rename sstable_assertions.hh -> tests/index_reader_assertions.hh sstables: Support parsing index entries from SSTables 3.x format. sstables: move bound_kind_m to header	2018-07-22 16:15:41 +03:00
Vladimir Krivopalov	df1a151f75	sstables: Minor clean-up and renaming to clustering_ranges_walker. - Renamed _current to _current_range to better reflect its nature as there are other similarly named members (_current_start and _current_end). - Don't use a temporary variable for incrementing the change counter. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 16:34:37 -07:00
Piotr Jastrzebski	01611f2083	sstables: Add test for filtering and forwarding Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-07-20 16:34:37 -07:00
Piotr Jastrzebski	3466dc2368	sstables: Fix schema for static row tests	2018-07-20 16:34:37 -07:00
Piotr Jastrzebski	abf3fc1b98	sstables: Fix ck filtering and fast forwarding Both were broken before this change. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 16:34:37 -07:00
Piotr Jastrzebski	564bcfa4d0	sstables: Introduce mutation_fragment_filter This class encapsulates the logic related to clustering key filtering and fast forwarding. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 16:19:07 -07:00
Vladimir Krivopalov	4d3467d793	sstables: Add getter for end_open_marker to index_reader. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	c7285abc9e	tests: Add test reading index for a partition comprised of RT markers of boundary types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	91f96d7d2b	tests: Add test for reading index of a partition comprised of only range tombstones. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	fc051954c2	tests: Use std::adjacent_find in index_reader_assertions::has_monotonic_positions() Not only this is easier to read and understand, but it also doesn't force the promoted_index_block class to support copying which is heavyweight and otherwise not needed. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	d4e0fa96e3	tests: Read rows only index Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	5561c713d9	sstables: Do not seek through the promoted index for static row positions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	917528c427	sstables: Read promoted index stored in SSTables 3.x ('mc') format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	86d14f8166	sstables: Make promoted_index_block support clustering keys for both ka/la and mc formats. This is a pre-requisite for parsing promoted index blocks written in SSTables 'mc' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	79c2f0095c	utils: Add overloaded_functor helper. The overloaded_functor class template can be used to encompass multiple lambdas accepting different types into a single callable object that can be used with any of those types. One application is visitors for std::variant where different handling is required for different types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	593d8faf7d	position_in_partition: Add a constructor from range_tag_t{}, bound_kind and clustering_key_prefix. This facilitates position_in_partition creation when parsing range tombstones bounds from SSTables files. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	997ebaaa14	sstables: Support reading signed vints in continuous_data_consumer. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	540dfcc9bf	sstables: Factor out the code building a vector of fixed clustering values lengths. This code will be re-used in promoted_index_blocks_parser to parse clustering key prefixes from SSTables 3.x format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	741d5f3b5d	sstables: Remove unused includes from index_entry.hh Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	b29b948872	tests: Add test for reading SSTables 3.x index file with empty promoted index. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	054eb2df66	tests: Rename sstable_assertions.hh -> tests/index_reader_assertions.hh The previous name of the file is moreover confusing as we have several sstable_assertions classes throughout tests but this header only contains a class for index reader assertions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	f50ffa267f	sstables: Support parsing index entries from SSTables 3.x format. With this patch, index_reader is capable of reading index_entries from both 'ka'/'la' and 'mc' formats. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Piotr Jastrzebski	d0f8c71e28	sstables: move bound_kind_m to header and add helper methods. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-07-20 13:50:17 -07:00
Duarte Nunes	6bd087facb	Merge 'Make indexed queries with pk restrictions non-filtering' from Piotr " Queries that use secondary index and have a full partition key restriction or full primary key restriction should not require filtering - it's sufficient to add these restrictions to the index query. This also adds secondary index tests to cover this case. Tests: unit (release) " * 'si_and_pk_restrictions_2' of https://github.com/psarna/scylla: tests: add index + partition key test cql3: make index+primary key restrictions filtering-independent cql3: use primary key restrictions in filtering index queries cql3: add is_all_eq to primary key restrictions cql3: add explicit conversion between key restrictions cql3: add apply_to() method to single column restriction cql3: make primary key restrictions' values unambiguous	2018-07-19 16:54:43 +01:00
Tomasz Grabiec	d5534d6a77	Merge "Improve categorization of messaging verbs into connections" from Avi Now that verb categorizations also affect scheduling, getting them correct is more important. The first three patches in this series improve the infrastructure a little, and the forth fixes some categorization errors wrt. repair/streaming verbs. * https://github.com/avikivity/scylla msg-idx-sanity/v1: messaging: choose connection index via a look-up table messaging: convert do_get_rpc_client_idx into a switch messaging: remove default when computing rpc client index messaging: categorize more streaming/repair verbs as streaming	2018-07-19 15:03:15 +02:00
Tomasz Grabiec	ef4fb1f91d	sstables: mp_row_consumer_m: Add trace-level logging Very useful for debugging. The old mp_row_consumer_k_l had this. Message-Id: <1532000326-28649-1-git-send-email-tgrabiec@scylladb.com>	2018-07-19 14:58:00 +03:00
Asias He	1f06ee3960	range_streamer: Limit nr of nodes to stream in parallel For example, to bootstrap a 50th node in a cluster [shard 0] range_streamer - Bootstrap with [127.0.0.8, 127.0.0.2, 127.0.0.24, 127.0.0.21, 127.0.0.49, 127.0.0.44, 127.0.0.9, 127.0.0.7, 127.0.0.47, 127.0.0.15, 127.0.0.5, 127.0.0.30, 127.0.0.14, 127.0.0.12, 127.0.0.36, 127.0.0.11, 127.0.0.48, 127.0.0.28, 127.0.0.33, 127.0.0.10, 127.0.0.41, 127.0.0.4, 127.0.0.40, 127.0.0.3, 127.0.0.6, 127.0.0.43, 127.0.0.22, 127.0.0.26, 127.0.0.42, 127.0.0.25, 127.0.0.17, 127.0.0.37, 127.0.0.23, 127.0.0.13, 127.0.0.38, 127.0.0.1, 127.0.0.18, 127.0.0.20, 127.0.0.39, 127.0.0.27, 127.0.0.34, 127.0.0.32, 127.0.0.19, 127.0.0.16, 127.0.0.31, 127.0.0.45, 127.0.0.29, 127.0.0.35, 127.0.0.46] for keyspace=keyspace1 started, nodes_to_stream=49, nodes_in_parallel=49 the new node will get data from 49 existing nodes. Currently, it will stream from all the 49 existing nodes at the same time. It is not a good idea to stream from all the nodes in parallel which can overwhelm the bootstrap node, i.e., 49 nodes sending, 1 node receiving. To fix this, limit the nr of nodes to stream in parallel. We should have a better control over the memory usage and parallelism. But for now, limit the nr of nodes to a maximum of 16 as a starter. With this limit, each shard can work with as many as 16 remote nodes in parallel, I think this has enough parallelism for streaming in terms of performance. This change have effect on the bootstrap/decommission/removenode node operations, and do not have effect on repair. Refs #2782 Message-Id: <980610dc97490d4f16281a0c3203b9bee73e04e4.1531989557.git.asias@scylladb.com>	2018-07-19 11:44:05 +03:00
Avi Kivity	31d4d37161	Merge "Reduce continuous memory usage in gossip" from Asias" " Use chunked_vector instead of vector. It won't have compatibility issues because chunked_vector and vector have the same on wire format. Refs #278 " * 'asias/gossip_memory_v2' of github.com:scylladb/seastar-dev: gossip: Reduce continuous memory usage to_string: Add std::list and utils::chunked_vector support serializer: Add chunked_vector support	2018-07-19 09:12:09 +03:00
Tomasz Grabiec	9a0548397c	tests: row_cache: Add test for eviction from invalidated partitions Message-Id: <1531933216-28026-1-git-send-email-tgrabiec@scylladb.com>	2018-07-18 21:06:36 +03:00
Piotr Sarna	82c049692b	tests: add index + partition key test Tests covering querying both index and partition keys are added - it's checked that such queries do not require filtering.	2018-07-18 18:45:08 +02:00
Piotr Sarna	0c85bdcdc2	cql3: make index+primary key restrictions filtering-independent If full partition key (or full primary key) is used in an indexed query, it should not require filtering, because queries like that can be efficiently narrowed down with stricter index restrictions.	2018-07-18 18:45:08 +02:00
Piotr Sarna	2542630a18	cql3: use primary key restrictions in filtering index queries If both index and partition key is used in a query, it should not require filtering, because indexed query can be narrowed down with partition key information. This commit appends partition key restrictions to index query.	2018-07-18 18:45:08 +02:00
Piotr Sarna	27590816f0	cql3: add is_all_eq to primary key restrictions is_all_eq is later needed to decide if restrictions can be used in an indexed query.	2018-07-18 18:45:08 +02:00
Piotr Sarna	20a349777e	cql3: add explicit conversion between key restrictions Partition and clustering key restrictions sometimes need to be converted and this commit provides a way to do that.	2018-07-18 18:45:08 +02:00
Piotr Sarna	f1357defd6	cql3: add apply_to() method to single column restriction This method allows copying single column restriction, possibly with a new column definition.	2018-07-18 18:44:38 +02:00
Tomasz Grabiec	dc453d4f5d	tests: flat_mutation_reader: Use fluent assertions for better error messages Message-Id: <1531908313-29810-2-git-send-email-tgrabiec@scylladb.com>	2018-07-18 13:52:23 +01:00
Tomasz Grabiec	604c8baed8	tests: flat_mutation_reader_assertions: Introduce produces(mutation_fragment) Message-Id: <1531908313-29810-1-git-send-email-tgrabiec@scylladb.com>	2018-07-18 13:52:23 +01:00
Tomasz Grabiec	c46813717c	tests: sstables: Check that reading large index pages does not cause large allocations Reproducer of #3597. Message-Id: <1531914040-5427-1-git-send-email-tgrabiec@scylladb.com>	2018-07-18 14:56:41 +03:00
Piotr Sarna	30f9924ad5	cql3: make primary key restrictions' values unambiguous using directive must be used to disambiguate the overridden method.	2018-07-18 13:28:37 +02:00
Paweł Dziepak	a0c1c0c921	types: bytes_view: override fragmented validate() The default implementation linearises the buffer and calls validate(bytes_view). This is bad and not needed for bytes_type which doesn't do any validation anyway.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	0b9eed72f4	cql3: value_view: switch to fragmented_temporary_buffer::view	2018-07-18 12:28:06 +01:00
Paweł Dziepak	0551efee3b	types: add validate that accepts fragmented_temporary_buffer::view	2018-07-18 12:28:06 +01:00
Paweł Dziepak	8f4cb36ef2	cql3 query_options: add linearize() Some code in the CQL3 layer requires bytes_view and it is fairly reasonable to assume that it won't deal with large buffers (e.g. statement restrictions). query_options already has make_temporary() which takes ownership of a cql3::raw_value so that the rest of the code can use cql3::raw_value_view. This patch adds similar linearize() function which, if necessary, linearises a cql3::raw_value_view and returns a bytes_view with lifetime tied to the life or query_options.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	3810045f8f	cql3: query_options: use bytes_ostream for temporaries bytes_ostream is going to be more efficient than std::vector<std::vector<char>> since it can put multiple small values in a single buffer thus reducing the number of memory allocations.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	dff6cd3e2f	cql3: operation: make make_cell accept fragmented_temporary_buffer::view	2018-07-18 12:28:06 +01:00
Paweł Dziepak	cc87263bd8	atomic_cell: accept fragmented_temporary_buffer::view values	2018-07-18 12:28:06 +01:00
Paweł Dziepak	7d7910aa4d	cql3: avoid ambiguity in a call to update_parameters::make_cell() Using initializer lists in calls like foo({}) is ambiguous if foo() has multiple overloads with more than one accepting a type that is default-constructible. update_parameters::make_cell() is about to get an overload that accepts fragmented_temporary_buffer::view as a value, so let's make sure its call site won't be ambiguous.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	8c6e544fec	transport: switch to fragmented_temporary_buffer The logic responsible for reading requests was operating on temporary_buffer<char> and bytes_view. This required all request messages to be linearised to a contiguous buffer, possibly causing large allocations. Changing to fragmented_temporary_buffer mostly alleviates this problem unless the reader code explicitly asks for a contiguous bytes_view.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	f95bb21d99	transport: extract compression buffers from response class Both compression and decompression code is going to reuse the same pair of reusable buffers.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	a8c4f41a0b	tests/reusable_buffer: test fragmented_temporary_buffer support	2018-07-18 12:28:06 +01:00
Paweł Dziepak	32ba47fb87	utils: reusable_buffer: support fragmented_temporary_buffer reusable_buffer already supports bytes_ostream which is often used for handling data sent from Scylla. This patch adds support for fragmented_temporary_buffer which is going to be mainly used for data received by Scylla.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	166c9a3b8c	tests: add test for fragmented_temporary_buffer	2018-07-18 12:28:06 +01:00
Paweł Dziepak	b152aafd67	util fragment_range: add general linearisation functions All FragmentRange implementations can be linearised in the same way, so let's provide linearized() and with_linearized() functions for all of them.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	fc484f0819	utils: add fragmented_temporary_buffer Seastar output_streams produce temporary_buffer<char>s. fragmented_temporary_buffer represents a single fragmented buffer that consists of, possibly multiple, temporary_buffer<char>s.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	b5a72a880b	tests: add basic test for transport requests and responses	2018-07-18 12:28:06 +01:00
Paweł Dziepak	054d39b8f7	tests/random-utils: print seed Knowning the seed will make it easier to investigate failures in randomised tests.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	9445ce3f84	tests/random-utils: generate sstrings	2018-07-18 12:28:06 +01:00
Paweł Dziepak	46acd76cc8	cql3: add value_view printer and equality comparison BOOST_CHECK_*() expect compared objcts to be equality-comparable and printable.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	24929fd2ce	transport: move response outside of cql_server class	2018-07-18 12:28:06 +01:00
Paweł Dziepak	5986e7a383	transport: drop request_reader::read_value()	2018-07-18 12:28:06 +01:00
Paweł Dziepak	72450e2f7f	transport: extract request reading to request_reader	2018-07-18 12:28:06 +01:00
Paweł Dziepak	1eeef4383c	transport: fix use-after-free in read_name_and_value_list()	2018-07-18 12:28:06 +01:00
Avi Kivity	31151cadd4	Merge "row_cache: Fix violation of continuity on concurrent eviction and population" from Tomasz " The problem happens under the following circumstances: - we have a partially populated partition in cache, with a gap in the middle - a read with no clustering restrictions trying to populate that gap - eviction of the entry for the lower bound of the gap concurrent with population The population may incorrectly mark the range before the gap as continuous. This may result in temporary loss of writes in that clustering range. The problem heals by clearing cache. Caught by row_cache_test::test_concurrent_reads_and_eviction, which has been failing sporadically. The problem is in ensure_population_lower_bound(), which returns true if current clustering range covers all rows, which means that the populator has a right to set continuity flag to true on the row it inserts. This is correct only if the current population range actually starts since before all clustering rows. Otherwise, we're populating since _last_row and should consult it. Fixes #3608. " * 'tgrabiec/fix-violation-of-continuity-on-concurrent-read-and-eviction' of github.com:tgrabiec/scylla: row_cache: Fix violation of continuity on concurrent eviction and population position_in_partition: Introduce is_before_all_clustered_rows()	2018-07-18 10:11:34 +03:00
Asias He	506eed325a	dht: Fix typo in boot_strapper.cc Eror -> Error Message-Id: <ab1050c526f6e70c3a365595376acde7706d86e9.1531877929.git.asias@scylladb.com>	2018-07-18 10:00:27 +03:00
Tomasz Grabiec	894961006b	Merge "db/view/view_builder: Fixes to bookkeeping" from Duarte This series contains a couple of fixes to the bookkeeping of the view build process, which could cause data to be left behind in the system tables. * git@github.com:duarten/scylla.git materialized-views/view-build-fixes/v1: Duarte Nunes (3): db/system_keyspace: Add function to remove view build status of a shard db/view: Don't have shard 0 clear other shard's status on drop db/view: Restrict writes to the distributed system keyspace to shard 0	2018-07-17 18:01:28 +02:00
Tomasz Grabiec	25d09e51ac	Merge "db/view/build_progress_virtual_reader: Fixes to clustering key adjusts" from Duarte This series contains a couple of fixes to the adjusting of clustering keys in the build_progress_virtual_reader, some of which could potentially cause heap overflows when querying the legacy system table. * git@github.com:duarten/scylla.git materialized-views/build-progress-virtual-reader-fixes/v1: Duarte Nunes (3): db/view/build_progress_virtual_reader: Use correct schema to adjust ck db/view/build_progress_virtual_reader: Fix full ck detection db/view/build_progress_virtual_reader: Also adjust end RT bound	2018-07-17 18:00:30 +02:00
Avi Kivity	9ffa6b9ad6	Merge "Fix leaks and corruption of continuity in cache in case of bad_alloc from key linearization" from Tomasz " This series fixes two issues related to bad_allocs and keys which require linearization (larger than 12.8 KiB). With such keys, comparators may throw if memory allocation fails. This may cause lookups in partition and rows trees to fail with bad_alloc. The first issue (#3583) was that partition version merging (mutation_partition::apply_monotonically()) was not taking into account that lookups may fail. If we fail, the partition which is being applied may be incorrectly left with the clustering range since the begging of the range up to the current row marked as continuous, if the current row has the continuity flag set, because we've moved all of the preceding rows into the target, and the correct lower bound row is no longer there in the source. This may mark some discontinuous ranges as continuous. Merging is retried by allocating_section, and there will be no problem if it eventually succeeds, original continuity will be reflected in the sum. The problem will persist if it doesn't eventually succeed, when we're really out of memory. The user-perceivable effect of this would be temporary loss of writes in the clustering range which was marked as continuous but shouldn't. Introduced in 2.2-rc1. The second issue (#3585) is that the code which inserts partitions in memtable and cache will leak the entry if boost::intrusive_set::insert() throws. This will also cause SIGSEGV when cache tries to evict from such a leaked entry. " * tag 'tgrabiec/fix-bad-continuity-on-oom-in-apply-v2' of github.com:tgrabiec/scylla: managed_bytes: Mark read_linearize() as an allocation point tests: Relax expectation about continuity after failed merging tests: mutation_partition: Verify continuity is consistent on bad_alloc on merging tests: Switch to seastar's allocation failure injector mutation_partition: Introduce set_continuity() clustering_interval_set: Introduce contained_in() clustering_interval_set: Introduce add() overload accepting another interval set mutation_partition: Fix merging to not leave the source with broader continuity on bad_alloc mutation_partition: Preserve continuity in case row merging with no tracker throws memtable, cache: Fix exception safety of partition entry insertions	2018-07-17 18:19:37 +03:00
Tomasz Grabiec	477d7b439b	row_cache: Fix violation of continuity on concurrent eviction and population ensure_population_lower_bound() returned true if current clustering range covers all rows, which means that the populator has a right to set continuity flag to true on the row it inserts. This is correct only if the current population range actually starts since before all clustering rows. Otherwise we're populating since _last_row, and should consult it. The fix introduces a new flag, set when starting to populte, which indicates if we're populating from the beginning of the range or not. We cannot simply check if _last_row is set in ensure_population_lower_bound() because _last_row can be set and then become empty again. Fixes #3608	2018-07-17 16:43:21 +02:00
Tomasz Grabiec	8d47d21149	position_in_partition: Introduce is_before_all_clustered_rows()	2018-07-17 16:43:21 +02:00
Tomasz Grabiec	612b223819	managed_bytes: Mark read_linearize() as an allocation point	2018-07-17 16:39:43 +02:00
Tomasz Grabiec	be678a81ee	tests: Relax expectation about continuity after failed merging Currently we check that the sum of continuities is exactly the same as expected on failure. Relax this to require that continuity is not broader, since in some bad_alloc scenarios, or preemption, we will have to mark some ranges as discontinuous.	2018-07-17 16:39:43 +02:00
Tomasz Grabiec	f366ac76e8	tests: mutation_partition: Verify continuity is consistent on bad_alloc on merging	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	d9db79a85d	tests: Switch to seastar's allocation failure injector It catches more allocation sites.	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	6b1fe6cbe5	mutation_partition: Introduce set_continuity()	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	ac772cbd81	clustering_interval_set: Introduce contained_in()	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	d24ebe8565	clustering_interval_set: Introduce add() overload accepting another interval set	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	c6c54021a8	mutation_partition: Fix merging to not leave the source with broader continuity on bad_alloc When clustering keys are larger than 12.8 KiB they may get fragmented and key comparator will need to linearize them on comparison. This may cause lookups in the rows tree to fail with bad_alloc. Partition version merging (mutation_partition::apply_monotonically()) was not taking this into account. If we fail on lookup, the partition which is being applied may be incorrectly left with the clustering range since the begging up to the current row marked as continuous, if the current row has the continuity flag set, because we've moved all of the preceding rows into the target, and the correct lower bound row is no longer there in the source. This may mark some discontinuous ranges as continuous. Merging is retried by allocating_section, and there will be no problem if it eventually suceeds, original continity will be reflected in the sum. The problem will persist if it doesn't eventually succeed, when we're really out of memory. To protect against this, we could reset the continuity flag of the current row in the source when exiting on exception. Fixes #3583	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	de5c52f422	mutation_partition: Preserve continuity in case row merging with no tracker throws Example: p: row{key=A, cont=0} row{key=C, cont=1} this: row{key=C, cont=0} When we get to processing key=C, key=A was already moved to this, so p has stale continuity on key=C, which marks (-inf,C) as continuous, whereas it should mark only (A, C). That's not a problem if merging succeeds, but if exception happens at this point, we will violate the invariant which says that the sum of p and this should yield the same logical partition. It wouldn't because continuity of the sum is calculated as a set union, and (-inf, A) would be incorrectly turned into a continuous range. This is not a problem currently because continuity is always full when there is no tracker (memtables), so won't change anyway, and when there is a tracker (cache) we never merge but overwrite instead, so there is no memory allocation and thus no possibility for failure. But better be safe.	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	567da3e063	memtable, cache: Fix exception safety of partition entry insertions boost::intrusive::set::insert() may throw if keys require linearization and that fails, in which case we will leak the entry. When this happens in cache, we will also violate the invariant for entry eviction, which assumes all tracked entries are linked, and cause a SEGFAULT. Use the non-throwing and faster insert_before() instead. Where we can't use insert_before(), use alloc_strategy_unique_ptr<> to ensure that entry is deallocated on insert failure. Fixes #3585.	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	c82c0be0be	tests: mutation_diff: Ignore differences in memory addresses Differences in memory addresses are not necessarily differences in values. Refs #3571 Message-Id: <1531824919-12737-1-git-send-email-tgrabiec@scylladb.com>	2018-07-17 16:32:04 +03:00
Amos Kong	0fcdab8538	scylla_setup: nic setup dialog is only for interactive mode Current code raises dialog even for non-interactive mode when we pass options in executing scylla_setup. This blocked automatical artifact-test. Fixes #3549 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <58f90e1e2837f31d9333d7e9fb68ce05208323da.1531824972.git.amos@scylladb.com>	2018-07-17 16:31:18 +03:00
Paweł Dziepak	422d1eaeb9	Merge "Improve usability of pkeys in system.large_partitions table" from Avi " Partition keys are currently stored in serialized form in the system.large_partitions table. This is an obstacle to operators who usually can't deserialize partition keys in their heads. Improve the situation by deserializing the partition key for them. " * tag 'pkey-print/v1' of https://github.com/avikivity/scylla: large_partition_handler: output friendly partition key keys: schema-aware printing of a partition_key	2018-07-17 13:51:22 +01:00
Avi Kivity	002ac87aac	Update seastar submodule * seastar aac6cf1...6b97e00 (5): > Merge "changes to fix travis CI builds" from Kefu > tls.cc: Make "close" timeout delay exception proof > core/sharded: mark foreign_ptr::get_owner_shard() const > core/memory: Expose counter of large allocations > tests: add test for multi-fragmented net::packet Fixes #3461. Ref scylladb/seastar#474.	2018-07-17 15:43:01 +03:00
Tomasz Grabiec	3f509ee3a2	mutation_partition: Fix exception-safety of row copy constructor In case population of the vector throws, the vector object would not be destroyed. It's a managed object, so in addition to causing a leak, it would corrupt memory if later moved by the LSA, because it would try to fixup forward references to itself. Caused sporadic failures and crashes of row_cache_test, especially with allocation failure injector enabled. Introduced in `27014a23d7`. Message-Id: <1531757764-7638-1-git-send-email-tgrabiec@scylladb.com>	2018-07-17 13:21:21 +01:00
Asias He	fd71c5718f	gossip: Reduce continuous memory usage Gossip SYN and ACK uses std::vector to store a list of gossip_digest, the larger the cluster, the more continuous memory is needed. To reduce the memory pressure which might cause std::bad_alloc, switch the std::vector to chunked_vector. In addition, change add_local_application_state to use std::list instead of std::vector. Refs #2782	2018-07-17 20:15:32 +08:00
Avi Kivity	acb3163639	large_partition_handler: output friendly partition key Use abstract_type::to_string() to prettify partition key components. Manually tested by setting --compaction-large-partition-warning-threshold-mb to zero and inspecting the output for compound and non-compound partition keys.	2018-07-17 14:44:52 +03:00
Avi Kivity	bfd14b4123	keys: schema-aware printing of a partition_key Add a with_schema() helper to decorate a partition key with its schema for pretty-printing purposes, and matching operator<<. This is useful to print partition keys where the operator, who may not be familiar with the encoding, may see them.	2018-07-17 14:43:12 +03:00
Tomasz Grabiec	d94c7c07a3	lsa: Disable alloc failure injector inside the LSA sanitizer Message-Id: <1531814822-30259-1-git-send-email-tgrabiec@scylladb.com>	2018-07-17 11:27:56 +01:00
Asias He	77018b7304	to_string: Add std::list and utils::chunked_vector support It will be used by the gossip code.	2018-07-17 16:14:31 +08:00
Asias He	e4802d2fe3	serializer: Add chunked_vector support It will be used by the gossip SYN and ACK message soon.	2018-07-17 16:12:50 +08:00
Botond Dénes	cc4acb6e26	storage_proxy: use the original row limits for the final results merging `query_partition_key_range()` does the final result merging and trimming (if necessary) to make sure we don't send more rows to the client than requested. This merging and trimming is done by a continuation attached to the `query_partition_key_range_concurrent()` which does the actual querying. The continuations captures via value the `row_limit` and `partition_limit` fields of the `query::read_command` object of the query. This has an unexpected consequence. The lambda object is constructed after the call to `query_partition_key_range_concurrent()` returns. If this call doesn't defer, any modifications done to the read command object done by `query_partition_key_range_concurrent()` will be visible to the lambda. This is undesirable because `query_partition_key_range_concurrent()` updates the read command object directly as the vnodes are traversed which in turn will result in the lambda doing the final trimming according to a decremented `row_limits`, which will cause the paging logic to declare the query as exhausted prematurely because the page will not be full. To avoid all this make a copy of the relevant limit fields before `query_partition_key_range_concurrent()` is called and pass these copies to the continuation, thus ensuring that the final trimming will be done according to the original page limits. Spotted while investigating a dtest failure on my 1865/range-scans/v2 branch. On that branch the way range scans are executed on replicas is completely refactored. These changes appearantly reduce the number of continuations in the read path to the point where an entire page can be filled without deferring and thus causing the problem to surface. Fixes #3605. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <f11e80a6bf8089d49ba3c112b25a69edf1a92231.1531743940.git.bdenes@scylladb.com>	2018-07-16 16:54:50 +03:00
Takuya ASADA	9479ff6b1e	dist/common/scripts/scylla_prepare: fix error when /etc/scylla/ami_disabled exists On this part shell command wasn't converted to python3, need to fix. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180715075015.13071-1-syuu@scylladb.com>	2018-07-16 09:29:38 +03:00
Avi Kivity	c4013f6fe1	messaging: categorize more streaming/repair verbs as streaming Since the messaging service will assign a scheduling group based on the client index, it's more important now to get the verbs categorized correctly. Re-categorize REPLICATION_FINISHED, REPAIR_CHECKSUM_RANGE, and most importantly STREAM_MUTATION_FRAGMENTS to the repair/streaming oriented connections so we get the correct scheduling.	2018-07-15 15:44:10 +03:00
Avi Kivity	ff3d7839ab	messaging: remove default when computing rpc client index A default means that when adding new verbs, we may forget to categorize a verb correctly. Without the default, the compiler will complain due to -Wswitch.	2018-07-15 15:40:29 +03:00
Avi Kivity	fe2db68be8	messaging: convert do_get_rpc_client_idx into a switch A switch is more readable for multiple choice with no clearly preferred choice.	2018-07-15 15:26:50 +03:00
Avi Kivity	3b1e04091c	messaging: choose connection index via a look-up table Looking up is faster than a bunch of if()s.	2018-07-15 15:21:06 +03:00
Takuya ASADA	1511d92473	dist/redhat: drop scylla_lib.sh from .rpm Since we dropped scylla_lib.sh at `58e6ad22b2`, we need remove it from RPM spec file too. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180712155129.17056-1-syuu@scylladb.com>	2018-07-15 14:46:22 +03:00
Avi Kivity	ef9b36376c	Merge "database: support multiple data directories" from Glauber " While Cassandra supports multiple data directories, we have been historically supporting just one. The one-directory model suits us better because of the I/O Scheduler and so far we have seen very few requests -- if any, to support this. Still, the infrastructure needed to support multiple directories can be beneficial so I am trying to bring this in. For simplicity, we will treat the first directory in the list as the main directory. By being able to still associate one singular directory with a table, most of the code doesn't have to change and we don't have to worry about how to distribute data between the directories. In this design: - We scan all data directories for existing data. - resharding only happens within a particular data directory. - snapshot details are accumulated with data for all directories that host snapshots for the tables we are examining - snapshots are created with files in its own directories, but the manifest file goes to the main directory. For this one, note that in Cassandra the same thing happens, except that there is no "main" directory. Still the manifest file is still just in one of them. - SSTables are flushed into the main directory. - Compactions write data into the main directory Despite the restrictions, one example of usage of this is recovery. If we have network attached devices for instance, we can quickly attach a network device to an existing node and make the data immediately available as it is compacted back to main storage. Tests: unit (release) " * 'multi-data-file-v2' of github.com:glommer/scylla: database: change ident database: support multiple data directories database: allow resharing to specify a directory database: support multiple directories in get_snapshot_details database: move get_snapshot_info into a seastar::thread snapshots: always create the snapshot directory sstables: pass sstable dir with entry descriptor database: make nodetool listsnapshots print correct information sstables: correctly create descriptors for snapshots	2018-07-15 13:31:04 +03:00
Avi Kivity	8ee807321f	Merge "scylla streaming with rpc streaming" from Asias " This work is on top of Gleb's rpc streaming which is merged recently. What this series does is to replace scylla streaming service's data plane to use the new rpc streaming instead of the old rpc verb to send the mutations for scylla streaming. Other parts of scylla streaming, the control plane, are not changed. In my test, to bootstrap a new node to the existing one node cluster, smp 2, scylla stores data on ramdisk to minimize disk io impact. I saw x2 improvment in streaming bandwidth. Before: [shard 0] stream_session - [Stream #2ae92320-5fc8-11e8-911a-000000000000] Streaming plan for Bootstrap-ks3-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1570312 KiB, 109521.02 KiB/s [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 14.338 seconds After: [shard 0] stream_session - [Stream #e5589ac0-5fc7-11e8-b463-000000000000] Streaming plan for Bootstrap-ks3-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1546875 KiB, 220415.36 KiB/s [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 7.018 seconds Tests: dtest update_cluster_layout_tests.py Fixes: #3591 " * tag 'asias/scylla_streaming_with_rpc_streaming_v8' of github.com:scylladb/seastar-dev: streaming: Add rpc streaming support storage_service: Introduce STREAM_WITH_RPC_STREAM feature streaming: Add estimate_partitions to send_info messaging_service: Add streaming with rpc streaming support messaging_service: Add streaming_domain database: Add add_sstable_and_update_cache database: Add make_streaming_sstable_for_write	2018-07-15 12:36:52 +03:00
Vlad Zolotarov	235520292e	utils::loading_cache: hold a shared_value_ptr to the value when we reload This allows to remove the requirement to hold the key value inside the _load callback if its value is needed in the asynchronous continuation inside the callback in the context of a reload. This also resolves the use-after-free issue when a _load() callback removes the item for a given key. See `a9b72db34d`.1528794135.git.bdenes%40scylladb.com for a discussion about this. In addition this patch makes the loading_cache more robust for any existing and potential situations when cached entries are being removed from inside the callback. This is achieved by extending the idea implemented by Duarte in the "utils/loading_cache: Avoid using invalidated iterators" by capturing timestamped_val_ptr (which is essentially a lw_shared_ptr to an intrusive set entry which holds both the key and the cached value) instead of a naked pointer. Tests {debug, release}: - Unit tests: - loading_cache_test - view_build_test - auth_test - auth_resource_test - dtest: - auth_test.py Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-13 11:27:58 -04:00
Vlad Zolotarov	b44ad5677a	utils::loading_cache::on_timer(): remove not needed capture of "this" Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-13 11:27:43 -04:00
Vlad Zolotarov	4aa0e5914b	utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload The list of elements that needs to be reloaded may be rather large. Use chunked_vector in order to make the allocator's life easier. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-13 09:53:59 -04:00
Avi Kivity	8c993e0728	messaging: tag RPC services with scheduling groups Assign a scheduling_group for each RPC service. Assignement is done by connection (get_rpc_client_idx()) - all verbs on the same connection are assigned the same group. While this may seem arbitrary, it avoids priority inversion; if two verbs on the same connection have different scheduling groups, the verb with the low shares may cause a backlog and stall the connection, including following requests from verbs that ought to have higher shares. The scheduling_group parameters are encapsulated in different classes as they are passed around to avoid adding dependencies. Message-Id: <20180708140433.6426-1-avi@scylladb.com>	2018-07-13 13:57:08 +02:00
Vladimir Krivopalov	cf7b42619d	clustering_ranges_walker: Improve class consistency and readability. This patch addresses several issues. 1. The class no longer uses placement-new trick for move-assignment. It was incorrect to use because the class contains const refererences and re-initializing the same region of memory would result in undefined behaviour on accessing these members. 2. Use boost::iterator_range for tracking the current range of cr_ranges. It is easier to deal with and avoids possible bugs like assigning only one of two iterators Message-Id: <4096182c4ee2fb1157e135c487c41012b266ba69.1531440684.git.vladimir@scylladb.com>	2018-07-13 11:23:33 +02:00
Asias He	deff5e7d60	streaming: Add rpc streaming support This patch changes scylla streaming to use the recently added rpc streaming feature provided by seastar to send mutation fragments for scylla streaming instead of the rpc verbs. It also changes the receiver to write to the sstable file directly, skipping writing to memtable.	2018-07-13 08:36:47 +08:00
Asias He	71e22fe981	storage_service: Introduce STREAM_WITH_RPC_STREAM feature With this feature, the node supports scylla streaming using the rpc streaming.	2018-07-13 08:36:47 +08:00
Asias He	faa6769cdb	streaming: Add estimate_partitions to send_info The sender needs to estimate the number of partitions to send, because the receiver needs this to prepare the sstables.	2018-07-13 08:36:46 +08:00
Asias He	ddfb4590ce	messaging_service: Add streaming with rpc streaming support Preparation for adding rpc streaming in scylla streaming. - register_stream_mutation_fragments is used to register the rpc streaming verb - make_sink_and_source_for_stream_mutation_fragments is used to get the sink and source object for the sender - make_sink_for_stream_mutation_fragments is used to get a sink object for the receiver	2018-07-13 08:36:46 +08:00
Asias He	671e1b08fe	messaging_service: Add streaming_domain The rpc streaming needs a streaming_domain id for the same logical server. Chose one for our messaging service.	2018-07-13 08:36:46 +08:00
Asias He	6540051f77	database: Add add_sstable_and_update_cache Since we can write mutations to sstable directly in streaming, we need to add those sstables to the system so it can be seen by the query. Also we need to update the cache so the query refects the latest data.	2018-07-13 08:36:45 +08:00
Asias He	dfc2739625	database: Add make_streaming_sstable_for_write This will be used to create sstable for streaming receiver to write the mutations received from network to sstable file instead of writing to memtable.	2018-07-13 08:36:45 +08:00
Takuya ASADA	ee61660b76	dist/common/scripts/scylla_ec2_check: support custom NIC ifname on EC2 Since some AMIs using consistent network device naming, primary NIC ifname is not 'eth0'. But we hardcoded NIC name as 'eth0' on scylla_ec2_check, we need to add --nic option to specify custom NIC ifname. Fixes #3584 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180712142446.15909-1-syuu@scylladb.com>	2018-07-12 18:22:28 +03:00
Tomasz Grabiec	b17f7257a9	sstables: index_reader: Reduce size of index_entry by indirecting promoted_index Reduces size of index_entry from 384 bytes to 64 bytes by using indirection for the optional promoted index instead of embedding it. Improves query time from 9ms to 4ms in a micro benchmark with a very large index page. Message-Id: <1531406354-10089-1-git-send-email-tgrabiec@scylladb.com>	2018-07-12 17:46:58 +03:00
Tomasz Grabiec	101dcdbb48	gdb: Fix scylla heapprof command Type of _frames was chagned to static_vector<> Message-Id: <1531233685-20786-2-git-send-email-tgrabiec@scylladb.com>	2018-07-12 16:51:30 +03:00
Tomasz Grabiec	059133ffa8	gdb: Introduce iteration wrapper for static_vector Message-Id: <1531233685-20786-1-git-send-email-tgrabiec@scylladb.com>	2018-07-12 16:51:30 +03:00
Duarte Nunes	63b63b0461	utils/loading_cache: Avoid using invalidated iterators When periodically reloading the values in the loading_cache, we would iterate over the list of entries and call the load() function for those which need to be reloaded. For some concrete caches, load() can remove the entry from the LRU set, and can be executed inline from the parallel_for_each(). This means we could potentially keep iterating using an invalidated iterator. Fix this by using a temporary container to hold those entries to be reloaded. Spotted when reading the code. Also use if constexpr and fix the comment in the function containing the changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180712124143.13638-1-duarte@scylladb.com>	2018-07-12 13:59:09 +01:00
Botond Dénes	2e7bf9c6f9	loading_cache::reload(): obtain key before calling _load() The continuation attached to _load() needs the key of the loaded entry to check whether it was disposed during the load. However if _load() invalidates the entry the continuation's capture line will access invalid memory while trying to obtain the key. To avoid this save a copy of the key before calling _load() and pass it to both _load() and the continuation. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <b571b73076ca863690f907fbd3fb4ff54e597b28.1531393608.git.bdenes@scylladb.com>	2018-07-12 13:42:42 +01:00
Avi Kivity	a4a2f743a8	Merge "Avoid large allocations when reading sstable index pages" from Tomasz " If there is a lot of partitions in the index page, index_list may grow large and require large contiguous blocks of memory, because it's based on std::vector. That puts pressure on the memory allocator, and if memory is fragmented, may not be possible to satisfy without a lot of eviction. Switch to chunked_vector to avoid this. Refs #3597 " * 'tgrabiec/avoid-large-alloc-in-index-reader' of github.com:tgrabiec/scylla: sstables: Switch index_list to chunked_vector to avoid large allocations utils: chunked_vector: Do not require T to be default-constructible for clear() utils: chunked_vector: Implement front()	2018-07-12 15:30:18 +03:00
Duarte Nunes	1fb3b924f4	utils/loading_cache: Remove superfluous continuation Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180712122031.13424-1-duarte@scylladb.com>	2018-07-12 15:22:35 +03:00
Takuya ASADA	8f80d23b07	dist/common/scripts/scylla_util.py: fix typo Fix typo, and rename get_mode_cpu_set() to get_mode_cpuset(), since a term 'cpuset' is not included '_' on other places. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180711141923.12675-1-syuu@scylladb.com>	2018-07-12 10:14:55 +03:00
Tomasz Grabiec	8c85b01ad3	gdb: Fix scylla lsa-segment on python 3 Referring to a function parameter via "global" no longer works on python 3. We should be using "nonlocal", which is absent on python 2 though. To make the script work on both, inline next(). Message-Id: <1531317984-29224-1-git-send-email-tgrabiec@scylladb.com>	2018-07-12 10:14:22 +03:00
Duarte Nunes	a7fdf4fc49	Merge 'ALLOW FILTERING for indexed queries' from Piotr " Previous series on ALLOW FILTERING introduced it for regular queries, but it's also possible to have an indexed query which requires filtering. This series contains minor fixes that allow treating indexed+filtered queries properly. The most important part is having more selective approach of extracting values from restrictions in read_posting_list() helper function. Before ALLOW FILTERING, restrictions contained only a single entry that matched the indexed column, but it's not the case with filtering (and it won't be the case with multiple indexing support). This series also comes with test cases for indexed+filtered queries. Tests: unit (release) " * 'allow_filtering_and_si_3' of https://github.com/psarna/scylla: tests: add filtering indexed queries tests cql3: use single restriction value in index creation cql3: add secondary index condition to need_filtering cql3: add value_for method cql3: add missing inline declarations to restrictions cql3: make index detection more specific index: add target_column getter to index	2018-07-12 00:17:36 +01:00
Duarte Nunes	55caaec411	db/view/build_progress_virtual_reader: Also adjust end RT bound Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	eda6b88b0e	db/view/build_progress_virtual_reader: Fix full ck detection As an optimization, the virtual reader doesn't change the underlying key if it is not full, and hence doesn't include the extra clustering key. However, this detection is broken because it checked for 3 clustering columns, instead of 2. This patch fixes that by obtaining the clustering key size from the underlying schema instead of hardcoding the size. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	ff3a0d437a	db/view/build_progress_virtual_reader: Use correct schema to adjust ck The virtual reader adjusts clustering keys obtained from the underlying, scylla-specific schema, and potentially sheds the extra clustering key that's absent from the Cassandra-compatible schema. This patches ensures we use the correct schema to iterator over the key. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	df66d7db59	db/view: Restrict writes to the distributed system keyspace to shard 0 Writing to the distributed system keyspace should be confined to a single shard of each host, namely shard 0. We were violating this constraint by having all shards set the host status to "started". This could be problematic when the build finishes quickly or there's a concurrent view drop, such that a write done by shard 0 can have a smaller timestamp than one done by some other shard. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:45:26 +01:00
Duarte Nunes	e683c1367f	db/view: Don't have shard 0 clear other shard's status on drop Shard 0 can clear the in-progress build status of all shards when a view finishes building, because we are ensured all writes to the system table have completed with earlier timestamps. This is not the case when dropping a view. A drop can happen concurrently with the build, in which case shard 0 may process the notification before another shard receives it, and before that shard writes to the system table. Fix this by ensuring each shard clears its own status on drop. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:45:26 +01:00
Duarte Nunes	2fa7f10429	db/system_keyspace: Add function to remove view build status of a shard This patch adds a function that clears the view build in-progress status for the current shard, similar to the existing one that clears it across all shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:27:39 +01:00
Piotr Sarna	fcfbc804e4	tests: add filtering indexed queries tests Tests covering ALLOW FILTERING usage while using secondary indexes as well are added to cql_query_test. Tests are based on Cassandra's test suite for filtering secondary indexes + some more simple cases.	2018-07-11 18:06:21 +02:00
Piotr Sarna	7d9715db27	cql3: use single restriction value in index creation ALLOW FILTERING support caused index-related restrictions to possibly have more values. In order to remain correct, only those restrictions which match the indexed columns should be used.	2018-07-11 18:06:21 +02:00
Piotr Sarna	1d75035672	cql3: add secondary index condition to need_filtering A query that restricts a partition key and an indexed column needs filtering (after reading an index) and it wasn't properly detected before.	2018-07-11 18:06:21 +02:00
Piotr Sarna	80ce9b72a1	cql3: add value_for method In order to extract value from a restriction for just one column, value_for(column_name, options) method is implemented. It's needed because once ALLOW FILTERING support was introduced, index-related restrictions may contain more than 1 value.	2018-07-11 18:06:21 +02:00
Piotr Sarna	c1ad28f28e	cql3: add missing inline declarations to restrictions In order to prevent future compilation errors, externally defined class methods from single column primary key restrictions are explicitly marked inline.	2018-07-11 18:06:21 +02:00
Piotr Sarna	02811d8996	cql3: make index detection more specific Conditions that detect if restrictions need an indexed query weren't specific enough to work properly with mixed index-filtering queries, because they would overly eager assume that partition/clustering key restrictions have a backing index.	2018-07-11 18:06:21 +02:00
Piotr Sarna	372644c909	index: add target_column getter to index Target column for an index is later needed to find matching restrictions.	2018-07-11 18:06:21 +02:00
Tomasz Grabiec	3b2890e1db	sstables: Switch index_list to chunked_vector to avoid large allocations If there is a lot of partitions in the index page, index_list may grow large and require large contiguous blocks of memory. That puts pressure on the memory allocator, and if memory is fragmented, may not be possible to satisfy without a lot of eviction.	2018-07-11 16:55:20 +02:00
Tomasz Grabiec	b0f5df10d2	utils: chunked_vector: Do not require T to be default-constructible for clear() resize(), used by clear(), requires T to be default-constructible in case the vector is expanded. It's not actually needed for clearing, and there will be users which use clear() with non-default-constructible T, so implement clear() without using resize().	2018-07-11 16:55:20 +02:00
Tomasz Grabiec	03832dab97	utils: chunked_vector: Implement front() std::vector<> has it, so should this, for easy migration.	2018-07-11 16:55:20 +02:00
Piotr Sarna	dcdd8be59c	cql3: make index-related tests less timing dependent Indexes and materialized views take time to build, so checks that rely on that are now wrapped with 'eventually' blocks. Message-Id: <6d3def2bc49b76dda11d7a1c9974a8b3d221003f.1531312518.git.sarna@scylladb.com>	2018-07-11 15:45:52 +03:00
Takuya ASADA	58e6ad22b2	dist/common/scripts: drop scylla_lib.sh Drop scylla_lib.sh since all bash scripts depends on the library is already converted to python3, and all scylla_lib.sh features are implemented on scylla_util.py. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180711114756.21823-1-syuu@scylladb.com>	2018-07-11 14:54:56 +03:00
Avi Kivity	83d72f3755	Update scylla-ami submodule * dist/ami/files/scylla-ami 5200f3f...d53834f (1): > Merge "AMI scripts python3 conversion" from Takuya	2018-07-11 13:16:08 +03:00
Avi Kivity	693cf77022	Merge "more conversion from bash to python3" from Takuya "Converted more scripts to python3." * 'script_python_conversion2_v2' of https://github.com/syuu1228/scylla: dist/common/scripts/scylla_util.py: make run()/out() functions shorter dist/ami: install python34 to run scylla_install_ami dist/common/scripts/scylla_ec2_check: move ec2 related code to class aws_instance dist/common/scripts: drop class concolor, use colorprint() dist/ami/files/.bash_profile: convert almost all lines to python3 dist/common/scripts: convert node_exporter_install to python3 dist/common/scripts: convert scylla_stop to python3 dist/common/scripts: convert scylla_prepare to python3	2018-07-11 13:14:23 +03:00
Tomasz Grabiec	1de5177175	tests: row_cache: Fix use-after-scope on partition_range passed to readers The partition_range must outlive the reader. Message-Id: <1531301583-15476-1-git-send-email-tgrabiec@scylladb.com>	2018-07-11 12:39:30 +03:00
Avi Kivity	28621066e6	observable: allow an observable to disconnect() twice without penalty Message-Id: <20180711070754.13286-1-avi@scylladb.com>	2018-07-11 10:15:01 +01:00
Avi Kivity	1895483781	observable: add comments explaining the purpose and use of the mechanism Message-Id: <20180710133706.8791-1-avi@scylladb.com>	2018-07-11 10:15:01 +01:00
Avi Kivity	99d3f0a1b1	tests: add obserable_test to test suite Message-Id: <20180711071131.13702-1-avi@scylladb.com>	2018-07-11 10:15:01 +01:00
Tomasz Grabiec	fde4a312db	gdb: Replace long() with int() Python 3 doesn't have 'long' anymore, so commands using it fail with newer GDB. long on python2 is the same as int on python3, both are arbitrary-precision. On python2 int is fixed-precision, but seems to be still enough (64 bit), so use that instead. Message-Id: <1531215600-31899-1-git-send-email-tgrabiec@scylladb.com>	2018-07-10 15:05:02 +03:00
Nadav Har'El	5e47061438	repair: fix small error-handling logic mistake As noticed by Tomasz Grabiec, we test a future's available() after having already waited for it with when_all(), which is pointless. The code after the wrong if() exchanges the contents of a token-range between this node and several other live neighbors; We can't do this exchange if either this node is broken or there is no other live neighbor. So this is what we needed to test. so !available() should have been failed(). Also the test for live_neighbors_checksum.empty() added in commit `7c873f0d1f` is unnecessary - we build live_neighbors and live_neighbors_checksum together, so if one of them is empty, so is the other. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180710114940.26027-1-nyh@scylladb.com>	2018-07-10 15:04:03 +03:00
Piotr Sarna	559439b6ea	tests: add more ALLOW FILTERING tests More test cases are added to cql_query_test in order to check ALLOW FILTERING clauses more accurately. Message-Id: <4c59c1f3eb01558be992d0596e5423c276087387.1531220558.git.sarna@scylladb.com>	2018-07-10 14:44:33 +03:00
Piotr Sarna	aadbfc6b84	cql3: throw instead of log for collection filtering Original series that introduced filtering logged a warning when collection restrictions appeared. Instead, an exception should be thrown until collection restrictions are supported for ALLOW FILTERING clauses. Message-Id: <ddaf342d4d6766fadb756f66e5afa0b99ce054f8.1531220558.git.sarna@scylladb.com>	2018-07-10 14:44:29 +03:00
Avi Kivity	7db394ce50	observable: switch to noncopyable_function std::function's move constructor is not noexcept, so observer's move constructor and assignment operator also cannot be. Switch to Seastar's noncopyable_function which provides better guarantees. Tests: observer_tests (release) Message-Id: <20180710073628.30702-1-avi@scylladb.com>	2018-07-10 09:42:49 +01:00
Avi Kivity	0a2c9387e8	Merge "Support reading deleted cells" from Piotr " Implement and test support for reading deleted cells in SSTables 3. " * 'haaawk/sstables3/read-deleted-cells-v2' of ssh://github.com/scylladb/seastar-dev: sstables: Test reading deleted cells from SST3 sstables: Support deleted cells in reading SST3 test_uncompressed_compound_ck_read: fix comment utils: add observer/observable templates	2018-07-10 11:21:00 +03:00
Piotr Jastrzebski	0abdd919c8	sstables: Test reading deleted cells from SST3 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-07-10 10:03:29 +02:00
Piotr Jastrzebski	54fc6dde35	sstables: Support deleted cells in reading SST3 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-07-10 10:03:29 +02:00
Piotr Jastrzebski	f64901fdac	test_uncompressed_compound_ck_read: fix comment Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-07-10 10:03:14 +02:00
Avi Kivity	96737d140f	utils: add observer/observable templates An observable is used to decouple an information producer from a consumer (in the same way as a callback), while allowing multiple consumers (called observers) to coexist and to manage their lifetime separately. Two classes are introduced: observable: a producer class; when an observable is invoked all observers receive the information observer: a consumer class; receives information from a observable Modelled after boost::signals2, with the following changes - all signals return void; information is passed from the producer to the consumer but not back - thread-unsafe - modern C++ without preprocessor hacks - connection lifetime is always managed rather than leaked by default - renamed to avoid the funky "slot" name Message-Id: <20180709172726.5079-1-avi@scylladb.com>	2018-07-09 18:48:44 +01:00
Paweł Dziepak	00a63663d6	bytes_ostream: increase max chunk size to 128 kB 128 kB is the size of the LSA segment and therefore the default size of any kind of chunks, fragments and buffers. Message-Id: <20180709155615.22500-1-pdziepak@scylladb.com>	2018-07-09 19:59:51 +03:00
Tomasz Grabiec	1336744a05	mutation_fragment: Fix clustering_row::equal() using incorrect column kind Incorrect column_kind was passed, which may cause wrong type to be used for comparison if schema contains static columns. Affects only tests. Spotted during code review. Message-Id: <1531144991-2658-1-git-send-email-tgrabiec@scylladb.com>	2018-07-09 15:25:17 +01:00
Avi Kivity	ed7855a8a6	Update seastar submodule * seastar 216d499...aac6cf1 (5): > reactor: pollable_fd: limit fragment count to IOV_MAX > tests: silence more "-Werror=sign-compare" warnings > reactor: include <boost/next_prior.hpp> > Use `#pragma once` everywhere > .gitignore: adds __pycache__ directory	2018-07-09 17:01:44 +03:00
Gleb Natapov	617666efb0	storage_proxy: use logger's exception printer to report read failure Use existing exception pretty printer since it handles nested exceptions. Message-Id: <20180709122826.GT28899@scylladb.com>	2018-07-09 15:31:14 +03:00
Duarte Nunes	156817e00e	db/size_estimates_virtual_reader: Use left-exclusive token ranges We were considering the token ranges in the size_estimates system table to be inclusive, which is incorrect and incompatible with Cassandra. While we ignore the inclusiveness of the partition_range bounds when selecting sstables, we do take it into account in estimated_keys_for_range(). We would thus select the correct sstables, but could over-estimate the range size nonetheless. Tests: virtual_reader_test(release) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180709115919.5106-1-duarte@scylladb.com>	2018-07-09 15:26:32 +03:00
Takuya ASADA	1a5a40e5f6	dist/common/scripts/scylla_util.py: use os.open(O_EXCL) to verify disk is unused To simplify is_unused_disk(), just try to open the disk instead of checking multiple block subsystems. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180709102729.30066-1-syuu@scylladb.com>	2018-07-09 13:29:15 +03:00
Avi Kivity	7d0df2a06d	Update scylla-ami submodule * dist/ami/files/scylla-ami 67293ba...5200f3f (1): > Add custom script options to AMI user-data	2018-07-09 13:21:30 +03:00
Gleb Natapov	ac27d1c93b	storage_proxy: fix rpc connection failure handling by read operation Currently rpc::closed_error is not counted towards replica failure during read and thus read operation waits for timeout even if one of the nodes dies. Fix this by counting rpc::closed_error towards failed attempts. Fixes #3590. Message-Id: <20180708123522.GC28899@scylladb.com>	2018-07-09 10:05:31 +03:00
Avi Kivity	2f8537b178	database: demote "Setting compaction strategy" log message to debug level It's not very helpful in normal operation, and generates much noise, especially when there are many tables. Message-Id: <20180708070051.8508-1-avi@scylladb.com>	2018-07-08 10:27:03 +01:00
Avi Kivity	512baf536f	storage_proxy: implement write timeouts Require a timeout parameter for storage_proxy::mutate_begin() and all its callers (all the way to thrift and cql modification_statement and batch_statement). This should fix spurious debug-mode test failures, where overcommit and general debug slowness result in the default timeouts being exceeded. Since the tests use infinite timeouts, they should not time out any more. Tests: unit (release), with an extra patch that aborts when a non-infinite timeout is detected. Message-Id: <20180707204424.17116-1-avi@scylladb.com>	2018-07-08 10:27:03 +01:00
Takuya ASADA	929ba016ed	dist/common/scripts/scylla_util.py: strip double quote from sysconfig parameter Current sysconfig_parser.get() returns parameter including double quote, it will cause problem by append text using sysconfig_parser.set(). Fixes #3587 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180706172219.16859-1-syuu@scylladb.com>	2018-07-08 10:47:41 +03:00
Duarte Nunes	1beed0ca16	Merge 'hinted handoff: add rebalancing and unmark as experimental' from Vlad " This series adds the last missing part of the HH feature list (as in the design doc) - rebalancing; and finally removes the "experimental" tag from the HH. " * 'hinted_handoff_rebalance-v3' of https://github.com/vladzcloudius/scylla: main: remove the "experimental" tag from the hinted handoff feature db::hints::manager: implement rebalance() method	2018-07-07 20:38:07 +01:00
Takuya ASADA	a98b4b705c	dist/common/scripts/scylla_util.py: make run()/out() functions shorter Refactored these functions to make them simpler.	2018-07-08 01:13:36 +09:00
Takuya ASADA	e2a032f7ea	dist/ami: install python34 to run scylla_install_ami Since we switched scylla_install_ami to python3, need to install python3 before launching the script.	2018-07-08 01:13:36 +09:00
Takuya ASADA	4e04fb7d68	dist/common/scripts/scylla_ec2_check: move ec2 related code to class aws_instance There is duplicated code on both scylla_ec2_check and class aws_instance on scylla_util.py, so drop these code from scylla_ec2_check and use class aws_instance.	2018-07-08 01:13:36 +09:00
Takuya ASADA	99d5ca03e7	dist/common/scripts: drop class concolor, use colorprint() To print colored console output with simplar code, drop class concolor and use colorprint() instead.	2018-07-08 01:13:36 +09:00
Takuya ASADA	14d117363b	dist/ami/files/.bash_profile: convert almost all lines to python3 Since it's .bash_profile we cannot make the file to python3 script but almost all lines are rewritten to python3, .bash_profile just launch it.	2018-07-08 01:13:35 +09:00
Takuya ASADA	25c3249d8d	dist/common/scripts: convert node_exporter_install to python3 Convert bash script to python3.	2018-07-08 01:13:35 +09:00
Takuya ASADA	505fcc92f7	dist/common/scripts: convert scylla_stop to python3 Convert bash script to python3.	2018-07-08 01:13:35 +09:00
Takuya ASADA	eb369942bd	dist/common/scripts: convert scylla_prepare to python3 Convert bash script to python3.	2018-07-08 01:13:35 +09:00
Vlad Zolotarov	7495c8e56d	dist: scylla_lib.sh: get_mode_cpu_set: split the declaration and ssignment to the local variable In bash local variable declaration is a separate operation with its own exit status (always 0) therefore constructs like local var=`cmd` will always result in the 0 exit status ($? value) regardless of the actual result of "cmd" invocation. To overcome this we should split the declaration and the assignment to be like this: local var var=`cmd` Fixes #3508 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1529702903-24909-3-git-send-email-vladz@scylladb.com>	2018-07-07 18:04:19 +03:00
Vlad Zolotarov	f3ca17b1a1	dist: scylla_lib.sh: get_mode_cpu_set: don't let the error messages out References #3508 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1529702903-24909-2-git-send-email-vladz@scylladb.com>	2018-07-07 18:04:18 +03:00
Avi Kivity	e79fccdf7b	Update seastar submodule * seastar d7f35d7...216d499 (10): > temporary_buffer: Add clone method() > temporary_buffer: Make move-assignment operator noexcept. > deleter: Make move-assignment operator noexcept. > reactor: don't become inefficient when max_task_backlog is exceeded > reactor: switch cumulative time metrics resolution from nanoseconds to milliseconds > preempt: annotate for branch prediction > tests: silence "-Werror=sign-compare" warnings > Merge "Support one I/O Scheduler per device" from Glauber > rpc: make rpc server scheduling aware > Add SEASTAR_USER_CFLAGS and SEASTAR_ENABLE_WERROR	2018-07-07 17:48:25 +03:00
Vlad Zolotarov	c65a110839	main: remove the "experimental" tag from the hinted handoff feature Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-06 19:19:40 -04:00
Vlad Zolotarov	83ba6d84a1	db::hints::manager: implement rebalance() method Rebalance hints segments that need to be sent among all present shards. Ensure that after rebalancing the difference between the number of segments of any two shards is not greater than 1. Try to minimize the amount of "file rename" operations in order to achieve the needed result. Note: "Resharding" is a particular case of rebalancing. Tests: dtest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-06 19:18:46 -04:00
Piotr Sarna	77aa97f62a	cql3: fix ALLOW FILTERING iterator In original series cell iterator for regular cells was erroneously taken by copy instead of by reference, which will result in iterating over the first value indefinitely. Also, the same iterator was not updated for collections, which is fixed too. Message-Id: <83297adf8121de4fd37257c87f250d61ea9ec80b.1530892191.git.sarna@scylladb.com>	2018-07-06 17:23:12 +01:00
Duarte Nunes	0ec3ff0611	Merge 'Add ALLOW FILTERING metrics' from Piotr " This series addresses issue #3575 by adding 3 ALLOW FILTERING related metrics to help profile queries: * number of read request that required filtering * total number of rows read that required filtering * number of rows read that required filtering and matched Tests: unit (release) " * 'allow_filtering_metrics_4' of https://github.com/psarna/scylla: cql3: publish ALLOW FILTERING metrics cql3: add updating ALLOW FILTERING metrics cql3: define ALLOW FILTERING metrics	2018-07-06 11:19:37 +01:00
Piotr Sarna	4a435e6f66	cql3: publish ALLOW FILTERING metrics ALLOW FILTERING related metrics are registered and published. Fixes #3575	2018-07-06 12:00:37 +02:00
Piotr Sarna	03f2f8633b	cql3: add updating ALLOW FILTERING metrics Metrics related to ALLOW FILTERING queries are now properly updated on read requests.	2018-07-06 12:00:29 +02:00
Piotr Sarna	8cb242ab0b	cql3: define ALLOW FILTERING metrics The following metrics are defined for ALLOW FILTERING: * number of read request that required filtering * total number of rows read that required filtering * number of rows read that required filtering and matched	2018-07-06 10:43:18 +02:00
Glauber Costa	82f7f7b36d	database: change ident Previous patches have used reviewer-oriented identation. Re-ident. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 17:11:01 -04:00
Glauber Costa	99c8a1917f	database: support multiple data directories While Cassandra supports multiple data directories, we have been historically supporting just one. The one-directory model suits us better because of the I/O Scheduler and so far we have seen very few requests -- if any, to support this. Still, the infrastructure needed to support multiple directories can be beneficial so I am trying to bring this in. For simplicity, we will treat the first directory in the list as the main directory. By being able to still associate one singular directory with a table, most of the code doesn't have to change and we don't have to worry about how to distribute data between the directories. In this design: - We scan all data directories for existing data. - resharding only happens within a particular data directory. - snapshot details are accumulated with data for all directories that host snapshots for the tables we are examining - snapshots are created with files in its own directories, but the manifest file goes to the main directory. For this one, note that in Cassandra the same thing happens, except that there is no "main" directory. Still the manifest file is still just in one of them. - SSTables are flushed into the main directory. - Compactions write data into the main directory Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 16:58:39 -04:00
Glauber Costa	3b46984a1e	database: allow resharing to specify a directory resharding assumes that all SSTables will be in cf->dir(), but in reality we will soon have tables in other places. We can specify a directory in get_all_shared_sstables and specify that directory from the resharding process. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 16:58:08 -04:00
Glauber Costa	c8b2d441a8	database: support multiple directories in get_snapshot_details Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 16:58:08 -04:00
Glauber Costa	a8ccf4d1e6	database: move get_snapshot_info into a seastar::thread I am about to add another level of identation and this code already shifts right too much. It is not performance critical, so let's use a thread for that. seastar::threads did not exist when this was first written. Also remove one unused continuation from inside the inner scan, simplifying its code. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 16:58:08 -04:00
Glauber Costa	919c7d6bb9	snapshots: always create the snapshot directory We currently don't always create the snapshot directory as an optimization. We have a test at sync time handling this use case. This works well when all SSTables are created in the same directory, but if we have more than one data directory than it may not work if we don't have SSTables in all data directories. We can fix it by unconditionally creating the directory. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 16:58:08 -04:00
Glauber Costa	86239e4e22	sstables: pass sstable dir with entry descriptor We have been assuming that all SSTables for a table will be in the same directory, and we pass the directory name to make_descriptor only because that's the way in ka and la to find out the keyspace and table names. However, SSTables for a given column family could be spread into multiple directories. So let's pass it down with the descriptor so we can load from the right place. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 16:45:26 -04:00
Glauber Costa	25a02c61d6	database: make nodetool listsnapshots print correct information nodetool listsnapshots is currently printing zero sizes for all snapshots The reason for that is that we are moving the snapshot directory name in the capture list, which can be evaluated by the compiler to happen before we use it as the function parameter. Fixes #3572 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 16:20:07 -04:00
Glauber Costa	4a62866104	sstables: correctly create descriptors for snapshots Our regular expression for parsing SSTable files tests for the directory for the la file format, since that file format does not include the ks/cf pair in the file name itself. However, the regular expression does not cover the case in which the SSTable files are coming from snapshots. This patch extends the regex so they are also covered. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-07-05 16:19:09 -04:00
Raphael S. Carvalho	dfd1e1229e	sstables/compaction_manager: fix typo in function name to reevaluate postponed compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180702185343.26682-1-raphaelsc@scylladb.com>	2018-07-05 18:54:14 +03:00
Takuya ASADA	4df982fe07	dist/common/scripts/scylla_sysconfig_setup: fix typo Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180705133313.16934-1-syuu@scylladb.com>	2018-07-05 16:38:14 +03:00
Avi Kivity	7a1bcd9ad3	Merge "Improve mutation printing in GDB" from Tomasz " This is a series of patches which make it possible for a human to examine contents of cache or memtables from GDB. " * 'tgrabiec/gdb-cache-printers' of github.com:tgrabiec/scylla: gdb: Add pretty printer for managed_vector gdb: Add pretty printer for rows gdb: Add mutation_partition pretty printer gdb: Add pretty printer for partition_entry gdb: Add pretty printer for managed_bytes gdb: Add iteration wrapper for intrusive_set_external_comparator gdb: Add iteration wrapper for boost intrusive set	2018-07-05 14:08:58 +03:00
Avi Kivity	f55a2fe3a7	main: improve reporting of dns resolution errors A report that C-Ares returned some errors tells the user nothing. Improve the error message by including the name of the configuration variable and its value. Message-Id: <20180705084959.10872-1-avi@scylladb.com>	2018-07-05 10:24:41 +01:00
Duarte Nunes	c126b00793	Merge 'ALLOW FILTERING support' from Piotr " The main idea of this series is to provide a filtering_visitor as a specialised result_set_builder::visitor implementation that keeps restriction info and applies it on query results. Also, since allow_filtering checking is not correct now (e.g. #2025) on select_statement level, this series tries to fix any issues related to it. Still in TODO: * handling CONTAINS relation in single column restriction filtering * handling multi-column restrictions - especially EQ, which can be split into multiple single-column restrictions * more tests - it's never enough; especially esoteric cases like filtering queries which also use secondary indexes, paging tests, etc. Tests: unit (release) " * 'allow_filtering_6' of https://github.com/psarna/scylla: tests: add allow_filtering tests to cql_query_test cql3: enable ALLOW FILTERING service: add filtering_pager cql3: optimize filtering partition keys and static rows cql3: add filtering visitor cql3: move result_set_builder functions to header cql3: amend need_filtering() cql3: add single column primary key restrictions getters cql3: expose single column primary key restrictions cql3: add needs_filtering to primary key restrictions cql3: add simpler single_column_restriction::is_satisfied_by	2018-07-05 10:18:08 +01:00
Piotr Sarna	a7dd02309f	tests: add allow_filtering tests to cql_query_test Test cases for ALLOW FILTERING are added to cql_query_test suite.	2018-07-05 10:50:43 +02:00
Piotr Sarna	27bf20aa3f	cql3: enable ALLOW FILTERING Enables 'ALLOW FILTERING' queries by transfering control to result_set_builder::filtering_visitor. Both regular and primary key columns are allowed, but some things are left unimplemented: - multi-column restrictions - CONTAINS queries Fixes #2025	2018-07-05 10:50:43 +02:00
Piotr Sarna	7b018f6fd6	service: add filtering_pager For paged results of an 'ALLOW FILTERING' query, a filtering pager is provided. It's based on a filtering_visitor for result_builder.	2018-07-05 10:50:43 +02:00
Piotr Sarna	a08fba19e3	cql3: optimize filtering partition keys and static rows If any restriction on partition key or static row part fails, it will be so for every row that belongs to a partition. Hence, full check of the rest of the rows is skipped.	2018-07-05 10:50:43 +02:00
Piotr Sarna	2a0b720102	cql3: add filtering visitor In order to filter results of an 'ALLOW FILTERING' query, a visitor that can take optional filter for result_builder is provided. It defaults to nop_filter, which accepts all rows.	2018-07-05 10:50:43 +02:00
Piotr Sarna	1cf5653f89	cql3: move result_set_builder functions to header Moving function definitions to header is a preparation step before turning result_set_builder into a template.	2018-07-05 10:50:43 +02:00
Piotr Sarna	4d3d32f465	cql3: amend need_filtering() Previous implementation of need_filtering() was too eager to assume that index query should be used, whereas sometimes a query should just be filtered.	2018-07-05 10:50:39 +02:00
Avi Kivity	dd083122f9	Update scylla-ami submodule * dist/ami/files/scylla-ami 0fd9d23...67293ba (1): > scylla_install_ami: fix broken argument parser Fixes #3578.	2018-07-05 09:48:06 +03:00
Avi Kivity	f4caa418ff	Merge "Fix the "LCS data-loss bug"" from Botond " This series fixes the "LCS data-loss bug" where full scans (and everything that uses them) would miss some small percentage (> 0.001%) of the keys. This could easily lead to permanent data-loss as compaction and decomission both use full scans. `aeffbb673` worked around this bug by disabling the incremental reader selectors (the class identified as the source of the bug) altogether. This series fixes the underlying issue and reverts `aeffbb673`. The root cause of the bug is that the `incremental_reader_selector` uses the current read position to poll for new readers using `sstable_set::incremental_selector::select()`. This means that when the currently open sstables contain no partitions that would intersect with some of the yet unselected sstables, those sstables would be ignored. Solve the problem by not calling `select()` with the current read position and always pass the `next_position` returned in the previous call. This means that the traversal of the sstable-set happens at a pace defined by the sstable-set itself and this guarantees that no sstable will be jumped over. When asked for new readers the `incremental_reader_selector` will now iteratively call `select()` using the `next_position` from the previous `select()` call until it either receives some new, yet unselected sstables, or `next_position` surpasses the read position (in which case `select()` will be tried again later). The `sstable_set::incremental_selector` was not suitable in its present state to support calling `select()` with the `next_position` from a previous call as in some cases it could not make progress due to inclusiveness related ambiguities. So in preparation to the above fix `sstable_set` was updated to work in terms of ring-position instead of tokens. Ring-position can express positions in a much more fine-grained way then token, including positions after/before tokens and keys. This allows for a clear expression of `next_position` such that calling `select()` with it guarantees forward progress in the token-space. Tests: unit(release, debug) Refs: #3513 " * 'leveled-missing-keys/v4' of https://github.com/denesb/scylla: tests/mutation_reader_test: combined_mutation_reader_test: use SEASTAR_THREAD_TEST_CASE tests/mutation_reader_test: refactor combined_mutation_reader_test tests/mutation_reader_test: fix reader_selector related tests Revert "database: stop using incremental selectors" incremental_reader_selector: don't jump over sstables mutation_reader: reader_selector: use ring_position instead of token sstables_set::incremental_selector: use ring_position instead of token compatible_ring_position: refactor to compatible_ring_position_view dht::ring_position_view: use token_bound from ring_position i_partitioner: add free function ring-position tri comparator mutation_reader_merger::maybe_add_readers(): remove early return mutation_reader_merger: get rid of _key	2018-07-05 09:33:12 +03:00
Takuya ASADA	3bcc123000	dist/ami: hardcode target for scylla_current_repo since we don't have --target option anymore We break build_ami.sh since we dropped Ubuntu support, scylla_current_repo command does not finishes because of less argument ('--target' with no distribution name, since $TARGET is always blank now). It need to hardcoded as centos. Fixes #3577 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180705035251.29160-1-syuu@scylladb.com>	2018-07-05 09:31:43 +03:00
Paweł Dziepak	07a429e837	test.py: do not disable human-readable format with --jenkins flag When test.py is run with --jenkins flag Boost UTF is asked to generate an XML file with the test results. This automatically disables the human-readable output printed to stdout. There is no real reason to do so and it is actually less confusing when the Boost UTF messages are in the test output together with Scylla logger messages. Message-Id: <20180704172913.23462-1-pdziepak@scylladb.com>	2018-07-05 09:31:15 +03:00
Raphael S. Carvalho	7d6af5da3a	sstables/compaction_manager: properly reevaluate postponed compactions for leveled strategy Function to reevaluate postponed compaction was called too early for strategies that don't allow parallel compaction (only leveled strategy (LCS) at this moment). Such strategies must first have the ongoing compaction deregistered before reevaluating the postponed ones. Manager uses task list of ongoing compaction to decides if there's ongoing compaction for a given column family. So compaction could stop making progress at all if and only if we stop flushing new data. So it could happen that a column family would be left with lots of pending compaction, leading the user to think all compacting is done, but after reboot, there will be lots of compaction activity. We'll both improve method to detect parallel compaction here and also add a call to reevaluate postponed compaction after compaction is done. Fixes #3534. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180702185327.26615-1-raphaelsc@scylladb.com>	2018-07-04 16:30:21 +01:00
Botond Dénes	b32f94d31e	tests/mutation_reader_test: combined_mutation_reader_test: use SEASTAR_THREAD_TEST_CASE	2018-07-04 17:42:37 +03:00
Botond Dénes	77ad085393	tests/mutation_reader_test: refactor combined_mutation_reader_test Make combined_mutation_reader_test more interesting: * Set the levels on the sstables * Arrange the sstables so that they test for the "jump over sstables" bug. * Arrange the sstables so that they test for the "gap between sstables". While at it also make the code more compact.	2018-07-04 17:42:37 +03:00
Botond Dénes	4b57fc9aea	tests/mutation_reader_test: fix reader_selector related tests Don't assume the partition keys use lexical ordering. Add some additional checks.	2018-07-04 17:42:37 +03:00
Botond Dénes	a9c465d7d2	Revert "database: stop using incremental selectors" The data-loss bug is fixed, the incremental selector can be used again. This reverts commit `aeffbb6732`.	2018-07-04 17:42:37 +03:00
Botond Dénes	c37aff419e	incremental_reader_selector: don't jump over sstables Passing the current read position to the `incremental_selector::select()` can lead to "jumping" through sstables. This can happen when the currently open sstables have no partition that intersects with a yet unselected sstable that has an intersecting range nevertheless, in other words there is a gap in the selected sstables that this unselected one completely fits into. In this case the unselected sstable will be completely omitted from the read. The solution is to not to avoid calling `select()` with a position that is larger than the `next_position` returned from the previous `select()` call. Instead, call `select()` repeatedly with the `next_position` from the previous call, until either at least one new sstable is selected or the current read position is surpassed. This guarantess that no sstables will be jumped over. In other words, advance the incremental selector in a pace defined by itself thus guaranteeing that no sstable will be jumped over.	2018-07-04 17:42:37 +03:00
Botond Dénes	81a03db955	mutation_reader: reader_selector: use ring_position instead of token sstable_set::incremental selector was migrated to ring position, follow suit and migrate the reader_selector to use ring_position as well. Above correctness this also improves efficiency in case of dense tables, avoiding prematurely selecting sstables that share the token but start at different keys, altough one could argue that this is a niche case.	2018-07-04 17:42:37 +03:00
Botond Dénes	a8e795a16e	sstables_set::incremental_selector: use ring_position instead of token Currently `sstable_set::incremental_selector` works in terms of tokens. Sstables can be selected with tokens and internally the token-space is partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as well. This is problematic for severeal reasons. The sub-range sstables cover from the token-space is defined in terms of decorated keys. It is even possible that multiple sstables cover multiple non-overlapping sub-ranges of a single token. The current system is unable to model this and will at best result in selecting unnecessary sstables. The usage of token for providing the next position where the intersecting sstables change [1] causes further problems. Attempting to walk over the token-space by repeatedly calling `select()` with the `next_position` returned from the previous call will quite possibly lead to an infinite loop as a token cannot express inclusiveness/exclusiveness and thus the incremental selector will not be able to make progress when the upper and lower bounds of two neighbouring intervals share the same token with different inclusiveness e.g. [t1, t2](t2, t3]. To solve these problems update incremental_selector to work in terms of ring position. This makes it possible to partition the token-space amoing sstables at decorated key granularity. It also makes it possible for select() to return a next_position that is guaranteed to make progress. partitioned_sstable_set now builds the internal interval map using the decorated key of the sstables, not just the tokens. incremental_selector::select() now uses `dht::ring_position_view` as both the selector and the next_position. ring_position_view can express positions between keys so it can also include information about inclusiveness/exclusiveness of the next interval guaranteeing forward progress. [1] `sstable_set::incremental_selector::selection::next_position`	2018-07-04 17:42:33 +03:00
Duarte Nunes	33d7de0805	Merge 'Expose sharding information to connections' from Avi " In the same way that drivers can route requests to a coordinator that is also a replica of the data used by the request, we can allow drivers to route requests directly to the shard. This patchset adds and documents a way for drivers to know which shard a connection is connected to, and how to perform this routing. " * tag 'shard-info-alt/v1' of https://github.com/avikivity/scylla: doc: documented protocol extension for exposing sharding transport: expose more information about sharding via the OPTIONS/SUPPORTED messages dht: add i_partitioner::sharding_ignore_msb()	2018-07-04 13:01:21 +01:00
Botond Dénes	8084ce3a8e	query_pager: use query::is_single_partition() to check for singular range Use query::is_single_partition() to check whether the queried ranges are singular or not. The current method of using `dht::partition_range::is_singular()` is incorrect, as it is possible to build a singular range that doesn't represent a single partition. `query::is_single_partition()` correctly checks for this so use it instead. Found during code-review. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <f671f107e8069910a2f84b14c8d22638333d571c.1530675889.git.bdenes@scylladb.com>	2018-07-04 10:04:50 +01:00
Takuya ASADA	3cb7ddaf68	dist/debian/build_deb.sh: make build_deb.sh more simplified Use is_debian()/is_ubuntu() to detect target distribution, also install pystache by path since package name is different between Fedora and CentOS. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180703193224.4773-1-syuu@scylladb.com>	2018-07-04 11:12:26 +03:00
Takuya ASADA	ed1d0b6839	dist/ami/files/.bash_profile: drop Ubuntu support Drop Ubuntu support on login prompt, too. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180703192813.4589-1-syuu@scylladb.com>	2018-07-04 11:12:26 +03:00
Piotr Sarna	f42eaff75e	cql3: add single column primary key restrictions getters Getters for single column partition/clustering key restrictions are added to statement_restrictions.	2018-07-04 09:48:32 +02:00
Piotr Sarna	a99acbc376	cql3: expose single column primary key restrictions Underlying single_column_restrictions are exposed for single_column_primary_key_restrictions via a const method.	2018-07-04 09:48:32 +02:00
Piotr Sarna	f7a2f15935	cql3: add needs_filtering to primary key restrictions Primary key restrictions sometimes require filtering. These functions return true if ALLOW FILTERING needs to be enabled in order to satisfy these restrictions.	2018-07-04 09:48:32 +02:00
Piotr Sarna	6aec9e711f	cql3: add simpler single_column_restriction::is_satisfied_by Currently restriction::is_satisfied_by() accepts only keys and rows as arguments. In this commit, a version that only takes bytes of data is provided. This simpler version applies to single_column_restriction only, because it compares raw bytes underneath anyway. For other restriction types, simplified is_satisfied_by is not defined.	2018-07-04 09:48:32 +02:00
Botond Dénes	bf2645c616	compatible_ring_position: refactor to compatible_ring_position_view compatible_ring_position's sole purpose is to allow creating boost::icl::interval_map with dht::ring_position as the key and list of sstables as the value. This function is served equally well if compatible_ring_position wraps a `dht::ring_position_view` instead of a `dht::ring_position` with the added benefit of not having to copy the possibly heavy `dht::decorated_key` around. It also makes it possible to do lookups with `dht::ring_position_view` which is much more versatile and allows avoiding copies just to make lookups. The only downside is that `dht::ring_position_view` requires the lifetime of the "viewed" object to be taken care of. This is not a concern however, as so long as an interval is present in the map the represented sstable is guaranteed to be alive to, as the interval map participates in the ownership of the stored sstables. Rename compatible_ring_position to compatible_ring_position_view to reflect the changes. While at it upgrade the std::experimental::optional to std::optional.	2018-07-04 08:19:39 +03:00
Botond Dénes	48b07ba5d3	dht::ring_position_view: use token_bound from ring_position Currently dht::ring_position_view's dht::token constructor takes the token bound in the form of a raw `uint8_t`. This allows for passing a weight of "0" which is illegal as single token does not represent a single ring position but an interval as arbitrary number of keys can have the same token. dht::ring_position uses an enum in its dht::token constructor. Import that same enum into the dht::ring_position_view scope and take a `token_bound` instead of `uint8_t`. This is especially important as in later patches the internal weight of the ring_position_view will be exposed and illegal values can cause all sorts of problems.	2018-07-04 08:19:34 +03:00
Alexys Jacob	8c03c1e2ce	Support Gentoo Linux on node_health_check script. Gentoo Linux was not supported by the node_health_check script which resulted in the following error message displayed: "This s a Non-Supported OS, Please Review the Support Matrix" This patch adds support for Gentoo Linux while adding a TODO note to add support for authenticated clusters which the script does not support yet. Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20180703124458.3788-1-ultrabug@gentoo.org>	2018-07-03 20:18:13 +03:00
Tomasz Grabiec	2ffb621271	Merge "Fix atomic_cell_or_collection::external_memory_usage()" from Paweł After the transition to the new in-memory representation in `aab6b0ee27` 'Merge "Introduce new in-memory representation for cells" from Paweł' atomic_cell_or_collection::external_memory_usage() stopped accounting for the externally stored data. Since, it wasn't covered by the unit tests the bug remained unnotices until now. This series fixes the memory usage calculation and adds proper unit tests. * https://github.com/pdziepak/scylla.git fix-external-memory-usage/v1: tests/mutation: properly mark atomic_cells that are collection members imr::utils::object: expose size overhead data::cell: expose size overhead of external chunks atomic_cell: add external chunks and overheads to external_memory_usage() tests/mutation: test external_memory_usage()	2018-07-03 14:58:10 +02:00
Botond Dénes	c236a96d7d	tests/cql_query_tess: add unit test for querying empty ranges test A bug was found recently (#3564) in the paging logic, where the code assumed the queried ranges list is non-empty. This assumption is incorrect as there can be valid (if rare) queries that can result in the ranges list to be empty. Add a unit test that executes such a query with paging enabled to detect any future bugs related to assumptions about the ranges list being non-empty. Refs: #3564 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <f5ba308c4014c24bb392060a7e72e7521ff021fa.1530618836.git.bdenes@scylladb.com>	2018-07-03 13:43:17 +01:00
Botond Dénes	59a30f0684	query_pager: be prepared to _ranges being empty do_fetch_page() checks in the beginning whether there is a saved query state already, meaning this is not the first page. If there is not it checks whether the query is for a singulular partitions or a range scan to decide whether to enable the stateful queries or not. This check assumed that there is at least one range in _ranges which will not hold under some circumstances. Add a check for _ranges being empty. Fixes: #3564 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <cbe64473f8013967a93ef7b2104c7ca0507afac9.1530610709.git.bdenes@scylladb.com>	2018-07-03 11:05:01 +01:00
Avi Kivity	eafd16266d	tests: reduce multishard_mutation_test runtime in debug mode Debug mode is so slow that generating 1000 mutations is too much for it. High memory use can also confuse the santitizers that track each allocation. Reduce mutation count from 1000 to 10 in debug mode.	2018-07-03 12:01:44 +03:00
Avi Kivity	a36b1f1967	Merge "more scylla_setup fixes" from Takuya " Added NIC / Disk existance check, --force-raid mode on scylla_raid_setup. " * 'scylla_setup_fix4' of https://github.com/syuu1228/scylla: dist/common/scripts/scylla_raid_setup: verify specified disks are unused dist/common/scripts/scylla_raid_setup: add --force-raid to construct raid even only one disk is specified dist/common/scripts/scylla_setup: don't accept disk path if it's not block device dist/common/scripts/scylla_raid_setup: verify specified disk paths are block device dist/common/scripts/scylla_sysconfig_setup: verify NIC existance	2018-07-03 11:03:08 +03:00
Takuya ASADA	d0f39ea31d	dist/common/scripts/scylla_raid_setup: verify specified disks are unused Currently only scylla_setup interactive mode verifies selected disks are unused, on non-interactive mode we get mdadm/mkfs.xfs program error and python backtrace when disks are busy. So we should verify disks are unused also on scylla_raid_setup, print out simpler error message.	2018-07-03 14:50:34 +09:00
Takuya ASADA	3289642223	dist/common/scripts/scylla_raid_setup: add --force-raid to construct raid even only one disk is specified User may want to start RAID volume with only one disk, add an option to force constructing RAID even only one disk specified.	2018-07-03 14:50:34 +09:00
Takuya ASADA	e0c16c4585	dist/common/scripts/scylla_setup: don't accept disk path if it's not block device Need to ignore input when specified path is not block device.	2018-07-03 14:50:34 +09:00
Takuya ASADA	24ca2d85c6	dist/common/scripts/scylla_raid_setup: verify specified disk paths are block device Verify disk paths are block device, exit with error if not.	2018-07-03 14:50:34 +09:00
Takuya ASADA	99b5cf1f92	dist/common/scripts/scylla_sysconfig_setup: verify NIC existance Verify NIC existance before writing sysconfig file to prevent causing error while running scylla. See #2442	2018-07-03 14:50:34 +09:00
Takuya ASADA	084c824d12	scripts: merge scylla_install_pkg to scylla-ami scylla_install_pkg is initially written for one-liner-installer, but now it only used for creating AMI, and it just few lines of code, so it should be merge into scylla_install_ami script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180612150106.26573-2-syuu@scylladb.com>	2018-07-02 13:20:09 +03:00
Takuya ASADA	fafcacc31c	dist/ami: drop Ubuntu AMI support Drop Ubuntu AMI since it's not maintained for a long time, and we have no plan to officially provide it. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180612150106.26573-1-syuu@scylladb.com>	2018-07-02 13:20:08 +03:00
Avi Kivity	677991f353	Uodate scylla-ami submodule * dist/ami/files/scylla-ami 36e8511...0fd9d23 (2): > scylla_install_ami: merge scylla_install_pkg > scylla_install_ami: drop Ubuntu AMI	2018-07-02 13:19:34 +03:00
Botond Dénes	01bd34d117	i_partitioner: add free function ring-position tri comparator Having to create an object just to compare two ring positions (or views) is annoying and unnecessary. Provide a free function version as well.	2018-07-02 11:41:09 +03:00
Botond Dénes	78ecf2740a	mutation_reader_merger::maybe_add_readers(): remove early return It's unnecessary (doesn't prevent anything). The code without it expresses intent better (and is shorter by two lines).	2018-07-02 11:41:09 +03:00
Botond Dénes	d26b35b058	mutation_reader_merger: get rid of _key `_key` is only used in a single place and this does not warrant storing it in a member. Also get rid of current_position() which was used to query `_key`.	2018-07-02 11:40:43 +03:00
Avi Kivity	0b148d0070	Merge "scylla_setup fixes" from Takuya " I found problems on previously submmited patchset 'scylla_setup fixes' and 'more fixes for scylla_setup', so fixed them and merged into one patchset. Also added few more patches. " * 'scylla_setup_fix3' of https://github.com/syuu1228/scylla: dist/common/scripts/scylla_setup: allow input multiple disk paths on RAID disk prompt dist/common/scripts/scylla_raid_setup: skip constructing RAID0 when only one disk specified dist/common/scripts/scylla_raid_setup: fix module import dist/common/scripts/scylla_setup: check disk is used in MDRAID dist/common/scripts/scylla_setup: move unmasking scylla-fstrim.timer on scylla_fstrim_setup dist/common/scripts/scylla_setup: use print() instead of logging.error() dist/common/scripts/scylla_setup: implement do_verify_package() for Gentoo Linux dist/common/scripts/scylla_coredump_setup: run os.remove() when deleting directory is symlink dist/common/scripts/scylla_setup: don't include the disk on unused list when it contains partitions dist/common/scripts/scylla_setup: skip running rest of the check when the disk detected as used dist/common/scripts/scylla_setup: add a disk to selected list correctly dist/common/scripts/scylla_setup: fix wrong indent dist/common/scripts: sync instance type list for detect NIC type to latest one dist/common/scripts: verify systemd unit existance using 'systemctl cat'	2018-07-02 10:21:49 +03:00
Avi Kivity	a45c3aa8c7	Merge "Fix handling of stale write replies in storage_proxy" from Gleb " If a coordinator sends write requests with ID=X and restarts it may get a reply to the request after it restarts and sends another request with the same ID (but to different replicas). This condition will trigger an assert in a coordinator. Drop the assertion in favor of a warning and initialize handler id in a way to make this situation less likely. Fixes: #3153 " * 'gleb/write-handler-id' of github.com:scylladb/seastar-dev: storage_proxy: initialize write response id counter from wall clock value storage_proxy: drop virtual from signal(gms::inet_address) storage_proxy: do not assert on getting an unexpected write reply	2018-07-01 17:59:54 +03:00
Gleb Natapov	19e7493d5b	storage_proxy: initialize write response id counter from wall clock value Initializing write response id to the same value on each reboot may cause stale id to be taken for active one if node restarts after sending only a couple of write request and before receiving replies. On next reboot it will start assigning id's from the same value and receiving old replies will confuse it. Mitigate this by assigning initial id to wall clock value in milliseconds. It will not solve the problem completely, but will mitigate it.	2018-07-01 17:24:40 +03:00
Nadav Har'El	3194ce16b3	repair: fix combination of "-pr" and "-local" repair options When nodetool repair is used with the combination of the "-pr" (primary range) and "-local" (only repair with nodes in the same DC) options, Scylla needs to define the "primary ranges" differently: Rather than assign one node in the entire cluster to be the primary owner of every token, we need one node in each data-center - so that a "-local" repair will cover all the tokens. Fixes #3557. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180701132445.21685-1-nyh@scylladb.com>	2018-07-01 16:39:33 +03:00
Gleb Natapov	569437aaa5	storage_proxy: drop virtual from signal(gms::inet_address) The function is not overridden, so should not be virtual.	2018-07-01 16:35:59 +03:00
Gleb Natapov	5ee09e5f3b	storage_proxy: do not assert on getting an unexpected write reply In theory we should not get write reply from a node we did not send write to, but in practice stale reply can be received if node reboot between sending write and getting a reply. Do not assert, but log the warning instead and ignore the reply. Fixes: #3153	2018-07-01 16:35:09 +03:00
Tomasz Grabiec	b464b66e90	row_cache: Fix memtable reads concurrent with cache update missing writes Introduced in `5b59df3761`. It is incorrect to erase entries from the memtable being moved to cache if partition update can be preempted because a later memtable read may create a snapshot in the memtable before memtable writes for that partition are made visible through cache. As a result the read may miss some of the writes which were in the memtable. The code was checking for presence of snapshots when entering the partition, but this condition may change if update is preempted. The fix is to not allow erasing if update is preemptible. This also caused SIGSEGVs because we were assuming that no such snapshots will be created and hence were not invalidating iterators on removal of the entries, which results in undefined behavior when such snapshots are actually created. Fixes SIGSEGV in dtest: limits_test.py:TestLimits.max_cells_test Fixes #3532 Message-Id: <1530129009-13716-1-git-send-email-tgrabiec@scylladb.com>	2018-07-01 15:36:05 +03:00
Avi Kivity	f3da043230	Merge "Make in-memory partition version merging preemptable" from Tomasz " Partition snapshots go away when the last read using the snapshot is done. Currently we will synchronously attempt to merge partition versions on this event. If partitions are large, that may stall the reactor for a significant amount of time, depending on the size of newer versions. Cache update on memtable flush can create especially large versions. The solution implemented in this series is to allow merging to be preemptable, and continue in the background. Background merging is done by the mutation_cleaner associated with the container (memtable, cache). There is a single merging process per mutation_cleaner. The merging worker runs in a separate scheduling group, introduced here, called "mem_compaction". When the last user of a snapshot goes away the snapshot is slided to the oldest unreferenced version first so that the version is no longer reachable from partition_entry::read(). The cleaner will then keep merging preceding (newer) versions into it, until it merges a version which is referenced. The merging is preemtable. If the initial merging is preempted, the snapshot is enqueued into the cleaner, the worker woken up, and merging will continue asynchronously. When memtable is merged with cache, its cleaner is merged with cache cleaner, so any outstanding background merges will be continued by the cache cleaner without disruption. This reduces scheduling latency spikes in tests/perf_row_cache_update for the case of large partition with many rows. For -c1 -m1G I saw them dropping from >23ms to 1-2ms. System-level benchmark using scylla-bench shows a similar improvement. " * tag 'tgrabiec/merge-snapshots-gradually-v4' of github.com:tgrabiec/scylla: tests: perf_row_cache_update: Test with an active reader surviving memtable flush memtable, cache: Run mutation_cleaner worker in its own scheduling group mutation_cleaner: Make merge() redirect old instance to the new one mvcc: Use RAII to ensure that partition versions are merged mvcc: Merge partition version versions gradually in the background mutation_partition: Make merging preemtable tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots	2018-07-01 15:32:51 +03:00
Avi Kivity	8eba27829a	doc: documented protocol extension for exposing sharding Document a protocol extension that exposes the sharding algorithm to drivers, and recommend how to use it to achieve connection-per-core.	2018-07-01 15:26:30 +03:00
Avi Kivity	28d064e7c0	transport: expose more information about sharding via the OPTIONS/SUPPORTED messages Provide all infomation needed for a connection pool to set up a connection per shard.	2018-07-01 15:26:28 +03:00
Botond Dénes	5fd9c3b9d4	tests/mutation_reader_test: require min shard-count for multishard tests Tests testing different aspects of `foreign_reader` and `multishard_combining_reader` are designed to run with a certain minimum shard count. Running them with any shard count below this minimum makes them useless at best but can even fail them. Refuse to run these tests when the shard count is below the required minimum to avoid an accidental and unnecessary investigation into a false-positive test failure. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d24159415b6a9d74eafb8355b6e3fba98c1ff7ff.1530274392.git.bdenes@scylladb.com>	2018-07-01 12:44:41 +03:00
Avi Kivity	f73340e6f8	Merge "Index reader and associated types clean-up." from Vladimir " This patchset paves way to support for reading SSTables 3.x index files. It aims at streamlining and tidying up the existing index_reader and helpers and brings no functional or high-level changes. In v3: - do not capture 'found' and just return 'true' in the continuation inside advance_and_check_if_present() - split code that makes the use of advance_upper_past() internal-only into two commits for better readability GitHub URL: https://github.com/argenet/scylla/tree/projects/sstables-30/index_reader_cleanup/v3 Tests: unit {release} Performance tests (perf_fast_forward) did not reveal any noticeable changes. The complete output is below. ======================================== Original code (before the patchset) ======================================== running: large-partition-skips Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1 0 0.336514 1000000 2971642 1000 126956 35 0 0 0 0 0 0 0 99.5% 1 1 1.411239 500000 354299 993 127056 2 0 0 1 1 0 0 0 99.9% 1 8 0.464468 111112 239224 993 127056 2 0 0 1 1 0 0 0 99.8% 1 16 0.330490 58824 177990 993 127056 12 0 0 1 1 0 0 0 99.7% 1 32 0.257010 30304 117910 993 127056 15 0 0 1 1 0 0 0 99.7% 1 64 0.213650 15385 72010 997 127072 268 0 0 3 3 0 0 0 99.5% 1 256 0.159498 3892 24402 993 127056 245 0 0 1 1 0 0 0 95.5% 1 1024 0.088678 976 11006 993 127056 347 0 0 1 1 0 0 0 63.4% 1 4096 0.082627 245 2965 649 22452 389 252 0 1 1 0 0 0 20.0% 64 1 0.411080 984616 2395191 1059 127056 57 1 0 1 1 0 0 0 99.1% 64 8 0.390130 888896 2278461 993 127056 2 0 0 1 1 0 0 0 99.8% 64 16 0.369033 800000 2167828 993 127056 3 0 0 1 1 0 0 0 99.8% 64 32 0.338126 666688 1971714 993 127056 10 0 0 1 1 0 0 0 99.7% 64 64 0.297335 500032 1681711 997 127072 18 0 0 3 3 0 0 0 99.7% 64 256 0.199420 200000 1002910 993 127056 211 0 0 1 1 0 0 0 99.5% 64 1024 0.113953 58880 516704 993 127056 284 0 0 1 1 0 0 0 64.1% 64 4096 0.094596 15424 163051 687 23684 415 248 0 1 1 0 0 0 23.7% running: large-partition-slicing Testing slicing of large partition: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000586 1 1706 3 164 2 1 0 1 1 0 0 0 9.0% 0 32 0.000587 32 54539 3 164 2 1 0 1 1 0 0 0 9.9% 0 256 0.000688 256 372343 4 196 2 1 0 1 1 0 0 0 20.7% 0 4096 0.004320 4096 948185 19 676 10 1 0 1 1 0 0 0 36.7% 500000 1 0.000882 1 1134 5 228 3 2 0 1 1 0 0 0 14.3% 500000 32 0.000881 32 36321 5 228 3 2 0 1 1 0 0 0 14.3% 500000 256 0.000961 256 266386 6 260 3 2 0 1 1 0 0 0 21.9% 500000 4096 0.003127 4096 1309805 21 740 14 2 0 1 1 0 0 0 54.0% running: large-partition-slicing-clustering-keys Testing slicing of large partition using clustering keys: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000639 1 1564 3 164 2 0 0 1 1 0 0 0 13.9% 0 32 0.000626 32 51154 3 164 2 0 0 1 1 0 0 0 15.3% 0 256 0.000716 256 357560 4 168 2 0 0 1 1 0 0 0 23.1% 0 4096 0.003681 4096 1112743 16 680 8 1 0 1 1 0 0 0 38.5% 500000 1 0.000966 1 1035 4 424 3 2 0 1 1 0 0 0 12.4% 500000 32 0.000911 32 35121 5 296 3 1 0 1 1 0 0 0 13.1% 500000 256 0.000978 256 261645 5 296 3 1 0 1 1 0 0 0 19.1% 500000 4096 0.003155 4096 1298139 11 744 6 1 0 1 1 0 0 0 44.5% running: large-partition-slicing-single-key-reader Testing slicing of large partition, single-partition reader: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000756 1 1323 4 484 2 0 0 1 1 0 0 0 11.3% 0 32 0.000625 32 51174 3 164 2 0 0 1 1 0 0 0 15.5% 0 256 0.000705 256 363337 4 196 2 0 0 1 1 0 0 0 24.3% 0 4096 0.003603 4096 1136829 16 900 8 1 0 1 1 0 0 0 44.4% 500000 1 0.000880 1 1136 5 228 3 3 0 1 1 0 0 0 12.6% 500000 32 0.000882 32 36268 5 228 3 1 0 1 1 0 0 0 14.0% 500000 256 0.000965 256 265178 6 260 3 1 0 1 1 0 0 0 20.8% 500000 4096 0.003098 4096 1322024 21 740 14 2 0 1 1 0 0 0 54.6% running: large-partition-select-few-rows Testing selecting few rows from a large partition: stride rows time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1000000 1 0.000631 1 1585 3 164 2 2 0 1 1 0 0 0 15.2% 500000 2 0.000873 2 2291 5 228 3 2 0 1 1 0 0 0 13.2% 250000 4 0.001404 4 2850 9 356 5 4 0 1 1 0 0 0 11.9% 125000 8 0.002878 8 2779 21 740 13 8 0 1 1 0 0 0 15.5% 62500 16 0.005184 16 3087 41 1380 25 16 0 1 1 0 0 0 19.3% 2 500000 0.948899 500000 526926 1040 127056 39 0 0 1 1 0 0 0 99.9% running: large-partition-forwarding Testing forwarding with clustering restriction in a large partition: pk-scan time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu yes 0.001813 2 1103 11 1380 3 8 0 1 1 0 0 0 18.5% no 0.000922 2 2170 5 228 3 1 0 1 1 0 0 0 14.1% running: small-partition-skips Testing scanning small partitions with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 1.023396 1000000 977139 1104 139668 12 0 0 2 2 0 0 0 99.7% -> 1 1 2.176794 500000 229696 6200 177660 5109 0 0 5108 7679 0 0 0 69.9% -> 1 8 1.130179 111112 98314 6200 177660 5109 0 0 5108 9647 0 0 0 41.5% -> 1 16 0.972022 58824 60517 6200 177660 5109 0 0 5108 9913 0 0 0 32.0% -> 1 32 0.880783 30304 34406 6201 177664 5110 0 0 5108 10057 0 0 0 25.2% -> 1 64 0.829019 15385 18558 6199 177656 5108 0 0 5107 10135 0 0 0 20.4% -> 1 256 2.248487 3892 1731 5028 168948 3937 0 0 3936 7801 0 0 0 4.6% -> 1 1024 0.342806 976 2847 2076 146948 985 105 0 984 1955 0 0 0 9.3% -> 1 4096 0.088605 245 2765 739 18152 492 246 0 247 490 0 0 0 11.1% -> 64 1 1.796715 984616 548009 6274 177660 5120 0 0 5108 5187 0 0 0 63.1% -> 64 8 1.688994 888896 526287 6200 177660 5109 0 0 5108 5674 0 0 0 61.2% -> 64 16 1.593196 800000 502135 6200 177660 5109 0 0 5108 6143 0 0 0 58.7% -> 64 32 1.438651 666688 463412 6200 177660 5109 0 0 5108 6807 0 0 0 56.5% -> 64 64 1.290205 500032 387560 6200 177660 5109 0 0 5108 7660 0 0 0 49.2% -> 64 256 2.136466 200000 93613 5252 170616 4161 0 0 4160 6267 0 0 0 13.8% -> 64 1024 0.388871 58880 151413 2317 148784 1226 107 0 1225 1844 0 0 0 23.4% -> 64 4096 0.107253 15424 143809 807 19100 562 244 0 321 482 0 0 0 24.2% running: small-partition-slicing Testing slicing small partitions: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.002773 1 361 3 68 2 0 0 1 1 0 0 0 10.5% 0 32 0.002905 32 11015 3 68 2 0 0 1 1 0 0 0 11.6% 0 256 0.003170 256 80764 4 104 2 0 0 1 1 0 0 0 17.8% 0 4096 0.008125 4096 504095 20 616 11 1 0 1 1 0 0 0 54.1% 500000 1 0.002914 1 343 3 72 2 0 0 1 2 0 0 0 10.7% 500000 32 0.002967 32 10786 3 72 2 0 0 1 2 0 0 0 12.6% 500000 256 0.003338 256 76685 5 112 3 0 0 2 2 0 0 0 17.4% 500000 4096 0.008495 4096 482141 21 624 12 1 0 2 2 0 0 0 52.3% ======================================== With the patchset ======================================== running: large-partition-skips Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1 0 0.340110 1000000 2940229 1000 126956 42 0 0 0 0 0 0 0 97.5% 1 1 1.401352 500000 356798 993 127056 2 0 0 1 1 0 0 0 99.9% 1 8 0.463124 111112 239918 993 127056 2 0 0 1 1 0 0 0 99.8% 1 16 0.330050 58824 178228 993 127056 11 0 0 1 1 0 0 0 99.7% 1 32 0.255981 30304 118384 993 127056 8 0 0 1 1 0 0 0 99.7% 1 64 0.215160 15385 71505 997 127072 263 0 0 3 3 0 0 0 99.4% 1 256 0.159702 3892 24370 993 127056 239 0 0 1 1 0 0 0 95.6% 1 1024 0.094403 976 10339 993 127056 298 0 0 1 1 0 0 0 58.9% 1 4096 0.082501 245 2970 649 22452 391 252 0 1 1 0 0 0 20.1% 64 1 0.415227 984616 2371272 1059 127056 52 1 0 1 1 0 0 0 99.3% 64 8 0.391556 888896 2270166 993 127056 2 0 0 1 1 0 0 0 99.8% 64 16 0.372075 800000 2150102 993 127056 4 0 0 1 1 0 0 0 99.7% 64 32 0.337454 666688 1975641 993 127056 15 0 0 1 1 0 0 0 99.7% 64 64 0.296345 500032 1687333 997 127072 21 0 0 3 3 0 0 0 99.7% 64 256 0.199221 200000 1003911 993 127056 204 0 0 1 1 0 0 0 99.4% 64 1024 0.118224 58880 498037 993 127056 275 0 0 1 1 0 0 0 61.8% 64 4096 0.095098 15424 162191 687 23684 417 248 0 1 1 0 0 0 23.7% running: large-partition-slicing Testing slicing of large partition: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000585 1 1709 3 164 2 1 0 1 1 0 0 0 10.7% 0 32 0.000589 32 54353 3 164 2 1 0 1 1 0 0 0 10.0% 0 256 0.000688 256 372293 4 196 2 1 0 1 1 0 0 0 20.7% 0 4096 0.004336 4096 944562 19 676 10 1 0 1 1 0 0 0 36.9% 500000 1 0.000877 1 1140 5 228 3 2 0 1 1 0 0 0 13.6% 500000 32 0.000883 32 36222 5 228 3 2 0 1 1 0 0 0 14.4% 500000 256 0.000963 256 265804 6 260 3 2 0 1 1 0 0 0 22.0% 500000 4096 0.003008 4096 1361779 21 740 17 2 0 1 1 0 0 0 56.7% running: large-partition-slicing-clustering-keys Testing slicing of large partition using clustering keys: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000623 1 1604 3 164 2 0 0 1 1 0 0 0 13.9% 0 32 0.000624 32 51261 3 164 2 0 0 1 1 0 0 0 14.7% 0 256 0.000714 256 358484 4 168 2 0 0 1 1 0 0 0 22.6% 0 4096 0.003687 4096 1110990 16 680 8 1 0 1 1 0 0 0 38.6% 500000 1 0.000973 1 1028 4 424 3 2 0 1 1 0 0 0 12.1% 500000 32 0.000914 32 35022 5 296 3 1 0 1 1 0 0 0 12.8% 500000 256 0.000986 256 259646 5 296 3 1 0 1 1 0 0 0 19.7% 500000 4096 0.003155 4096 1298122 11 744 6 1 0 1 1 0 0 0 44.5% running: large-partition-slicing-single-key-reader Testing slicing of large partition, single-partition reader: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000766 1 1305 4 484 2 0 0 1 1 0 0 0 12.2% 0 32 0.000626 32 51111 3 164 2 0 0 1 1 0 0 0 15.2% 0 256 0.000710 256 360563 4 196 2 0 0 1 1 0 0 0 25.2% 0 4096 0.003963 4096 1033440 16 900 8 1 0 1 1 0 0 0 40.2% 500000 1 0.000877 1 1141 5 228 3 1 0 1 1 0 0 0 12.7% 500000 32 0.000882 32 36272 5 228 3 1 0 1 1 0 0 0 14.2% 500000 256 0.000959 256 266937 6 260 3 1 0 1 1 0 0 0 21.1% 500000 4096 0.003103 4096 1319992 21 740 14 2 0 1 1 0 0 0 53.9% running: large-partition-select-few-rows Testing selecting few rows from a large partition: stride rows time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1000000 1 0.000631 1 1586 3 164 2 2 0 1 1 0 0 0 13.8% 500000 2 0.000872 2 2295 5 228 3 2 0 1 1 0 0 0 13.4% 250000 4 0.001483 4 2698 9 356 5 4 0 1 1 0 0 0 11.2% 125000 8 0.002894 8 2764 21 740 13 8 0 1 1 0 0 0 15.6% 62500 16 0.005182 16 3087 41 1380 25 16 0 1 1 0 0 0 19.5% 2 500000 0.942943 500000 530255 1040 127056 38 0 0 1 1 0 0 0 99.9% running: large-partition-forwarding Testing forwarding with clustering restriction in a large partition: pk-scan time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu yes 0.001807 2 1107 11 1380 3 8 0 1 1 0 0 0 18.9% no 0.000924 2 2165 5 228 3 1 0 1 1 0 0 0 14.1% running: small-partition-skips Testing scanning small partitions with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 1.009953 1000000 990145 1104 139668 11 0 0 2 2 0 0 0 99.7% -> 1 1 2.213846 500000 225851 6200 177660 5109 0 0 5108 7679 0 0 0 70.3% -> 1 8 1.150029 111112 96617 6200 177660 5109 0 0 5108 9647 0 0 0 42.3% -> 1 16 0.989438 58824 59452 6200 177660 5109 0 0 5108 9913 0 0 0 33.2% -> 1 32 0.891590 30304 33989 6201 177664 5110 0 0 5108 10057 0 0 0 26.4% -> 1 64 0.840952 15385 18295 6199 177656 5108 0 0 5107 10135 0 0 0 21.6% -> 1 256 2.247875 3892 1731 5028 168948 3937 0 0 3936 7801 0 0 0 5.0% -> 1 1024 0.345917 976 2821 2076 146948 985 105 0 984 1955 0 0 0 10.0% -> 1 4096 0.088806 245 2759 739 18152 492 246 0 247 490 0 0 0 11.6% -> 64 1 1.821995 984616 540406 6274 177660 5119 0 0 5108 5187 0 0 0 63.9% -> 64 8 1.715052 888896 518291 6200 177660 5109 0 0 5108 5674 0 0 0 61.9% -> 64 16 1.620385 800000 493710 6200 177660 5109 0 0 5108 6143 0 0 0 59.4% -> 64 32 1.464497 666688 455233 6200 177660 5109 0 0 5108 6807 0 0 0 56.9% -> 64 64 1.311386 500032 381300 6200 177660 5109 0 0 5108 7660 0 0 0 50.0% -> 64 256 2.153954 200000 92853 5252 170616 4161 0 0 4160 6267 0 0 0 14.3% -> 64 1024 0.350275 58880 168097 2317 148784 1226 107 0 1225 1844 0 0 0 27.5% -> 64 4096 0.107498 15424 143482 807 19100 562 244 0 321 482 0 0 0 24.5% running: small-partition-slicing Testing slicing small partitions: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.002872 1 348 3 68 2 0 0 1 1 0 0 0 10.2% 0 32 0.002833 32 11297 3 68 2 0 0 1 1 0 0 0 12.1% 0 256 0.003145 256 81404 4 104 2 0 0 1 1 0 0 0 17.9% 0 4096 0.008110 4096 505079 20 616 12 1 0 1 1 0 0 0 54.4% 500000 1 0.002934 1 341 3 72 2 1 0 1 2 0 0 0 10.6% 500000 32 0.002871 32 11145 3 72 2 0 0 1 2 0 0 0 12.0% 500000 256 0.003216 256 79598 5 112 3 0 0 2 2 0 0 0 18.3% 500000 4096 0.008557 4096 478692 21 624 12 1 0 2 2 0 0 0 51.9% " * 'projects/sstables-30/index_reader_cleanup/v3' of https://github.com/argenet/scylla: sstables: Remove "lower_" from index_reader public methods. sstables: Make index_reader::advance_upper_past() method private. sstables: Stop using index_reader::advance_upper_past() outside the class. sstables: Move promoted_index_block from types.hh to index_entry.hh. sstables: Factor out promoted index into a separate class. sstables: Use std::optional instead of std::experimental optional in index_reader.	2018-07-01 12:30:29 +03:00
Botond Dénes	da53ea7a13	tests.py: add --jobs command line parameter Allowing for setting the number of jobs to use for running the tests. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d58d6393c6271bffc37ab3b5edc37b00ef485d9c.1529433590.git.bdenes@scylladb.com>	2018-07-01 12:26:41 +03:00
Avi Kivity	db2c029f7a	dht: add i_partitioner::sharding_ignore_msb() While the sharding algorithm is exposed (as cpu_sharding_algorithm_name()), the ignore_msb parameter is not. Add a function to do that.	2018-07-01 12:17:35 +03:00
Vladimir Krivopalov	b24eb5c11d	sstables: Remove "lower_" from index_reader public methods. The index_reader class public interface has been amended to only deal with the upper bound cursor along with advancing the lower bound. Since the class users can only explicitly operate with the lower bound cursor (take data file position, advance to the next partition, etc), it no longer makes sense to specify that the method operates on the lower bound cursor in its name. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-29 11:48:33 -07:00
Vladimir Krivopalov	30109a693b	sstables: Make index_reader::advance_upper_past() method private. No changes made to the code except that it is moved around. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-29 11:47:48 -07:00
Vladimir Krivopalov	80d1d5017f	sstables: Stop using index_reader::advance_upper_past() outside the class. The only case when it needs to be called is when an index_reader is advanced to a specific partition as part of sstable_reader initialisation. Instead, we're passing an optional upper_bound parameter that is used to call advance_upper_past() internally if partition is found. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-29 11:47:20 -07:00
Duarte Nunes	0db5419ec5	Merge 'Avoid copies when unfreezing frozen_mutation' from Paweł " When frozen mutation gets deserialised current implementation copies its value 3 times: from IDL buffer to bytes object, from bytes object to atomic_cell and then atomic_cell is copied again. Moreover, the value gets linearised which may cause a large allocation. All of that is very wasteful. This patch devirtualises and reworks IDL reading code so that when used with partition_builder the cell value is copied only once and without linearisation: from the IDL buffer to the final atomic_cell. perf_simple_query -c4, medians of 30 results: ./perf_before ./perf_after diff read 310576.54 316273.90 1.8% write 359913.15 375579.44 4.4% microbenchmark, perf_idl: BEFORE test iterations median mad min max frozen_mutation.freeze_one_small_row 2142435 462.431ns 0.125ns 462.306ns 467.659ns frozen_mutation.unfreeze_one_small_row 1640949 601.422ns 0.082ns 601.340ns 605.279ns frozen_mutation.apply_one_small_row 1538969 645.993ns 0.405ns 645.588ns 656.510ns AFTER test iterations median mad min max frozen_mutation.freeze_one_small_row 2139548 455.525ns 0.631ns 454.894ns 456.707ns frozen_mutation.unfreeze_one_small_row 1760139 566.157ns 0.003ns 566.153ns 584.339ns frozen_mutation.apply_one_small_row 1582050 610.951ns 0.060ns 610.891ns 613.044ns Tests: unit(release) " * tag 'avoid-copy-unfreeze/v2' of https://github.com/pdziepak/scylla: mutation_partition_view: use column_mapping_entry::is_atomic() schema: column_mapping_entry: cache abstract_type::is_atomic() schema: column_mapping_entry: reduce logic duplication mutation_partition_view: do not linearise or copy cell value atomic_cell: allow passing value via ser::buffer_view mutation_partition_view: pass cell by value to visitor mutation_partition_view: devirtualise accept() storage_proxy: use mutation_partition_view::{first, last}_row_key() mutation_partition_view: add last_row_key() and first_row_key() getters	2018-06-28 22:55:20 +01:00
Paweł Dziepak	c45e291084	mutation_partition_view: use column_mapping_entry::is_atomic()	2018-06-28 22:16:42 +01:00
Paweł Dziepak	6c54a97320	schema: column_mapping_entry: cache abstract_type::is_atomic() IDL deserialisation code calls is_atomic() for each cell. An additional indirection and a virtual call can be avoided by caching that value in column_mapping_entry. There is already very similar optimisation done for column_definitions.	2018-06-28 22:16:42 +01:00
Paweł Dziepak	2bfdc2d781	schema: column_mapping_entry: reduce logic duplication User-defined constructors often make it more likely that a careless developer will forget to update one of them when adding a new member to a structure. The risk of that happening can be reduced by reducing code duplication with delegating constructors.	2018-06-28 22:16:42 +01:00
Paweł Dziepak	199f9196e9	mutation_partition_view: do not linearise or copy cell value	2018-06-28 22:11:19 +01:00
Paweł Dziepak	92700c6758	atomic_cell: allow passing value via ser::buffer_view	2018-06-28 22:11:19 +01:00
Paweł Dziepak	bf330a99f0	mutation_partition_view: pass cell by value to visitor mutation_partition_view needs to create an atomic_cell from IDL-serialised data. Then that cell is passed to the visitor. However, because generic mutation_partition_visitor interface was used, the cell was passed by constant reference which forced the visitor to needlessly copy it. This patch takes advantage of the fact that mutation_partition_view is devirtualised now and adjust the interfaces of its visitors so that the cell can be passed without copying.	2018-06-28 22:11:19 +01:00
Paweł Dziepak	569176aad1	mutation_partition_view: devirtualise accept() There are only two types of visitors used and only one of them appears in the hot path. They can be devirtualised without too much effort, which also enables future custom interface specialisations specific to mutation_partition_views and its users, not necessairly in the scope of more general mutation_partition_visitor.	2018-06-28 22:11:19 +01:00
Paweł Dziepak	6bd71015e7	storage_proxy: use mutation_partition_view::{first, last}_row_key()	2018-06-28 22:11:19 +01:00
Paweł Dziepak	2259eee97c	mutation_partition_view: add last_row_key() and first_row_key() getters Some users (e.g. reconciliation code) need only to know the clustering key of the first or the last row in the partition. This was done with a full visitor visiting every single cell of the partition, which is very wasteful. This patch adds direct getters for the needed information.	2018-06-28 22:11:19 +01:00
Vladimir Krivopalov	a497edcbda	sstables: Move promoted_index_block from types.hh to index_entry.hh. It is only being used by index_reader internally and never exposed so should not be listed in commonly used types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-28 12:28:59 -07:00
Vladimir Krivopalov	81fba73e9d	sstables: Factor out promoted index into a separate class. An index entry may or may not have a promoted index. All the optional fields are better scoped under the same class to avoid lots of separate optional fields and give better representation. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-28 12:28:59 -07:00
Asias He	bb4d361cf6	storage_service: Limit number of REPLICATION_FINISHED verb can retry In the removenode operation, if the message servicing is stopped, e.g., due to disk io error isolation, the node can keep retrying the REPLICATION_FINISHED verb infinitely. Scylla log full of such message was observed: [shard 0] storage_service - Fail to send REPLICATION_FINISHED to $IP:0: seastar::rpc::closed_error (connection is closed) To fix, limit the number of retires. Tests: update_cluster_layout_tests.py Fixes #3542 Message-Id: <638d392d6b39cc2dd2b175d7f000e7fb1d474f87.1529927816.git.asias@scylladb.com>	2018-06-28 19:54:01 +01:00
Paweł Dziepak	e9dffc753c	tests/mutation: test external_memory_usage()	2018-06-28 19:20:23 +01:00
Paweł Dziepak	8153df7684	atomic_cell: add external chunks and overheads to external_memory_usage()	2018-06-28 19:20:23 +01:00
Paweł Dziepak	2dc78a6ca2	data::cell: expose size overhead of external chunks	2018-06-28 18:01:17 +01:00
Paweł Dziepak	6adc78d690	imr::utils::object: expose size overhead	2018-06-28 18:01:17 +01:00
Paweł Dziepak	e69f2c361c	tests/mutation: properly mark atomic_cells that are collection members	2018-06-28 18:00:39 +01:00
Takuya ASADA	972ce88601	dist/common/scripts/scylla_setup: allow input multiple disk paths on RAID disk prompt Allow "/dev/sda1,/dev/sdb1" style input on RAID disk prompt.	2018-06-29 01:37:19 +09:00
Takuya ASADA	a83c66b402	dist/common/scripts/scylla_raid_setup: skip constructing RAID0 when only one disk specified When only one disk specified, create XFS directly on the disk instead of creating RAID0 volume on the disk.	2018-06-29 01:37:19 +09:00
Takuya ASADA	99fb754221	dist/common/scripts/scylla_raid_setup: fix module import sys module was missing, import it. Fixes #3548	2018-06-29 01:37:19 +09:00
Takuya ASADA	f2132c61bd	dist/common/scripts/scylla_setup: check disk is used in MDRAID Check disk is used in MDRAID by /proc/mdstat.	2018-06-29 01:37:19 +09:00
Takuya ASADA	daccc10a06	dist/common/scripts/scylla_setup: move unmasking scylla-fstrim.timer on scylla_fstrim_setup Currently, enabling scylla-fstrim.timer is part of 'enable-service', it will be enabled even --no-fstrim-setup specified (or input 'No' on interactive setup prompt). To apply --no-fstrim-setup we need to enabling scylla-fstrim.timer in scylla_fstrim_setup instead of enable-service part of scylla_setup. Fixes #3248	2018-06-29 01:37:19 +09:00
Takuya ASADA	fa6db21fea	dist/common/scripts/scylla_setup: use print() instead of logging.error() Align with other script scripts, use print().	2018-06-29 01:37:19 +09:00
Takuya ASADA	2401115e14	dist/common/scripts/scylla_setup: implement do_verify_package() for Gentoo Linux Implement Gentoo Linux support on scylla_setup.	2018-06-29 01:37:19 +09:00
Takuya ASADA	9d537cb449	dist/common/scripts/scylla_coredump_setup: run os.remove() when deleting directory is symlink Since shutil.rmtree() causes exception when running on symlink, we need to check the path is symlink, run os.remove() when it symlink. Fixes #3544	2018-06-29 01:37:19 +09:00
Takuya ASADA	5b4da4d4bd	dist/common/scripts/scylla_setup: don't include the disk on unused list when it contains partitions On current implementation, we are checking the partition is mounted, but a disk contains the partition marked as unused. To avoid the problem, we should skip a disk which contains partitions. Fixes #3545	2018-06-29 01:37:19 +09:00
Takuya ASADA	83bc72b0ab	dist/common/scripts/scylla_setup: skip running rest of the check when the disk detected as used Don't need to run check when we already detected the disk as used.	2018-06-29 01:37:19 +09:00
Takuya ASADA	1650d37dae	dist/common/scripts/scylla_setup: add a disk to selected list correctly When a disk path typed on the RAID setup prompt, the script mistakenly splits the input for each character, like ['/', 'd', 'e', 'v', '/', 's', 'd', 'b']. To fix the issue we need to use selected.append() instead of selected +=. See #3545	2018-06-29 01:37:19 +09:00
Takuya ASADA	4b5826ff5a	dist/common/scripts/scylla_setup: fix wrong indent list_block_devices() should return 'devices' on both re.match() is matched and unmatched.	2018-06-29 01:37:19 +09:00
Takuya ASADA	f828c5c4f3	dist/common/scripts: sync instance type list for detect NIC type to latest one Current instance type list is outdated, sync with latest table from: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#enabling_enhanced_networking Fixes #3536	2018-06-29 01:37:19 +09:00
Takuya ASADA	6cffb164d6	dist/common/scripts: verify systemd unit existance using 'systemctl cat' Verify unit existance by running 'systemctl cat {}' silently, raise exception if the unit doesn't exist.	2018-06-29 01:37:19 +09:00
Vladimir Krivopalov	82f76b0947	Use std::reference_wrapper instead of a plain reference in bound_view. The presence of a plain reference prohibits the bound_view class from being copyable. The trick employed to work around that was to use 'placement new' for copy-assigning bound_view objects, but this approach is ill-formed and causes undefined behaviour for classes that have const and/or reference members. The solution is to use a std::reference_wrapper instead. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <a0c951649c7aef2f66612fc006c44f8a33713931.1530113273.git.vladimir@scylladb.com>	2018-06-28 11:24:06 +01:00
Avi Kivity	c87a961667	Merge "Add multishard_writer support" from Asias " We need a multishard_writer which gets mutation fragments from a producer (e.g., from the network using the rpc streaming) and consumes the mutation fragments with a consumer (e.g., write to sstable). The multishard_writer will take care of the mutation fragments do not belong to current shard. This multishard_writer will be used in the new scylla streaming. " * 'asias/multishard_writer_v10.1' of github.com:scylladb/seastar-dev: tests: Add multishard_writer_test to test.py tests: Add test for multishard_writer multishard_writer: Introduce multishard_writer tests: Allow random_mutation_generator to generate mutations belong to remote shrard	2018-06-28 12:36:55 +03:00
Asias He	fd8b7efb99	tests: Add multishard_writer_test to test.py For multishard_writer class testing.	2018-06-28 17:20:29 +08:00
Asias He	4050a4b24e	tests: Add test for multishard_writer	2018-06-28 17:20:29 +08:00
Asias He	f4b406cce1	multishard_writer: Introduce multishard_writer The multishard_writer class gets mutation_fragments generated from flat_mutation_reader and consumes the mutation_fragments with multishard_writer::_consumer. If the mutation_fragment does not belong to the shard multishard_writer is on, it will forward the mutation_fragment to the correct shard. Future returned by multishard_writer() becomes ready when all the mutation_fragments are consumed. Tests: tests/multishard_writer_test.cc Tests: dtest update_cluster_layout_tests.py Fixes #3497	2018-06-28 17:20:28 +08:00
Asias He	8eccff1723	tests: Allow random_mutation_generator to generate mutations belong to remote shrard - make_local_keys returns keys of current shard - make_keys returns keys of current or remote shard	2018-06-28 17:20:28 +08:00
Asias He	27cb41ddeb	range_streamer: Use float for time took for stream It is useful when the total time to stream is small, e.g, 2.0 seconds and 2.9 seconds. Showing the time as interger number of seconds is not accurate in such case. Message-Id: <d801b57279981c72acb907ad4b0190ba4d938a3d.1530175052.git.asias@scylladb.com>	2018-06-28 11:39:14 +03:00
Vladimir Krivopalov	fc629b9ca6	sstables: Use std::optional instead of std::experimental optional in index_reader. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-27 16:47:53 -07:00
Tomasz Grabiec	0a1aec2bd6	tests: perf_row_cache_update: Test with an active reader surviving memtable flush Exposes latency issues caused by mutation_cleaner life time issues, fixed by eralier commits.	2018-06-27 21:51:04 +02:00
Tomasz Grabiec	074be4d4e8	memtable, cache: Run mutation_cleaner worker in its own scheduling group The worker is responsible for merging MVCC snapshots, which is similar to merging sstables, but in memory. The new scheduling group will be therefore called "memory compaction". We should run it in a separate scheduling group instead of main/memtables, so that it doesn't disrupt writes and other system activities. It's also nice for monitoring how much CPU time we spend on this.	2018-06-27 21:51:04 +02:00
Tomasz Grabiec	6c6ffaee71	mutation_cleaner: Make merge() redirect old instance to the new one If memtable snapshot goes away after memtable started merging to cache, it would enqueue the snapshots for cleaning on the memtable's cleaner, which will have to clean without deferrring when the memtable is destroyed. That may stall the reactor. To avoid this, make merge() cause the old instance of the cleaner to redirect to the new instance (owned by cache), like we do for regions. This way the snapshots mentioned earlier can be cleaned after memtable is destroyed, gracefully.	2018-06-27 21:51:04 +02:00
Tomasz Grabiec	450985dfee	mvcc: Use RAII to ensure that partition versions are merged Before this patch, maybe_merge_versions() had to be manually called before partition snapshot goes away. That is error prone and makes client code more complicated. Delegate that task to a new partition_snapshot_ptr object, through which all snapshots are published now.	2018-06-27 21:51:04 +02:00
Avi Kivity	e1efda8b0c	Merge "Disable sstable filtering based on min/max clustering key components" from Tomasz " With DateTiered and TimeWindow, there is a read optimization enabled which excludes sstables based on overlap with recorded min/max values of clustering key components. The problem is that it doesn't take into account partition tombstones and static rows, which should still be returned by the reader even if there is no overlap in the query's clustering range. A read which returns no clustering rows can mispopulate cache, which will appear as partition deletion or writes to the static row being lost. Until node restart or eviction of the partition entry. There is also a bad interaction between cache population on read and that optimization. When the clustering range of the query doesn't overlap with any sstable, the reader will return no partition markers for the read, which leads cache populator to assume there is no partition in sstables and it will cache an empty partition. This will cause later reads of that partition to miss prior writes to that partition until it is evicted from cache or node is restarted. Disable until a more elaborate fix is implemented. Fixes #3552 Fixes #3553 " * tag 'tgrabiec/disable-min-max-sstable-filtering-v1' of github.com:tgrabiec/scylla: tests: Add test for slicing a mutation source with date tiered compaction strategy tests: Check that database conforms to mutation source database: Disable sstable filtering based on min/max clustering key components	2018-06-27 14:28:27 +03:00
Calle Wilund	054514a47a	sstables::compress: Ensure unqualified compressor name if possible Fixes #3546 Both older origin and scylla writes "known" compressor names (i.e. those in origin namespace) unqualified (i.e. LZ4Compressor). This behaviour was not preserved in the virtualization change. But probably should be. Message-Id: <20180627110930.1619-1-calle@scylladb.com>	2018-06-27 14:16:50 +03:00
Tomasz Grabiec	d1e8c32b2e	gdb: Add pretty printer for managed_vector	2018-06-27 13:07:28 +02:00
Tomasz Grabiec	b0e8547569	gdb: Add pretty printer for rows	2018-06-27 13:07:28 +02:00
Tomasz Grabiec	da19508317	gdb: Add mutation_partition pretty printer	2018-06-27 13:07:28 +02:00
Tomasz Grabiec	d485e1c1d8	gdb: Add pretty printer for partition_entry	2018-06-27 13:07:28 +02:00
Tomasz Grabiec	b51c70ef69	gdb: Add pretty printer for managed_bytes	2018-06-27 13:07:28 +02:00
Tomasz Grabiec	d76cfa77b1	gdb: Add iteration wrapper for intrusive_set_external_comparator	2018-06-27 13:07:24 +02:00
Tomasz Grabiec	aa0b41f0b2	gdb: Add iteration wrapper for boost intrusive set	2018-06-27 13:04:47 +02:00
Tomasz Grabiec	c26a304fbb	mvcc: Merge partition version versions gradually in the background When snapshots go away, typically when the last reader is destroyed, we used to merge adjacent versions atomically. This could induce reactor stalls if partitions were large. This is especially true for versions created on cache update from memtables. The solution is to allow this process to be preempted and move to the background. mutation_cleaner keeps a linked list of such unmerged snapshots and has a worker fiber which merges them incrementally and asynchronously with regards to reads. This reduces scheduling latency spikes in tests/perf_row_cache_update for the case of large partition with many rows. For -c1 -m1G I saw them dropping from 23ms to 2ms.	2018-06-27 12:48:30 +02:00
Tomasz Grabiec	4d3cc2867a	mutation_partition: Make merging preemtable	2018-06-27 12:48:30 +02:00
Tomasz Grabiec	4995a8c568	tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots Preparation for switching to background merging.	2018-06-27 12:48:30 +02:00
Piotr Sarna	03753cc431	database: make drop_column_family wait on reads in progress drop_column_family now waits for both writes and reads in progress. It solves possible liveness issues with row cache, when column_family could be dropped prematurely, before the read request was finished. Phaser operation is passed inside database::query() call. There are other places where reading logic is applied (e.g. view replicas), but these are guarded with different synchronization mechanisms, while _pending_reads_phaser applies to regular reads only. Fixes #3357 Reported-by: Duarte Nunes <duarte@scylladb.com> Signed-off-by: Piotr Sarna <sarna@scylladb.com> Message-Id: <d58a5ee10596d0d62c765ee2114ac171b6f087d2.1529928323.git.sarna@scylladb.com>	2018-06-27 10:02:56 +01:00
Piotr Sarna	e1a867cbe3	database: add phaser for reads Currently drop_column_family waits on write_in_progress phaser, but there's no such mechanism for reads. This commit adds a corresponding reads phaser. Refs #3357 Reported-by: Duarte Nunes <duarte@scylladb.com> Signed-off-by: Piotr Sarna <sarna@scylladb.com> Message-Id: <70b5fdd44efbc24df61585baef024b809cabe527.1529928323.git.sarna@scylladb.com>	2018-06-27 10:02:56 +01:00
Tomasz Grabiec	b4879206fb	tests: Add test for slicing a mutation source with date tiered compaction strategy Reproducer for https://github.com/scylladb/scylla/issues/3552	2018-06-26 18:54:44 +02:00
Tomasz Grabiec	826a237c2e	tests: Check that database conforms to mutation source	2018-06-26 18:54:44 +02:00
Tomasz Grabiec	19b76bf75b	database: Disable sstable filtering based on min/max clustering key components With DateTiered and TimeWindow, there is a read optimization enabled which excludes sstables based on overlap with recorded min/max values of clustering key components. The problem is that it doesn't take into account partition tombstones and static rows, which should still be returned by the reader even if there is no overlap in the query's clustering range. A read which returns no clustering rows can mispopulate cache, which will appear as partition deletion or writes to the static row being lost. Until node restart or eviction of the partition entry. There is also a bad interaction between cache population on read and that optimization. When the clustering range of the query doesn't overlap with any sstable, the reader will return no partition markers for the read, which leads cache populator to assume there is no partition in sstables and it will cache an empty partition. This will cause later reads of that partition to miss prior writes to that partition until it is evicted from cache or node is restarted. Disable until a more elaborate fix is implemented. Fixes #3552 Fixes #3553	2018-06-26 18:54:44 +02:00

3240 changed files with 134111 additions and 47444 deletions

3

.dockerignore Normal file

View File

@@ -0,0 +1,3 @@
 .git
 build
 seastar/build

									
										4

.github/PULL_REQUEST_TEMPLATE.md
									
										vendored
									
												View File
											
				@@ -1,4 +0,0 @@

				Scylla doesn't use pull-requests, please send a patch to the [mailing list](mailto:scylladb-dev@googlegroups.com) instead.

				See our [contributing guidelines](../CONTRIBUTING.md) and our [Scylla development guidelines](../HACKING.md) for more information.

				If you have any questions please don't hesitate to send a mail to the [dev list](mailto:scylladb-dev@googlegroups.com).

5

.gitignore vendored

View File

@@ -19,3 +19,8 @@ CMakeLists.txt.user
 __pycache__CMakeLists.txt.user
 .gdbinit
 resources
 .pytest_cache
 /expressions.tokens
 tags
 testlog/*
 test/*/*.reject

11

.gitmodules vendored

View File

@@ -1,14 +1,17 @@
 [submodule "seastar"]
 	path = seastar
 	url = ../seastar
 	url = ../scylla-seastar
 	ignore = dirty
 [submodule "swagger-ui"]
 	path = swagger-ui
 	url = ../scylla-swagger-ui
 	ignore = dirty
 [submodule "dist/ami/files/scylla-ami"]
 	path = dist/ami/files/scylla-ami
 	url = ../scylla-ami
 [submodule "xxHash"]
 	path = xxHash
 	url = ../xxHash
 [submodule "libdeflate"]
 	path = libdeflate
 	url = ../libdeflate
 [submodule "zstd"]
 	path = zstd
 	url = ../zstd

									
										3

CMakeLists.txt
									
												View File
												
				@@ -97,7 +97,7 @@ scan_scylla_source_directories(

				          service

				          sstables

				          streaming

				          tests

				          test

				          thrift

				          tracing

				          transport

				@@ -138,4 +138,5 @@ target_include_directories(scylla PUBLIC

				        ${SEASTAR_INCLUDE_DIRS}

				        ${Boost_INCLUDE_DIRS}

				        xxhash

				        libdeflate

				        build/release/gen)

									
										2

CONTRIBUTING.md
									
												View File
												
				@@ -1,6 +1,6 @@

				# Asking questions or requesting help

				Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) for general questions and help.

				Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.

				# Reporting an issue

									
										97

HACKING.md
									
												View File
												
				@@ -20,11 +20,22 @@ $ git submodule update --init --recursive

				Scylla depends on the system package manager for its development dependencies.

				Running `./install_dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.

				Running `./install-dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.

				On Ubuntu and Debian based Linux distributions, some packages

				required to build Scylla are missing in the official upstream:

				- libthrift-dev and libthrift

				- antlr3-c++-dev

				Try running ```sudo ./scripts/scylla_current_repo``` to add Scylla upstream,

				and get the missing packages from it.

				### Build system

				**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native thread, and up to 3 GB per native thread while linking.

				**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native

				thread, and up to 3 GB per native thread while linking. GCC >= 8.1.1. is

				required.

				Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.

				@@ -43,11 +54,9 @@ The full suite of options for project configuration is available via

				$ ./configure.py --help

				```

				The most important options are:

				The most important option is:

				- `--mode={release,debug,all}`: Debug mode enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) and allows for debugging with tools like GDB. Debugging builds are generally slower and generate much larger object files than release builds.

				- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.

				- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.

				Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.

				@@ -55,6 +64,30 @@ To save time -- for instance, to avoid compiling all unit tests -- you can also

				```bash

				$ ninja-build build/release/tests/schema_change_test

				$ ninja-build build/release/service/storage_proxy.o

				```

				You can also specify a single mode. For example

				```bash

				$ ninja-build release

				```

				Will build everytihng in release mode. The valid modes are

				* Debug: Enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)

				  and other sanity checks. It has no optimizations, which allows for debugging with tools like

				  GDB. Debugging builds are generally slower and generate much larger object files than release builds.

				* Release: Fewer checks and more optimizations. It still has debug info.

				* Dev: No optimizations or debug info. The objective is to compile and link as fast as possible.

				  This is useful for the first iterations of a patch.

				Note that by default unit tests binaries are stripped so they can't be used with gdb or seastar-addr2line.

				To include debug information in the unit test binary, build the test binary with a `_g` suffix. For example,

				```bash

				$ ninja-build build/release/tests/schema_change_test_g

				```

				### Unit testing

				@@ -83,7 +116,7 @@ The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread

				### Preparing patches

				All changes to Scylla are submitted as patches to the public mailing list. Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.

				All changes to Scylla are submitted as patches to the public [mailing list](mailto:scylladb-dev@googlegroups.com). Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.

				Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/). There are also some guidelines that can help you make the patch review process smoother:

				@@ -112,6 +145,8 @@ The usual is "Tests: unit (release)", although running debug tests is encouraged

				5. When answering review comments, prefer inline quotes as they make it easier to track the conversation across multiple e-mails.

				6. The Linux kernel's [Submitting Patches](https://www.kernel.org/doc/html/v4.19/process/submitting-patches.html) document offers excellent advice on how to prepare patches and patchsets for review. Since the Scylla development process is derived from the kernel's, almost all of the advice there is directly applicable.

				### Finding a person to review and merge your patches

				You can use the `scripts/find-maintainer` script to find a subsystem maintainer and/or reviewer for your patches. The script accepts a filename in the git source tree as an argument and outputs a list of subsystems the file belongs to and their respective maintainers and reviewers. For example, if you changed the `cql3/statements/create_view_statement.hh` file, run the script as follows:

				@@ -164,6 +199,29 @@ On a development machine, one might run Scylla as

				$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes

				```

				To interact with scylla it is recommended to build our versions of

				cqlsh and nodetool. They are available at

				https://github.com/scylladb/scylla-tools-java and can be built with

				```bash

				$ sudo ./install-dependencies.sh

				$ ant jar

				```

				cqlsh should work out of the box, but nodetool depends on a running

				scylla-jmx (https://github.com/scylladb/scylla-jmx). It can be build

				with

				```bash

				$ mvn package

				```

				and must be started with

				```bash

				$ ./scripts/scylla-jmx

				```

				### Branches and tags

				Multiple release branches are maintained on the Git repository at https://github.com/scylladb/scylla. Release 1.5, for instance, is tracked on the `branch-1.5` branch.

				@@ -254,7 +312,7 @@ In this example, `10.0.0.2` will be sent up to 16 jobs and the local machine wil

				When a compilation is in progress, the status of jobs on all remote machines can be visualized in the terminal with `distccmon-text` or graphically as a GTK application with `distccmon-gnome`.

				One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next section speeding up this process.

				One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next sections speeding up this process.

				### Using the `gold` linker

				@@ -264,6 +322,24 @@ Linking Scylla can be slow. The gold linker can replace GNU ld and often speeds

				$ sudo alternatives --config ld

				```

				### Using split dwarf

				With debug info enabled, most of the link time is spent copying and

				relocating it. It is possible to leave most of the debug info out of

				the link by writing it to a side .dwo file. This is done by passing

				`-gsplit-dwarf` to gcc.

				Unfortunately just `-gsplit-dwarf` would slow down `gdb` startup. To

				avoid that the gold linker can be told to create an index with

				`--gdb-index`.

				More info at https://gcc.gnu.org/wiki/DebugFission.

				Both options can be enable by passing `--split-dwarf` to configure.py.

				Note that distcc is *not* compatible with it, but icecream

				(https://github.com/icecc/icecream) is.

				### Testing changes in Seastar with Scylla

				Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.

				@@ -277,3 +353,8 @@ $ git remote add local /home/tsmith/src/seastar

				$ git remote update

				$ git checkout -t local/my_local_seastar_branch

				```

				### Core dump debugging

				Slides:

				2018.11.20: https://www.slideshare.net/tomekgrabiec/scylla-core-dump-debugging-tools

31

MAINTAINERS

View File

@@ -5,8 +5,6 @@ F: Filename, directory, or pattern for the subsystem
 ---
 AUTH
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Calle Wilund <calle@scylladb.com>
 R: Vlad Zolotarov <vladz@scylladb.com>
 R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
@@ -14,22 +12,17 @@ F: auth/*
 CACHE
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 R: Piotr Jastrzebski <piotr@scylladb.com>
 F: row_cache*
 F: *mutation*
 F: tests/mvcc*
 COMMITLOG / BATCHLOGa
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Calle Wilund <calle@scylladb.com>
 F: db/commitlog/*
 F: db/batch*
 COORDINATOR
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Gleb Natapov <gleb@scylladb.com>
 F: service/storage_proxy*
@@ -49,12 +42,10 @@ M: Pekka Enberg <penberg@scylladb.com>
 F: cql3/*
 COUNTERS
 M: Paweł Dziepak <pdziepak@scylladb.com>
 F: counters*
 F: tests/counter_test*
 GOSSIP
 M: Duarte Nunes <duarte@scylladb.com>
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 R: Asias He <asias@scylladb.com>
 F: gms/*
@@ -65,14 +56,11 @@ F: dist/docker/*
 LSA
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 F: utils/logalloc*
 MATERIALIZED VIEWS
 M: Duarte Nunes <duarte@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 R: Duarte Nunes <duarte@scylladb.com>
 M: Nadav Har'El <nyh@scylladb.com>
 F: db/view/*
 F: cql3/statements/*view*
@@ -82,14 +70,12 @@ F: dist/*
 REPAIR
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Asias He <asias@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 F: repair/*
 SCHEMA MANAGEMENT
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
 F: db/schema_tables*
 F: db/legacy_schema_migrator*
@@ -98,15 +84,13 @@ F: schema*
 SECONDARY INDEXES
 M: Pekka Enberg <penberg@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 M: Nadav Har'El <nyh@scylladb.com>
 R: Pekka Enberg <penberg@scylladb.com>
 F: db/index/*
 F: cql3/statements/*index*
 SSTABLES
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Raphael S. Carvalho <raphaelsc@scylladb.com>
 R: Glauber Costa <glauber@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
@@ -114,18 +98,17 @@ F: sstables/*
 STREAMING
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Asias He <asias@scylladb.com>
 F: streaming/*
 F: service/storage_service.*
 THRIFT TRANSPORT LAYER
 M: Duarte Nunes <duarte@scylladb.com>
 F: thrift/*
 ALTERNATOR
 M: Nadav Har'El <nyh@scylladb.com>
 F: alternator/*
 F: alternator-test/*
 THE REST
 M: Avi Kivity <avi@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Nadav Har'El <nyh@scylladb.com>
 F: *

									
										29

README-DPDK.md
									
												View File
											
				@@ -1,29 +0,0 @@

				Seastar and DPDK

				================

				Seastar uses the Data Plane Development Kit to drive NIC hardware directly.  This

				provides an enormous performance boost.

				To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a

				run-time parameter.  This will use the DPDK package provided as a git submodule with the

				seastar sources.

				To use your own self-compiled DPDK package, follow this procedure:

				1. Setup host to compile DPDK:

				   - Ubuntu 

				     `sudo apt-get install -y build-essential linux-image-extra-$(uname -r)` 

				2. Prepare a DPDK SDK:

				   - Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`

				   - Untar it.

				   - Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.

				   - For DPDK 1.7.x: edit config/common_linuxapp: 

				     - Set CONFIG_RTE_LIBRTE_PMD_BOND  to 'n'.

				     - Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.

				     - Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.

				   - Start the tools/setup.sh script as root.

				   - Compile a linuxapp target (option 9).

				   - Install IGB_UIO module (option 11).

				   - Bind some physical port to IGB_UIO (option 17).

				   - Configure hugepage mappings (option 14/15).

				3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.

									
										43

README.md
									
												View File
												
				@@ -2,17 +2,23 @@

				## Quick-start

				To get the build going quickly, Scylla offers a [frozen toolchain](tools/toolchain/README.md)

				which would build and run Scylla using a pre-configured Docker image.

				Using the frozen toolchain will also isolate all of the installed

				dependencies in a Docker container.

				Assuming you have met the toolchain prerequisites, which is running

				Docker in user mode, building and running is as easy as:

				```bash

				$ git submodule update --init --recursive

				$ sudo ./install-dependencies.sh

				$ ./configure.py --mode=release

				$ ninja-build -j4 # Assuming 4 system threads.

				$ ./build/release/scylla

				$ # Rejoice!

				```

				$ ./tools/toolchain/dbuild ./configure.py

				$ ./tools/toolchain/dbuild ninja build/release/scylla

				$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1

				 ```

				Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.

				**Note**: GCC >= 8.1.1 is required to compile Scylla.

				## Running Scylla

				* Run Scylla

				@@ -21,10 +27,10 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev

				```

				* run Scylla with one CPU and ./tmp as data directory

				* run Scylla with one CPU and ./tmp as work directory

				```

				./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1

				./build/release/scylla --workdir tmp --smp 1

				```

				* For more run options:

				@@ -32,6 +38,24 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev

				./build/release/scylla --help

				```

				## Scylla APIs and compatibility

				By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and

				Thrift. There is also experimental support for the API of Amazon DynamoDB,

				but being experimental it needs to be explicitly enabled to be used. For more

				information on how to enable the experimental DynamoDB compatibility in Scylla,

				and the current limitations of this feature, see

				[Alternator](docs/alternator/alternator.md) and

				[Getting started with Alternator](docs/alternator/getting-started.md).

				## Documentation

				Documentation can be found in [./docs](./docs) and on the

				[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear

				definition of what goes where, so when looking for something be sure to check

				both.

				Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).

				User documentation can be found [here](https://docs.scylladb.com/).

				## Building Fedora RPM

				As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:

				@@ -75,4 +99,5 @@ docker run -p $(hostname -i):9042:9042 -i -t <image name>

				## Contributing to Scylla

				[Hacking howto](HACKING.md)

				[Guidelines for contributing](CONTRIBUTING.md)

4

SCYLLA-VERSION-GEN

View File

@@ -1,6 +1,7 @@
 #!/bin/sh
 VERSION=666.development
 PRODUCT=scylla
 VERSION=3.3.4
 if test -f version
 then
@@ -22,3 +23,4 @@ echo "$SCYLLA_VERSION-$SCYLLA_RELEASE"
 mkdir -p build
 echo "$SCYLLA_VERSION" > build/SCYLLA-VERSION-FILE
 echo "$SCYLLA_RELEASE" > build/SCYLLA-RELEASE-FILE
 echo "$PRODUCT" > build/SCYLLA-PRODUCT-FILE

									
										78

alternator-test/README.md
									
										Normal file
									
												View File
												
				@@ -0,0 +1,78 @@

				Tests for Alternator that should also pass, identically, against DynamoDB.

				Tests use the boto3 library for AWS API, and the pytest frameworks

				(both are available from Linux distributions, or with "pip install").

				To run all tests against the local installation of Alternator on

				http://localhost:8000, just run `pytest`.

				Some additional pytest options:

				* To run all tests in a single file, do `pytest test_table.py`.

				* To run a single specific test, do `pytest test_table.py::test_create_table_unsupported_names`.

				* Additional useful pytest options, especially useful for debugging tests:

				  * -v: show the names of each individual test running instead of just dots.

				  * -s: show the full output of running tests (by default, pytest captures the test's output and only displays it if a test fails)

				Add the `--aws` option to test against AWS instead of the local installation.

				For example - `pytest --aws test_item.py` or `pytest --aws`.

				If you plan to run tests against AWS and not just a local Scylla installation,

				the files ~/.aws/credentials should be configured with your AWS key:

				```

				[default]

				aws_access_key_id = XXXXXXXXXXXXXXXXXXXX

				aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

				```

				and ~/.aws/config with the default region to use in the test:

				```

				[default]

				region = us-east-1

				```

				## HTTPS support

				In order to run tests with HTTPS, run pytest with `--https` parameter. Note that the Scylla cluster needs to be provided

				with alternator\_https\_port configuration option in order to initialize a HTTPS server.

				Moreover, running an instance of a HTTPS server requires a certificate. Here's how to easily generate

				a key and a self-signed certificate, which is sufficient to run `--https` tests:

				```

				openssl genrsa 2048 > scylla.key

				openssl req -new -x509 -nodes -sha256 -days 365 -key scylla.key -out scylla.crt

				```

				If this pair is put into `conf/` directory, it will be enough

				to allow the alternator HTTPS server to think it's been authorized and properly certified.

				Still, boto3 library issues warnings that the certificate used for communication is self-signed,

				and thus should not be trusted. For the sake of running local tests this warning is explicitly ignored.

				## Authorization

				By default, boto3 prepares a properly signed Authorization header with every request.

				In order to confirm the authorization, the server recomputes the signature by using

				user credentials (user-provided username + a secret key known by the server),

				and then checks if it matches the signature from the header.

				Early alternator code did not verify signatures at all, which is also allowed by the protocol.

				A partial implementation of the authorization verification can be allowed by providing a Scylla

				configuration parameter:

				```yaml

				  alternator_enforce_authorization: true

				```

				The implementation is currently coupled with Scylla's system\_auth.roles table,

				which means that an additional step needs to be performed when setting up Scylla

				as the test environment. Tests will use the following credentials:

				Username: `alternator`

				Secret key: `secret_pass`

				With CQLSH, it can be achieved by executing this snipped:

				```bash

				cqlsh -x "INSERT INTO system_auth.roles (role, salted_hash) VALUES ('alternator', 'secret_pass')"

				```

				Most tests expect the authorization to succeed, so they will pass even with `alternator_enforce_authorization`

				turned off. However, test cases from `test_authorization.py` may require this option to be turned on,

				so it's advised.

									
										179

alternator-test/conftest.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,179 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# This file contains "test fixtures", a pytest concept described in

				# https://docs.pytest.org/en/latest/fixture.html.

				# A "fixture" is some sort of setup which an invididual test requires to run.

				# The fixture has setup code and teardown code, and if multiple tests

				# require the same fixture, it can be set up only once - while still allowing

				# the user to run individual tests and automatically set up the fixtures they need.

				import pytest

				import boto3

				from util import create_test_table

				# Test that the Boto libraries are new enough. These tests want to test a

				# large variety of DynamoDB API features, and to do this we need a new-enough

				# version of the the Boto libraries (boto3 and botocore) so that they can

				# access all these API features.

				# In particular, the BillingMode feature was added in botocore 1.12.54.

				import botocore

				import sys

				from distutils.version import LooseVersion

				if (LooseVersion(botocore.__version__) < LooseVersion('1.12.54')):

				    pytest.exit("Your Boto library is too old. Please upgrade it,\ne.g. using:\n    sudo pip{} install --upgrade boto3".format(sys.version_info[0]))

				# By default, tests run against a local Scylla installation on localhost:8080/.

				# The "--aws" option can be used to run against Amazon DynamoDB in the us-east-1

				# region.

				def pytest_addoption(parser):

				    parser.addoption("--aws", action="store_true",

				        help="run against AWS instead of a local Scylla installation")

				    parser.addoption("--https", action="store_true",

				        help="communicate via HTTPS protocol on port 8043 instead of HTTP when"

				            " running against a local Scylla installation")

				# "dynamodb" fixture: set up client object for communicating with the DynamoDB

				# API. Currently this chooses either Amazon's DynamoDB in the default region

				# or a local Alternator installation on http://localhost:8080 - depending on the

				# existence of the "--aws" option. In the future we should provide options

				# for choosing other Amazon regions or local installations.

				# We use scope="session" so that all tests will reuse the same client object.

				@pytest.fixture(scope="session")

				def dynamodb(request):

				    if request.config.getoption('aws'):

				        return boto3.resource('dynamodb')

				    else:

				        # Even though we connect to the local installation, Boto3 still

				        # requires us to specify dummy region and credential parameters,

				        # otherwise the user is forced to properly configure ~/.aws even

				        # for local runs.

				        local_url = 'https://localhost:8043' if request.config.getoption('https') else 'http://localhost:8000'

				        # Disable verifying in order to be able to use self-signed TLS certificates

				        verify = not request.config.getoption('https')

				        # Silencing the 'Unverified HTTPS request warning'

				        if request.config.getoption('https'):

				            import urllib3

				            urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

				        return boto3.resource('dynamodb', endpoint_url=local_url, verify=verify,

				            region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')

				# "test_table" fixture: Create and return a temporary table to be used in tests

				# that need a table to work on. The table is automatically deleted at the end.

				# We use scope="session" so that all tests will reuse the same client object.

				# This "test_table" creates a table which has a specific key schema: both a

				# partition key and a sort key, and both are strings. Other fixtures (below)

				# can be used to create different types of tables.

				#

				# TODO: Although we are careful about deleting temporary tables when the

				# fixture is torn down, in some cases (e.g., interrupted tests) we can be left

				# with some tables not deleted, and they will never be deleted. Because all

				# our temporary tables have the same test_table_prefix, we can actually find

				# and remove these old tables with this prefix. We can have a fixture, which

				# test_table will require, which on teardown will delete all remaining tables

				# (possibly from an older run). Because the table's name includes the current

				# time, we can also remove just tables older than a particular age. Such

				# mechanism will allow running tests in parallel, without the risk of deleting

				# a parallel run's temporary tables.

				@pytest.fixture(scope="session")

				def test_table(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }

				        ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				        ])

				    yield table

				    # We get back here when this fixture is torn down. We ask Dynamo to delete

				    # this table, but not wait for the deletion to complete. The next time

				    # we create a test_table fixture, we'll choose a different table name

				    # anyway.

				    table.delete()

				# The following fixtures test_table_* are similar to test_table but create

				# tables with different key schemas.

				@pytest.fixture(scope="session")

				def test_table_s(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],

				        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])

				    yield table

				    table.delete()

				@pytest.fixture(scope="session")

				def test_table_b(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],

				        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'B' } ])

				    yield table

				    table.delete()

				@pytest.fixture(scope="session")

				def test_table_sb(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],

				        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'B' } ])

				    yield table

				    table.delete()

				@pytest.fixture(scope="session")

				def test_table_sn(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],

				        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'N' } ])

				    yield table

				    table.delete()

				# "filled_test_table" fixture:  Create a temporary table to be used in tests

				# that involve reading data - GetItem, Scan, etc. The table is filled with

				# 328 items - each consisting of a partition key, clustering key and two

				# string attributes. 164 of the items are in a single partition (with the

				# partition key 'long') and the 164 other items are each in a separate

				# partition. Finally, a 329th item is added with different attributes.

				# This table is supposed to be read from, not updated nor overwritten.

				# This fixture returns both a table object and the description of all items

				# inserted into it.

				@pytest.fixture(scope="session")

				def filled_test_table(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }

				        ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				        ])

				    count = 164

				    items = [{

				        'p': str(i),

				        'c': str(i),

				        'attribute': "x" * 7,

				        'another': "y" * 16

				    } for i in range(count)]

				    items = items + [{

				        'p': 'long',

				        'c': str(i),

				        'attribute': "x" * (1 + i % 7),

				        'another': "y" * (1 + i % 16)

				    } for i in range(count)]

				    items.append({'p': 'hello', 'c': 'world', 'str': 'and now for something completely different'})

				    with table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    yield table, items

				    table.delete()

									
										74

alternator-test/test_authorization.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,74 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for authorization

				import pytest

				import botocore

				from botocore.exceptions import ClientError

				import boto3

				import requests

				# Test that trying to perform an operation signed with a wrong key

				# will not succeed

				def test_wrong_key_access(request, dynamodb):

				    print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")

				    url = dynamodb.meta.client._endpoint.host

				    with pytest.raises(ClientError, match='UnrecognizedClientException'):

				        if url.endswith('.amazonaws.com'):

				            boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='wrong_id', aws_secret_access_key='').describe_endpoints()

				        else:

				            verify = not url.startswith('https')

				            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='', verify=verify).describe_endpoints()

				# A similar test, but this time the user is expected to exist in the database (for local tests)

				def test_wrong_password(request, dynamodb):

				    print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")

				    url = dynamodb.meta.client._endpoint.host

				    with pytest.raises(ClientError, match='UnrecognizedClientException'):

				        if url.endswith('.amazonaws.com'):

				            boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='alternator', aws_secret_access_key='wrong_key').describe_endpoints()

				        else:

				            verify = not url.startswith('https')

				            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='wrong_key', verify=verify).describe_endpoints()

				# A test ensuring that expired signatures are not accepted

				def test_expired_signature(dynamodb, test_table):

				    url = dynamodb.meta.client._endpoint.host

				    print(url)

				    headers = {'Content-Type': 'application/x-amz-json-1.0',

				               'X-Amz-Date': '20170101T010101Z',

				               'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',

				               'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'

				    }

				    response = requests.post(url, headers=headers, verify=False)

				    assert not response.ok

				    assert "InvalidSignatureException" in response.text and "Signature expired" in response.text

				# A test ensuring that signatures that exceed current time too much are not accepted.

				# Watch out - this test is valid only for around next 1000 years, it needs to be updated later.

				def test_signature_too_futuristic(dynamodb, test_table):

				    url = dynamodb.meta.client._endpoint.host

				    print(url)

				    headers = {'Content-Type': 'application/x-amz-json-1.0',

				               'X-Amz-Date': '30200101T010101Z',

				               'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',

				               'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'

				    }

				    response = requests.post(url, headers=headers, verify=False)

				    assert not response.ok

				    assert "InvalidSignatureException" in response.text and "Signature not yet current" in response.text

									
										253

alternator-test/test_batch.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,253 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for batch operations - BatchWriteItem, BatchReadItem.

				# Note that various other tests in other files also use these operations,

				# so they are actually tested by other tests as well.

				import pytest

				from botocore.exceptions import ClientError

				from util import random_string, full_scan, full_query, multiset

				# Test ensuring that items inserted by a batched statement can be properly extracted

				# via GetItem. Schema has both hash and sort keys.

				def test_basic_batch_write_item(test_table):

				    count = 7

				    with test_table.batch_writer() as batch:

				        for i in range(count):

				            batch.put_item(Item={

				                'p': "batch{}".format(i),

				                'c': "batch_ck{}".format(i),

				                'attribute': str(i),

				                'another': 'xyz'

				            })

				    for i in range(count):

				        item = test_table.get_item(Key={'p': "batch{}".format(i), 'c': "batch_ck{}".format(i)}, ConsistentRead=True)['Item']

				        assert item['p'] == "batch{}".format(i)

				        assert item['c'] == "batch_ck{}".format(i)

				        assert item['attribute'] == str(i)

				        assert item['another'] == 'xyz' 

				# Test batch write to a table with only a hash key

				def test_batch_write_hash_only(test_table_s):

				    items = [{'p': random_string(), 'val': random_string()} for i in range(10)]

				    with test_table_s.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    for item in items:

				        assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item

				# Test batch delete operation (DeleteRequest): We create a bunch of items, and

				# then delete them all.

				def test_batch_write_delete(test_table_s):

				    items = [{'p': random_string(), 'val': random_string()} for i in range(10)]

				    with test_table_s.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    for item in items:

				        assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item

				    with test_table_s.batch_writer() as batch:

				        for item in items:

				            batch.delete_item(Key={'p': item['p']})

				    # Verify that all items are now missing:

				    for item in items:

				        assert not 'Item' in test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)

				# Test the same batch including both writes and delete. Should be fine.

				def test_batch_write_and_delete(test_table_s):

				    p1 = random_string()

				    p2 = random_string()

				    test_table_s.put_item(Item={'p': p1})

				    assert 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)

				    assert not 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)

				    with test_table_s.batch_writer() as batch:

				        batch.put_item({'p': p2})

				        batch.delete_item(Key={'p': p1})

				    assert not 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)

				    assert 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)

				# It is forbidden to update the same key twice in the same batch.

				# DynamoDB says "Provided list of item keys contains duplicates".

				def test_batch_write_duplicate_write(test_table_s, test_table):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*duplicates'):

				        with test_table_s.batch_writer() as batch:

				            batch.put_item({'p': p})

				            batch.put_item({'p': p})

				    c = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*duplicates'):

				        with test_table.batch_writer() as batch:

				            batch.put_item({'p': p, 'c': c})

				            batch.put_item({'p': p, 'c': c})

				    # But it is fine to touch items with one component the same, but the other not.

				    other = random_string()

				    with test_table.batch_writer() as batch:

				        batch.put_item({'p': p, 'c': c})

				        batch.put_item({'p': p, 'c': other})

				        batch.put_item({'p': other, 'c': c})

				def test_batch_write_duplicate_delete(test_table_s, test_table):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*duplicates'):

				        with test_table_s.batch_writer() as batch:

				            batch.delete_item(Key={'p': p})

				            batch.delete_item(Key={'p': p})

				    c = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*duplicates'):

				        with test_table.batch_writer() as batch:

				            batch.delete_item(Key={'p': p, 'c': c})

				            batch.delete_item(Key={'p': p, 'c': c})

				    # But it is fine to touch items with one component the same, but the other not.

				    other = random_string()

				    with test_table.batch_writer() as batch:

				        batch.delete_item(Key={'p': p, 'c': c})

				        batch.delete_item(Key={'p': p, 'c': other})

				        batch.delete_item(Key={'p': other, 'c': c})

				def test_batch_write_duplicate_write_and_delete(test_table_s, test_table):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*duplicates'):

				        with test_table_s.batch_writer() as batch:

				            batch.delete_item(Key={'p': p})

				            batch.put_item({'p': p})

				    c = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*duplicates'):

				        with test_table.batch_writer() as batch:

				            batch.delete_item(Key={'p': p, 'c': c})

				            batch.put_item({'p': p, 'c': c})

				    # But it is fine to touch items with one component the same, but the other not.

				    other = random_string()

				    with test_table.batch_writer() as batch:

				        batch.delete_item(Key={'p': p, 'c': c})

				        batch.put_item({'p': p, 'c': other})

				        batch.put_item({'p': other, 'c': c})

				# Test that BatchWriteItem's PutRequest completely replaces an existing item.

				# It shouldn't merge it with a previously existing value. See also the same

				# test for PutItem - test_put_item_replace().

				def test_batch_put_item_replace(test_table_s, test_table):

				    p = random_string()

				    with test_table_s.batch_writer() as batch:

				        batch.put_item(Item={'p': p, 'a': 'hi'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}

				    with test_table_s.batch_writer() as batch:

				        batch.put_item(Item={'p': p, 'b': 'hello'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}

				    c = random_string()

				    with test_table.batch_writer() as batch:

				        batch.put_item(Item={'p': p, 'c': c, 'a': 'hi'})

				    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}

				    with test_table.batch_writer() as batch:

				        batch.put_item(Item={'p': p, 'c': c, 'b': 'hello'})

				    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}

				# Test that if one of the batch's operations is invalid, because a key

				# column is missing or has the wrong type, the entire batch is rejected

				# before any write is done.

				def test_batch_write_invalid_operation(test_table_s):

				    # test key attribute with wrong type:

				    p1 = random_string()

				    p2 = random_string()

				    items = [{'p': p1}, {'p': 3}, {'p': p2}]

				    with pytest.raises(ClientError, match='ValidationException'):

				        with test_table_s.batch_writer() as batch:

				            for item in items:

				                batch.put_item(item)

				    for p in [p1, p2]:

				        assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)

				    # test missing key attribute:

				    p1 = random_string()

				    p2 = random_string()

				    items = [{'p': p1}, {'x': 'whatever'}, {'p': p2}]

				    with pytest.raises(ClientError, match='ValidationException'):

				        with test_table_s.batch_writer() as batch:

				            for item in items:

				                batch.put_item(item)

				    for p in [p1, p2]:

				        assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)

				# Basic test for BatchGetItem, reading several entire items.

				# Schema has both hash and sort keys.

				def test_batch_get_item(test_table):

				    items = [{'p': random_string(), 'c': random_string(), 'val': random_string()} for i in range(10)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    keys = [{k: x[k] for k in ('p', 'c')} for x in items]

				    # We use the low-level batch_get_item API for lack of a more convenient

				    # API. At least it spares us the need to encode the key's types...

				    reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ConsistentRead': True}})

				    print(reply)

				    got_items = reply['Responses'][test_table.name]

				    assert multiset(got_items) == multiset(items)

				# Same, with schema has just hash key.

				def test_batch_get_item_hash(test_table_s):

				    items = [{'p': random_string(), 'val': random_string()} for i in range(10)]

				    with test_table_s.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    keys = [{k: x[k] for k in ('p')} for x in items]

				    reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': keys, 'ConsistentRead': True}})

				    got_items = reply['Responses'][test_table_s.name]

				    assert multiset(got_items) == multiset(items)

				# Test what do we get if we try to read two *missing* values in addition to

				# an existing one. It turns out the missing items are simply not returned,

				# with no sign they are missing.

				def test_batch_get_item_missing(test_table_s):

				    p = random_string();

				    test_table_s.put_item(Item={'p': p})

				    reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}, {'p': random_string()}, {'p': p}], 'ConsistentRead': True}})

				    got_items = reply['Responses'][test_table_s.name]

				    assert got_items == [{'p' : p}]

				# If all the keys requested from a particular table are missing, we still

				# get a response array for that table - it's just empty.

				def test_batch_get_item_completely_missing(test_table_s):

				    reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}], 'ConsistentRead': True}})

				    got_items = reply['Responses'][test_table_s.name]

				    assert got_items == []

				# Test GetItem with AttributesToGet

				def test_batch_get_item_attributes_to_get(test_table):

				    items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    keys = [{k: x[k] for k in ('p', 'c')} for x in items]

				    for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:

				        reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'AttributesToGet': wanted, 'ConsistentRead': True}})

				        got_items = reply['Responses'][test_table.name]

				        expected_items = [{k: item[k] for k in wanted if k in item} for item in items]

				        assert multiset(got_items) == multiset(expected_items)

				# Test GetItem with ProjectionExpression (just a simple one, with

				# top-level attributes)

				def test_batch_get_item_projection_expression(test_table):

				    items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    keys = [{k: x[k] for k in ('p', 'c')} for x in items]

				    for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:

				        reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ProjectionExpression': ",".join(wanted), 'ConsistentRead': True}})

				        got_items = reply['Responses'][test_table.name]

				        expected_items = [{k: item[k] for k in wanted if k in item} for item in items]

				        assert multiset(got_items) == multiset(expected_items)

1106

alternator-test/test_condition_expression.py Normal file

View File

File diff suppressed because it is too large Load Diff

									
										49

alternator-test/test_describe_endpoints.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,49 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Test for the DescribeEndpoints operation

				import boto3

				# Test that the DescribeEndpoints operation works as expected: that it

				# returns one endpoint (it may return more, but it never does this in

				# Amazon), and this endpoint can be used to make more requests.

				def test_describe_endpoints(request, dynamodb):

				    endpoints = dynamodb.meta.client.describe_endpoints()['Endpoints']

				    # It is not strictly necessary that only a single endpoint be returned,

				    # but this is what Amazon DynamoDB does today (and so does Alternator).

				    assert len(endpoints) == 1

				    for endpoint in endpoints:

				        assert 'CachePeriodInMinutes' in endpoint.keys()

				        address = endpoint['Address']

				        # Check that the address is a valid endpoint by checking that we can

				        # send it another describe_endpoints() request ;-) Note that the

				        # address does not include the "http://" or "https://" prefix, and

				        # we need to choose one manually.

				        prefix = "https://" if request.config.getoption('https') else "http://"

				        verify = not request.config.getoption('https')

				        url = prefix + address

				        if address.endswith('.amazonaws.com'):

				            boto3.client('dynamodb',endpoint_url=url, verify=verify).describe_endpoints()

				        else:

				            # Even though we connect to the local installation, Boto3 still

				            # requires us to specify dummy region and credential parameters,

				            # otherwise the user is forced to properly configure ~/.aws even

				            # for local runs.

				            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass', verify=verify).describe_endpoints()

				        # Nothing to check here - if the above call failed with an exception,

				        # the test would fail.

									
										169

alternator-test/test_describe_table.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,169 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the DescribeTable operation.

				# Some attributes used only by a specific major feature will be tested

				# elsewhere:

				#  1. Tests for describing tables with global or local secondary indexes

				#     (the GlobalSecondaryIndexes and LocalSecondaryIndexes attributes)

				#     are in test_gsi.py and test_lsi.py.

				#  2. Tests for the stream feature (LatestStreamArn, LatestStreamLabel,

				#     StreamSpecification) will be in the tests devoted to the stream

				#     feature.

				#  3. Tests for describing a restored table (RestoreSummary, TableId)

				#     will be together with tests devoted to the backup/restore feature.

				import pytest

				from botocore.exceptions import ClientError

				import re

				import time

				from util import multiset

				# Test that DescribeTable correctly returns the table's name and state

				def test_describe_table_basic(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert got['TableName'] == test_table.name

				    assert got['TableStatus'] == 'ACTIVE'

				# Test that DescribeTable correctly returns the table's schema, in

				# AttributeDefinitions and KeySchema attributes

				def test_describe_table_schema(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    expected = { # Copied from test_table()'s fixture

				        'KeySchema': [ { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }

				        ],

				        'AttributeDefinitions': [

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				        ]

				    }

				    assert got['KeySchema'] == expected['KeySchema']

				    # The list of attribute definitions may be arbitrarily reordered

				    assert multiset(got['AttributeDefinitions']) == multiset(expected['AttributeDefinitions'])

				# Test that DescribeTable correctly returns the table's billing mode,

				# in the BillingModeSummary attribute.

				def test_describe_table_billing(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert got['BillingModeSummary']['BillingMode'] == 'PAY_PER_REQUEST'

				    # The BillingModeSummary should also contain a

				    # LastUpdateToPayPerRequestDateTime attribute, which is a date.

				    # We don't know what date this is supposed to be, but something we

				    # do know is that the test table was created already with this billing

				    # mode, so the table creation date should be the same as the billing

				    # mode setting date.

				    assert 'LastUpdateToPayPerRequestDateTime' in got['BillingModeSummary']

				    assert got['BillingModeSummary']['LastUpdateToPayPerRequestDateTime'] == got['CreationDateTime']

				# Test that DescribeTable correctly returns the table's creation time.

				# We don't know what this creation time is supposed to be, so this test

				# cannot be very thorough... We currently just tests against something we

				# know to be wrong - returning the *current* time, which changes on every

				# call.

				@pytest.mark.xfail(reason="DescribeTable does not return table creation time")

				def test_describe_table_creation_time(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert 'CreationDateTime' in got

				    time1 = got['CreationDateTime']

				    time.sleep(1) 

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    time2 = got['CreationDateTime']

				    assert time1 == time2

				# Test that DescribeTable returns the table's estimated item count

				# in the ItemCount attribute. Unfortunately, there's not much we can

				# really test here... The documentation says that the count can be

				# delayed by six hours, so the number we get here may have no relation

				# to the current number of items in the test table. The attribute should exist,

				# though. This test does NOT verify that ItemCount isn't always returned as

				# zero - such stub implementation will pass this test.

				@pytest.mark.xfail(reason="DescribeTable does not return table item count")

				def test_describe_table_item_count(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert 'ItemCount' in got

				# Similar test for estimated size in bytes - TableSizeBytes - which again,

				# may reflect the size as long as six hours ago.

				@pytest.mark.xfail(reason="DescribeTable does not return table size")

				def test_describe_table_size(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert 'TableSizeBytes' in got

				# Test the ProvisionedThroughput attribute returned by DescribeTable.

				# This is a very partial test: Our test table is configured without

				# provisioned throughput, so obviously it will not have interesting settings

				# for it. DynamoDB returns zeros for some of the attributes, even though

				# the documentation suggests missing values should have been fine too.

				@pytest.mark.xfail(reason="DescribeTable does not return provisioned throughput")

				def test_describe_table_provisioned_throughput(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert got['ProvisionedThroughput']['NumberOfDecreasesToday'] == 0

				    assert got['ProvisionedThroughput']['WriteCapacityUnits'] == 0

				    assert got['ProvisionedThroughput']['ReadCapacityUnits'] == 0

				# This is a silly test for the RestoreSummary attribute in DescribeTable -

				# it should not exist in a table not created by a restore. When testing

				# the backup/restore feature, we will have more meaninful tests for the

				# value of this attribute in that case.

				def test_describe_table_restore_summary(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert not 'RestoreSummary' in got

				# This is a silly test for the SSEDescription attribute in DescribeTable -

				# by default, a table is encrypted with AWS-owned keys, not using client-

				# owned keys, and the SSEDescription attribute is not returned at all.

				def test_describe_table_encryption(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert not 'SSEDescription' in got

				# This is a silly test for the StreamSpecification attribute in DescribeTable -

				# when there are no streams, this attribute should be missing.

				def test_describe_table_stream_specification(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert not 'StreamSpecification' in got

				# Test that the table has an ARN, a unique identifier for the table which

				# includes which zone it is on, which account, and of course the table's

				# name. The ARN format is described in

				# https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#genref-arns

				@pytest.mark.xfail(reason="DescribeTable does not return ARN")

				def test_describe_table_arn(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert 'TableArn' in got and got['TableArn'].startswith('arn:')

				# Test that the table has a TableId.

				# TODO: Figure out what is this TableId supposed to be, it is just a

				# unique id that is created with the table and never changes? Or anything

				# else?

				@pytest.mark.xfail(reason="DescribeTable does not return TableId")

				def test_describe_table_id(test_table):

				    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']

				    assert 'TableId' in got

				# DescribeTable error path: trying to describe a non-existent table should

				# result in a ResourceNotFoundException.

				def test_describe_table_non_existent_table(dynamodb):

				    with pytest.raises(ClientError, match='ResourceNotFoundException') as einfo:

				        dynamodb.meta.client.describe_table(TableName='non_existent_table')

				    # As one of the first error-path tests that we wrote, let's test in more

				    # detail that the error reply has the appropriate fields:

				    response = einfo.value.response

				    print(response)

				    err = response['Error']

				    assert err['Code'] == 'ResourceNotFoundException'

				    assert re.match(err['Message'], 'Requested resource not found: Table: non_existent_table not found')

1079

alternator-test/test_expected.py Normal file

View File

File diff suppressed because it is too large Load Diff

									
										874

alternator-test/test_gsi.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,874 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests of GSI (Global Secondary Indexes)

				#

				# Note that many of these tests are slower than usual, because many of them

				# need to create new tables and/or new GSIs of different types, operations

				# which are extremely slow in DynamoDB, often taking minutes (!).

				import pytest

				import time

				from botocore.exceptions import ClientError, ParamValidationError

				from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables

				# GSIs only support eventually consistent reads, so tests that involve

				# writing to a table and then expect to read something from it cannot be

				# guaranteed to succeed without retrying the read. The following utility

				# functions make it easy to write such tests.

				# Note that in practice, there repeated reads are almost never necessary:

				# Amazon claims that "Changes to the table data are propagated to the global

				# secondary indexes within a fraction of a second, under normal conditions"

				# and indeed, in practice, the tests here almost always succeed without a

				# retry.

				def assert_index_query(table, index_name, expected_items, **kwargs):

				    for i in range(3):

				        if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):

				            return

				        print('assert_index_query retrying')

				        time.sleep(1)

				    assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))

				def assert_index_scan(table, index_name, expected_items, **kwargs):

				    for i in range(3):

				        if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):

				            return

				        print('assert_index_scan retrying')

				        time.sleep(1)

				    assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))

				# Although quite silly, it is actually allowed to create an index which is

				# identical to the base table.

				def test_gsi_identical(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],

				        AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    items = [{'p': random_string(), 'x': random_string()} for i in range(10)]

				    with table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # Scanning the entire table directly or via the index yields the same

				    # results (in different order).

				    assert multiset(items) == multiset(full_scan(table))

				    assert_index_scan(table, 'hello', items)

				    # We can't scan a non-existant index

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_scan(table, IndexName='wrong')

				    table.delete()

				# One of the simplest forms of a non-trivial GSI: The base table has a hash

				# and sort key, and the index reverses those roles. Other attributes are just

				# copied.

				@pytest.fixture(scope="session")

				def test_table_gsi_1(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }

				        ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'c', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'p', 'KeyType': 'RANGE' },

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ],

				        )

				    yield table

				    table.delete()

				def test_gsi_simple(test_table_gsi_1):

				    items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]

				    with test_table_gsi_1.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    c = items[0]['c']

				    # The index allows a query on just a specific sort key, which isn't

				    # allowed on the base table.

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})

				    expected_items = [x for x in items if x['c'] == c]

				    assert_index_query(test_table_gsi_1, 'hello', expected_items,

				        KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})

				    # Scanning the entire table directly or via the index yields the same

				    # results (in different order).

				    assert_index_scan(test_table_gsi_1, 'hello', full_scan(test_table_gsi_1))

				def test_gsi_same_key(test_table_gsi_1):

				    c = random_string();

				    # All these items have the same sort key 'c' but different hash key 'p'

				    items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]

				    with test_table_gsi_1.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    assert_index_query(test_table_gsi_1, 'hello', items,

				        KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})

				# Check we get an appropriate error when trying to read a non-existing index

				# of an existing table. Although the documentation specifies that a

				# ResourceNotFoundException should be returned if "The operation tried to

				# access a nonexistent table or index", in fact in the specific case that

				# the table does exist but an index does not - we get a ValidationException.

				def test_gsi_missing_index(test_table_gsi_1):

				    with pytest.raises(ClientError, match='ValidationException.*wrong_name'):

				        full_query(test_table_gsi_1, IndexName='wrong_name',

				            KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})

				    with pytest.raises(ClientError, match='ValidationException.*wrong_name'):

				        full_scan(test_table_gsi_1, IndexName='wrong_name')

				# Nevertheless, if the table itself does not exist, a query should return

				# a ResourceNotFoundException, not ValidationException:

				def test_gsi_missing_table(dynamodb):

				    with pytest.raises(ClientError, match='ResourceNotFoundException'):

				        dynamodb.meta.client.query(TableName='nonexistent_table', IndexName='any_name', KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})

				    with pytest.raises(ClientError, match='ResourceNotFoundException'):

				        dynamodb.meta.client.scan(TableName='nonexistent_table', IndexName='any_name')

				# Verify that strongly-consistent reads on GSI are *not* allowed.

				@pytest.mark.xfail(reason="GSI strong consistency not checked")

				def test_gsi_strong_consistency(test_table_gsi_1):

				    with pytest.raises(ClientError, match='ValidationException.*Consistent'):

				        full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': ['hi'], 'ComparisonOperator': 'EQ'}}, IndexName='hello', ConsistentRead=True)

				    with pytest.raises(ClientError, match='ValidationException.*Consistent'):

				        full_scan(test_table_gsi_1, IndexName='hello', ConsistentRead=True)

				# Verify that a GSI is correctly listed in describe_table

				@pytest.mark.xfail(reason="DescribeTable provides index names only, no size or item count")

				def test_gsi_describe(test_table_gsi_1):

				    desc = test_table_gsi_1.meta.client.describe_table(TableName=test_table_gsi_1.name)

				    assert 'Table' in desc

				    assert 'GlobalSecondaryIndexes' in desc['Table']

				    gsis = desc['Table']['GlobalSecondaryIndexes']

				    assert len(gsis) == 1

				    gsi = gsis[0]

				    assert gsi['IndexName'] == 'hello'

				    assert 'IndexSizeBytes' in gsi     # actual size depends on content

				    assert 'ItemCount' in gsi

				    assert gsi['Projection'] == {'ProjectionType': 'ALL'}

				    assert gsi['IndexStatus'] == 'ACTIVE'

				    assert gsi['KeySchema'] == [{'KeyType': 'HASH', 'AttributeName': 'c'},

				                                {'KeyType': 'RANGE', 'AttributeName': 'p'}]

				    # TODO: check also ProvisionedThroughput, IndexArn

				# When a GSI's key includes an attribute not in the base table's key, we

				# need to remember to add its type to AttributeDefinitions.

				def test_gsi_missing_attribute_definition(dynamodb):

				    with pytest.raises(ClientError, match='ValidationException.*AttributeDefinitions'):

				        create_test_table(dynamodb,

				            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				            AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ],

				            GlobalSecondaryIndexes=[

				                {   'IndexName': 'hello',

				                    'KeySchema': [ { 'AttributeName': 'c', 'KeyType': 'HASH' } ],

				                    'Projection': { 'ProjectionType': 'ALL' }

				                }

				            ])

				# test_table_gsi_1_hash_only is a variant of test_table_gsi_1: It's another

				# case where the index doesn't involve non-key attributes. Again the base

				# table has a hash and sort key, but in this case the index has *only* a

				# hash key (which is the base's hash key). In the materialized-view-based

				# implementation, we need to remember the other part of the base key as a

				# clustering key.

				@pytest.fixture(scope="session")

				def test_table_gsi_1_hash_only(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }

				        ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'c', 'KeyType': 'HASH' },

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ],

				        )

				    yield table

				    table.delete()

				def test_gsi_key_not_in_index(test_table_gsi_1_hash_only):

				    # Test with items with different 'c' values:

				    items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]

				    with test_table_gsi_1_hash_only.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    c = items[0]['c']

				    expected_items = [x for x in items if x['c'] == c]

				    assert_index_query(test_table_gsi_1_hash_only, 'hello', expected_items,

				        KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})

				    # Test items with the same sort key 'c' but different hash key 'p'

				    c = random_string();

				    items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]

				    with test_table_gsi_1_hash_only.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    assert_index_query(test_table_gsi_1_hash_only, 'hello', items,

				        KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})

				    # Scanning the entire table directly or via the index yields the same

				    # results (in different order).

				    assert_index_scan(test_table_gsi_1_hash_only, 'hello', full_scan(test_table_gsi_1_hash_only))

				# A second scenario of GSI. Base table has just hash key, Index has a

				# different hash key - one of the non-key attributes from the base table.

				@pytest.fixture(scope="session")

				def test_table_gsi_2(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'x', 'KeyType': 'HASH' },

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    yield table

				    table.delete()

				def test_gsi_2(test_table_gsi_2):

				    items1 = [{'p': random_string(), 'x': random_string()} for i in range(10)]

				    x1 = items1[0]['x']

				    x2 = random_string()

				    items2 = [{'p': random_string(), 'x': x2} for i in range(10)]

				    items = items1 + items2

				    with test_table_gsi_2.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    expected_items = [i for i in items if i['x'] == x1]

				    assert_index_query(test_table_gsi_2, 'hello', expected_items,

				        KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})

				    expected_items = [i for i in items if i['x'] == x2]

				    assert_index_query(test_table_gsi_2, 'hello', expected_items,

				        KeyConditions={'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})

				# Test that when a table has a GSI, if the indexed attribute is missing, the

				# item is added to the base table but not the index.

				def test_gsi_missing_attribute(test_table_gsi_2):

				    p1 = random_string()

				    x1 = random_string()

				    test_table_gsi_2.put_item(Item={'p':  p1, 'x': x1})

				    p2 = random_string()

				    test_table_gsi_2.put_item(Item={'p':  p2})

				    # Both items are now in the base table:

				    assert test_table_gsi_2.get_item(Key={'p':  p1})['Item'] == {'p': p1, 'x': x1}

				    assert test_table_gsi_2.get_item(Key={'p':  p2})['Item'] == {'p': p2}

				    # But only the first item is in the index: It can be found using a

				    # Query, and a scan of the index won't find it (but a scan on the base

				    # will).

				    assert_index_query(test_table_gsi_2, 'hello', [{'p': p1, 'x': x1}],

				        KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})

				    assert any([i['p'] == p1 for i in full_scan(test_table_gsi_2)])

				    # Note: with eventually consistent read, we can't really be sure that

				    # and item will "never" appear in the index. We do this test last,

				    # so if we had a bug and such item did appear, hopefully we had enough

				    # time for the bug to become visible. At least sometimes.

				    assert not any([i['p'] == p2 for i in full_scan(test_table_gsi_2, IndexName='hello')])

				# Test when a table has a GSI, if the indexed attribute has the wrong type,

				# the update operation is rejected, and is added to neither base table nor

				# index. This is different from the case of a *missing* attribute, where

				# the item is added to the base table but not index.

				# The following three tests test_gsi_wrong_type_attribute_{put,update,batch}

				# test updates using PutItem, UpdateItem, and BatchWriteItem respectively.

				def test_gsi_wrong_type_attribute_put(test_table_gsi_2):

				    # PutItem with wrong type for 'x' is rejected, item isn't created even

				    # in the base table.

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*mismatch'):

				        test_table_gsi_2.put_item(Item={'p':  p, 'x': 3})

				    assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)

				def test_gsi_wrong_type_attribute_update(test_table_gsi_2):

				    # An UpdateItem with wrong type for 'x' is also rejected, but naturally

				    # if the item already existed, it remains as it was.

				    p = random_string()

				    x = random_string()

				    test_table_gsi_2.put_item(Item={'p':  p, 'x': x})

				    with pytest.raises(ClientError, match='ValidationException.*mismatch'):

				        test_table_gsi_2.update_item(Key={'p':  p}, AttributeUpdates={'x': {'Value': 3, 'Action': 'PUT'}})

				    assert test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': x}

				def test_gsi_wrong_type_attribute_batch(test_table_gsi_2):

				    # In a BatchWriteItem, if any update is forbidden, the entire batch is

				    # rejected, and none of the updates happen at all.

				    p1 = random_string()

				    p2 = random_string()

				    p3 = random_string()

				    items = [{'p': p1, 'x': random_string()},

				             {'p': p2, 'x': 3},

				             {'p': p3, 'x': random_string()}]

				    with pytest.raises(ClientError, match='ValidationException.*mismatch'):

				        with test_table_gsi_2.batch_writer() as batch:

				            for item in items:

				                batch.put_item(item)

				    for p in [p1, p2, p3]:

				        assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)

				# A third scenario of GSI. Index has a hash key and a sort key, both are

				# non-key attributes from the base table. This scenario may be very

				# difficult to implement in Alternator because Scylla's materialized-views

				# implementation only allows one new key column in the view, and here

				# we need two (which, also, aren't actual columns, but map items).

				@pytest.fixture(scope="session")

				def test_table_gsi_3(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'a', 'AttributeType': 'S' },

				                    { 'AttributeName': 'b', 'AttributeType': 'S' }

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'a', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'b', 'KeyType': 'RANGE' }

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    yield table

				    table.delete()

				def test_gsi_3(test_table_gsi_3):

				    items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]

				    with test_table_gsi_3.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    assert_index_query(test_table_gsi_3, 'hello', [items[3]],

				        KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})

				def test_gsi_update_second_regular_base_column(test_table_gsi_3):

				    items = [{'p': random_string(), 'a': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]

				    with test_table_gsi_3.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    items[3]['b'] = 'updated'

				    test_table_gsi_3.update_item(Key={'p':  items[3]['p']}, AttributeUpdates={'b': {'Value': 'updated', 'Action': 'PUT'}})

				    assert_index_query(test_table_gsi_3, 'hello', [items[3]],

				        KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})

				# Test that when a table has a GSI, if the indexed attribute is missing, the

				# item is added to the base table but not the index.

				# This is the same feature we already tested in test_gsi_missing_attribute()

				# above, but on a different table: In that test we used test_table_gsi_2,

				# with one indexed attribute, and in this test we use test_table_gsi_3 which

				# has two base regular attributes in the view key, and more possibilities

				# of which value might be missing. Reproduces issue #6008.

				def test_gsi_missing_attribute_3(test_table_gsi_3):

				    p = random_string()

				    a = random_string()

				    b = random_string()

				    # First, add an item with a missing "a" value. It should appear in the

				    # base table, but not in the index:

				    test_table_gsi_3.put_item(Item={'p':  p, 'b': b})

				    assert test_table_gsi_3.get_item(Key={'p':  p})['Item'] == {'p': p, 'b': b}

				    # Note: with eventually consistent read, we can't really be sure that

				    # an item will "never" appear in the index. We hope that if a bug exists

				    # and such an item did appear, sometimes the delay here will be enough

				    # for the unexpected item to become visible.

				    assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])

				    # Same thing for an item with a missing "b" value:

				    test_table_gsi_3.put_item(Item={'p':  p, 'a': a})

				    assert test_table_gsi_3.get_item(Key={'p':  p})['Item'] == {'p': p, 'a': a}

				    assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])

				    # And for an item missing both:

				    test_table_gsi_3.put_item(Item={'p':  p})

				    assert test_table_gsi_3.get_item(Key={'p':  p})['Item'] == {'p': p}

				    assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])

				# A fourth scenario of GSI. Two GSIs on a single base table.

				@pytest.fixture(scope="session")

				def test_table_gsi_4(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'a', 'AttributeType': 'S' },

				                    { 'AttributeName': 'b', 'AttributeType': 'S' }

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello_a',

				                'KeySchema': [

				                    { 'AttributeName': 'a', 'KeyType': 'HASH' },

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            },

				            {   'IndexName': 'hello_b',

				                'KeySchema': [

				                    { 'AttributeName': 'b', 'KeyType': 'HASH' },

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    yield table

				    table.delete()

				# Test that a base table with two GSIs updates both as expected.

				def test_gsi_4(test_table_gsi_4):

				    items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]

				    with test_table_gsi_4.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    assert_index_query(test_table_gsi_4, 'hello_a', [items[3]],

				        KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'}})

				    assert_index_query(test_table_gsi_4, 'hello_b', [items[3]],

				        KeyConditions={'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})

				# Verify that describe_table lists the two GSIs.

				def test_gsi_4_describe(test_table_gsi_4):

				    desc = test_table_gsi_4.meta.client.describe_table(TableName=test_table_gsi_4.name)

				    assert 'Table' in desc

				    assert 'GlobalSecondaryIndexes' in desc['Table']

				    gsis = desc['Table']['GlobalSecondaryIndexes']

				    assert len(gsis) == 2

				    assert multiset([g['IndexName'] for g in gsis]) == multiset(['hello_a', 'hello_b'])

				# A scenario for GSI in which the table has both hash and sort key

				@pytest.fixture(scope="session")

				def test_table_gsi_5(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'x', 'KeyType': 'RANGE' },

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    yield table

				    table.delete()

				def test_gsi_5(test_table_gsi_5):

				    items1 = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]

				    p1, x1 = items1[0]['p'], items1[0]['x']

				    p2, x2 = random_string(), random_string()

				    items2 = [{'p': p2, 'c': random_string(), 'x': x2} for i in range(10)]

				    items = items1 + items2

				    with test_table_gsi_5.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    expected_items = [i for i in items if i['p'] == p1 and i['x'] == x1]

				    assert_index_query(test_table_gsi_5, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},

				                       'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})

				    expected_items = [i for i in items if i['p'] == p2 and i['x'] == x2]

				    assert_index_query(test_table_gsi_5, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},

				                       'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})

				# Verify that DescribeTable correctly returns the schema of both base-table

				# and secondary indexes. KeySchema is given for each of the base table and

				# indexes, and AttributeDefinitions is merged for all of them together.

				def test_gsi_5_describe_table_schema(test_table_gsi_5):

				    got = test_table_gsi_5.meta.client.describe_table(TableName=test_table_gsi_5.name)['Table']

				    # Copied from test_table_gsi_5 fixture

				    expected_base_keyschema = [

				                    { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'c', 'KeyType': 'RANGE' } ]

				    expected_gsi_keyschema = [

				                    { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'x', 'KeyType': 'RANGE' } ]

				    expected_all_attribute_definitions = [

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x', 'AttributeType': 'S' } ]

				    assert got['KeySchema'] == expected_base_keyschema

				    gsis = got['GlobalSecondaryIndexes']

				    assert len(gsis) == 1

				    assert gsis[0]['KeySchema'] == expected_gsi_keyschema

				    # The list of attribute definitions may be arbitrarily reordered

				    assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)

				# Similar DescribeTable schema test for test_table_gsi_2. The peculiarity

				# in that table is that the base table has only a hash key p, and index

				# only hash hash key x; Now, while internally Scylla needs to add "p" as a

				# clustering key in the materialized view (in Scylla the view key always

				# contains the base key), when describing the table, "p" shouldn't be

				# returned as a range key, because the user didn't ask for it.

				# This test reproduces issue #5320.

				@pytest.mark.xfail(reason="GSI DescribeTable spurious range key (#5320)")

				def test_gsi_2_describe_table_schema(test_table_gsi_2):

				    got = test_table_gsi_2.meta.client.describe_table(TableName=test_table_gsi_2.name)['Table']

				    # Copied from test_table_gsi_2 fixture

				    expected_base_keyschema = [ { 'AttributeName': 'p', 'KeyType': 'HASH' } ]

				    expected_gsi_keyschema = [ { 'AttributeName': 'x', 'KeyType': 'HASH' } ]

				    expected_all_attribute_definitions = [

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x', 'AttributeType': 'S' } ]

				    assert got['KeySchema'] == expected_base_keyschema

				    gsis = got['GlobalSecondaryIndexes']

				    assert len(gsis) == 1

				    assert gsis[0]['KeySchema'] == expected_gsi_keyschema

				    # The list of attribute definitions may be arbitrarily reordered

				    assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)

				# All tests above involved "ProjectionType: ALL". This test checks how

				# "ProjectionType:: KEYS_ONLY" works. We note that it projects both

				# the index's key, *and* the base table's key. So items which had different

				# base-table keys cannot suddenly become the same item in the index.

				@pytest.mark.xfail(reason="GSI not supported")

				def test_gsi_projection_keys_only(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'x', 'KeyType': 'HASH' },

				                ],

				                'Projection': { 'ProjectionType': 'KEYS_ONLY' }

				            }

				        ])

				    items = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]

				    with table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    wanted = ['p', 'x']

				    expected_items = [{k: x[k] for k in wanted if k in x} for x in items]

				    assert_index_scan(table, 'hello', expected_items)

				    table.delete()

				# Test for "ProjectionType:: INCLUDE". The secondary table includes the

				# its own and the base's keys (as in KEYS_ONLY) plus the extra keys given

				# in NonKeyAttributes.

				@pytest.mark.xfail(reason="GSI not supported")

				def test_gsi_projection_include(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'x', 'KeyType': 'HASH' },

				                ],

				                'Projection': { 'ProjectionType': 'INCLUDE',

				                                'NonKeyAttributes': ['a', 'b'] }

				            }

				        ])

				    # Some items have the projected attributes a,b and some don't:

				    items = [{'p': random_string(), 'x': random_string(), 'a': random_string(), 'b': random_string(), 'y': random_string()} for i in range(10)]

				    items = items + [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]

				    with table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    wanted = ['p', 'x', 'a', 'b']

				    expected_items = [{k: x[k] for k in wanted if k in x} for x in items]

				    assert_index_scan(table, 'hello', expected_items)

				    print(len(expected_items))

				    table.delete()

				# DynamoDB's says the "Projection" argument of GlobalSecondaryIndexes is

				# mandatory, and indeed Boto3 enforces that it must be passed. The

				# documentation then goes on to claim that the "ProjectionType" member of

				# "Projection" is optional - and Boto3 allows it to be missing. But in

				# fact, it is not allowed to be missing: DynamoDB complains: "Unknown

				# ProjectionType: null".

				@pytest.mark.xfail(reason="GSI not supported")

				def test_gsi_missing_projection_type(dynamodb):

				    with pytest.raises(ClientError, match='ValidationException.*ProjectionType'):

				        create_test_table(dynamodb,

				            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],

				            AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],

				            GlobalSecondaryIndexes=[

				                {   'IndexName': 'hello',

				                    'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],

				                    'Projection': {}

				                }

				            ])

				# update_table() for creating a GSI is an asynchronous operation.

				# The table's TableStatus changes from ACTIVE to UPDATING for a short while

				# and then goes back to ACTIVE, but the new GSI's IndexStatus appears as

				# CREATING, until eventually (after a *long* time...) it becomes ACTIVE.

				# During the CREATING phase, at some point the Backfilling attribute also

				# appears, until it eventually disappears. We need to wait until all three

				# markers indicate completion.

				# Unfortunately, while boto3 has a client.get_waiter('table_exists') to

				# wait for a table to exists, there is no such function to wait for an

				# index to come up, so we need to code it ourselves.

				def wait_for_gsi(table, gsi_name):

				    start_time = time.time()

				    # Surprisingly, even for tiny tables this can take a very long time

				    # on DynamoDB - often many minutes!

				    for i in range(300):

				        time.sleep(1)

				        desc = table.meta.client.describe_table(TableName=table.name)

				        table_status = desc['Table']['TableStatus']

				        if table_status != 'ACTIVE':

				            print('%d Table status still %s' % (i, table_status))

				            continue

				        index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]

				        assert len(index_desc) == 1

				        index_status = index_desc[0]['IndexStatus']

				        if index_status != 'ACTIVE':

				            print('%d Index status still %s' % (i, index_status))

				            continue

				        # When the index is ACTIVE, this must be after backfilling completed

				        assert not 'Backfilling' in index_desc[0]

				        print('wait_for_gsi took %d seconds' % (time.time() - start_time))

				        return

				    raise AssertionError("wait_for_gsi did not complete")

				# Similarly to how wait_for_gsi() waits for a GSI to finish adding,

				# this function waits for a GSI to be finally deleted.

				def wait_for_gsi_gone(table, gsi_name):

				    start_time = time.time()

				    for i in range(300):

				        time.sleep(1)

				        desc = table.meta.client.describe_table(TableName=table.name)

				        table_status = desc['Table']['TableStatus']

				        if table_status != 'ACTIVE':

				            print('%d Table status still %s' % (i, table_status))

				            continue

				        if 'GlobalSecondaryIndexes' in desc['Table']:

				            index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]

				            if len(index_desc) != 0:

				                index_status = index_desc[0]['IndexStatus']

				                print('%d Index status still %s' % (i, index_status))

				                continue

				        print('wait_for_gsi_gone took %d seconds' % (time.time() - start_time))

				        return

				    raise AssertionError("wait_for_gsi_gone did not complete")

				# All tests above involved creating a new table with a GSI up-front. This

				# test will test creating a base table *without* a GSI, putting data in

				# it, and then adding a GSI with the UpdateTable operation. This starts

				# a backfilling stage - where data is copied to the index - and when this

				# stage is done, the index is usable. Items whose indexed column contains

				# the wrong type are silently ignored and not added to the index (it would

				# not have been possible to add such items if the GSI was already configured

				# when they were added).

				@pytest.mark.xfail(reason="GSI not supported")

				def test_gsi_backfill(dynamodb):

				    # First create, and fill, a table without GSI. The items in items1

				    # will have the appropriate string type for 'x' and will later get

				    # indexed. Items in item2 have no value for 'x', and in item3 'x' is in

				    # not a string; So the items in items2 and items3 will be missing

				    # in the index we'll create later.

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])

				    items1 = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]

				    items2 = [{'p': random_string(), 'y': random_string()} for i in range(10)]

				    items3 = [{'p': random_string(), 'x': i} for i in range(10)]

				    items = items1 + items2 + items3

				    with table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    assert multiset(items) == multiset(full_scan(table))

				    # Now use UpdateTable to create the GSI

				    dynamodb.meta.client.update_table(TableName=table.name,

				        AttributeDefinitions=[{ 'AttributeName': 'x', 'AttributeType': 'S' }],

				        GlobalSecondaryIndexUpdates=[ {  'Create':

				            {  'IndexName': 'hello',

				                'KeySchema': [{ 'AttributeName': 'x', 'KeyType': 'HASH' }],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }}])

				    # update_table is an asynchronous operation. We need to wait until it

				    # finishes and the table is backfilled.

				    wait_for_gsi(table, 'hello')

				    # As explained above, only items in items1 got copied to the gsi,

				    # and Scan on them works as expected.

				    # Note that we don't need to retry the reads here (i.e., use the

				    # assert_index_scan() or assert_index_query() functions) because after

				    # we waited for backfilling to complete, we know all the pre-existing

				    # data is already in the index.

				    assert multiset(items1) == multiset(full_scan(table, IndexName='hello'))

				    # We can also use Query on the new GSI, to search on the attribute x:

				    assert multiset([items1[3]]) == multiset(full_query(table,

				        IndexName='hello',

				        KeyConditions={'x': {'AttributeValueList': [items1[3]['x']], 'ComparisonOperator': 'EQ'}}))

				    # Let's also test that we cannot add another index with the same name

				    # that already exists

				    with pytest.raises(ClientError, match='ValidationException.*already exists'):

				        dynamodb.meta.client.update_table(TableName=table.name,

				            AttributeDefinitions=[{ 'AttributeName': 'y', 'AttributeType': 'S' }],

				            GlobalSecondaryIndexUpdates=[ {  'Create':

				                {  'IndexName': 'hello',

				                    'KeySchema': [{ 'AttributeName': 'y', 'KeyType': 'HASH' }],

				                    'Projection': { 'ProjectionType': 'ALL' }

				                }}])

				    table.delete()

				# Test deleting an existing GSI using UpdateTable

				@pytest.mark.xfail(reason="GSI not supported")

				def test_gsi_delete(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'x', 'KeyType': 'HASH' },

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    items = [{'p': random_string(), 'x': random_string()} for i in range(10)]

				    with table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # So far, we have the index for "x" and can use it:

				    assert_index_query(table, 'hello', [items[3]],

				        KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})

				    # Now use UpdateTable to delete the GSI for "x"

				    dynamodb.meta.client.update_table(TableName=table.name,

				        GlobalSecondaryIndexUpdates=[{  'Delete':

				            { 'IndexName': 'hello' } }])

				    # update_table is an asynchronous operation. We need to wait until it

				    # finishes and the GSI is removed.

				    wait_for_gsi_gone(table, 'hello')

				    # Now index is gone. We cannot query using it.

				    with pytest.raises(ClientError, match='ValidationException.*hello'):

				        full_query(table, IndexName='hello',

				            KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})

				    table.delete()

				# Utility function for creating a new table a GSI with the given name,

				# and, if creation was successful, delete it. Useful for testing which

				# GSI names work.

				def create_gsi(dynamodb, index_name):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],

				        AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': index_name,

				                'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    # Verify that the GSI wasn't just ignored, as Scylla originally did ;-)

				    assert 'GlobalSecondaryIndexes' in table.meta.client.describe_table(TableName=table.name)['Table']

				    table.delete()

				# Like table names (tested in test_table.py), index names must must also

				# be 3-255 characters and match the regex [a-zA-Z0-9._-]+. This test

				# is similar to test_create_table_unsupported_names(), but for GSI names.

				# Note that Scylla is actually more limited in the length of the index

				# names, because both table name and index name, together, have to fit in

				# 221 characters. But we don't verify here this specific limitation.

				def test_gsi_unsupported_names(dynamodb):

				    # Unfortunately, the boto library tests for names shorter than the

				    # minimum length (3 characters) immediately, and failure results in

				    # ParamValidationError. But the other invalid names are passed to

				    # DynamoDB, which returns an HTTP response code, which results in a

				    # CientError exception.

				    with pytest.raises(ParamValidationError):

				        create_gsi(dynamodb, 'n')

				    with pytest.raises(ParamValidationError):

				        create_gsi(dynamodb, 'nn')

				    with pytest.raises(ClientError, match='ValidationException.*nnnnn'):

				        create_gsi(dynamodb, 'n' * 256)

				    with pytest.raises(ClientError, match='ValidationException.*nyh'):

				        create_gsi(dynamodb, 'nyh@test')

				# On the other hand, names following the above rules should be accepted. Even

				# names which the Scylla rules forbid, such as a name starting with .

				def test_gsi_non_scylla_name(dynamodb):

				    create_gsi(dynamodb, '.alternator_test')

				# Index names with 255 characters are allowed in Dynamo. In Scylla, the

				# limit is different - the sum of both table and index length cannot

				# exceed 211 characters. So we test a much shorter limit.

				# (compare test_create_and_delete_table_very_long_name()).

				def test_gsi_very_long_name(dynamodb):

				    #create_gsi(dynamodb, 'n' * 255)   # works on DynamoDB, but not on Scylla

				    create_gsi(dynamodb, 'n' * 190)

				# Verify that ListTables does not list materialized views used for indexes.

				# This is hard to test, because we don't really know which table names

				# should be listed beyond those we created, and don't want to assume that

				# no other test runs in parallel with us. So the method we chose is to use a

				# unique random name for an index, and check that no table contains this

				# name. This assumes that materialized-view names are composed using the

				# index's name (which is currently what we do).

				@pytest.fixture(scope="session")

				def test_table_gsi_random_name(dynamodb):

				    index_name = random_string()

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }

				        ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': index_name,

				                'KeySchema': [

				                    { 'AttributeName': 'c', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'p', 'KeyType': 'RANGE' },

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ],

				        )

				    yield [table, index_name]

				    table.delete()

				def test_gsi_list_tables(dynamodb, test_table_gsi_random_name):

				    table, index_name = test_table_gsi_random_name

				    # Check that the random "index_name" isn't a substring of any table name:

				    tables = list_tables(dynamodb)

				    for name in tables:

				        assert not index_name in name

				    # But of course, the table's name should be in the list:

				    assert table.name in tables

									
										35

alternator-test/test_health.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,35 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the health check

				import requests

				# Test that a health check can be performed with a GET packet

				def test_health_works(dynamodb):

				    url = dynamodb.meta.client._endpoint.host

				    response = requests.get(url)

				    assert response.ok

				    assert response.content.decode('utf-8').strip()  == 'healthy: {}'.format(url.replace('https://', '').replace('http://', ''))

				# Test that a health check only works for the root URL ('/')

				def test_health_only_works_for_root_path(dynamodb):

				    url = dynamodb.meta.client._endpoint.host

				    for suffix in ['/abc', '/-', '/index.htm', '/health']:

				        print(url + suffix)

				        response = requests.get(url + suffix, verify=False)

				        assert response.status_code in range(400, 405)

									
										402

alternator-test/test_item.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,402 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the CRUD item operations: PutItem, GetItem, UpdateItem, DeleteItem

				import pytest

				from botocore.exceptions import ClientError

				from decimal import Decimal

				from util import random_string, random_bytes

				# Basic test for creating a new item with a random name, and reading it back

				# with strong consistency.

				# Only the string type is used for keys and attributes. None of the various

				# optional PutItem features (Expected, ReturnValues, ReturnConsumedCapacity,

				# ReturnItemCollectionMetrics, ConditionalOperator, ConditionExpression,

				# ExpressionAttributeNames, ExpressionAttributeValues) are used, and

				# for GetItem strong consistency is requested as well as all attributes,

				# but no other optional features (AttributesToGet, ReturnConsumedCapacity,

				# ProjectionExpression, ExpressionAttributeNames)

				def test_basic_string_put_and_get(test_table):

				    p = random_string()

				    c = random_string()

				    val = random_string()

				    val2 = random_string()

				    test_table.put_item(Item={'p': p, 'c': c, 'attribute': val, 'another': val2})

				    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']

				    assert item['p'] == p

				    assert item['c'] == c

				    assert item['attribute'] == val

				    assert item['another'] == val2

				# Similar to test_basic_string_put_and_get, just uses UpdateItem instead of

				# PutItem. Because the item does not yet exist, it should work the same.

				def test_basic_string_update_and_get(test_table):

				    p = random_string()

				    c = random_string()

				    val = random_string()

				    val2 = random_string()

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'PUT'}, 'another': {'Value': val2, 'Action': 'PUT'}})

				    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']

				    assert item['p'] == p

				    assert item['c'] == c

				    assert item['attribute'] == val

				    assert item['another'] == val2

				# Test put_item and get_item of various types for the *attributes*,

				# including both scalars as well as nested documents, lists and sets.

				# The full list of types tested here:

				#    number, boolean, bytes, null, list, map, string set, number set,

				#    binary set.

				# The keys are still strings.

				# Note that only top-level attributes are written and read in this test -

				# this test does not attempt to modify *nested* attributes.

				# See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/dynamodb.html

				# on how to pass these various types to Boto3's put_item().

				def test_put_and_get_attribute_types(test_table):

				    key = {'p': random_string(), 'c': random_string()}

				    test_items = [

				        Decimal("12.345"),

				        42,

				        True,

				        False,

				        b'xyz',

				        None,

				        ['hello', 'world', 42],

				        {'hello': 'world', 'life': 42},

				        {'hello': {'test': 'hi', 'hello': True, 'list': [1, 2, 'hi']}},

				        set(['hello', 'world', 'hi']),

				        set([1, 42, Decimal("3.14")]),

				        set([b'xyz', b'hi']),

				    ]

				    item = { str(i) : test_items[i] for i in range(len(test_items)) }

				    item.update(key)

				    test_table.put_item(Item=item)

				    got_item = test_table.get_item(Key=key, ConsistentRead=True)['Item']

				    assert item == got_item

				# The test_empty_* tests below verify support for empty items, with no

				# attributes except the key. This is a difficult case for Scylla, because

				# for an empty row to exist, Scylla needs to add a "CQL row marker".

				# There are several ways to create empty items - via PutItem, UpdateItem

				# and deleting attributes from non-empty items, and we need to check them

				# all, in several test_empty_* tests:

				def test_empty_put(test_table):

				    p = random_string()

				    c = random_string()

				    test_table.put_item(Item={'p': p, 'c': c})

				    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']

				    assert item == {'p': p, 'c': c}

				def test_empty_put_delete(test_table):

				    p = random_string()

				    c = random_string()

				    test_table.put_item(Item={'p': p, 'c': c, 'hello': 'world'})

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})

				    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']

				    assert item == {'p': p, 'c': c}

				def test_empty_update(test_table):

				    p = random_string()

				    c = random_string()

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={})

				    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']

				    assert item == {'p': p, 'c': c}

				def test_empty_update_delete(test_table):

				    p = random_string()

				    c = random_string()

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Value': 'world', 'Action': 'PUT'}})

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})

				    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']

				    assert item == {'p': p, 'c': c}

				# Test error handling of UpdateItem passed a bad "Action" field.

				def test_update_bad_action(test_table):

				    p = random_string()

				    c = random_string()

				    val = random_string()

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'NONEXISTENT'}})

				# A more elaborate UpdateItem test, updating different attributes at different

				# times. Includes PUT and DELETE operations.

				def test_basic_string_more_update(test_table):

				    p = random_string()

				    c = random_string()

				    val1 = random_string()

				    val2 = random_string()

				    val3 = random_string()

				    val4 = random_string()

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Value': val1, 'Action': 'PUT'}})

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val1, 'Action': 'PUT'}})

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a2': {'Value': val2, 'Action': 'PUT'}})

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val3, 'Action': 'PUT'}})

				    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Action': 'DELETE'}})

				    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']

				    assert item['p'] == p

				    assert item['c'] == c

				    assert item['a1'] == val3

				    assert item['a2'] == val2

				    assert not 'a3' in item

				# Test that item operations on a non-existant table name fail with correct

				# error code.

				def test_item_operations_nonexistent_table(dynamodb):

				    with pytest.raises(ClientError, match='ResourceNotFoundException'):

				        dynamodb.meta.client.put_item(TableName='non_existent_table',

				            Item={'a':{'S':'b'}})

				# Fetching a non-existant item. According to the DynamoDB doc, "If there is no

				# matching item, GetItem does not return any data and there will be no Item

				# element in the response."

				def test_get_item_missing_item(test_table):

				    p = random_string()

				    c = random_string()

				    assert not "Item" in test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)

				# Test that if we have a table with string hash and sort keys, we can't read

				# or write items with other key types to it.

				def test_put_item_wrong_key_type(test_table):

				    b = random_bytes()

				    s = random_string()

				    n = Decimal("3.14")

				    # Should succeed (correct key types)

				    test_table.put_item(Item={'p': s, 'c': s})

				    assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}

				    # Should fail (incorrect hash key types)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.put_item(Item={'p': b, 'c': s})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.put_item(Item={'p': n, 'c': s})

				    # Should fail (incorrect sort key types)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.put_item(Item={'p': s, 'c': b})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.put_item(Item={'p': s, 'c': n})

				    # Should fail (missing hash key)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.put_item(Item={'c': s})

				    # Should fail (missing sort key)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.put_item(Item={'p': s})

				def test_update_item_wrong_key_type(test_table, test_table_s):

				    b = random_bytes()

				    s = random_string()

				    n = Decimal("3.14")

				    # Should succeed (correct key types)

				    test_table.update_item(Key={'p': s, 'c': s}, AttributeUpdates={})

				    assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}

				    # Should fail (incorrect hash key types)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.update_item(Key={'p': b, 'c': s}, AttributeUpdates={})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.update_item(Key={'p': n, 'c': s}, AttributeUpdates={})

				    # Should fail (incorrect sort key types)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.update_item(Key={'p': s, 'c': b}, AttributeUpdates={})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.update_item(Key={'p': s, 'c': n}, AttributeUpdates={})

				    # Should fail (missing hash key)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.update_item(Key={'c': s}, AttributeUpdates={})

				    # Should fail (missing sort key)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.update_item(Key={'p': s}, AttributeUpdates={})

				    # Should fail (spurious key columns)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': s, 'c': s})

				def test_get_item_wrong_key_type(test_table, test_table_s):

				    b = random_bytes()

				    s = random_string()

				    n = Decimal("3.14")

				    # Should succeed (correct key types) but have empty result

				    assert not "Item" in test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)

				    # Should fail (incorrect hash key types)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.get_item(Key={'p': b, 'c': s})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.get_item(Key={'p': n, 'c': s})

				    # Should fail (incorrect sort key types)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.get_item(Key={'p': s, 'c': b})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.get_item(Key={'p': s, 'c': n})

				    # Should fail (missing hash key)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.get_item(Key={'c': s})

				    # Should fail (missing sort key)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.get_item(Key={'p': s})

				    # Should fail (spurious key columns)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': s, 'c': s})

				def test_delete_item_wrong_key_type(test_table, test_table_s):

				    b = random_bytes()

				    s = random_string()

				    n = Decimal("3.14")

				    # Should succeed (correct key types)

				    test_table.delete_item(Key={'p': s, 'c': s})

				    # Should fail (incorrect hash key types)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.delete_item(Key={'p': b, 'c': s})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.delete_item(Key={'p': n, 'c': s})

				    # Should fail (incorrect sort key types)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.delete_item(Key={'p': s, 'c': b})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.delete_item(Key={'p': s, 'c': n})

				    # Should fail (missing hash key)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.delete_item(Key={'c': s})

				    # Should fail (missing sort key)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.delete_item(Key={'p': s})

				    # Should fail (spurious key columns)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table.delete_item(Key={'p': s, 'c': s, 'spurious': s})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.delete_item(Key={'p': s, 'c': s})

				# Most of the tests here arbitrarily used a table with both hash and sort keys

				# (both strings). Let's check that a table with *only* a hash key works ok

				# too, for PutItem, GetItem, and UpdateItem.

				def test_only_hash_key(test_table_s):

				    s = random_string()

				    test_table_s.put_item(Item={'p': s, 'hello': 'world'})

				    assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world'}

				    test_table_s.update_item(Key={'p': s}, AttributeUpdates={'hi': {'Value': 'there', 'Action': 'PUT'}})

				    assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world', 'hi': 'there'}

				# Tests for item operations in tables with non-string hash or sort keys.

				# These tests focus only on the type of the key - everything else is as

				# simple as we can (string attributes, no special options for GetItem

				# and PutItem). These tests also focus on individual items only, and

				# not about the sort order of sort keys - this should be verified in

				# test_query.py, for example.

				def test_bytes_hash_key(test_table_b):

				    # Bytes values are passed using base64 encoding, which has weird cases

				    # depending on len%3 and len%4. So let's try various lengths.

				    for len in range(10,18):

				        p = random_bytes(len)

				        val = random_string()

				        test_table_b.put_item(Item={'p': p, 'attribute': val})

				        assert test_table_b.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'attribute': val}

				def test_bytes_sort_key(test_table_sb):

				    p = random_string()

				    c = random_bytes()

				    val = random_string()

				    test_table_sb.put_item(Item={'p': p, 'c': c, 'attribute': val})

				    assert test_table_sb.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': val}

				# Tests for using a large binary blob as hash key, sort key, or attribute.

				# DynamoDB strictly limits the size of the binary hash key to 2048 bytes,

				# and binary sort key to 1024 bytes, and refuses anything larger. The total

				# size of an item is limited to 400KB, which also limits the size of the

				# largest attributes. For more details on these limits, see

				# https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

				# Alternator currently does *not* have these limitations, and can accept much

				# larger keys and attributes, but what we do in the following tests is to verify

				# that items up to DynamoDB's maximum sizes also work well in Alternator.

				def test_large_blob_hash_key(test_table_b):

				    b = random_bytes(2048)

				    test_table_b.put_item(Item={'p': b})

				    assert test_table_b.get_item(Key={'p': b}, ConsistentRead=True)['Item'] == {'p': b}

				def test_large_blob_sort_key(test_table_sb):

				    s = random_string()

				    b = random_bytes(1024)

				    test_table_sb.put_item(Item={'p': s, 'c': b})

				    assert test_table_sb.get_item(Key={'p': s, 'c': b}, ConsistentRead=True)['Item'] == {'p': s, 'c': b}

				def test_large_blob_attribute(test_table):

				    p = random_string()

				    c = random_string()

				    b = random_bytes(409500)  # a bit less than 400KB

				    test_table.put_item(Item={'p': p, 'c': c, 'attribute': b })

				    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': b}

				# Checks what it is not allowed to use in a single UpdateItem request both

				# old-style AttributeUpdates and new-style UpdateExpression.

				def test_update_item_two_update_methods(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            AttributeUpdates={'a': {'Value': 3, 'Action': 'PUT'}},

				            UpdateExpression='SET b = :val1',

				            ExpressionAttributeValues={':val1': 4})

				# Verify that having neither AttributeUpdates nor UpdateExpression is

				# allowed, and results in creation of an empty item.

				def test_update_item_no_update_method(test_table_s):

				    p = random_string()

				    assert not "Item" in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)

				    test_table_s.update_item(Key={'p': p})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p}

				# Test GetItem with the AttributesToGet parameter. Result should include the

				# selected attributes only - if one wants the key attributes as well, one

				# needs to select them explicitly. When no key attributes are selected,

				# some items may have *none* of the selected attributes. Those items are

				# returned too, as empty items - they are not outright missing.

				def test_getitem_attributes_to_get(dynamodb, test_table):

				    p = random_string()

				    c = random_string()

				    item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}

				    test_table.put_item(Item=item)

				    for wanted in [ ['a'],             # only non-key attribute

				                    ['c', 'a'],        # a key attribute (sort key) and non-key

				                    ['p', 'c'],        # entire key

				                    ['nonexistent']    # Our item doesn't have this

				                   ]:

				        got_item = test_table.get_item(Key={'p': p, 'c': c}, AttributesToGet=wanted, ConsistentRead=True)['Item']

				        expected_item = {k: item[k] for k in wanted if k in item}

				        assert expected_item == got_item

				# Basic test for DeleteItem, with hash key only

				def test_delete_item_hash(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p})

				    assert 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)

				    test_table_s.delete_item(Key={'p': p})

				    assert not 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)

				# Basic test for DeleteItem, with hash and sort key

				def test_delete_item_sort(test_table):

				    p = random_string()

				    c = random_string()

				    key = {'p': p, 'c': c}

				    test_table.put_item(Item=key)

				    assert 'Item' in test_table.get_item(Key=key, ConsistentRead=True)

				    test_table.delete_item(Key=key)

				    assert not 'Item' in test_table.get_item(Key=key, ConsistentRead=True)

				# Test that PutItem completely replaces an existing item. It shouldn't merge

				# it with a previously existing value, as UpdateItem does!

				# We test for a table with just hash key, and for a table with both hash and

				# sort keys.

				def test_put_item_replace(test_table_s, test_table):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}

				    test_table_s.put_item(Item={'p': p, 'b': 'hello'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}

				    c = random_string()

				    test_table.put_item(Item={'p': p, 'c': c, 'a': 'hi'})

				    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}

				    test_table.put_item(Item={'p': p, 'c': c, 'b': 'hello'})

				    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}

									
										365

alternator-test/test_lsi.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,365 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests of LSI (Local Secondary Indexes)

				#

				# Note that many of these tests are slower than usual, because many of them

				# need to create new tables and/or new LSIs of different types, operations

				# which are extremely slow in DynamoDB, often taking minutes (!).

				import pytest

				import time

				from botocore.exceptions import ClientError, ParamValidationError

				from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables

				# Currently, Alternator's LSIs only support eventually consistent reads, so tests

				# that involve writing to a table and then expect to read something from it cannot

				# be guaranteed to succeed without retrying the read. The following utility

				# functions make it easy to write such tests.

				def assert_index_query(table, index_name, expected_items, **kwargs):

				    for i in range(3):

				        if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):

				            return

				        print('assert_index_query retrying')

				        time.sleep(1)

				    assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))

				def assert_index_scan(table, index_name, expected_items, **kwargs):

				    for i in range(3):

				        if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):

				            return

				        print('assert_index_scan retrying')

				        time.sleep(1)

				    assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))

				# Although quite silly, it is actually allowed to create an index which is

				# identical to the base table.

				def test_lsi_identical(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],

				        AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'S' }],

				        LocalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    items = [{'p': random_string(), 'c': random_string()} for i in range(10)]

				    with table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # Scanning the entire table directly or via the index yields the same

				    # results (in different order).

				    assert multiset(items) == multiset(full_scan(table))

				    assert_index_scan(table, 'hello', items)

				    # We can't scan a non-existant index

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_scan(table, IndexName='wrong')

				    table.delete()

				# Checks that providing a hash key different than the base table is not allowed,

				# and so is providing duplicated keys or no sort key at all

				def test_lsi_wrong(dynamodb):

				    with pytest.raises(ClientError, match='ValidationException.*'):

				        table = create_test_table(dynamodb,

				            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				            AttributeDefinitions=[

				                        { 'AttributeName': 'p', 'AttributeType': 'S' },

				                        { 'AttributeName': 'a', 'AttributeType': 'S' },

				                        { 'AttributeName': 'b', 'AttributeType': 'S' }

				            ],

				            LocalSecondaryIndexes=[

				                {   'IndexName': 'hello',

				                    'KeySchema': [

				                        { 'AttributeName': 'b', 'KeyType': 'HASH' },

				                        { 'AttributeName': 'p', 'KeyType': 'RANGE' }

				                    ],

				                    'Projection': { 'ProjectionType': 'ALL' }

				                }

				            ])

				        table.delete()

				    with pytest.raises(ClientError, match='ValidationException.*'):

				        table = create_test_table(dynamodb,

				            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				            AttributeDefinitions=[

				                        { 'AttributeName': 'p', 'AttributeType': 'S' },

				                        { 'AttributeName': 'a', 'AttributeType': 'S' },

				                        { 'AttributeName': 'b', 'AttributeType': 'S' }

				            ],

				            LocalSecondaryIndexes=[

				                {   'IndexName': 'hello',

				                    'KeySchema': [

				                        { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                        { 'AttributeName': 'p', 'KeyType': 'RANGE' }

				                    ],

				                    'Projection': { 'ProjectionType': 'ALL' }

				                }

				            ])

				        table.delete()

				    with pytest.raises(ClientError, match='ValidationException.*'):

				        table = create_test_table(dynamodb,

				            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],

				            AttributeDefinitions=[

				                        { 'AttributeName': 'p', 'AttributeType': 'S' },

				                        { 'AttributeName': 'a', 'AttributeType': 'S' },

				                        { 'AttributeName': 'b', 'AttributeType': 'S' }

				            ],

				            LocalSecondaryIndexes=[

				                {   'IndexName': 'hello',

				                    'KeySchema': [

				                        { 'AttributeName': 'p', 'KeyType': 'HASH' }

				                    ],

				                    'Projection': { 'ProjectionType': 'ALL' }

				                }

				            ])

				        table.delete()

				# A simple scenario for LSI. Base table has just hash key, Index has an

				# additional sort key - one of the non-key attributes from the base table.

				@pytest.fixture(scope="session")

				def test_table_lsi_1(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				                    { 'AttributeName': 'b', 'AttributeType': 'S' },

				        ],

				        LocalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'b', 'KeyType': 'RANGE' }

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            }

				        ])

				    yield table

				    table.delete()

				def test_lsi_1(test_table_lsi_1):

				    items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]

				    p1, b1 = items1[0]['p'], items1[0]['b']

				    p2, b2 = random_string(), random_string()

				    items2 = [{'p': p2, 'c': p2, 'b': b2}]

				    items = items1 + items2

				    with test_table_lsi_1.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]

				    assert_index_query(test_table_lsi_1, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})

				    expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]

				    assert_index_query(test_table_lsi_1, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}})

				# A second scenario of LSI. Base table has both hash and sort keys,

				# a local index is created on each non-key parameter

				@pytest.fixture(scope="session")

				def test_table_lsi_4(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x1', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x2', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x3', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x4', 'AttributeType': 'S' },

				        ],

				        LocalSecondaryIndexes=[

				            {   'IndexName': 'hello_' + column,

				                'KeySchema': [

				                    { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': column, 'KeyType': 'RANGE' }

				                ],

				                'Projection': { 'ProjectionType': 'ALL' }

				            } for column in ['x1','x2','x3','x4']

				        ])

				    yield table

				    table.delete()

				def test_lsi_4(test_table_lsi_4):

				    items1 = [{'p': random_string(), 'c': random_string(),

				               'x1': random_string(), 'x2': random_string(), 'x3': random_string(), 'x4': random_string()} for i in range(10)]

				    i_values = items1[0]

				    i5 = random_string()

				    items2 = [{'p': i5, 'c': i5, 'x1': i5, 'x2': i5, 'x3': i5, 'x4': i5}]

				    items = items1 + items2

				    with test_table_lsi_4.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    for column in ['x1', 'x2', 'x3', 'x4']:

				        expected_items = [i for i in items if (i['p'], i[column]) == (i_values['p'], i_values[column])]

				        assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,

				            KeyConditions={'p': {'AttributeValueList': [i_values['p']], 'ComparisonOperator': 'EQ'},

				                           column: {'AttributeValueList': [i_values[column]], 'ComparisonOperator': 'EQ'}})

				        expected_items = [i for i in items if (i['p'], i[column]) == (i5, i5)]

				        assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,

				            KeyConditions={'p': {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'},

				                           column: {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'}})

				def test_lsi_describe(test_table_lsi_4):

				    desc = test_table_lsi_4.meta.client.describe_table(TableName=test_table_lsi_4.name)

				    assert 'Table' in desc

				    assert 'LocalSecondaryIndexes' in desc['Table']

				    lsis = desc['Table']['LocalSecondaryIndexes']

				    assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_x1', 'hello_x2', 'hello_x3', 'hello_x4'])

				    # TODO: check projection and key params

				    # TODO: check also ProvisionedThroughput, IndexArn

				# A table with selective projection - only keys are projected into the index

				@pytest.fixture(scope="session")

				def test_table_lsi_keys_only(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				                    { 'AttributeName': 'b', 'AttributeType': 'S' }

				        ],

				        LocalSecondaryIndexes=[

				            {   'IndexName': 'hello',

				                'KeySchema': [

				                    { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'b', 'KeyType': 'RANGE' }

				                ],

				                'Projection': { 'ProjectionType': 'KEYS_ONLY' }

				            }

				        ])

				    yield table

				    table.delete()

				# Check that it's possible to extract a non-projected attribute from the index,

				# as the documentation promises

				def test_lsi_get_not_projected_attribute(test_table_lsi_keys_only):

				    items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]

				    p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']

				    p2, b2, d2 = random_string(), random_string(), random_string()

				    items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]

				    items = items1 + items2

				    with test_table_lsi_keys_only.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1 and i['d'] == d1]

				    assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},

				        Select='ALL_ATTRIBUTES')

				    expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]

				    assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},

				        Select='ALL_ATTRIBUTES')

				    expected_items = [{'d': i['d']} for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]

				    assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},

				        Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['d'])

				# Check that only projected attributes can be extracted

				@pytest.mark.xfail(reason="LSI in alternator currently only implement full projections")

				def test_lsi_get_all_projected_attributes(test_table_lsi_keys_only):

				    items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]

				    p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']

				    p2, b2, d2 = random_string(), random_string(), random_string()

				    items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]

				    items = items1 + items2

				    with test_table_lsi_keys_only.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    expected_items = [{'p': i['p'], 'c': i['c'],'b': i['b']} for i in items if i['p'] == p1 and i['b'] == b1]

				    assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})

				# Check that strongly consistent reads are allowed for LSI

				def test_lsi_consistent_read(test_table_lsi_1):

				    items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]

				    p1, b1 = items1[0]['p'], items1[0]['b']

				    p2, b2 = random_string(), random_string()

				    items2 = [{'p': p2, 'c': p2, 'b': b2}]

				    items = items1 + items2

				    with test_table_lsi_1.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]

				    assert_index_query(test_table_lsi_1, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},

				        ConsistentRead=True)

				    expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]

				    assert_index_query(test_table_lsi_1, 'hello', expected_items,

				        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},

				                       'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},

				        ConsistentRead=True)

				# A table with both gsi and lsi present

				@pytest.fixture(scope="session")

				def test_table_lsi_gsi(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],

				        AttributeDefinitions=[

				                    { 'AttributeName': 'p', 'AttributeType': 'S' },

				                    { 'AttributeName': 'c', 'AttributeType': 'S' },

				                    { 'AttributeName': 'x1', 'AttributeType': 'S' },

				        ],

				        GlobalSecondaryIndexes=[

				            {   'IndexName': 'hello_g1',

				                'KeySchema': [

				                    { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'x1', 'KeyType': 'RANGE' }

				                ],

				                'Projection': { 'ProjectionType': 'KEYS_ONLY' }

				            }

				        ],

				        LocalSecondaryIndexes=[

				            {   'IndexName': 'hello_l1',

				                'KeySchema': [

				                    { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                    { 'AttributeName': 'x1', 'KeyType': 'RANGE' }

				                ],

				                'Projection': { 'ProjectionType': 'KEYS_ONLY' }

				            }

				        ])

				    yield table

				    table.delete()

				# Test that GSI and LSI can coexist, even if they're identical

				def test_lsi_and_gsi(test_table_lsi_gsi):

				    desc = test_table_lsi_gsi.meta.client.describe_table(TableName=test_table_lsi_gsi.name)

				    assert 'Table' in desc

				    assert 'LocalSecondaryIndexes' in desc['Table']

				    assert 'GlobalSecondaryIndexes' in desc['Table']

				    lsis = desc['Table']['LocalSecondaryIndexes']

				    gsis = desc['Table']['GlobalSecondaryIndexes']

				    assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_l1'])

				    assert(sorted([gsi['IndexName'] for gsi in gsis]) == ['hello_g1'])

				    items = [{'p': random_string(), 'c': random_string(), 'x1': random_string()} for i in range(17)]

				    p1, c1, x1 = items[0]['p'], items[0]['c'], items[0]['x1']

				    with test_table_lsi_gsi.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    for index in ['hello_g1', 'hello_l1']:

				        expected_items = [i for i in items if i['p'] == p1 and i['x1'] == x1]

				        assert_index_query(test_table_lsi_gsi, index, expected_items,

				            KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},

				                           'x1': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})

									
										60

alternator-test/test_nested.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,60 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Test for operations on items with *nested* attributes.

				import pytest

				from botocore.exceptions import ClientError

				from util import random_string

				# Test that we can write a top-level attribute that is a nested document, and

				# read it back correctly.

				def test_nested_document_attribute_write(test_table_s):

				    nested_value = {

				        'a': 3,

				        'b': {'c': 'hello', 'd': ['hi', 'there', {'x': 'y'}, '42']},

				    }

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': nested_value})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': nested_value}

				# Test that if we have a top-level attribute that is a nested document (i.e.,

				# a dictionary), updating this attribute will replace it entirely by a new

				# nested document - not merge into the old content with the new content.

				def test_nested_document_attribute_overwrite(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}

				    test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': {'c': 5}, 'Action': 'PUT'}})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'c': 5}, 'd': 5}

				# Moreover, we can overwrite an entire nested document by, say, a string,

				# and that's also fine.

				def test_nested_document_attribute_overwrite_2(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}

				    test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': 'hi', 'Action': 'PUT'}})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi', 'd': 5}

				# Verify that AttributeUpdates cannot be used to update a nested attribute -

				# trying to use a dot in the name of the attribute, will just create one with

				# an actual dot in its name.

				def test_attribute_updates_dot(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a.b': {'Value': 3, 'Action': 'PUT'}})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}

									
										201

alternator-test/test_projection_expression.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,201 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the various operations (GetItem, Query, Scan) with a

				# ProjectionExpression parameter.

				#

				# ProjectionExpression is an expension of the legacy AttributesToGet

				# parameter. Both parameters request that only a subset of the attributes

				# be fetched for each item, instead of all of them. But while AttributesToGet

				# was limited to top-level attributes, ProjectionExpression can request also

				# nested attributes.

				import pytest

				from botocore.exceptions import ClientError

				from util import random_string, full_scan, full_query, multiset

				# Basic test for ProjectionExpression, requesting only top-level attributes.

				# Result should include the selected attributes only - if one wants the key

				# attributes as well, one needs to select them explicitly. When no key

				# attributes are selected, an item may have *none* of the selected

				# attributes, and returned as an empty item.

				def test_projection_expression_toplevel(test_table):

				    p = random_string()

				    c = random_string()

				    item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}

				    test_table.put_item(Item=item)

				    for wanted in [ ['a'],             # only non-key attribute

				                    ['c', 'a'],        # a key attribute (sort key) and non-key

				                    ['p', 'c'],        # entire key

				                    ['nonexistent']    # Our item doesn't have this

				                   ]:

				        got_item = test_table.get_item(Key={'p': p, 'c': c}, ProjectionExpression=",".join(wanted), ConsistentRead=True)['Item']

				        expected_item = {k: item[k] for k in wanted if k in item}

				        assert expected_item == got_item

				# Various simple tests for ProjectionExpression's syntax, using only top-evel

				# attributes.

				def test_projection_expression_toplevel_syntax(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': 'hello'}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a'})['Item'] == {'a': 'hello'}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,b')['Item'] == {'a': 'hello', 'b': 'hi'}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=' a  ,   b  ')['Item'] == {'a': 'hello', 'b': 'hi'}

				    # Missing or unused names in ExpressionAttributeNames are errors:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#wrong': 'a'})['Item'] == {'a': 'hello'}

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a', '#unused': 'b'})['Item'] == {'a': 'hello'}

				    # It is not allowed to fetch the same top-level attribute twice (or in

				    # general, list two overlapping attributes). We get an error like

				    # "Invalid ProjectionExpression: Two document paths overlap with each

				    # other; must remove or rewrite one of these paths; path one: [a], path

				    # two: [a]".

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a')['Item']

				    # A comma with nothing after it is a syntax error:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,')['Item']

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=',a')['Item']

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,,b')['Item']

				    # An empty ProjectionExpression is not allowed. DynamoDB recognizes its

				    # syntax, but then writes: "Invalid ProjectionExpression: The expression

				    # can not be empty".

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='')['Item']

				# The following two tests are similar to test_projection_expression_toplevel()

				# which tested the GetItem operation - but these test Scan and Query.

				# Both test ProjectionExpression with only top-level attributes.

				def test_projection_expression_scan(filled_test_table):

				    table, items = filled_test_table

				    for wanted in [ ['another'],       # only non-key attributes (one item doesn't have it!)

				                    ['c', 'another'],  # a key attribute (sort key) and non-key

				                    ['p', 'c'],        # entire key

				                    ['nonexistent']    # none of the items have this attribute!

				                   ]:

				        got_items = full_scan(table,  ProjectionExpression=",".join(wanted))

				        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]

				        assert multiset(expected_items) == multiset(got_items)

				def test_projection_expression_query(test_table):

				    p = random_string()

				    items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    for wanted in [ ['a'],             # only non-key attributes

				                    ['c', 'a'],        # a key attribute (sort key) and non-key

				                    ['p', 'c'],        # entire key

				                    ['nonexistent']    # none of the items have this attribute!

				                   ]:

				        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression=",".join(wanted))

				        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]

				        assert multiset(expected_items) == multiset(got_items)

				# The previous tests all fetched only top-level attributes. They could all

				# be written using AttributesToGet instead of ProjectionExpression (and,

				# in fact, we do have similar tests with AttributesToGet in other files),

				# but the previous test checked that the alternative syntax works correctly.

				# The following test checks fetching more elaborate attribute paths from

				# nested documents.

				@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")

				def test_projection_expression_path(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={

				        'p': p,

				        'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5},

				        'b': 'hello' 

				        })

				    # Fetching the entire nested document "a" works, of course:

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5}}

				    # If we fetch a.b, we get only the content of b - but it's still inside

				    # the a dictionary:

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}]}}

				    # Similarly, fetching a.b[0] gives us a one-element array in a dictionary.

				    # Note that [0] is the first element of an array.

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0]')['Item'] == {'a': {'b': [2]}}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2]')['Item'] == {'a': {'b': [{'x': 'hi', 'y': 'yo'}]}}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2].y')['Item'] == {'a': {'b': [{'y': 'yo'}]}}

				    # Trying to read any sort of non-existant attribute returns an empty item.

				    # This includes a non-existing top-level attribute, an attempt to read

				    # beyond the end of an array or a non-existant member of a dictionary, as

				    # well as paths which begin with a non-existant prefix.

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='x')['Item'] == {}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3]')['Item'] == {}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x')['Item'] == {}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x.y')['Item'] == {}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3].x')['Item'] == {}

				    # We can read multiple paths - the result are merged into one object

				    # structured the same was as in the original item:

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[1]')['Item'] == {'a': {'b': [2, 4]}}

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.c')['Item'] == {'a': {'b': [2], 'c': 5}}

				    # It is not allowed to read the same path multiple times. The error from

				    # DynamoDB looks like: "Invalid ProjectionExpression: Two document paths

				    # overlap with each other; must remove or rewrite one of these paths;

				    # path one: [a, b, [0]], path two: [a, b, [0]]".

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[0]')['Item']

				    # Two paths are considered to "overlap" if the content of one path

				    # contains the content of the second path. So requesting both "a" and

				    # "a.b[0]" is not allowed.

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a.b[0]')['Item']

				@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")

				def test_query_projection_expression_path(test_table):

				    p = random_string()

				    items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression="a.x")

				    expected_items = [{'a': {'x': x['a']['x']}} for x in items]

				    assert multiset(expected_items) == multiset(got_items)

				@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")

				def test_scan_projection_expression_path(test_table):

				    # This test is similar to test_query_projection_expression_path above,

				    # but uses a scan instead of a query. The scan will generate unrelated

				    # partitions created by other tests (hopefully not too many...) that we

				    # need to ignore. We also need to ask for "p" too, so we can filter by it.

				    p = random_string()

				    items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    got_items = [ x for x in full_scan(test_table, ProjectionExpression="p, a.x") if x['p'] == p]

				    expected_items = [{'p': p, 'a': {'x': x['a']['x']}} for x in items]

				    assert multiset(expected_items) == multiset(got_items)

				# It is not allowed to use both ProjectionExpression and its older cousin,

				# AttributesToGet, together. If trying to do this, DynamoDB produces an error

				# like "Can not use both expression and non-expression parameters in the same

				# request: Non-expression parameters: {AttributesToGet} Expression

				# parameters: {ProjectionExpression}

				def test_projection_expression_and_attributes_to_get(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})

				    with pytest.raises(ClientError, match='ValidationException.*both'):

				        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a', AttributesToGet=['b'])['Item']

				    with pytest.raises(ClientError, match='ValidationException.*both'):

				        full_scan(test_table_s,  ProjectionExpression='a', AttributesToGet=['a'])

				    with pytest.raises(ClientError, match='ValidationException.*both'):

				        full_query(test_table_s, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression='a', AttributesToGet=['a'])

									
										516

alternator-test/test_query.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,516 @@

				# -*- coding: utf-8 -*-

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the Query operation

				import random

				import pytest

				from botocore.exceptions import ClientError, ParamValidationError

				from decimal import Decimal

				from util import random_string, random_bytes, full_query, multiset

				from boto3.dynamodb.conditions import Key, Attr

				# Test that scanning works fine with in-stock paginator

				def test_query_basic_restrictions(dynamodb, filled_test_table):

				    test_table, items = filled_test_table

				    paginator = dynamodb.meta.client.get_paginator('query')

				    # EQ

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long']) == multiset(got_items)

				    # LT

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['12'], 'ComparisonOperator': 'LT'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)

				    # LE

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'LE'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] <= '14']) == multiset(got_items)

				    # GT

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['15'], 'ComparisonOperator': 'GT'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] > '15']) == multiset(got_items)

				    # GE

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'GE'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '14']) == multiset(got_items)

				    # BETWEEN

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['155', '164'], 'ComparisonOperator': 'BETWEEN'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '155' and item['c'] <= '164']) == multiset(got_items)

				    # BEGINS_WITH

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['11'], 'ComparisonOperator': 'BEGINS_WITH'}

				        }):

				        print([item for item in items if item['p'] == 'long' and item['c'].startswith('11')])

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'].startswith('11')]) == multiset(got_items)

				# Test that KeyConditionExpression parameter is supported

				@pytest.mark.xfail(reason="KeyConditionExpression not supported yet")

				def test_query_key_condition_expression(dynamodb, filled_test_table):

				    test_table, items = filled_test_table

				    paginator = dynamodb.meta.client.get_paginator('query')

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditionExpression=Key("p").eq("long") & Key("c").lt("12")):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)

				def test_begins_with(dynamodb, test_table):

				    paginator = dynamodb.meta.client.get_paginator('query')

				    items = [{'p': 'unorthodox_chars', 'c': sort_key, 'str': 'a'} for sort_key in [u'ÿÿÿ', u'cÿbÿ', u'cÿbÿÿabg'] ]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # TODO(sarna): Once bytes type is supported, /xFF character should be tested

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': [u'ÿÿ'], 'ComparisonOperator': 'BEGINS_WITH'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'ÿÿ')])

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': [u'cÿbÿ'], 'ComparisonOperator': 'BEGINS_WITH'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'cÿbÿ')])

				def test_begins_with_wrong_type(dynamodb, test_table_sn):

				    paginator = dynamodb.meta.client.get_paginator('query')

				    with pytest.raises(ClientError, match='ValidationException'):

				        for page in paginator.paginate(TableName=test_table_sn.name, KeyConditions={

				                'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},

				                'c' : {'AttributeValueList': [17], 'ComparisonOperator': 'BEGINS_WITH'}

				                }):

				            pass

				# Items returned by Query should be sorted by the sort key. The following

				# tests verify that this is indeed the case, for the three allowed key types:

				# strings, binary, and numbers. These tests test not just the Query operation,

				# but inherently that the sort-key sorting works.

				def test_query_sort_order_string(test_table):

				    # Insert a lot of random items in one new partition:

				    # str(i) has a non-obvious sort order (e.g., "100" comes before "2") so is a nice test.

				    p = random_string()

				    items = [{'p': p, 'c': str(i)} for i in range(128)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    assert len(items) == len(got_items)

				    # Extract just the sort key ("c") from the items

				    sort_keys = [x['c'] for x in items]

				    got_sort_keys = [x['c'] for x in got_items]

				    # Verify that got_sort_keys are already sorted (in string order)

				    assert sorted(got_sort_keys) == got_sort_keys

				    # Verify that got_sort_keys are a sorted version of the expected sort_keys

				    assert sorted(sort_keys) == got_sort_keys

				def test_query_sort_order_bytes(test_table_sb):

				    # Insert a lot of random items in one new partition:

				    # We arbitrarily use random_bytes with a random length.

				    p = random_string()

				    items = [{'p': p, 'c': random_bytes(10)} for i in range(128)]

				    with test_table_sb.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    got_items = full_query(test_table_sb, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    assert len(items) == len(got_items)

				    sort_keys = [x['c'] for x in items]

				    got_sort_keys = [x['c'] for x in got_items]

				    # Boto3's "Binary" objects are sorted as if bytes are signed integers.

				    # This isn't the order that DynamoDB itself uses (byte 0 should be first,

				    # not byte -128). Sorting the byte array ".value" works.

				    assert sorted(got_sort_keys, key=lambda x: x.value) == got_sort_keys

				    assert sorted(sort_keys) == got_sort_keys

				def test_query_sort_order_number(test_table_sn):

				    # This is a list of numbers, sorted in correct order, and each suitable

				    # for accurate representation by Alternator's number type.

				    numbers = [

				        Decimal("-2e10"),

				        Decimal("-7.1e2"),

				        Decimal("-4.1"),

				        Decimal("-0.1"),

				        Decimal("-1e-5"),

				        Decimal("0"),

				        Decimal("2e-5"),

				        Decimal("0.15"),

				        Decimal("1"),

				        Decimal("1.00000000000000000000000001"),

				        Decimal("3.14159"),

				        Decimal("3.1415926535897932384626433832795028841"),

				        Decimal("31.4"),

				        Decimal("1.4e10"),

				    ]

				    # Insert these numbers, in random order, into one partition:

				    p = random_string()

				    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]

				    with test_table_sn.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # Finally, verify that we get back exactly the same numbers (with identical

				    # precision), and in their original sorted order.

				    got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    got_sort_keys = [x['c'] for x in got_items]

				    assert got_sort_keys == numbers

				def test_query_filtering_attributes_equality(filled_test_table):

				    test_table, items = filled_test_table

				    query_filter = {

				        "attribute" : {

				            "AttributeValueList" : [ "xxxx" ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)

				    query_filter = {

				        "attribute" : {

				            "AttributeValueList" : [ "xxxx" ],

				            "ComparisonOperator": "EQ"

				        },

				        "another" : {

				            "AttributeValueList" : [ "yy" ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)

				# Test that FilterExpression works as expected

				@pytest.mark.xfail(reason="FilterExpression not supported yet")

				def test_query_filter_expression(filled_test_table):

				    test_table, items = filled_test_table

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx"))

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)

				# QueryFilter can only contain non-key attributes in order to be compatible

				def test_query_filtering_key_equality(filled_test_table):

				    test_table, items = filled_test_table

				    with pytest.raises(ClientError, match='ValidationException'):

				        query_filter = {

				            "c" : {

				                "AttributeValueList" : [ "5" ],

				                "ComparisonOperator": "EQ"

				            }

				        }

				        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)

				        print(got_items)

				    with pytest.raises(ClientError, match='ValidationException'):

				        query_filter = {

				            "attribute" : {

				                "AttributeValueList" : [ "x" ],

				                "ComparisonOperator": "EQ"

				            },

				            "p" : {

				                "AttributeValueList" : [ "5" ],

				                "ComparisonOperator": "EQ"

				            }

				        }

				        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)

				        print(got_items)

				# Test Query with the AttributesToGet parameter. Result should include the

				# selected attributes only - if one wants the key attributes as well, one

				# needs to select them explicitly. When no key attributes are selected,

				# some items may have *none* of the selected attributes. Those items are

				# returned too, as empty items - they are not outright missing.

				def test_query_attributes_to_get(dynamodb, test_table):

				    p = random_string()

				    items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    for wanted in [ ['a'],             # only non-key attributes

				                    ['c', 'a'],        # a key attribute (sort key) and non-key

				                    ['p', 'c'],        # entire key

				                    ['nonexistent']    # none of the items have this attribute!

				                   ]:

				        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, AttributesToGet=wanted)

				        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]

				        assert multiset(expected_items) == multiset(got_items)

				# Test that in a table with both hash key and sort key, which keys we can

				# Query by: We can Query by the hash key, by a combination of both hash and

				# sort keys, but *cannot* query by just the sort key, and obviously not

				# by any non-key column.

				def test_query_which_key(test_table):

				    p = random_string()

				    c = random_string()

				    p2 = random_string()

				    c2 = random_string()

				    item1 = {'p': p, 'c': c}

				    item2 = {'p': p, 'c': c2}

				    item3 = {'p': p2, 'c': c}

				    for i in [item1, item2, item3]:

				        test_table.put_item(Item=i)

				    # Query by hash key only:

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    expected_items = [item1, item2]

				    assert multiset(expected_items) == multiset(got_items)

				    # Query by hash key *and* sort key (this is basically a GetItem):

				    got_items = full_query(test_table, KeyConditions={

				        'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},

				        'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				    })

				    expected_items = [item1]

				    assert multiset(expected_items) == multiset(got_items)

				    # Query by sort key alone is not allowed. DynamoDB reports:

				    # "Query condition missed key schema element: p".

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table, KeyConditions={

				            'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				        })

				    # Query by a non-key isn't allowed, for the same reason - that the

				    # actual hash key (p) is missing in the query:

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table, KeyConditions={

				            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				        })

				    # If we try both p and a non-key we get a complaint that the sort

				    # key is missing: "Query condition missed key schema element: c"

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table, KeyConditions={

				            'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},

				            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				        })

				    # If we try p, c and another key, we get an error that

				    # "Conditions can be of length 1 or 2 only".

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table, KeyConditions={

				            'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},

				            'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'},

				            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				        })

				# Test the "Select" parameter of Query. The default Select mode,

				# ALL_ATTRIBUTES, returns items with all their attributes. Other modes

				# allow returning just specific attributes or just counting the results

				# without returning items at all.

				@pytest.mark.xfail(reason="Select not supported yet")

				def test_query_select(test_table_sn):

				    numbers = [Decimal(i) for i in range(10)]

				    # Insert these numbers, in random order, into one partition:

				    p = random_string()

				    items = [{'p': p, 'c': num, 'x': num} for num in random.sample(numbers, len(numbers))]

				    with test_table_sn.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # Verify that we get back the numbers in their sorted order. By default,

				    # query returns all attributes:

				    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']

				    got_sort_keys = [x['c'] for x in got_items]

				    assert got_sort_keys == numbers

				    got_x_attributes = [x['x'] for x in got_items]

				    assert got_x_attributes == numbers

				    # Select=ALL_ATTRIBUTES does exactly the same as the default - return

				    # all attributes:

				    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_ATTRIBUTES')['Items']

				    got_sort_keys = [x['c'] for x in got_items]

				    assert got_sort_keys == numbers

				    got_x_attributes = [x['x'] for x in got_items]

				    assert got_x_attributes == numbers

				    # Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it

				    # is just for indexes, when IndexName is specified)

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_PROJECTED_ATTRIBUTES')

				    # Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet

				    # or ProjectionExpression appears, but then really does nothing:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES')

				    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['x'])['Items']

				    expected_items = [{'x': i} for i in numbers]

				    assert got_items == expected_items

				    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression='x')['Items']

				    assert got_items == expected_items

				    # Select=COUNT just returns a count - not any items

				    got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='COUNT')

				    assert got['Count'] == len(numbers)

				    assert not 'Items' in got

				    # Check again that we also get a count - not just with Select=COUNT,

				    # but without Select=COUNT we also get the items:

				    got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    assert got['Count'] == len(numbers)

				    assert 'Items' in got

				    # Select with some unknown string generates a validation exception:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='UNKNOWN')

				# Test that the "Limit" parameter can be used to return only some of the

				# items in a single partition. The items returned are the first in the

				# sorted order.

				def test_query_limit(test_table_sn):

				    numbers = [Decimal(i) for i in range(10)]

				    # Insert these numbers, in random order, into one partition:

				    p = random_string()

				    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]

				    with test_table_sn.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # Verify that we get back the numbers in their sorted order.

				    # First, no Limit so we should get all numbers (we have few of them, so

				    # it all fits in the default 1MB limitation)

				    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']

				    got_sort_keys = [x['c'] for x in got_items]

				    assert got_sort_keys == numbers

				    # Now try a few different Limit values, and verify that the query

				    # returns exactly the first Limit sorted numbers.

				    for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:

				        got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)['Items']

				        assert len(got_items) == min(limit, len(numbers))

				        got_sort_keys = [x['c'] for x in got_items]

				        assert got_sort_keys == numbers[0:limit]

				    # Unfortunately, the boto3 library forbids a Limit of 0 on its own,

				    # before even sending a request, so we can't test how the server responds.

				    with pytest.raises(ParamValidationError):

				        test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=0)

				# In test_query_limit we tested just that Limit allows to stop the result

				# after right right number of items. Here we test that such a stopped result

				# can be resumed, via the LastEvaluatedKey/ExclusiveStartKey paging mechanism.

				def test_query_limit_paging(test_table_sn):

				    numbers = [Decimal(i) for i in range(20)]

				    # Insert these numbers, in random order, into one partition:

				    p = random_string()

				    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]

				    with test_table_sn.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # Verify that full_query() returns all these numbers, in sorted order.

				    # full_query() will do a query with the given limit, and resume it again

				    # and again until the last page.

				    for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:

				        got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)

				        got_sort_keys = [x['c'] for x in got_items]

				        assert got_sort_keys == numbers

				# Test that the ScanIndexForward parameter works, and can be used to

				# return items sorted in reverse order. Combining this with Limit can

				# be used to return the last items instead of the first items of the

				# partition.

				@pytest.mark.xfail(reason="ScanIndexForward not supported yet")

				def test_query_reverse(test_table_sn):

				    numbers = [Decimal(i) for i in range(20)]

				    # Insert these numbers, in random order, into one partition:

				    p = random_string()

				    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]

				    with test_table_sn.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # Verify that we get back the numbers in their sorted order or reverse

				    # order, depending on the ScanIndexForward parameter being True or False.

				    # First, no Limit so we should get all numbers (we have few of them, so

				    # it all fits in the default 1MB limitation)

				    reversed_numbers = list(reversed(numbers))

				    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=True)['Items']

				    got_sort_keys = [x['c'] for x in got_items]

				    assert got_sort_keys == numbers

				    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False)['Items']

				    got_sort_keys = [x['c'] for x in got_items]

				    assert got_sort_keys == reversed_numbers

				    # Now try a few different Limit values, and verify that the query

				    # returns exactly the first Limit sorted numbers - in regular or

				    # reverse order, depending on ScanIndexForward.

				    for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:

				        got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=True)['Items']

				        assert len(got_items) == min(limit, len(numbers))

				        got_sort_keys = [x['c'] for x in got_items]

				        assert got_sort_keys == numbers[0:limit]

				        got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=False)['Items']

				        assert len(got_items) == min(limit, len(numbers))

				        got_sort_keys = [x['c'] for x in got_items]

				        assert got_sort_keys == reversed_numbers[0:limit]

				# Test that paging also works properly with reverse order

				# (ScanIndexForward=false), i.e., reverse-order queries can be resumed

				@pytest.mark.xfail(reason="ScanIndexForward not supported yet")

				def test_query_reverse_paging(test_table_sn):

				    numbers = [Decimal(i) for i in range(20)]

				    # Insert these numbers, in random order, into one partition:

				    p = random_string()

				    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]

				    with test_table_sn.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    reversed_numbers = list(reversed(numbers))

				    # Verify that with ScanIndexForward=False, full_query() returns all

				    # these numbers in reversed sorted order - getting pages of Limit items

				    # at a time and resuming the query.

				    for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:

				        got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False, Limit=limit)

				        got_sort_keys = [x['c'] for x in got_items]

				        assert got_sort_keys == reversed_numbers

									
										226

alternator-test/test_returnvalues.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,226 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the ReturnValues parameter for the different update operations

				# (PutItem, UpdateItem, DeleteItem).

				import pytest

				from botocore.exceptions import ClientError

				from util import random_string

				# Test trivial support for the ReturnValues parameter in PutItem, UpdateItem

				# and DeleteItem - test that "NONE" works (and changes nothing), while a

				# completely unsupported value gives an error.

				# This test is useful to check that before the ReturnValues parameter is fully

				# implemented, it returns an error when a still-unsupported ReturnValues

				# option is attempted in the request - instead of simply being ignored.

				def test_trivial_returnvalues(test_table_s):

				    # PutItem:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')

				    assert not 'Attributes' in ret

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')

				    # UpdateItem:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',

				        UpdateExpression='SET b = :val',

				        ExpressionAttributeValues={':val': 'cat'})

				    assert not 'Attributes' in ret

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',

				            UpdateExpression='SET a = a + :val',

				            ExpressionAttributeValues={':val': 1})

				    # DeleteItem:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')

				    assert not 'Attributes' in ret

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')

				# Test the ReturnValues parameter on a PutItem operation. Only two settings

				# are supported for this parameter for this operation: NONE (the default)

				# and ALL_OLD.

				@pytest.mark.xfail(reason="ReturnValues not supported")

				def test_put_item_returnvalues(test_table_s):

				    # By default, the previous value of an item is not returned:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'})

				    assert not 'Attributes' in ret

				    # Using ReturnValues=NONE is the same:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')

				    assert not 'Attributes' in ret

				    # With ReturnValues=ALL_OLD, the old value of the item is returned

				    # in an "Attributes" attribute:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_OLD')

				    assert ret['Attributes'] == {'p': p, 'a': 'hi'}

				    # Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,

				    # are supported by other operations but not by PutItem:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_OLD')

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_NEW')

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_NEW')

				    # Also, obviously, a non-supported setting "DOG" also returns in error:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')

				    # The ReturnValues value is case sensitive, so while "NONE" is supported

				    # (and tested above), "none" isn't:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='none')

				# Test the ReturnValues parameter on a DeleteItem operation. Only two settings

				# are supported for this parameter for this operation: NONE (the default)

				# and ALL_OLD.

				@pytest.mark.xfail(reason="ReturnValues not supported")

				def test_delete_item_returnvalues(test_table_s):

				    # By default, the previous value of an item is not returned:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    ret=test_table_s.delete_item(Key={'p': p})

				    assert not 'Attributes' in ret

				    # Using ReturnValues=NONE is the same:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')

				    assert not 'Attributes' in ret

				    # With ReturnValues=ALL_OLD, the old value of the item is returned

				    # in an "Attributes" attribute:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi'})

				    ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_OLD')

				    assert ret['Attributes'] == {'p': p, 'a': 'hi'}

				    # Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,

				    # are supported by other operations but not by PutItem:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_OLD')

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_NEW')

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_NEW')

				    # Also, obviously, a non-supported setting "DOG" also returns in error:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')

				    # The ReturnValues value is case sensitive, so while "NONE" is supported

				    # (and tested above), "none" isn't:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.delete_item(Key={'p': p}, ReturnValues='none')

				# Test the ReturnValues parameter on a UpdateItem operation. All five

				# settings are supported for this parameter for this operation: NONE

				# (the default), ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW.

				@pytest.mark.xfail(reason="ReturnValues not supported")

				def test_update_item_returnvalues(test_table_s):

				    # By default, the previous value of an item is not returned:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})

				    ret=test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val',

				        ExpressionAttributeValues={':val': 'cat'})

				    assert not 'Attributes' in ret

				    # Using ReturnValues=NONE is the same:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',

				        UpdateExpression='SET b = :val',

				        ExpressionAttributeValues={':val': 'cat'})

				    assert not 'Attributes' in ret

				    # With ReturnValues=ALL_OLD, the entire old value of the item (even

				    # attributes we did not modify) is returned in an "Attributes" attribute:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_OLD',

				        UpdateExpression='SET b = :val',

				        ExpressionAttributeValues={':val': 'cat'})

				    assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'dog'}

				    # With ReturnValues=UPDATED_OLD, only the overwritten attributes of the

				    # old item are returned in an "Attributes" attribute:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',

				        UpdateExpression='SET b = :val, c = :val2',

				        ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})

				    assert ret['Attributes'] == {'b': 'dog'}

				    # Even if an update overwrites an attribute by the same value again,

				    # this is considered an update, and the old value (identical to the

				    # new one) is returned:

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',

				        UpdateExpression='SET b = :val',

				        ExpressionAttributeValues={':val': 'cat'})

				    assert ret['Attributes'] == {'b': 'cat'}

				    # Deleting an attribute also counts as overwriting it, of course:

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',

				        UpdateExpression='REMOVE b')

				    assert ret['Attributes'] == {'b': 'cat'}

				    # With ReturnValues=ALL_NEW, the entire new value of the item (including

				    # old attributes we did not modify) is returned:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_NEW',

				        UpdateExpression='SET b = :val',

				        ExpressionAttributeValues={':val': 'cat'})

				    assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'cat'}

				    # With ReturnValues=UPDATED_NEW, only the new value of the updated

				    # attributes are returned. Note that "updated attributes" means

				    # the newly set attributes - it doesn't require that these attributes

				    # have any previous values

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',

				        UpdateExpression='SET b = :val, c = :val2',

				        ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})

				    assert ret['Attributes'] == {'b': 'cat', 'c': 'hello'}

				    # Deleting an attribute also counts as overwriting it, but the delete

				    # column is not returned in the response - so it's empty in this case.

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',

				        UpdateExpression='REMOVE b')

				    assert not 'Attributes' in ret

				    # In the above examples, UPDATED_NEW is not useful because it just

				    # returns the new values we already know from the request... UPDATED_NEW

				    # becomes more useful in read-modify-write operations:

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 1})

				    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',

				        UpdateExpression='SET a = a + :val',

				        ExpressionAttributeValues={':val': 1})

				    assert ret['Attributes'] == {'a': 2}

				    # A non-supported setting "DOG" also returns in error:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',

				            UpdateExpression='SET a = a + :val',

				            ExpressionAttributeValues={':val': 1})

				    # The ReturnValues value is case sensitive, so while "NONE" is supported

				    # (and tested above), "none" isn't:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, ReturnValues='none',

				            UpdateExpression='SET a = a + :val',

				            ExpressionAttributeValues={':val': 1})

									
										252

alternator-test/test_scan.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,252 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the Scan operation

				import pytest

				from botocore.exceptions import ClientError

				from util import random_string, full_scan, full_scan_and_count, multiset

				from boto3.dynamodb.conditions import Attr

				# Test that scanning works fine with/without pagination

				def test_scan_basic(filled_test_table):

				    test_table, items = filled_test_table

				    for limit in [None,1,2,4,33,50,100,9007,16*1024*1024]:

				        pos = None

				        got_items = []

				        while True:

				            if limit:

				                response = test_table.scan(Limit=limit, ExclusiveStartKey=pos) if pos else test_table.scan(Limit=limit)

				                assert len(response['Items']) <= limit

				            else:

				                response = test_table.scan(ExclusiveStartKey=pos) if pos else test_table.scan()

				            pos = response.get('LastEvaluatedKey', None)

				            got_items += response['Items']

				            if not pos:

				                break

				        assert len(items) == len(got_items)

				        assert multiset(items) == multiset(got_items)

				def test_scan_with_paginator(dynamodb, filled_test_table):

				    test_table, items = filled_test_table

				    paginator = dynamodb.meta.client.get_paginator('scan')

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name):

				        got_items += page['Items']

				    assert len(items) == len(got_items)

				    assert multiset(items) == multiset(got_items)

				    for page_size in [1, 17, 1234]:

				        got_items = []

				        for page in paginator.paginate(TableName=test_table.name, PaginationConfig={'PageSize': page_size}):

				            got_items += page['Items']

				    assert len(items) == len(got_items)

				    assert multiset(items) == multiset(got_items)

				# Although partitions are scanned in seemingly-random order, inside a

				# partition items must be returned by Scan sorted in sort-key order.

				# This test verifies this, for string sort key. We'll need separate

				# tests for the other sort-key types (number and binary)

				def test_scan_sort_order_string(filled_test_table):

				    test_table, items = filled_test_table

				    got_items = full_scan(test_table)

				    assert len(items) == len(got_items)

				    # Extract just the sort key ("c") from the partition "long"

				    items_long = [x['c'] for x in items if x['p'] == 'long']

				    got_items_long = [x['c'] for x in got_items if x['p'] == 'long']

				    # Verify that got_items_long are already sorted (in string order)

				    assert sorted(got_items_long) == got_items_long

				    # Verify that got_items_long are a sorted version of the expected items_long

				    assert sorted(items_long) == got_items_long

				# Test Scan with the AttributesToGet parameter. Result should include the

				# selected attributes only - if one wants the key attributes as well, one

				# needs to select them explicitly. When no key attributes are selected,

				# some items may have *none* of the selected attributes. Those items are

				# returned too, as empty items - they are not outright missing.

				def test_scan_attributes_to_get(dynamodb, filled_test_table):

				    table, items = filled_test_table

				    for wanted in [ ['another'],       # only non-key attributes (one item doesn't have it!)

				                    ['c', 'another'],  # a key attribute (sort key) and non-key

				                    ['p', 'c'],        # entire key

				                    ['nonexistent']    # none of the items have this attribute!

				                   ]:

				        print(wanted)

				        got_items = full_scan(table, AttributesToGet=wanted)

				        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]

				        assert multiset(expected_items) == multiset(got_items)

				def test_scan_with_attribute_equality_filtering(dynamodb, filled_test_table):

				    table, items = filled_test_table

				    scan_filter = {

				        "attribute" : {

				            "AttributeValueList" : [ "xxxxx" ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    got_items = full_scan(table, ScanFilter=scan_filter)

				    expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" ]

				    assert multiset(expected_items) == multiset(got_items)

				    scan_filter = {

				        "another" : {

				            "AttributeValueList" : [ "y" ],

				            "ComparisonOperator": "EQ"

				        },

				        "attribute" : {

				            "AttributeValueList" : [ "xxxxx" ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    got_items = full_scan(table, ScanFilter=scan_filter)

				    expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" and item["another"] == "y" ]

				    assert multiset(expected_items) == multiset(got_items)

				# Test that FilterExpression works as expected

				@pytest.mark.xfail(reason="FilterExpression not supported yet")

				def test_scan_filter_expression(filled_test_table):

				    test_table, items = filled_test_table

				    got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx"))

				    print(got_items)

				    assert multiset([item for item in items if 'attribute' in item.keys() and item['attribute'] == 'xxxx']) == multiset(got_items)

				    got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))

				    print(got_items)

				    assert multiset([item for item in items if 'attribute' in item.keys() and 'another' in item.keys() and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)

				def test_scan_with_key_equality_filtering(dynamodb, filled_test_table):

				    table, items = filled_test_table

				    scan_filter_p = {

				        "p" : {

				            "AttributeValueList" : [ "7" ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    scan_filter_c = {

				        "c" : {

				            "AttributeValueList" : [ "9" ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    scan_filter_p_and_attribute = {

				        "p" : {

				            "AttributeValueList" : [ "7" ],

				            "ComparisonOperator": "EQ"

				        },

				        "attribute" : {

				            "AttributeValueList" : [ "x"*7 ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    scan_filter_c_and_another = {

				        "c" : {

				            "AttributeValueList" : [ "9" ],

				            "ComparisonOperator": "EQ"

				        },

				        "another" : {

				            "AttributeValueList" : [ "y"*16 ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    # Filtering on the hash key

				    got_items = full_scan(table, ScanFilter=scan_filter_p)

				    expected_items = [item for item in items if "p" in item.keys() and item["p"] == "7" ]

				    assert multiset(expected_items) == multiset(got_items)

				    # Filtering on the sort key

				    got_items = full_scan(table, ScanFilter=scan_filter_c)

				    expected_items = [item for item in items if "c" in item.keys() and item["c"] == "9"]

				    assert multiset(expected_items) == multiset(got_items)

				    # Filtering on the hash key and an attribute

				    got_items = full_scan(table, ScanFilter=scan_filter_p_and_attribute)

				    expected_items = [item for item in items if "p" in item.keys() and "another" in item.keys() and item["p"] == "7" and item["another"] == "y"*16]

				    assert multiset(expected_items) == multiset(got_items)

				    # Filtering on the sort key and an attribute

				    got_items = full_scan(table, ScanFilter=scan_filter_c_and_another)

				    expected_items = [item for item in items if "c" in item.keys() and "another" in item.keys() and item["c"] == "9" and item["another"] == "y"*16]

				    assert multiset(expected_items) == multiset(got_items)

				# Test the "Select" parameter of Scan. The default Select mode,

				# ALL_ATTRIBUTES, returns items with all their attributes. Other modes

				# allow returning just specific attributes or just counting the results

				# without returning items at all.

				@pytest.mark.xfail(reason="Select not supported yet")

				def test_scan_select(filled_test_table):

				    test_table, items = filled_test_table

				    got_items = full_scan(test_table)

				    # By default, a scan returns all the items, with all their attributes:

				    # query returns all attributes:

				    got_items = full_scan(test_table)

				    assert multiset(items) == multiset(got_items)

				    # Select=ALL_ATTRIBUTES does exactly the same as the default - return

				    # all attributes:

				    got_items = full_scan(test_table, Select='ALL_ATTRIBUTES')

				    assert multiset(items) == multiset(got_items)

				    # Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it

				    # is just for indexes, when IndexName is specified)

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_scan(test_table, Select='ALL_PROJECTED_ATTRIBUTES')

				    # Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet

				    # or ProjectionExpression appears, but then really does nothing beyond

				    # what AttributesToGet and ProjectionExpression already do:

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_scan(test_table, Select='SPECIFIC_ATTRIBUTES')

				    wanted = ['c', 'another']

				    got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=wanted)

				    expected_items = [{k: x[k] for k in wanted if k in x} for x in items]

				    assert multiset(expected_items) == multiset(got_items)

				    got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression=','.join(wanted))

				    assert multiset(expected_items) == multiset(got_items)

				    # Select=COUNT just returns a count - not any items

				    (got_count, got_items) = full_scan_and_count(test_table, Select='COUNT')

				    assert got_count == len(items)

				    assert got_items == []

				    # Check that we also get a count in regular scans - not just with

				    # Select=COUNT, but without Select=COUNT we both items and count:

				    (got_count, got_items) = full_scan_and_count(test_table)

				    assert got_count == len(items)

				    assert multiset(items) == multiset(got_items)

				    # Select with some unknown string generates a validation exception:

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_scan(test_table, Select='UNKNOWN')

				# Test parallel scan, i.e., the Segments and TotalSegments options.

				# In the following test we check that these parameters allow splitting

				# a scan into multiple parts, and that these parts are in fact disjoint,

				# and their union is the entire contents of the table. We do not actually

				# try to run these queries in *parallel* in this test.

				@pytest.mark.xfail(reason="parallel scan not supported yet")

				def test_scan_parallel(filled_test_table):

				    test_table, items = filled_test_table

				    for nsegments in [1, 2, 17]:

				        print('Testing TotalSegments={}'.format(nsegments))

				        got_items = []

				        for segment in range(nsegments):

				            got_items.extend(full_scan(test_table, TotalSegments=nsegments, Segment=segment))

				        # The following comparison verifies that each of the expected item

				        # in items was returned in one - and just one - of the segments.

				        assert multiset(items) == multiset(got_items)

									
										276

alternator-test/test_table.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,276 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for basic table operations: CreateTable, DeleteTable, ListTables.

				import pytest

				from botocore.exceptions import ClientError

				from util import list_tables, test_table_name, create_test_table, random_string

				# Utility function for create a table with a given name and some valid

				# schema.. This function initiates the table's creation, but doesn't

				# wait for the table to actually become ready.

				def create_table(dynamodb, name, BillingMode='PAY_PER_REQUEST', **kwargs):

				    return dynamodb.create_table(

				        TableName=name,

				        BillingMode=BillingMode,

				        KeySchema=[

				            {

				                'AttributeName': 'p',

				                'KeyType': 'HASH'

				            },

				            {

				                'AttributeName': 'c',

				                'KeyType': 'RANGE'

				            }

				        ],

				        AttributeDefinitions=[

				            {

				                'AttributeName': 'p',

				                'AttributeType': 'S'

				            },

				            {

				                'AttributeName': 'c',

				                'AttributeType': 'S'

				            },

				        ],

				        **kwargs

				    )

				# Utility function for creating a table with a given name, and then deleting

				# it immediately, waiting for these operations to complete. Since the wait

				# uses DescribeTable, this function requires all of CreateTable, DescribeTable

				# and DeleteTable to work correctly.

				# Note that in DynamoDB, table deletion takes a very long time, so tests

				# successfully using this function are very slow.

				def create_and_delete_table(dynamodb, name, **kwargs):

				    table = create_table(dynamodb, name, **kwargs)

				    table.meta.client.get_waiter('table_exists').wait(TableName=name)

				    table.delete()

				    table.meta.client.get_waiter('table_not_exists').wait(TableName=name)

				##############################################################################

				# Test creating a table, and then deleting it, waiting for each operation

				# to have completed before proceeding. Since the wait uses DescribeTable,

				# this tests requires all of CreateTable, DescribeTable and DeleteTable to

				# function properly in their basic use cases.

				# Unfortunately, this test is extremely slow with DynamoDB because deleting

				# a table is extremely slow until it really happens.

				def test_create_and_delete_table(dynamodb):

				    create_and_delete_table(dynamodb, 'alternator_test')

				# DynamoDB documentation specifies that table names must be 3-255 characters,

				# and match the regex [a-zA-Z0-9._-]+. Names not matching these rules should

				# be rejected, and no table be created.

				def test_create_table_unsupported_names(dynamodb):

				    from botocore.exceptions import ParamValidationError, ClientError

				    # Intererstingly, the boto library tests for names shorter than the

				    # minimum length (3 characters) immediately, and failure results in

				    # ParamValidationError. But the other invalid names are passed to

				    # DynamoDB, which returns an HTTP response code, which results in a

				    # CientError exception.

				    with pytest.raises(ParamValidationError):

				        create_table(dynamodb, 'n')

				    with pytest.raises(ParamValidationError):

				        create_table(dynamodb, 'nn')

				    with pytest.raises(ClientError, match='ValidationException'):

				        create_table(dynamodb, 'n' * 256)

				    with pytest.raises(ClientError, match='ValidationException'):

				        create_table(dynamodb, 'nyh@test')

				# On the other hand, names following the above rules should be accepted. Even

				# names which the Scylla rules forbid, such as a name starting with .

				def test_create_and_delete_table_non_scylla_name(dynamodb):

				    create_and_delete_table(dynamodb, '.alternator_test')

				# names with 255 characters are allowed in Dynamo, but they are not currently

				# supported in Scylla because we create a directory whose name is the table's

				# name followed by 33 bytes (underscore and UUID). So currently, we only

				# correctly support names with length up to 222.

				def test_create_and_delete_table_very_long_name(dynamodb):

				    # In the future, this should work:

				    #create_and_delete_table(dynamodb, 'n' * 255)

				    # But for now, only 222 works:

				    create_and_delete_table(dynamodb, 'n' * 222)

				    # We cannot test the following on DynamoDB because it will succeed

				    # (DynamoDB allows up to 255 bytes)

				    #with pytest.raises(ClientError, match='ValidationException'):

				    #   create_table(dynamodb, 'n' * 223)

				# Tests creating a table with an invalid schema should return a

				# ValidationException error.

				def test_create_table_invalid_schema(dynamodb):

				    # The name of the table "created" by this test shouldn't matter, the

				    # creation should not succeed anyway.

				    with pytest.raises(ClientError, match='ValidationException'):

				        dynamodb.create_table(

				            TableName='name_doesnt_matter',

				            BillingMode='PAY_PER_REQUEST',

				            KeySchema=[

				                { 'AttributeName': 'p', 'KeyType': 'HASH' },

				                { 'AttributeName': 'c', 'KeyType': 'HASH' }

				            ],

				            AttributeDefinitions=[

				                { 'AttributeName': 'p', 'AttributeType': 'S' },

				                { 'AttributeName': 'c', 'AttributeType': 'S' },

				            ],

				        )

				    with pytest.raises(ClientError, match='ValidationException'):

				        dynamodb.create_table(

				            TableName='name_doesnt_matter',

				            BillingMode='PAY_PER_REQUEST',

				            KeySchema=[

				                { 'AttributeName': 'p', 'KeyType': 'RANGE' },

				                { 'AttributeName': 'c', 'KeyType': 'RANGE' }

				            ],

				            AttributeDefinitions=[

				                { 'AttributeName': 'p', 'AttributeType': 'S' },

				                { 'AttributeName': 'c', 'AttributeType': 'S' },

				            ],

				        )

				    with pytest.raises(ClientError, match='ValidationException'):

				        dynamodb.create_table(

				            TableName='name_doesnt_matter',

				            BillingMode='PAY_PER_REQUEST',

				            KeySchema=[

				                { 'AttributeName': 'c', 'KeyType': 'RANGE' }

				            ],

				            AttributeDefinitions=[

				                { 'AttributeName': 'c', 'AttributeType': 'S' },

				            ],

				        )

				    with pytest.raises(ClientError, match='ValidationException'):

				        dynamodb.create_table(

				            TableName='name_doesnt_matter',

				            BillingMode='PAY_PER_REQUEST',

				            KeySchema=[

				                { 'AttributeName': 'c', 'KeyType': 'HASH' },

				                { 'AttributeName': 'p', 'KeyType': 'RANGE' },

				                { 'AttributeName': 'z', 'KeyType': 'RANGE' }

				            ],

				            AttributeDefinitions=[

				                { 'AttributeName': 'c', 'AttributeType': 'S' },

				                { 'AttributeName': 'p', 'AttributeType': 'S' },

				                { 'AttributeName': 'z', 'AttributeType': 'S' }

				            ],

				        )

				    with pytest.raises(ClientError, match='ValidationException'):

				        dynamodb.create_table(

				            TableName='name_doesnt_matter',

				            BillingMode='PAY_PER_REQUEST',

				            KeySchema=[

				                { 'AttributeName': 'c', 'KeyType': 'HASH' },

				            ],

				            AttributeDefinitions=[

				                { 'AttributeName': 'z', 'AttributeType': 'S' }

				            ],

				        )

				    with pytest.raises(ClientError, match='ValidationException'):

				        dynamodb.create_table(

				            TableName='name_doesnt_matter',

				            BillingMode='PAY_PER_REQUEST',

				            KeySchema=[

				                { 'AttributeName': 'k', 'KeyType': 'HASH' },

				            ],

				            AttributeDefinitions=[

				                { 'AttributeName': 'k', 'AttributeType': 'Q' }

				            ],

				        )

				# Test that trying to create a table that already exists fails in the

				# appropriate way (ResourceInUseException)

				def test_create_table_already_exists(dynamodb, test_table):

				    with pytest.raises(ClientError, match='ResourceInUseException'):

				        create_table(dynamodb, test_table.name)

				# Test that BillingMode error path works as expected - only the values

				# PROVISIONED or PAY_PER_REQUEST are allowed. The former requires

				# ProvisionedThroughput to be set, the latter forbids it.

				# If BillingMode is outright missing, it defaults (as original

				# DynamoDB did) to PROVISIONED so ProvisionedThroughput is allowed.

				def test_create_table_billing_mode_errors(dynamodb, test_table):

				    with pytest.raises(ClientError, match='ValidationException'):

				        create_table(dynamodb, test_table_name(), BillingMode='unknown')

				    # billing mode is case-sensitive

				    with pytest.raises(ClientError, match='ValidationException'):

				        create_table(dynamodb, test_table_name(), BillingMode='pay_per_request')

				    # PAY_PER_REQUEST cannot come with a ProvisionedThroughput:

				    with pytest.raises(ClientError, match='ValidationException'):

				        create_table(dynamodb, test_table_name(),

				            BillingMode='PAY_PER_REQUEST', ProvisionedThroughput={'ReadCapacityUnits': 10, 'WriteCapacityUnits': 10})

				    # On the other hand, PROVISIONED requires ProvisionedThroughput:

				    # By the way, ProvisionedThroughput not only needs to appear, it must

				    # have both ReadCapacityUnits and WriteCapacityUnits - but we can't test

				    # this with boto3, because boto3 has its own verification that if

				    # ProvisionedThroughput is given, it must have the correct form.

				    with pytest.raises(ClientError, match='ValidationException'):

				        create_table(dynamodb, test_table_name(), BillingMode='PROVISIONED')

				    # If BillingMode is completely missing, it defaults to PROVISIONED, so

				    # ProvisionedThroughput is required

				    with pytest.raises(ClientError, match='ValidationException'):

				        dynamodb.create_table(TableName=test_table_name(),

				            KeySchema=[{ 'AttributeName': 'p', 'KeyType': 'HASH' }],

				            AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }])

				# Our first implementation had a special column name called "attrs" where

				# we stored a map for all non-key columns. If the user tried to name one

				# of the key columns with this same name, the result was a disaster - Scylla

				# goes into a bad state after trying to write data with two updates to same-

				# named columns.

				special_column_name1 = 'attrs'

				special_column_name2 = ':attrs'

				@pytest.fixture(scope="session")

				def test_table_special_column_name(dynamodb):

				    table = create_test_table(dynamodb,

				        KeySchema=[

				            { 'AttributeName': special_column_name1, 'KeyType': 'HASH' },

				            { 'AttributeName': special_column_name2, 'KeyType': 'RANGE' }

				        ],

				        AttributeDefinitions=[

				            { 'AttributeName': special_column_name1, 'AttributeType': 'S' },

				            { 'AttributeName': special_column_name2, 'AttributeType': 'S' },

				        ],

				    )

				    yield table

				    table.delete()

				@pytest.mark.xfail(reason="special attrs column not yet hidden correctly")

				def test_create_table_special_column_name(test_table_special_column_name):

				    s = random_string()

				    c = random_string()

				    h = random_string()

				    expected = {special_column_name1: s, special_column_name2: c, 'hello': h}

				    test_table_special_column_name.put_item(Item=expected)

				    got = test_table_special_column_name.get_item(Key={special_column_name1: s, special_column_name2: c}, ConsistentRead=True)['Item']

				    assert got == expected

				# Test that all tables we create are listed, and pagination works properly.

				# Note that the DyanamoDB setup we run this against may have hundreds of

				# other tables, for all we know. We just need to check that the tables we

				# created are indeed listed.

				def test_list_tables_paginated(dynamodb, test_table, test_table_s, test_table_b):

				    my_tables_set = {table.name for table in [test_table, test_table_s, test_table_b]}

				    for limit in [1, 2, 3, 4, 50, 100]:

				        print("testing limit={}".format(limit))

				        list_tables_set = set(list_tables(dynamodb, limit))

				        assert my_tables_set.issubset(list_tables_set)

				# Test that pagination limit is validated

				def test_list_tables_wrong_limit(dynamodb):

				    # lower limit (min. 1) is imposed by boto3 library checks

				    with pytest.raises(ClientError, match='ValidationException'):

				        dynamodb.meta.client.list_tables(Limit=101)

									
										854

alternator-test/test_update_expression.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,854 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the UpdateItem operations with an UpdateExpression parameter

				import random

				import string

				import pytest

				from botocore.exceptions import ClientError

				from decimal import Decimal

				from util import random_string

				# The simplest test of using UpdateExpression to set a top-level attribute,

				# instead of the older AttributeUpdates parameter.

				# Checks only one "SET" action in an UpdateExpression.

				def test_update_expression_set(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val1',

				        ExpressionAttributeValues={':val1': 4})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}

				# An empty UpdateExpression is NOT allowed, and generates a "The expression

				# can not be empty" error. This contrasts with an empty AttributeUpdates which

				# is allowed, and results in the creation of an empty item if it didn't exist

				# yet (see test_empty_update()).

				def test_update_expression_empty(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='')

				# A basic test with multiple SET actions in one expression

				def test_update_expression_set_multi(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET x = :val1, y = :val1',

				        ExpressionAttributeValues={':val1': 4})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': 4, 'y': 4}

				# SET can be used to copy an existing attribute to a new one

				def test_update_expression_set_copy(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hello'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = a')

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}

				    # Copying an non-existing attribute generates an error

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = z')

				    # It turns out that attributes to be copied are read before the SET

				    # starts to write, so "SET x = :val1, y = x" does not work...

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET x = :val1, y = x', ExpressionAttributeValues={':val1': 4})

				    # SET z=z does nothing if z exists, or fails if it doesn't

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = a')

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET z = z')

				    # We can also use name references in either LHS or RHS of SET, e.g.,

				    # SET #one = #two. We need to also take the references used in the RHS

				    # when we want to complain about unused names in ExpressionAttributeNames.

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',

				         ExpressionAttributeNames={'#one': 'c', '#two': 'a'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello', 'c': 'hello'}

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',

				             ExpressionAttributeNames={'#one': 'c', '#two': 'a', '#three': 'z'})

				# Test for read-before-write action where the value to be read is nested inside a - operator

				def test_update_expression_set_nested_copy(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #n = :two',

				         ExpressionAttributeNames={'#n': 'n'}, ExpressionAttributeValues={':two': 2})

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nn = :seven - #n',

				         ExpressionAttributeNames={'#nn': 'nn', '#n': 'n'}, ExpressionAttributeValues={':seven': 7})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnn = :nnn',

				         ExpressionAttributeNames={'#nnn': 'nnn'}, ExpressionAttributeValues={':nnn': [2,4]})

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnnn = list_append(:val1, #nnn)',

				         ExpressionAttributeNames={'#nnnn': 'nnnn', '#nnn': 'nnn'}, ExpressionAttributeValues={':val1': [1,3]})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5, 'nnn': [2,4], 'nnnn': [1,3,2,4]}

				# Test for getting a key value with read-before-write

				def test_update_expression_set_key(test_table_sn):

				    p = random_string()

				    test_table_sn.update_item(Key={'p': p, 'c': 7});

				    test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #n = #p',

				         ExpressionAttributeNames={'#n': 'n', '#p': 'p'})

				    test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #nn = #c + #c',

				         ExpressionAttributeNames={'#nn': 'nn', '#c': 'c'})

				    assert test_table_sn.get_item(Key={'p': p, 'c': 7}, ConsistentRead=True)['Item'] == {'p': p, 'c': 7, 'n': p, 'nn': 14}

				# Simple test for the "REMOVE" action

				def test_update_expression_remove(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hi'}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a')

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hi'}

				# Demonstrate that although all DynamoDB examples give UpdateExpression

				# action names in uppercase - e.g., "SET", it can actually be any case.

				def test_update_expression_action_case(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = :val1', ExpressionAttributeValues={':val1': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='sEt b = :val1', ExpressionAttributeValues={':val1': 5})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}

				# Demonstrate that whitespace is ignored in UpdateExpression parsing.

				def test_update_expression_action_whitespace(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='  set   b=:val1  ', ExpressionAttributeValues={':val1': 5})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}

				# In UpdateExpression, the attribute name can appear directly in the expression

				# (without a "#placeholder" notation) only if it is a single "token" as

				# determined by DynamoDB's lexical analyzer rules: Such token is composed of

				# alphanumeric characters whose first character must be alphabetic. Other

				# names cause the parser to see multiple tokens, and produce syntax errors.

				def test_update_expression_name_token(test_table_s):

				    p = random_string()

				    # Alphanumeric names starting with an alphabetical character work

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET alnum = :val1', ExpressionAttributeValues={':val1': 1})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['alnum'] == 1

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 2})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['Alpha_Numeric_123'] == 2

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET A123_ = :val1', ExpressionAttributeValues={':val1': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['A123_'] == 3

				    # But alphanumeric names cannot start with underscore or digits.

				    # DynamoDB's lexical analyzer doesn't recognize them, and produces

				    # a ValidationException looking like:

				    #   Invalid UpdateExpression: Syntax error; token: "_", near: "SET _123"

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _123 = :val1', ExpressionAttributeValues={':val1': 3})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _abc = :val1', ExpressionAttributeValues={':val1': 3})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123a = :val1', ExpressionAttributeValues={':val1': 3})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123 = :val1', ExpressionAttributeValues={':val1': 3})

				    # Various other non-alpha-numeric characters, split a token and NOT allowed

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi-there = :val1', ExpressionAttributeValues={':val1': 3})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi$there = :val1', ExpressionAttributeValues={':val1': 3})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET "hithere" = :val1', ExpressionAttributeValues={':val1': 3})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET !hithere = :val1', ExpressionAttributeValues={':val1': 3})

				    # In addition to the literal names, DynamoDB also allows references to any

				    # name, using the "#reference" syntax. It turns out the reference name is

				    # also a token following the rules as above, with one interesting point:

				    # since "#" already started the token, the next character may be any

				    # alphanumeric and doesn't need to be only alphabetical.

				    # Note that the reference target - the actual attribute name - can include

				    # absolutely any characters, and we use silly_name below as an example

				    silly_name = '3can include any character!.#='

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 4}, ExpressionAttributeNames={'#Alpha_Numeric_123': silly_name})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 4

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123a = :val1', ExpressionAttributeValues={':val1': 5}, ExpressionAttributeNames={'#123a': silly_name})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 5

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123 = :val1', ExpressionAttributeValues={':val1': 6}, ExpressionAttributeNames={'#123': silly_name})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 6

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #_ = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#_': silly_name})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 7

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #hi-there = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#hi-there': silly_name})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #!hi = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#!hi': silly_name})

				    # Just a "#" is not enough as a token. Interestingly, DynamoDB will

				    # find the bad name in ExpressionAttributeNames before it actually tries

				    # to parse UpdateExpression, but we can verify the parse fails too by

				    # using a valid but irrelevant name in ExpressionAttributeNames:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#': silly_name})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#a': silly_name})

				    # There is also the value references, ":reference", for the right-hand

				    # side of an assignment. These have similar naming rules like "#reference".

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :Alpha_Numeric_123', ExpressionAttributeValues={':Alpha_Numeric_123': 8})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 8

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123a', ExpressionAttributeValues={':123a': 9})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123', ExpressionAttributeValues={':123': 10})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 10

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :_', ExpressionAttributeValues={':_': 11})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 11

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :hi!there', ExpressionAttributeValues={':hi!there': 12})

				    # Just a ":" is not enough as a token.

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':': 7})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':a': 7})

				    # Trying to use a :reference on the left-hand side of an assignment will

				    # not work. In DynamoDB, it's a different type of token (and generates

				    # syntax error).

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET :a = :b', ExpressionAttributeValues={':a': 1, ':b': 2})

				# Multiple actions are allowed in one expression, but actions are divided

				# into clauses (SET, REMOVE, DELETE, ADD) and each of those can only appear

				# once.

				def test_update_expression_multi(test_table_s):

				    p = random_string()

				    # We can have two SET actions in one SET clause:

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 2}

				    # But not two SET clauses - we get error "The "SET" section can only be used once in an update expression"

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})

				    # We can have a REMOVE and a SET clause (note no comma between clauses):

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = :val2', ExpressionAttributeValues={':val2': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = :val2 REMOVE b', ExpressionAttributeValues={':val2': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 3}

				    # The same clause (e.g., SET) cannot be used twice, even if interleaved with something else

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 REMOVE a SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})

				# Trying to modify the same item twice in the same update is forbidden.

				# For "SET a=:v REMOVE a" DynamoDB says: "Invalid UpdateExpression: Two

				# document paths overlap with each other; must remove or rewrite one of

				# these paths; path one: [a], path two: [a]". 

				# It is actually good for Scylla that such updates are forbidden, because had

				# we allowed "SET a=:v REMOVE a" the result would be surprising - because data

				# wins over a delete with the same timestamp, so "a" would be set despite the

				# REMOVE command appearing later in the command line.

				def test_update_expression_multi_overlap(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hello'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}

				    # Neither "REMOVE a SET a = :v" nor "SET a = :v REMOVE a" are allowed:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET a = :v', ExpressionAttributeValues={':v': 'hi'})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v REMOVE a', ExpressionAttributeValues={':v': 'yo'})

				    # It's also not allowed to set a twice in the same clause

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v1, a = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'})

				    # Obviously, the paths are compared after the name references are evaluated

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a1 = :v1, #a2 = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'}, ExpressionAttributeNames={'#a1': 'a', '#a2': 'a'})

				# The problem isn't just with identical paths - we can't modify two paths that

				# "overlap" in the sense that one is the ancestor of the other.

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_update_expression_multi_overlap_nested(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*overlap'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, a.b = :val2',

				            ExpressionAttributeValues={':val1': {'b': 7}, ':val2': 'there'})

				    test_table_s.put_item(Item={'p': p, 'a': {'b': {'c': 2}}})

				    with pytest.raises(ClientError, match='ValidationException.*overlap'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.b = :val1, a.b.c = :val2',

				            ExpressionAttributeValues={':val1': 'hi', ':val2': 'there'})

				# In the previous test we saw that *modifying* the same item twice in the same

				# update is forbidden; But it is allowed to *read* an item in the same update

				# that also modifies it, and we check this here.

				def test_update_expression_multi_with_copy(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hello'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}

				    # "REMOVE a SET b = a" works: as noted in test_update_expression_set_copy()

				    # the value of 'a' is read before the actual REMOVE operation happens.

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = a')

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = b REMOVE b')

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 'hello'}

				# Test case where a :val1 is referenced, without being defined

				def test_update_expression_set_missing_value(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = :val1',

				            ExpressionAttributeValues={':val2': 4})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = :val1')

				# It is forbidden for ExpressionAttributeValues to contain values not used

				# by the expression. DynamoDB produces an error like: "Value provided in

				# ExpressionAttributeValues unused in expressions: keys: {:val1}"

				def test_update_expression_spurious_value(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1',

				            ExpressionAttributeValues={':val1': 3, ':val2': 4})

				# Test case where a #name is referenced, without being defined

				def test_update_expression_set_missing_name(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET #name = :val1',

				            ExpressionAttributeValues={':val2': 4},

				            ExpressionAttributeNames={'#wrongname': 'hello'})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET #name = :val1',

				            ExpressionAttributeValues={':val2': 4})

				# It is forbidden for ExpressionAttributeNames to contain names not used

				# by the expression. DynamoDB produces an error like: "Value provided in

				# ExpressionAttributeNames unused in expressions: keys: {#b}"

				def test_update_expression_spurious_name(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',

				            ExpressionAttributeNames={'#a': 'hello', '#b': 'hi'},

				            ExpressionAttributeValues={':val1': 3, ':val2': 4})

				# Test that the key attributes (hash key or sort key) cannot be modified

				# by an update

				def test_update_expression_cannot_modify_key(test_table):

				    p = random_string()

				    c = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*key'):

				        test_table.update_item(Key={'p': p, 'c': c},

				            UpdateExpression='SET p = :val1', ExpressionAttributeValues={':val1': 4})

				    with pytest.raises(ClientError, match='ValidationException.*key'):

				        test_table.update_item(Key={'p': p, 'c': c},

				            UpdateExpression='SET c = :val1', ExpressionAttributeValues={':val1': 4})

				    with pytest.raises(ClientError, match='ValidationException.*key'):

				        test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE p')

				    with pytest.raises(ClientError, match='ValidationException.*key'):

				        test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE c')

				    with pytest.raises(ClientError, match='ValidationException.*key'):

				        test_table.update_item(Key={'p': p, 'c': c},

				            UpdateExpression='ADD p :val1', ExpressionAttributeValues={':val1': 4})

				    with pytest.raises(ClientError, match='ValidationException.*key'):

				        test_table.update_item(Key={'p': p, 'c': c},

				            UpdateExpression='ADD c :val1', ExpressionAttributeValues={':val1': 4})

				    with pytest.raises(ClientError, match='ValidationException.*key'):

				        test_table.update_item(Key={'p': p, 'c': c},

				            UpdateExpression='DELETE p :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})

				    with pytest.raises(ClientError, match='ValidationException.*key'):

				        test_table.update_item(Key={'p': p, 'c': c},

				            UpdateExpression='DELETE c :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})

				    # As sanity check, verify we *can* modify a non-key column

				    test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='SET a = :val1', ExpressionAttributeValues={':val1': 4})

				    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 4}

				    test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE a')

				    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c}

				# Test that trying to start an expression with some nonsense like HELLO

				# instead of SET, REMOVE, ADD or DELETE, fails.

				def test_update_expression_non_existant_clause(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='HELLO b = :val1',

				            ExpressionAttributeValues={':val1': 4})

				# Test support for "SET a = :val1 + :val2", "SET a = :val1 - :val2"

				# Only exactly these combinations work - e.g., it's a syntax error to

				# try to add three. Trying to add a string fails.

				def test_update_expression_plus_basic(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val1 + :val2',

				        ExpressionAttributeValues={':val1': 4, ':val2': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 7}

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val1 - :val2',

				        ExpressionAttributeValues={':val1': 5, ':val2': 2})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}

				    # Only the addition of exactly two values is supported!

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = :val1 + :val2 + :val3',

				            ExpressionAttributeValues={':val1': 4, ':val2': 3, ':val3': 2})

				    # Only numeric values can be added - other things like strings or lists

				    # cannot be added, and we get an error like "Incorrect operand type for

				    # operator or function; operator or function: +, operand type: S".

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = :val1 + :val2',

				            ExpressionAttributeValues={':val1': 'dog', ':val2': 3})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = :val1 + :val2',

				            ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})

				# While most of the Alternator code just saves high-precision numbers

				# unchanged, the "+" and "-" operations need to calculate with them, and

				# we should check the calculation isn't done with some lower-precision

				# representation, e.g., double

				def test_update_expression_plus_precision(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val1 + :val2',

				        ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("10000000000000000000001")}

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val2 - :val1',

				        ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("9999999999999999999999")}

				# Test support for "SET a = b + :val2" et al., i.e., a version of the

				# above test_update_expression_plus_basic with read before write.

				def test_update_expression_plus_rmw(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 2})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = a + :val1',

				        ExpressionAttributeValues={':val1': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 5

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = :val1 + a',

				        ExpressionAttributeValues={':val1': 4})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val1 + a',

				        ExpressionAttributeValues={':val1': 1})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 10

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = b + a')

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 19

				# Test the list_append() function in SET, for the most basic use case of

				# concatenating two value references. Because this is the first test of

				# functions in SET, we also test some generic features of how functions

				# are parsed.

				def test_update_expression_list_append_basic(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = list_append(:val1, :val2)',

				        ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': [4, 'hello', 'hi', 7]}

				    # Unlike the operation name "SET", function names are case-sensitive!

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = LIST_APPEND(:val1, :val2)',

				            ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})

				    # As usual, spaces are ignored by the parser

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = list_append(:val1, :val2)',

				        ExpressionAttributeValues={':val1': ['a'], ':val2': ['b']})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['a', 'b']}

				    # The list_append function only allows two parameters. The parser can

				    # correctly parse fewer or more, but then an error is generated: "Invalid

				    # UpdateExpression: Incorrect number of operands for operator or function;

				    # operator or function: list_append, number of operands: 1".

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = list_append(:val1)',

				            ExpressionAttributeValues={':val1': ['a']})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = list_append(:val1, :val2, :val3)',

				            ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': [7], ':val3': ['a']})

				    # If list_append is used on value which isn't a list, we get

				    # error: "Invalid UpdateExpression: Incorrect operand type for operator

				    # or function; operator or function: list_append, operand type: S"

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = list_append(:val1, :val2)',

				            ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': 'hi'})

				# Additional list_append() tests, also using attribute paths as parameters

				# (i.e., read-modify-write).

				def test_update_expression_list_append(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = :val1',

				        ExpressionAttributeValues={':val1': ['hi', 2]})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==['hi', 2]

				    # Often, list_append is used to append items to a list attribute

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = list_append(a, :val1)',

				        ExpressionAttributeValues={':val1': [4, 'hello']})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['hi', 2, 4, 'hello']

				    # But it can also be used to just concatenate in other ways:

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = list_append(:val1, a)',

				        ExpressionAttributeValues={':val1': ['dog']})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello']

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = list_append(a, :val1)',

				        ExpressionAttributeValues={':val1': ['cat']})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == ['dog', 'hi', 2, 4, 'hello', 'cat']

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET c = list_append(a, b)')

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['c'] == ['dog', 'hi', 2, 4, 'hello', 'dog', 'hi', 2, 4, 'hello', 'cat']

				    # As usual, #references are allowed instead of inline names:

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET #name1 = list_append(#name2,:val1)',

				        ExpressionAttributeValues={':val1': [8]},

				        ExpressionAttributeNames={'#name1': 'a', '#name2': 'a'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello', 8]

				# Test the "if_not_exists" function in SET

				# The test also checks additional features of function-call parsing.

				def test_update_expression_if_not_exists(test_table_s):

				    p = random_string()

				    # Since attribute a doesn't exist, set it:

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = if_not_exists(a, :val1)',

				        ExpressionAttributeValues={':val1': 2})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2

				    # Now the attribute does exist, so set does nothing:

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = if_not_exists(a, :val1)',

				        ExpressionAttributeValues={':val1': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2

				    # if_not_exists can also be used to check one attribute and set another,

				    # but note that if_not_exists(a, :val) means a's value if it exists,

				    # otherwise :val!

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = if_not_exists(c, :val1)',

				        ExpressionAttributeValues={':val1': 4})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 4

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = if_not_exists(c, :val1)',

				        ExpressionAttributeValues={':val1': 5})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = if_not_exists(a, :val1)',

				        ExpressionAttributeValues={':val1': 6})

				    # note how because 'a' does exist, its value is copied, overwriting b's

				    # value:

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 2

				    # The parser expects function parameters to be value references, paths,

				    # or nested call to functions. Other crap will cause syntax errors:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = if_not_exists(non@sense, :val1)',

				            ExpressionAttributeValues={':val1': 6})

				    # if_not_exists() requires that the first parameter be a path. However,

				    # the parser doesn't know this, and allows for a function parameter

				    # also a value reference or a function call. If try one of these other

				    # things the parser succeeds, but we get a later error, looking like:

				    # "Invalid UpdateExpression: Operator or function requires a document

				    # path; operator or function: if_not_exists"

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = if_not_exists(if_not_exists(a, :val2), :val1)',

				            ExpressionAttributeValues={':val1': 6, ':val2': 3})

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = if_not_exists(:val2, :val1)',

				            ExpressionAttributeValues={':val1': 6, ':val2': 3})

				    # Surprisingly, if the wrong argument is a :val value reference, the

				    # parser first tries to look it up in ExpressionAttributeValues (and

				    # fails if it's missing), before realizing any value reference would be

				    # wrong... So the following fails like the above does - but with a

				    # different error message (which we do not check here): "Invalid

				    # UpdateExpression: An expression attribute value used in expression

				    # is not defined; attribute value: :val2"

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = if_not_exists(:val2, :val1)',

				            ExpressionAttributeValues={':val1': 6})

				# When the expression parser parses a function call f(value, value), each

				# value may itself be a function call - ad infinitum. So expressions like

				# list_append(if_not_exists(a, :val1), :val2) are legal and so is deeper

				# nesting.

				@pytest.mark.xfail(reason="for unknown reason, DynamoDB does not allow nesting list_append")

				def test_update_expression_function_nesting(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',

				            ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['cat', 'dog']})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog']

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',

				            ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog', '1', '2']

				    # I don't understand why the following expression isn't accepted, but it

				    # isn't! It produces a "Invalid UpdateExpression: The function is not

				    # allowed to be used this way in an expression; function: list_append".

				    # I don't know how to explain it. In any case, the *parsing* works -

				    # this is not a syntax error - the failure is in some verification later.

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = list_append(list_append(:val1, :val2), :val3)',

				                ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi']})

				    # Ditto, the following passes the parser but fails some later check with

				    # the same error message as above.

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = list_append(list_append(list_append(:val1, :val2), :val3), :val4)',

				                ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi'], ':val4': ['yo']})

				# Verify how in SET expressions, "+" (or "-") nests with functions.

				# We discover that f(x)+f(y) works but f(x+y) does NOT (results in a syntax

				# error on the "+"). This means that the parser has two separate rules:

				# 1.  set_action: SET path = value + value

				# 2.  value: VALREF | NAME | NAME (value, ...)

				def test_update_expression_function_plus_nesting(test_table_s):

				    p = random_string()

				    # As explained above, this - with "+" outside the expression, works:

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = if_not_exists(b, :val1)+:val2',

				            ExpressionAttributeValues={':val1': 2, ':val2': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5

				    # ...but this - with the "+" inside an expression parameter, is a syntax

				    # error:

				    with pytest.raises(ClientError, match='ValidationException'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET c = if_not_exists(c, :val1+:val2)',

				                ExpressionAttributeValues={':val1': 5, ':val2': 4})

				# This test tries to use an undefined function "f". This, obviously, fails,

				# but where we to actually print the error we would see "Invalid

				# UpdateExpression: Invalid function name; function: f". Not a syntax error.

				# This means that the parser accepts any alphanumeric name as a function

				# name, and only later use of this function fails because it's not one of

				# the supported file.

				def test_update_expression_unknown_function(test_table_s):

				    p = random_string()

				    with pytest.raises(ClientError, match='ValidationException.*f'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = f(b,c,d)')

				    with pytest.raises(ClientError, match='ValidationException.*f123_hi'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = f123_hi(b,c,d)')

				    # Just like unreferenced column names parsed by the DynamoDB parser,

				    # function names must also start with an alphabetic character. Trying

				    # to use _f as a function name will result with an actual syntax error,

				    # on the "_" token.

				    with pytest.raises(ClientError, match='ValidationException.*yntax error'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET a = _f(b,c,d)')

				# Test "ADD" operation for numbers

				def test_update_expression_add_numbers(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 3, 'b': 'hi'})

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='ADD a :val1',

				        ExpressionAttributeValues={':val1': 4})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 7

				    # If the value to be added isn't a number, we get an error like "Invalid

				    # UpdateExpression: Incorrect operand type for operator or function;

				    # operator: ADD, operand type: STRING".

				    with pytest.raises(ClientError, match='ValidationException.*type'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='ADD a :val1',

				            ExpressionAttributeValues={':val1': 'hello'})

				    # Similarly, if the attribute we're adding to isn't a number, we get an

				    # error like "An operand in the update expression has an incorrect data

				    # type"

				    with pytest.raises(ClientError, match='ValidationException.*type'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='ADD b :val1',

				            ExpressionAttributeValues={':val1': 1})

				# Test "ADD" operation for sets

				def test_update_expression_add_sets(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='ADD a :val1',

				        ExpressionAttributeValues={':val1': set(['pig'])})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog', 'cat', 'mouse', 'pig'])

				    # TODO: right now this test won't detect duplicated values in the returned result,

				    # because boto3 parses a set out of the returned JSON anyway. This check should leverage

				    # lower level API (if exists) to ensure that the JSON contains no duplicates

				    # in the set representation. It has been verified manually.

				    test_table_s.put_item(Item={'p': p, 'a': set(['beaver', 'lynx', 'coati']), 'b': 'hi'})

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='ADD a :val1',

				        ExpressionAttributeValues={':val1': set(['coati', 'beaver', 'badger'])})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['beaver', 'badger', 'lynx', 'coati'])

				    # The value to be added needs to be a set of the same type - it can't

				    # be a single element or anything else. If the value has the wrong type,

				    # we get an error like "Invalid UpdateExpression: Incorrect operand type

				    # for operator or function; operator: ADD, operand type: STRING".

				    with pytest.raises(ClientError, match='ValidationException.*type'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='ADD a :val1',

				            ExpressionAttributeValues={':val1': 'hello'})

				# Test "DELETE" operation for sets

				def test_update_expression_delete_sets(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='DELETE a :val1',

				        ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])

				    # Deleting an element not present in the set is not an error - it just

				    # does nothing

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='DELETE a :val1',

				        ExpressionAttributeValues={':val1': set(['pig'])})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])

				    # The value to be deleted must be a set of the same type - it can't

				    # be a single element or anything else. If the value has the wrong type,

				    # we get an error like "Invalid UpdateExpression: Incorrect operand type

				    # for operator or function; operator: DELETE, operand type: STRING".

				    with pytest.raises(ClientError, match='ValidationException.*type'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='DELETE a :val1',

				            ExpressionAttributeValues={':val1': 'hello'})

				######## Tests for paths and nested attribute updates:

				# A dot inside a name in ExpressionAttributeNames is a literal dot, and

				# results in a top-level attribute with an actual dot in its name - not

				# a nested attribute path.

				def test_update_expression_dot_in_name(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',

				        ExpressionAttributeValues={':val1': 3},

				        ExpressionAttributeNames={'#a': 'a.b'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}

				# A basic test for direct update of a nested attribute: One of the top-level

				# attributes is itself a document, and we update only one of that document's

				# nested attributes.

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_update_expression_nested_attribute_dot(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',

				        ExpressionAttributeValues={':val1': 7})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}, 'd': 5}

				    # Of course we can also add new nested attributes, not just modify

				    # existing ones:

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.d = :val1',

				        ExpressionAttributeValues={':val1': 3})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7, 'd': 3}, 'd': 5}

				# Similar test, for a list: one of the top-level attributes is a list, we

				# can update one of its items.

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_update_expression_nested_attribute_index(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[1] = :val1',

				        ExpressionAttributeValues={':val1': 'hello'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'hello', 'three']}

				# Test that just like happens in top-level attributes, also in nested

				# attributes, setting them replaces the old value - potentially an entire

				# nested document, by the whole value (which may have a different type)

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_update_expression_nested_different_type(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': {'one': 1, 'two': 2}}})

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',

				        ExpressionAttributeValues={':val1': 7})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}}

				# Yet another test of a nested attribute update. This one uses deeper

				# level of nesting (dots and indexes), adds #name references to the mix.

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_update_expression_nested_deep(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}}]}})

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c[1].#name.y[1] = :val1',

				        ExpressionAttributeValues={':val1': 9}, ExpressionAttributeNames={'#name': 'x'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==  {'b': 3, 'c': ['hi', {'x': {'y': [3, 9, 7]}}]}

				    # A deep path can also appear on the right-hand-side of an assignment

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.z = a.c[1].#name.y[1]',

				        ExpressionAttributeNames={'#name': 'x'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a']['z'] ==  9

				# A REMOVE operation can be used to remove nested attributes, and also

				# individual list items.

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_update_expression_nested_remove(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}, 'q': 2}]}})

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a.c[1].x.y[1], a.c[1].q')

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==  {'b': 3, 'c': ['hi', {'x': {'y': [3, 7]}}]}

				# The DynamoDB documentation specifies: "When you use SET to update a list

				# element, the contents of that element are replaced with the new data that

				# you specify. If the element does not already exist, SET will append the

				# new element at the end of the list."

				# So if we take a three-element list a[7], and set a[7], the new element

				# will be put at the end of the list, not position 7 specifically.

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_nested_attribute_update_array_out_of_bounds(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[7] = :val1',

				        ExpressionAttributeValues={':val1': 'hello'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello']}

				    # The DynamoDB documentation also says: "If you add multiple elements

				    # in a single SET operation, the elements are sorted in order by element

				    # number.

				    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[84] = :val1, a[37] = :val2',

				        ExpressionAttributeValues={':val1': 'a1', ':val2': 'a2'})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello', 'a2', 'a1']}

				# Test what happens if we try to write to a.b, which would only make sense if

				# a were a nested document, but a doesn't exist, or exists and is NOT a nested

				# document but rather a scalar or list or something.

				# DynamoDB actually detects this case and prints an error:

				#   ClientError: An error occurred (ValidationException) when calling the

				#   UpdateItem operation: The document path provided in the update expression

				#   is invalid for update

				# Because Scylla doesn't read before write, it cannot detect this as an error,

				# so we'll probably want to allow for that possibility as well.

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_nested_attribute_update_bad_path_dot(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': ['hi']})

				    with pytest.raises(ClientError, match='ValidationException.*path'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',

				            ExpressionAttributeValues={':val1': 7})

				    with pytest.raises(ClientError, match='ValidationException.*path'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b.c = :val1',

				            ExpressionAttributeValues={':val1': 7})

				    with pytest.raises(ClientError, match='ValidationException.*path'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c.c = :val1',

				            ExpressionAttributeValues={':val1': 7})

				# Similarly for other types of bad paths - using [0] on something which

				# isn't an array,

				@pytest.mark.xfail(reason="nested updates not yet implemented")

				def test_nested_attribute_update_bad_path_array(test_table_s):

				    p = random_string()

				    test_table_s.put_item(Item={'p': p, 'a': 'hello'})

				    with pytest.raises(ClientError, match='ValidationException.*path'):

				        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[0] = :val1',

				            ExpressionAttributeValues={':val1': 7})

									
										141

alternator-test/util.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,141 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Various utility functions which are useful for multiple tests

				import string

				import random

				import collections

				import time

				def random_string(length=10, chars=string.ascii_uppercase + string.digits):

				    return ''.join(random.choice(chars) for x in range(length))

				def random_bytes(length=10):

				    return bytearray(random.getrandbits(8) for _ in range(length))

				# Utility functions for scan and query into an array of items:

				# TODO: add to full_scan and full_query by default ConsistentRead=True, as

				# it's not useful for tests without it!

				def full_scan(table, **kwargs):

				    response = table.scan(**kwargs)

				    items = response['Items']

				    while 'LastEvaluatedKey' in response:

				        response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)

				        items.extend(response['Items'])

				    return items

				# full_scan_and_count returns both items and count as returned by the server.

				# Note that count isn't simply len(items) - the server returns them

				# independently. e.g., with Select='COUNT' the items are not returned, but

				# count is.

				def full_scan_and_count(table, **kwargs):

				    response = table.scan(**kwargs)

				    items = []

				    count = 0

				    if 'Items' in response:

				        items.extend(response['Items'])

				    if 'Count' in response:

				        count = count + response['Count']

				    while 'LastEvaluatedKey' in response:

				        response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)

				        if 'Items' in response:

				            items.extend(response['Items'])

				        if 'Count' in response:

				            count = count + response['Count']

				    return (count, items)

				# Utility function for fetching the entire results of a query into an array of items

				def full_query(table, **kwargs):

				    response = table.query(**kwargs)

				    items = response['Items']

				    while 'LastEvaluatedKey' in response:

				        response = table.query(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)

				        items.extend(response['Items'])

				    return items

				# To compare two lists of items (each is a dict) without regard for order,

				# "==" is not good enough because it will fail if the order is different.

				# The following function, multiset() converts the list into a multiset

				# (set with duplicates) where order doesn't matter, so the multisets can

				# be compared.

				def freeze(item):

				    if isinstance(item, dict):

				        return frozenset((key, freeze(value)) for key, value in item.items())

				    elif isinstance(item, list):

				        return tuple(freeze(value) for value in item)

				    return item

				def multiset(items):

				    return collections.Counter([freeze(item) for item in items])

				test_table_prefix = 'alternator_test_'

				def test_table_name():

				    current_ms = int(round(time.time() * 1000))

				    # In the off chance that test_table_name() is called twice in the same millisecond...

				    if test_table_name.last_ms >= current_ms:

				        current_ms = test_table_name.last_ms + 1

				    test_table_name.last_ms = current_ms

				    return test_table_prefix + str(current_ms)

				test_table_name.last_ms = 0

				def create_test_table(dynamodb, **kwargs):

				    name = test_table_name()

				    print("fixture creating new table {}".format(name))

				    table = dynamodb.create_table(TableName=name,

				        BillingMode='PAY_PER_REQUEST', **kwargs)

				    waiter = table.meta.client.get_waiter('table_exists')

				    # recheck every second instead of the default, lower, frequency. This can

				    # save a few seconds on AWS with its very slow table creation, but can

				    # more on tests on Scylla with its faster table creation turnaround.

				    waiter.config.delay = 1

				    waiter.config.max_attempts = 200

				    waiter.wait(TableName=name)

				    return table

				# DynamoDB's ListTables request returns up to a single page of table names

				# (e.g., up to 100) and it is up to the caller to call it again and again

				# to get the next page. This is a utility function which calls it repeatedly

				# as much as necessary to get the entire list.

				# We deliberately return a list and not a set, because we want the caller

				# to be able to recognize bugs in ListTables which causes the same table

				# to be returned twice.

				def list_tables(dynamodb, limit=100):

				    ret = []

				    pos = None

				    while True:

				        if pos:

				            page = dynamodb.meta.client.list_tables(Limit=limit, ExclusiveStartTableName=pos);

				        else:

				            page = dynamodb.meta.client.list_tables(Limit=limit);

				        results = page.get('TableNames', None)

				        assert(results)

				        ret = ret + results

				        newpos = page.get('LastEvaluatedTableName', None)

				        if not newpos:

				            break;

				        # It doesn't make sense for Dynamo to tell us we need more pages, but

				        # not send anything in *this* page!

				        assert len(results) > 0

				        assert newpos != pos

				        # Note that we only checked that we got back tables, not that we got

				        # any new tables not already in ret. So a buggy implementation might

				        # still cause an endless loop getting the same tables again and again.

				        pos = newpos

				    return ret

									
										147

alternator/auth.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,147 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "alternator/error.hh"

				#include "log.hh"

				#include <string>

				#include <string_view>

				#include <gnutls/crypto.h>

				#include <seastar/util/defer.hh>

				#include "hashers.hh"

				#include "bytes.hh"

				#include "alternator/auth.hh"

				#include <fmt/format.h>

				#include "auth/common.hh"

				#include "auth/password_authenticator.hh"

				#include "auth/roles-metadata.hh"

				#include "cql3/query_processor.hh"

				#include "cql3/untyped_result_set.hh"

				namespace alternator {

				static logging::logger alogger("alternator-auth");

				static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {

				    hmac_sha256_digest digest;

				    int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());

				    if (ret) {

				        throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));

				    }

				    return digest;

				}

				static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {

				    auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);

				    auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);

				    auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);

				    auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");

				    return signing;

				}

				static std::string apply_sha256(std::string_view msg) {

				    sha256_hasher hasher;

				    hasher.update(msg.data(), msg.size());

				    return to_hex(hasher.finalize());

				}

				static std::string format_time_point(db_clock::time_point tp) {

				    time_t time_point_repr = db_clock::to_time_t(tp);

				    std::string time_point_str;

				    time_point_str.resize(17);

				    ::tm time_buf;

				    // strftime prints the terminating null character as well

				    std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));

				    time_point_str.resize(16);

				    return time_point_str;

				}

				void check_expiry(std::string_view signature_date) {

				    //FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it

				    std::string expiration_str = format_time_point(db_clock::now() - 15min);

				    std::string validity_str = format_time_point(db_clock::now() + 15min);

				    if (signature_date < expiration_str) {

				        throw api_error("InvalidSignatureException",

				                fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",

				                signature_date, expiration_str));

				    }

				    if (signature_date > validity_str) {

				        throw api_error("InvalidSignatureException",

				                fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",

				                signature_date, validity_str));

				    }

				}

				std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,

				        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,

				        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {

				    auto amz_date_it = signed_headers_map.find("x-amz-date");

				    if (amz_date_it == signed_headers_map.end()) {

				        throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");

				    }

				    std::string_view amz_date = amz_date_it->second;

				    check_expiry(amz_date);

				    std::string_view datestamp = amz_date.substr(0, 8);

				    if (datestamp != orig_datestamp) {

				        throw api_error("InvalidSignatureException",

				                format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",

				                        orig_datestamp, datestamp));

				    }

				    std::string_view canonical_uri = "/";

				    std::stringstream canonical_headers;

				    for (const auto& header : signed_headers_map) {

				        canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';

				    }

				    std::string payload_hash = apply_sha256(body_content);

				    std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);

				    std::string_view algorithm = "AWS4-HMAC-SHA256";

				    std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);

				    std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope,  apply_sha256(canonical_request));

				    hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);

				    hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);

				    return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));

				}

				future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {

				    static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",

				            auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);

				    auto cl = auth::password_authenticator::consistency_for_user(username);

				    auto timeout = auth::internal_distributed_timeout_config();

				    return qp.process(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {

				        auto res = f.get0();

				        auto salted_hash = std::optional<sstring>();

				        if (res->empty()) {

				            throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));

				        }

				        salted_hash = res->one().get_opt<sstring>("salted_hash");

				        if (!salted_hash) {

				            throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));

				        }

				        return make_ready_future<std::string>(*salted_hash);

				    });

				}

				}

									
										46

alternator/auth.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,46 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string>

				#include <string_view>

				#include <array>

				#include "gc_clock.hh"

				#include "utils/loading_cache.hh"

				namespace cql3 {

				class query_processor;

				}

				namespace alternator {

				using hmac_sha256_digest = std::array<char, 32>;

				using key_cache = utils::loading_cache<std::string, std::string>;

				std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,

				        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,

				        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);

				future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);

				}

									
										111

alternator/base64.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,111 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				// The DynamoAPI dictates that "binary" (a.k.a. "bytes" or "blob") values

				// be encoded in the JSON API as base64-encoded strings. This is code to

				// convert byte arrays to base64-encoded strings, and back.

				#include "base64.hh"

				#include <ctype.h>

				// Arrays for quickly converting to and from an integer between 0 and 63,

				// and the character used in base64 encoding to represent it.

				static class base64_chars {

				public:

				    static constexpr const char* to =

				            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

				    int8_t from[255];

				    base64_chars() {

				        static_assert(strlen(to) == 64);

				        for (int i = 0; i < 255; i++) {

				            from[i] = 255; // signal invalid character

				        }

				        for (int i = 0; i < 64; i++) {

				            from[(unsigned) to[i]] = i;

				        }

				    }

				} base64_chars;

				std::string base64_encode(bytes_view in) {

				    std::string ret;

				    ret.reserve(((4 * in.size() / 3) + 3) & ~3);

				    int i = 0;

				    unsigned char chunk3[3]; // chunk of input

				    for (auto byte : in) {

				        chunk3[i++] = byte;

				        if (i == 3) {

				            ret += base64_chars.to[ (chunk3[0] & 0xfc) >> 2 ];

				            ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];

				            ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];

				            ret += base64_chars.to[ chunk3[2] & 0x3f ];

				            i = 0;

				        }

				    }

				    if (i) {

				        // i can be 1 or 2.

				        for(int j = i; j < 3; j++)

				            chunk3[j] = '\0';

				        ret += base64_chars.to[ ( chunk3[0] & 0xfc) >> 2 ];

				        ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];

				        if (i == 2) {

				            ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];

				        } else {

				            ret += '=';

				        }

				        ret += '=';

				    }

				    return ret;

				}

				bytes base64_decode(std::string_view in) {

				    int i = 0;

				    int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;

				    std::string ret;

				    ret.reserve(in.size() * 3 / 4);

				    for (unsigned char c : in) {

				        uint8_t dc = base64_chars.from[c];

				        if (dc == 255) {

				            // Any unexpected character, include the "=" character usually

				            // used for padding, signals the end of the decode.

				            break;

				        }

				        chunk4[i++] = dc;

				        if (i == 4) {

				            ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);

				            ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);

				            ret += ((chunk4[2] & 0x3) << 6) + chunk4[3];

				            i = 0;

				        }

				    }

				    if (i) {

				        // i can be 2 or 3, meaning 1 or 2 more output characters

				        if (i>=2)

				            ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);

				        if (i==3)

				            ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);

				    }

				    // FIXME: This copy is sad. The problem is we need back "bytes"

				    // but "bytes" doesn't have efficient append and std::string.

				    // To fix this we need to use bytes' "uninitialized" feature.

				    return bytes(ret.begin(), ret.end());

				}

									
										34

alternator/base64.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,34 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string_view>

				#include "bytes.hh"

				#include "rjson.hh"

				std::string base64_encode(bytes_view);

				bytes base64_decode(std::string_view);

				inline bytes base64_decode(const rjson::value& v) {

				  return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));

				}

									
										564

alternator/conditions.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,564 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include <list>

				#include <map>

				#include <string_view>

				#include "alternator/conditions.hh"

				#include "alternator/error.hh"

				#include "cql3/constants.hh"

				#include <unordered_map>

				#include "rjson.hh"

				#include "serialization.hh"

				#include "base64.hh"

				#include <stdexcept>

				namespace alternator {

				static logging::logger clogger("alternator-conditions");

				comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {

				    static std::unordered_map<std::string, comparison_operator_type> ops = {

				            {"EQ", comparison_operator_type::EQ},

				            {"NE", comparison_operator_type::NE},

				            {"LE", comparison_operator_type::LE},

				            {"LT", comparison_operator_type::LT},

				            {"GE", comparison_operator_type::GE},

				            {"GT", comparison_operator_type::GT},

				            {"IN", comparison_operator_type::IN},

				            {"NULL", comparison_operator_type::IS_NULL},

				            {"NOT_NULL", comparison_operator_type::NOT_NULL},

				            {"BETWEEN", comparison_operator_type::BETWEEN},

				            {"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},

				            {"CONTAINS", comparison_operator_type::CONTAINS},

				            {"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},

				    };

				    if (!comparison_operator.IsString()) {

				        throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));

				    }

				    std::string op = comparison_operator.GetString();

				    auto it = ops.find(op);

				    if (it == ops.end()) {

				        throw api_error("ValidationException", format("Unsupported comparison operator {}", op));

				    }

				    return it->second;

				}

				static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {

				    bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));

				    auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));

				    bytes raw_value = serialize_item(value);

				    auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));

				    return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));

				}

				static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {

				    bytes raw_value = get_key_from_typed_value(value, cdef, type_to_string(cdef.type));

				    auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));

				    return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));

				}

				::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {

				    clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));

				    auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);

				    for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {

				        std::string_view column_name(it->name.GetString(), it->name.GetStringLength());

				        const rjson::value& condition = it->value;

				        const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");

				        const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");

				        comparison_operator_type op = get_comparison_operator(comp_definition);

				        if (op != comparison_operator_type::EQ) {

				            throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");

				        }

				        if (attr_list.Size() != 1) {

				            throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));

				        }

				        if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {

				            // Primary key restriction

				            filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);

				        } else {

				            // Regular column restriction

				            filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);

				        }

				    }

				    return filtering_restrictions;

				}

				namespace {

				struct size_check {

				    // True iff size passes this check.

				    virtual bool operator()(rapidjson::SizeType size) const = 0;

				    // Check description, such that format("expected array {}", check.what()) is human-readable.

				    virtual sstring what() const = 0;

				};

				class exact_size : public size_check {

				    rapidjson::SizeType _expected;

				  public:

				    explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}

				    bool operator()(rapidjson::SizeType size) const override { return size == _expected; }

				    sstring what() const override { return format("of size {}", _expected); }

				};

				struct empty : public size_check {

				    bool operator()(rapidjson::SizeType size) const override { return size < 1; }

				    sstring what() const override { return "to be empty"; }

				};

				struct nonempty : public size_check {

				    bool operator()(rapidjson::SizeType size) const override { return size > 0; }

				    sstring what() const override { return "to be non-empty"; }

				};

				} // anonymous namespace

				// Check that array has the expected number of elements

				static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {

				    if (!array || !array->IsArray()) {

				        throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");

				    }

				    if (!expected(array->Size())) {

				        throw api_error("ValidationException",

				                        format("{} operator requires AttributeValueList {}, instead found list size {}",

				                               op, expected.what(), array->Size()));

				    }

				}

				struct rjson_engaged_ptr_comp {

				    bool operator()(const rjson::value* p1, const rjson::value* p2) const {

				        return rjson::single_value_comp()(*p1, *p2);

				    }

				};

				// It's not enough to compare underlying JSON objects when comparing sets,

				// as internally they're stored in an array, and the order of elements is

				// not important in set equality. See issue #5021

				static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {

				    if (set1.Size() != set2.Size()) {

				        return false;

				    }

				    std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;

				    for (auto it = set1.Begin(); it != set1.End(); ++it) {

				        set1_raw.insert(&*it);

				    }

				    for (const auto& a : set2.GetArray()) {

				        if (set1_raw.count(&a) == 0) {

				            return false;

				        }

				    }

				    return true;

				}

				// Check if two JSON-encoded values match with the EQ relation

				static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {

				    if (!v1) {

				        return false;

				    }

				    if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {

				        auto it1 = v1->MemberBegin();

				        auto it2 = v2.MemberBegin();

				        if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {

				            return check_EQ_for_sets(it1->value, it2->value);

				        }

				    }

				    return *v1 == v2;

				}

				// Check if two JSON-encoded values match with the NE relation

				static bool check_NE(const rjson::value* v1, const rjson::value& v2) {

				    return !v1 || *v1 != v2; // null is unequal to anything.

				}

				// Check if two JSON-encoded values match with the BEGINS_WITH relation

				static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {

				    // BEGINS_WITH requires that its single operand (v2) be a string or

				    // binary - otherwise it's a validation error. However, problems with

				    // the stored attribute (v1) will just return false (no match).

				    if (!v2.IsObject() || v2.MemberCount() != 1) {

				        throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));

				    }

				    auto it2 = v2.MemberBegin();

				    if (it2->name != "S" && it2->name != "B") {

				        throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));

				    }

				    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {

				        return false;

				    }

				    auto it1 = v1->MemberBegin();

				    if (it1->name != it2->name) {

				        return false;

				    }

				    if (it2->name == "S") {

				        std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());

				        std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());

				        return val1.substr(0, val2.size()) == val2;

				    } else /* it2->name == "B" */ {

				        // TODO (optimization): Check the begins_with condition directly on

				        // the base64-encoded string, without making a decoded copy.

				        bytes val1 = base64_decode(it1->value);

				        bytes val2 = base64_decode(it2->value);

				        return val1.substr(0, val2.size()) == val2;

				    }

				}

				static std::string_view to_string_view(const rjson::value& v) {

				    return std::string_view(v.GetString(), v.GetStringLength());

				}

				static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {

				    return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");

				}

				// Check if two JSON-encoded values match with the CONTAINS relation

				static bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				    if (!v1) {

				        return false;

				    }

				    const auto& kv1 = *v1->MemberBegin();

				    const auto& kv2 = *v2.MemberBegin();

				    if (kv2.name != "S" && kv2.name != "N" &&  kv2.name != "B") {

				        throw api_error("ValidationException",

				                        format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "

				                               "got {} instead", kv2.name));

				    }

				    if (kv1.name == "S" && kv2.name == "S") {

				        return to_string_view(kv1.value).find(to_string_view(kv2.value)) != std::string_view::npos;

				    } else if (kv1.name == "B" && kv2.name == "B") {

				        return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;

				    } else if (is_set_of(kv1.name, kv2.name)) {

				        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {

				            if (*i == kv2.value) {

				                return true;

				            }

				        }

				    } else if (kv1.name == "L") {

				        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {

				            if (!i->IsObject() || i->MemberCount() != 1) {

				                clogger.error("check_CONTAINS received a list whose element is malformed");

				                return false;

				            }

				            const auto& el = *i->MemberBegin();

				            if (el.name == kv2.name && el.value == kv2.value) {

				                return true;

				            }

				        }

				    }

				    return false;

				}

				// Check if two JSON-encoded values match with the NOT_CONTAINS relation

				static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				    if (!v1) {

				        return false;

				    }

				    return !check_CONTAINS(v1, v2);

				}

				// Check if a JSON-encoded value equals any element of an array, which must have at least one element.

				static bool check_IN(const rjson::value* val, const rjson::value& array) {

				    if (!array[0].IsObject() || array[0].MemberCount() != 1) {

				        throw api_error("ValidationException",

				                        format("IN operator encountered malformed AttributeValue: {}", array[0]));

				    }

				    const auto& type = array[0].MemberBegin()->name;

				    if (type != "S" && type != "N" && type != "B") {

				        throw api_error("ValidationException",

				                        "IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");

				    }

				    if (!val) {

				        return false;

				    }

				    bool have_match = false;

				    for (const auto& elem : array.GetArray()) {

				        if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {

				            throw api_error("ValidationException",

				                            "IN operator requires all AttributeValueList elements to have the same type ");

				        }

				        if (!have_match && *val == elem) {

				            // Can't return yet, must check types of all array elements. <sigh>

				            have_match = true;

				        }

				    }

				    return have_match;

				}

				static bool check_NULL(const rjson::value* val) {

				    return val == nullptr;

				}

				static bool check_NOT_NULL(const rjson::value* val) {

				    return val != nullptr;

				}

				// Check if two JSON-encoded values match with cmp.

				template <typename Comparator>

				bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {

				    if (!v2.IsObject() || v2.MemberCount() != 1) {

				        throw api_error("ValidationException",

				                        format("{} requires a single AttributeValue of type String, Number, or Binary",

				                               cmp.diagnostic));

				    }

				    const auto& kv2 = *v2.MemberBegin();

				    if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {

				        throw api_error("ValidationException",

				                        format("{} requires a single AttributeValue of type String, Number, or Binary",

				                               cmp.diagnostic));

				    }

				    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {

				        return false;

				    }

				    const auto& kv1 = *v1->MemberBegin();

				    if (kv1.name != kv2.name) {

				        return false;

				    }

				    if (kv1.name == "N") {

				        return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));

				    }

				    if (kv1.name == "S") {

				        return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),

				                   std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));

				    }

				    if (kv1.name == "B") {

				        return cmp(base64_decode(kv1.value), base64_decode(kv2.value));

				    }

				    clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");

				    return false;

				}

				struct cmp_lt {

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }

				    static constexpr const char* diagnostic = "LT operator";

				};

				struct cmp_le {

				    // bytes only has <, so we cannot use <=.

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs || lhs == rhs; }

				    static constexpr const char* diagnostic = "LE operator";

				};

				struct cmp_ge {

				    // bytes only has <, so we cannot use >=.

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs || lhs == rhs; }

				    static constexpr const char* diagnostic = "GE operator";

				};

				struct cmp_gt {

				    // bytes only has <, so we cannot use >.

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }

				    static constexpr const char* diagnostic = "GT operator";

				};

				// True if v is between lb and ub, inclusive.  Throws if lb > ub.

				template <typename T>

				bool check_BETWEEN(const T& v, const T& lb, const T& ub) {

				    if (ub < lb) {

				        throw api_error("ValidationException",

				                        format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));

				    }

				    return cmp_ge()(v, lb) && cmp_le()(v, ub);

				}

				static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {

				    if (!v) {

				        return false;

				    }

				    if (!v->IsObject() || v->MemberCount() != 1) {

				        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", *v));

				    }

				    if (!lb.IsObject() || lb.MemberCount() != 1) {

				        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", lb));

				    }

				    if (!ub.IsObject() || ub.MemberCount() != 1) {

				        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", ub));

				    }

				    const auto& kv_v = *v->MemberBegin();

				    const auto& kv_lb = *lb.MemberBegin();

				    const auto& kv_ub = *ub.MemberBegin();

				    if (kv_lb.name != kv_ub.name) {

				        throw api_error(

				                "ValidationException",

				                format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",

				                       kv_lb.name, kv_ub.name));

				    }

				    if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.

				        return false;

				    }

				    if (kv_v.name == "N") {

				        const char* diag = "BETWEEN operator";

				        return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));

				    }

				    if (kv_v.name == "S") {

				        return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),

				                             std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),

				                             std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));

				    }

				    if (kv_v.name == "B") {

				        return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));

				    }

				    throw api_error("ValidationException",

				        format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",

				               kv_lb.name));

				}

				// Verify one Expect condition on one attribute (whose content is "got")

				// for the verify_expected() below.

				// This function returns true or false depending on whether the condition

				// succeeded - it does not throw ConditionalCheckFailedException.

				// However, it may throw ValidationException on input validation errors.

				static bool verify_expected_one(const rjson::value& condition, const rjson::value* got) {

				    const rjson::value* comparison_operator = rjson::find(condition, "ComparisonOperator");

				    const rjson::value* attribute_value_list = rjson::find(condition, "AttributeValueList");

				    const rjson::value* value = rjson::find(condition, "Value");

				    const rjson::value* exists = rjson::find(condition, "Exists");

				    // There are three types of conditions that Expected supports:

				    // A value, not-exists, and a comparison of some kind. Each allows

				    // and requires a different combinations of parameters in the request

				    if (value) {

				        if (exists && (!exists->IsBool() || exists->GetBool() != true)) {

				            throw api_error("ValidationException", "Cannot combine Value with Exists!=true");

				        }

				        if (comparison_operator) {

				            throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");

				        }

				        return check_EQ(got, *value);

				    } else if (exists) {

				        if (comparison_operator) {

				            throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");

				        }

				        if (!exists->IsBool() || exists->GetBool() != false) {

				            throw api_error("ValidationException", "Exists!=false requires Value");

				        }

				        // Remember Exists=false, so we're checking that the attribute does *not* exist:

				        return !got;

				    } else {

				        if (!comparison_operator) {

				            throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");

				        }

				        comparison_operator_type op = get_comparison_operator(*comparison_operator);

				        switch (op) {

				        case comparison_operator_type::EQ:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_EQ(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::NE:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_NE(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::LT:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_lt{});

				        case comparison_operator_type::LE:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_le{});

				        case comparison_operator_type::GT:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_gt{});

				        case comparison_operator_type::GE:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_ge{});

				        case comparison_operator_type::BEGINS_WITH:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_BEGINS_WITH(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::IN:

				            verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);

				            return check_IN(got, *attribute_value_list);

				        case comparison_operator_type::IS_NULL:

				            verify_operand_count(attribute_value_list, empty(), *comparison_operator);

				            return check_NULL(got);

				        case comparison_operator_type::NOT_NULL:

				            verify_operand_count(attribute_value_list, empty(), *comparison_operator);

				            return check_NOT_NULL(got);

				        case comparison_operator_type::BETWEEN:

				            verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);

				            return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);

				        case comparison_operator_type::CONTAINS:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_CONTAINS(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::NOT_CONTAINS:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_NOT_CONTAINS(got, (*attribute_value_list)[0]);

				        }

				        throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));

				    }

				}

				// Verify that the existing values of the item (previous_item) match the

				// conditions given by the Expected and ConditionalOperator parameters

				// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).

				// This function will throw a ConditionalCheckFailedException API error

				// if the values do not match the condition, or ValidationException if there

				// are errors in the format of the condition itself.

				void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {

				    const rjson::value* expected = rjson::find(req, "Expected");

				    if (!expected) {

				        return;

				    }

				    if (!expected->IsObject()) {

				        throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");

				    }

				    // ConditionalOperator can be "AND" for requiring all conditions, or

				    // "OR" for requiring one condition, and defaults to "AND" if missing.

				    const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");

				    bool require_all = true;

				    if (conditional_operator) {

				        if (!conditional_operator->IsString()) {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");

				        }

				        std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());

				        if (s == "AND") {

				            // require_all is already true

				        } else if (s == "OR") {

				            require_all = false;

				        } else {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");

				        }

				        if (expected->GetObject().ObjectEmpty()) {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");

				        }

				    }

				    for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {

				        const rjson::value* got = nullptr;

				        if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {

				            got = rjson::find((*previous_item)["Item"], rjson::string_ref_type(it->name.GetString()));

				        }

				        bool success = verify_expected_one(it->value, got);

				        if (success && !require_all) {

				            // When !require_all, one success is enough!

				            return;

				        } else if (!success && require_all) {

				            // When require_all, one failure is enough!

				            throw api_error("ConditionalCheckFailedException", "Failed condition.");

				        }

				    }

				    // If we got here and require_all, none of the checks failed, so succeed.

				    // If we got here and !require_all, all of the checks failed, so fail.

				    if (!require_all) {

				        throw api_error("ConditionalCheckFailedException", "None of ORed Expect conditions were successful.");

				    }

				}

				}

									
										49

alternator/conditions.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,49 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				/*

				 * This file contains definitions and functions related to placing conditions

				 * on Alternator queries (equivalent of CQL's restrictions).

				 *

				 * With conditions, it's possible to add criteria to selection requests (Scan, Query)

				 * and use them for narrowing down the result set, by means of filtering or indexing.

				 *

				 * Ref: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html

				 */

				#pragma once

				#include "cql3/restrictions/statement_restrictions.hh"

				#include "serialization.hh"

				namespace alternator {

				enum class comparison_operator_type {

				    EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH

				};

				comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);

				::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);

				void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);

				}

									
										50

alternator/error.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,50 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				namespace alternator {

				// DynamoDB's error messages are described in detail in

				// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html

				// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser

				// HTTP code (almost always, 400), and a human readable message. Eventually these

				// will be wrapped into a JSON object returned to the client.

				class api_error : public std::exception {

				public:

				    using status_type = httpd::reply::status_type;

				    status_type _http_code;

				    std::string _type;

				    std::string _msg;

				    api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)

				        : _http_code(std::move(http_code))

				        , _type(std::move(type))

				        , _msg(std::move(msg))

				    { }

				    api_error() = default;

				    virtual const char* what() const noexcept override { return _msg.c_str(); }

				};

				}

2275

alternator/executor.cc Normal file

View File

File diff suppressed because it is too large Load Diff

									
										71

alternator/executor.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,71 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <seastar/core/future.hh>

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				#include <seastar/json/json_elements.hh>

				#include "service/storage_proxy.hh"

				#include "service/migration_manager.hh"

				#include "service/client_state.hh"

				#include "stats.hh"

				namespace alternator {

				class executor {

				    service::storage_proxy& _proxy;

				    service::migration_manager& _mm;

				public:

				    using client_state = service::client_state;

				    stats _stats;

				    static constexpr auto ATTRS_COLUMN_NAME = ":attrs";

				    static constexpr auto KEYSPACE_NAME = "alternator";

				    executor(service::storage_proxy& proxy, service::migration_manager& mm) : _proxy(proxy), _mm(mm) {}

				    future<json::json_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> list_tables(client_state& client_state, std::string content);

				    future<json::json_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> describe_endpoints(client_state& client_state, std::string content, std::string host_header);

				    future<json::json_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<json::json_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);

				    future<> start();

				    future<> stop() { return make_ready_future<>(); }

				    future<> maybe_create_keyspace();

				    static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);

				};

				}

									
										98

alternator/expressions.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,98 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "expressions.hh"

				#include "alternator/expressionsLexer.hpp"

				#include "alternator/expressionsParser.hpp"

				#include <seastarx.hh>

				#include <seastar/core/print.hh>

				#include <seastar/util/log.hh>

				#include <functional>

				namespace alternator {

				template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>

				Result do_with_parser(std::string input, Func&& f) {

				    expressionsLexer::InputStreamType input_stream{

				        reinterpret_cast<const ANTLR_UINT8*>(input.data()),

				        ANTLR_ENC_UTF8,

				        static_cast<ANTLR_UINT32>(input.size()),

				        nullptr };

				    expressionsLexer lexer(&input_stream);

				    expressionsParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());

				    expressionsParser parser(&tstream);

				    auto result = f(parser);

				    return result;

				}

				parsed::update_expression

				parse_update_expression(std::string query) {

				    try {

				        return do_with_parser(query,  std::mem_fn(&expressionsParser::update_expression));

				    } catch (...) {

				        throw expressions_syntax_error(format("Failed parsing UpdateExpression '{}': {}", query, std::current_exception()));

				    }

				}

				std::vector<parsed::path>

				parse_projection_expression(std::string query) {

				    try {

				        return do_with_parser(query,  std::mem_fn(&expressionsParser::projection_expression));

				    } catch (...) {

				        throw expressions_syntax_error(format("Failed parsing ProjectionExpression '{}': {}", query, std::current_exception()));

				    }

				}

				template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };

				template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;

				namespace parsed {

				void update_expression::add(update_expression::action a) {

				    std::visit(overloaded {

				        [&] (action::set&)    { seen_set = true; },

				        [&] (action::remove&) { seen_remove = true; },

				        [&] (action::add&)    { seen_add = true; },

				        [&] (action::del&)    { seen_del = true; }

				    }, a._action);

				    _actions.push_back(std::move(a));

				}

				void update_expression::append(update_expression other) {

				    if ((seen_set && other.seen_set) ||

				        (seen_remove && other.seen_remove) ||

				        (seen_add && other.seen_add) ||

				        (seen_del && other.seen_del)) {

				        throw expressions_syntax_error("Each of SET, REMOVE, ADD, DELETE may only appear once in UpdateExpression");

				    }

				    std::move(other._actions.begin(), other._actions.end(), std::back_inserter(_actions));

				    seen_set |= other.seen_set;

				    seen_remove |= other.seen_remove;

				    seen_add |= other.seen_add;

				    seen_del |= other.seen_del;

				}

				} // namespace parsed

				} // namespace alternator

214

alternator/expressions.g Normal file

View File

@@ -0,0 +1,214 @@
 /*
  * Copyright 2019 ScyllaDB
  *
  * This file is part of Scylla. See the LICENSE.PROPRIETARY file in the
  * top-level directory for licensing information.
  */
 /*
  * This file is part of Scylla.
  *
  * Scylla is free software: you can redistribute it and/or modify
  * it under the terms of the GNU Affero General Public License as published by
  * the Free Software Foundation, either version 3 of the License, or
  * (at your option) any later version.
  *
  * Scylla is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU Affero General Public License
  * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
  */
 /*
  * The DynamoDB protocol is based on JSON, and most DynamoDB requests
  * describe the operation and its parameters via JSON objects such as maps
  * and lists. Nevertheless, in some types of requests an "expression" is
  * passed as a single string, and we need to parse this string. These
  * cases include:
  *  1. Attribute paths, such as "a[3].b.c", are used in projection
  *     expressions as well as inside other expressions described below.
  *  2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
  *     used in conditional updates, filters, and other places.
  *  3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"
  *
  * All these expression syntaxes are very simple: Most of them could be
  * parsed as regular expressions, and the parenthesized condition expression
  * could be done with a simple hand-written lexical analyzer and recursive-
  * descent parser. Nevertheless, we decided to specify these parsers in the
  * ANTLR3 language already used in the Scylla project, hopefully making these
  * parsers easier to reason about, and easier to change if needed - and
  * reducing the amount of boiler-plate code.
  */
 grammar expressions;
 options {
     language = Cpp;
 }
 @parser::namespace{alternator}
 @lexer::namespace{alternator}
 /* TODO: explain what these traits things are. I haven't seen them explained
  * in any document... Compilation fails without these fail because a definition
  * of "expressionsLexerTraits" and "expressionParserTraits" is needed.
  */
 @lexer::traits {
     class expressionsLexer;
     class expressionsParser;
     typedef antlr3::Traits<expressionsLexer, expressionsParser> expressionsLexerTraits;
 }
 @parser::traits {
     typedef expressionsLexerTraits expressionsParserTraits;
 }
 @lexer::header {
 	#include "alternator/expressions.hh"
 	// ANTLR generates a bunch of unused variables and functions. Yuck...
     #pragma GCC diagnostic ignored "-Wunused-variable"
     #pragma GCC diagnostic ignored "-Wunused-function"
 }
 @parser::header {
 	#include "expressionsLexer.hpp"
 }
 /* By default, ANTLR3 composes elaborate syntax-error messages, saying which
  * token was unexpected, where, and so on on, but then dutifully writes these
  * error messages to the standard error, and returns from the parser as if
  * everything was fine, with a half-constructed output object! If we define
  * the "displayRecognitionError" method, it will be called upon to build this
  * error message, and we can instead throw an exception to stop the parsing
  * immediately. This is good enough for now, for our simple needs, but if
  * we ever want to show more information about the syntax error, Cql3.g
  * contains an elaborate implementation (it would be nice if we could reuse
  * it, not duplicate it).
  * Unfortunately, we have to repeat the same definition twice - once for the
  * parser, and once for the lexer.
  */
 @parser::context {
     void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
         throw expressions_syntax_error("syntax error");
     }
 }
 @lexer::context {
     void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
         throw expressions_syntax_error("syntax error");
     }
 }
 /*
  * Lexical analysis phase, i.e., splitting the input up to tokens.
  * Lexical analyzer rules have names starting in capital letters.
  * "fragment" rules do not generate tokens, and are just aliases used to
  * make other rules more readable.
  * Characters *not* listed here, e.g., '=', '(', etc., will be handled
  * as individual tokens on their own right.
  * Whitespace spans are skipped, so do not generate tokens.
  */
 WHITESPACE: (' ' | '\t' | '\n' | '\r')+ { skip(); };
 /* shortcuts for case-insensitive keywords */
 fragment A:('a'|'A');
 fragment B:('b'|'B');
 fragment C:('c'|'C');
 fragment D:('d'|'D');
 fragment E:('e'|'E');
 fragment F:('f'|'F');
 fragment G:('g'|'G');
 fragment H:('h'|'H');
 fragment I:('i'|'I');
 fragment J:('j'|'J');
 fragment K:('k'|'K');
 fragment L:('l'|'L');
 fragment M:('m'|'M');
 fragment N:('n'|'N');
 fragment O:('o'|'O');
 fragment P:('p'|'P');
 fragment Q:('q'|'Q');
 fragment R:('r'|'R');
 fragment S:('s'|'S');
 fragment T:('t'|'T');
 fragment U:('u'|'U');
 fragment V:('v'|'V');
 fragment W:('w'|'W');
 fragment X:('x'|'X');
 fragment Y:('y'|'Y');
 fragment Z:('z'|'Z');
 /* These keywords must be appear before the generic NAME token below,
  * because NAME matches too, and the first to match wins.
  */
 SET: S E T;
 REMOVE: R E M O V E;
 ADD: A D D;
 DELETE: D E L E T E;
 fragment ALPHA: 'A'..'Z' | 'a'..'z';
 fragment DIGIT: '0'..'9';
 fragment ALNUM: ALPHA | DIGIT | '_';
 INTEGER: DIGIT+;
 NAME: ALPHA ALNUM*;
 NAMEREF: '#' ALNUM+;
 VALREF: ':' ALNUM+;
 /*
  * Parsing phase - parsing the string of tokens generated by the lexical
  * analyzer defined above.
  */
 path_component: NAME | NAMEREF;
 path returns [parsed::path p]:
     root=path_component           { $p.set_root($root.text); }
     (   '.' name=path_component   { $p.add_dot($name.text); }
       | '[' INTEGER ']'           { $p.add_index(std::stoi($INTEGER.text)); }
     )*;
 update_expression_set_value returns [parsed::value v]:
       VALREF                             { $v.set_valref($VALREF.text); }
     | path                               { $v.set_path($path.p); }
     | NAME                               { $v.set_func_name($NAME.text); }
      '(' x=update_expression_set_value   { $v.add_func_parameter($x.v); }
      (',' x=update_expression_set_value  { $v.add_func_parameter($x.v); })*
      ')'
     ;
 update_expression_set_rhs returns [parsed::set_rhs rhs]:
     v=update_expression_set_value  { $rhs.set_value(std::move($v.v)); }
     (   '+' v=update_expression_set_value  { $rhs.set_plus(std::move($v.v)); }
       | '-' v=update_expression_set_value  { $rhs.set_minus(std::move($v.v)); }
     )?
     ;
 update_expression_set_action returns [parsed::update_expression::action a]:
     path '=' rhs=update_expression_set_rhs { $a.assign_set($path.p, $rhs.rhs); };
 update_expression_remove_action returns [parsed::update_expression::action a]:
     path { $a.assign_remove($path.p); };
 update_expression_add_action returns [parsed::update_expression::action a]:
     path VALREF { $a.assign_add($path.p, $VALREF.text); };
 update_expression_delete_action returns [parsed::update_expression::action a]:
     path VALREF { $a.assign_del($path.p, $VALREF.text); };
 update_expression_clause returns [parsed::update_expression e]:
       SET s=update_expression_set_action { $e.add(s); }
       (',' s=update_expression_set_action { $e.add(s); })*
     | REMOVE r=update_expression_remove_action { $e.add(r); }
       (',' r=update_expression_remove_action { $e.add(r); })*
     | ADD a=update_expression_add_action { $e.add(a); }
       (',' a=update_expression_add_action { $e.add(a); })*
     | DELETE d=update_expression_delete_action { $e.add(d); }
       (',' d=update_expression_delete_action { $e.add(d); })*
     ;
 // Note the "EOF" token at the end of the update expression. We want to the
 //  parser to match the entire string given to it - not just its beginning!
 update_expression returns [parsed::update_expression e]:
     (update_expression_clause { e.append($update_expression_clause.e); })* EOF;
 projection_expression returns [std::vector<parsed::path> v]:
     p=path      { $v.push_back(std::move($p.p)); }
     (',' p=path { $v.push_back(std::move($p.p)); } )* EOF;

									
										41

alternator/expressions.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,41 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string>

				#include <stdexcept>

				#include <vector>

				#include "expressions_types.hh"

				namespace alternator {

				class expressions_syntax_error : public std::runtime_error {

				public:

				    using runtime_error::runtime_error;

				};

				parsed::update_expression parse_update_expression(std::string query);

				std::vector<parsed::path> parse_projection_expression(std::string query);

				} /* namespace alternator */

									
										166

alternator/expressions_types.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,166 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <vector>

				#include <string>

				#include <variant>

				/*

				 * Parsed representation of expressions and their components.

				 *

				 * Types in alternator::parse namespace are used for holding the parse

				 * tree - objects generated by the Antlr rules after parsing an expression.

				 * Because of the way Antlr works, all these objects are default-constructed

				 * first, and then assigned when the rule is completed, so all these types

				 * have only default constructors - but setter functions to set them later.

				 */

				namespace alternator {

				namespace parsed {

				// "path" is an attribute's path in a document, e.g., a.b[3].c.

				class path {

				    // All paths have a "root", a top-level attribute, and any number of

				    // "dereference operators" - each either an index (e.g., "[2]") or a

				    // dot (e.g., ".xyz").

				    std::string _root;

				    std::vector<std::variant<std::string, unsigned>> _operators;

				public:

				    void set_root(std::string root) {

				        _root = std::move(root);

				    }

				    void add_index(unsigned i) {

				        _operators.emplace_back(i);

				    }

				    void add_dot(std::string(name)) {

				        _operators.emplace_back(std::move(name));

				    }

				    const std::string& root() const {

				        return _root;

				    }

				    bool has_operators() const {

				        return !_operators.empty();

				    }

				};

				// "value" is is a value used in the right hand side of an assignment

				// expression, "SET a = ...". It can be a reference to a value included in

				// the request (":val"), a path to an attribute from the existing item

				// (e.g., "a.b[3].c"), or a function of other such values.

				// Note that the real right-hand-side of an assignment is actually a bit

				// more general - it allows either a value, or a value+value or value-value -

				// see class set_rhs below.

				struct value {

				    struct function_call {

				        std::string _function_name;

				        std::vector<value> _parameters;

				    };

				    std::variant<std::string, path, function_call> _value;

				    void set_valref(std::string s) {

				        _value = std::move(s);

				    }

				    void set_path(path p) {

				        _value = std::move(p);

				    }

				    void set_func_name(std::string s) {

				        _value = function_call {std::move(s), {}};

				    }

				    void add_func_parameter(value v) {

				        std::get<function_call>(_value)._parameters.emplace_back(std::move(v));

				    }

				};

				// The right-hand-side of a SET in an update expression can be either a

				// single value (see above), or value+value, or value-value.

				class set_rhs {

				public:

				    char _op;  // '+', '-', or 'v''

				    value _v1;

				    value _v2;

				    void set_value(value&& v1) {

				        _op = 'v';

				        _v1 = std::move(v1);

				    }

				    void set_plus(value&& v2) {

				        _op = '+';

				        _v2 = std::move(v2);

				    }

				    void set_minus(value&& v2) {

				        _op = '-';

				        _v2 = std::move(v2);

				    }

				};

				class update_expression {

				public:

				    struct action {

				        path _path;

				        struct set {

				            set_rhs _rhs;

				        };

				        struct remove {

				        };

				        struct add {

				            std::string _valref;

				        };

				        struct del {

				            std::string _valref;

				        };

				        std::variant<set, remove, add, del> _action;

				        void assign_set(path p, set_rhs rhs) {

				            _path = std::move(p);

				            _action = set { std::move(rhs) };

				        }

				        void assign_remove(path p) {

				            _path = std::move(p);

				            _action = remove { };

				        }

				        void assign_add(path p, std::string v) {

				            _path = std::move(p);

				            _action = add { std::move(v) };

				        }

				        void assign_del(path p, std::string v) {

				            _path = std::move(p);

				            _action = del { std::move(v) };

				        }

				    };

				private:

				    std::vector<action> _actions;

				    bool seen_set = false;

				    bool seen_remove = false;

				    bool seen_add = false;

				    bool seen_del = false;

				public:

				    void add(action a);

				    void append(update_expression other);

				    bool empty() const {

				        return _actions.empty();

				    }

				    const std::vector<action>& actions() const {

				        return _actions;

				    }

				};

				} // namespace parsed

				} // namespace alternator

									
										172

alternator/rjson.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,172 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "rjson.hh"

				#include "error.hh"

				#include <seastar/core/print.hh>

				namespace rjson {

				static allocator the_allocator;

				std::string print(const rjson::value& value) {

				    string_buffer buffer;

				    writer writer(buffer);

				    value.Accept(writer);

				    return std::string(buffer.GetString());

				}

				rjson::value copy(const rjson::value& value) {

				    return rjson::value(value, the_allocator);

				}

				rjson::value parse(const std::string& str) {

				    return parse_raw(str.c_str(), str.size());

				}

				rjson::value parse_raw(const char* c_str, size_t size) {

				    rjson::document d;

				    d.Parse(c_str, size);

				    if (d.HasParseError()) {

				        throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));

				    }

				    rjson::value& v = d;

				    return std::move(v);

				}

				rjson::value& get(rjson::value& value, rjson::string_ref_type name) {

				    auto member_it = value.FindMember(name);

				    if (member_it != value.MemberEnd())

				        return member_it->value;

				    else {

				        throw rjson::error(format("JSON parameter {} not found", name));

				    }

				}

				const rjson::value& get(const rjson::value& value, rjson::string_ref_type name) {

				    auto member_it = value.FindMember(name);

				    if (member_it != value.MemberEnd())

				        return member_it->value;

				    else {

				        throw rjson::error(format("JSON parameter {} not found", name));

				    }

				}

				rjson::value from_string(const std::string& str) {

				    return rjson::value(str.c_str(), str.size(), the_allocator);

				}

				rjson::value from_string(const sstring& str) {

				    return rjson::value(str.c_str(), str.size(), the_allocator);

				}

				rjson::value from_string(const char* str, size_t size) {

				    return rjson::value(str, size, the_allocator);

				}

				const rjson::value* find(const rjson::value& value, string_ref_type name) {

				    auto member_it = value.FindMember(name);

				    return member_it != value.MemberEnd() ? &member_it->value : nullptr;

				}

				rjson::value* find(rjson::value& value, string_ref_type name) {

				    auto member_it = value.FindMember(name);

				    return member_it != value.MemberEnd() ? &member_it->value : nullptr;

				}

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member) {

				    base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), std::move(member), the_allocator);

				}

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member) {

				    base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), rjson::value(member), the_allocator);

				}

				void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member) {

				    base.AddMember(name, std::move(member), the_allocator);

				}

				void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member) {

				    base.AddMember(name, rjson::value(member), the_allocator);

				}

				void push_back(rjson::value& base_array, rjson::value&& item) {

				    base_array.PushBack(std::move(item), the_allocator);

				}

				bool single_value_comp::operator()(const rjson::value& r1, const rjson::value& r2) const {

				   auto r1_type = r1.GetType();

				   auto r2_type = r2.GetType();

				   // null is the smallest type and compares with every other type, nothing is lesser than null

				   if (r1_type == rjson::type::kNullType || r2_type == rjson::type::kNullType) {

				       return r1_type < r2_type;

				   }

				   // only null, true, and false are comparable with each other, other types are not compatible

				   if (r1_type != r2_type) {

				       if (r1_type > rjson::type::kTrueType || r2_type > rjson::type::kTrueType) {

				           throw rjson::error(format("Types are not comparable: {} {}", r1, r2));

				       }

				   }

				   switch (r1_type) {

				   case rjson::type::kNullType:

				       // fall-through

				   case rjson::type::kFalseType:

				       // fall-through

				   case rjson::type::kTrueType:

				       return r1_type < r2_type;

				   case rjson::type::kObjectType:

				       throw rjson::error("Object type comparison is not supported");

				   case rjson::type::kArrayType:

				       throw rjson::error("Array type comparison is not supported");

				   case rjson::type::kStringType: {

				       const size_t r1_len = r1.GetStringLength();

				       const size_t r2_len = r2.GetStringLength();

				       size_t len = std::min(r1_len, r2_len);

				       int result = std::strncmp(r1.GetString(), r2.GetString(), len);

				       return result < 0 || (result == 0 && r1_len < r2_len);

				   }

				   case rjson::type::kNumberType: {

				       if (r1.IsInt() && r2.IsInt()) {

				           return r1.GetInt() < r2.GetInt();

				       } else if (r1.IsUint() && r2.IsUint()) {

				           return r1.GetUint() < r2.GetUint();

				       } else if (r1.IsInt64() && r2.IsInt64()) {

				           return r1.GetInt64() < r2.GetInt64();

				       } else if (r1.IsUint64() && r2.IsUint64()) {

				           return r1.GetUint64() < r2.GetUint64();

				       } else {

				           // it's safe to call GetDouble() on any number type

				           return r1.GetDouble() < r2.GetDouble();

				       }

				   }

				   default:

				       return false;

				   }

				}

				} // end namespace rjson

				std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {

				    return os << rjson::print(v);

				}

									
										163

alternator/rjson.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,163 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				/*

				 * rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.

				 *

				 * rapidjson has strict copy elision policies, which, among other things, involves

				 * using provided char arrays without copying them and allows copying objects only explicitly.

				 * As such, one should be careful when passing strings with limited liveness

				 * (e.g. data underneath local std::strings) to rjson functions, because created JSON objects

				 * may end up relying on dangling char pointers. All rjson functions that create JSONs from strings

				 * by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live

				 * at least as long as the object, e.g. a static char array) and for std::strings. The more optimal

				 * variants should be used *only* if the liveness of the string is guaranteed, otherwise it will

				 * result in undefined behaviour.

				 * Also, bear in mind that methods exposed by rjson::value are generic, but some of them

				 * work fine only for specific types. In case the type does not match, an rjson::error will be thrown.

				 * Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type

				 * or calling Size() on a non-array value.

				 */

				#include <string>

				#include <stdexcept>

				namespace rjson {

				class error : public std::exception {

				    std::string _msg;

				public:

				    error() = default;

				    error(const std::string& msg) : _msg(msg) {}

				    virtual const char* what() const noexcept override { return _msg.c_str(); }

				};

				}

				// rapidjson configuration macros

				#define RAPIDJSON_HAS_STDSTRING 1

				// Default rjson policy is to use assert() - which is dangerous for two reasons:

				// 1. assert() can be turned off with -DNDEBUG

				// 2. assert() crashes a program

				// Fortunately, the default policy can be overridden, and so rapidjson errors will

				// throw an rjson::error exception instead.

				#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)

				#include <rapidjson/document.h>

				#include <rapidjson/writer.h>

				#include <rapidjson/stringbuffer.h>

				#include <rapidjson/error/en.h>

				#include <seastar/core/sstring.hh>

				#include "seastarx.hh"

				namespace rjson {

				using allocator = rapidjson::CrtAllocator;

				using encoding = rapidjson::UTF8<>;

				using document = rapidjson::GenericDocument<encoding, allocator>;

				using value = rapidjson::GenericValue<encoding, allocator>;

				using string_ref_type = value::StringRefType;

				using string_buffer = rapidjson::GenericStringBuffer<encoding>;

				using writer = rapidjson::Writer<string_buffer, encoding>;

				using type = rapidjson::Type;

				// Returns an object representing JSON's null

				inline rjson::value null_value() {

				    return rjson::value(rapidjson::kNullType);

				}

				// Returns an empty JSON object - {}

				inline rjson::value empty_object() {

				    return rjson::value(rapidjson::kObjectType);

				}

				// Returns an empty JSON array - []

				inline rjson::value empty_array() {

				    return rjson::value(rapidjson::kArrayType);

				}

				// Returns an empty JSON string - ""

				inline rjson::value empty_string() {

				    return rjson::value(rapidjson::kStringType);

				}

				// Convert the JSON value to a string with JSON syntax, the opposite of parse().

				// The representation is dense - without any redundant indentation.

				std::string print(const rjson::value& value);

				// Copies given JSON value - involves allocation

				rjson::value copy(const rjson::value& value);

				// Parses a JSON value from given string or raw character array.

				// The string/char array liveness does not need to be persisted,

				// as both parse() and parse_raw() will allocate member names and values.

				// Throws rjson::error if parsing failed.

				rjson::value parse(const std::string& str);

				rjson::value parse_raw(const char* c_str, size_t size);

				// Creates a JSON value (of JSON string type) out of internal string representations.

				// The string value is copied, so str's liveness does not need to be persisted.

				rjson::value from_string(const std::string& str);

				rjson::value from_string(const sstring& str);

				rjson::value from_string(const char* str, size_t size);

				// Returns a pointer to JSON member if it exists, nullptr otherwise

				rjson::value* find(rjson::value& value, rjson::string_ref_type name);

				const rjson::value* find(const rjson::value& value, rjson::string_ref_type name);

				// Returns a reference to JSON member if it exists, throws otherwise

				rjson::value& get(rjson::value& value, rjson::string_ref_type name);

				const rjson::value& get(const rjson::value& value, rjson::string_ref_type name);

				// Sets a member in given JSON object by moving the member - allocates the name.

				// Throws if base is not a JSON object.

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);

				// Sets a string member in given JSON object by assigning its reference - allocates the name.

				// NOTICE: member string liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);

				// Sets a member in given JSON object by moving the member.

				// NOTICE: name liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);

				// Sets a string member in given JSON object by assigning its reference.

				// NOTICE: name liveness must be ensured to be at least as long as base's.

				// NOTICE: member liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);

				// Adds a value to a JSON list by moving the item to its end.

				// Throws if base_array is not a JSON array.

				void push_back(rjson::value& base_array, rjson::value&& item);

				struct single_value_comp {

				    bool operator()(const rjson::value& r1, const rjson::value& r2) const;

				};

				} // end namespace rjson

				namespace std {

				std::ostream& operator<<(std::ostream& os, const rjson::value& v);

				}

									
										261

alternator/serialization.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,261 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "base64.hh"

				#include "log.hh"

				#include "serialization.hh"

				#include "error.hh"

				#include "rapidjson/writer.h"

				#include "concrete_types.hh"

				#include "cql3/type_json.hh"

				static logging::logger slogger("alternator-serialization");

				namespace alternator {

				type_info type_info_from_string(std::string type) {

				    static thread_local const std::unordered_map<std::string, type_info> type_infos = {

				        {"S", {alternator_type::S, utf8_type}},

				        {"B", {alternator_type::B, bytes_type}},

				        {"BOOL", {alternator_type::BOOL, boolean_type}},

				        {"N", {alternator_type::N, decimal_type}}, //FIXME: Replace with custom Alternator type when implemented

				    };

				    auto it = type_infos.find(type);

				    if (it == type_infos.end()) {

				        return {alternator_type::NOT_SUPPORTED_YET, utf8_type};

				    }

				    return it->second;

				}

				type_representation represent_type(alternator_type atype) {

				    static thread_local const std::unordered_map<alternator_type, type_representation> type_representations = {

				        {alternator_type::S, {"S", utf8_type}},

				        {alternator_type::B, {"B", bytes_type}},

				        {alternator_type::BOOL, {"BOOL", boolean_type}},

				        {alternator_type::N, {"N", decimal_type}}, //FIXME: Replace with custom Alternator type when implemented

				    };

				    auto it = type_representations.find(atype);

				    if (it == type_representations.end()) {

				        throw std::runtime_error(format("Unknown alternator type {}", int8_t(atype)));

				    }

				    return it->second;

				}

				struct from_json_visitor {

				    const rjson::value& v;

				    bytes_ostream& bo;

				    void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };

				    void operator()(const string_type_impl& t) {

				        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));

				    }

				    void operator()(const bytes_type_impl& t) const {

				        bo.write(base64_decode(v));

				    }

				    void operator()(const boolean_type_impl& t) const {

				        bo.write(boolean_type->decompose(v.GetBool()));

				    }

				    void operator()(const decimal_type_impl& t) const {

				        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));

				    }

				    // default

				    void operator()(const abstract_type& t) const {

				        bo.write(from_json_object(t, Json::Value(rjson::print(v)), cql_serialization_format::internal()));

				    }

				};

				bytes serialize_item(const rjson::value& item) {

				    if (item.IsNull() || item.MemberCount() != 1) {

				        throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));

				    }

				    auto it = item.MemberBegin();

				    type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings

				    if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {

				        slogger.trace("Non-optimal serialization of type {}", it->name.GetString());

				        return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));

				    }

				    bytes_ostream bo;

				    bo.write(bytes{int8_t(type_info.atype)});

				    visit(*type_info.dtype, from_json_visitor{it->value, bo});

				    return bytes(bo.linearize());

				}

				struct to_json_visitor {

				    rjson::value& deserialized;

				    const std::string& type_ident;

				    bytes_view bv;

				    void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };

				    void operator()(const decimal_type_impl& t) const {

				        auto s = to_json_string(*decimal_type, bytes(bv));

				        //FIXME(sarna): unnecessary copy

				        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));

				    }

				    void operator()(const string_type_impl& t) {

				        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(reinterpret_cast<const char *>(bv.data()), bv.size()));

				    }

				    void operator()(const bytes_type_impl& t) const {

				        std::string b64 = base64_encode(bv);

				        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(b64));

				    }

				    // default

				    void operator()(const abstract_type& t) const {

				        rjson::set_with_string_name(deserialized, type_ident, rjson::parse(t.to_string(bytes(bv))));

				    }

				};

				rjson::value deserialize_item(bytes_view bv) {

				    rjson::value deserialized(rapidjson::kObjectType);

				    if (bv.empty()) {

				        throw api_error("ValidationException", "Serialized value empty");

				    }

				    alternator_type atype = alternator_type(bv[0]);

				    bv.remove_prefix(1);

				    if (atype == alternator_type::NOT_SUPPORTED_YET) {

				        slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));

				        return rjson::parse_raw(reinterpret_cast<const char *>(bv.data()), bv.size());

				    }

				    type_representation type_representation = represent_type(atype);

				    visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});

				    return deserialized;

				}

				std::string type_to_string(data_type type) {

				    static thread_local std::unordered_map<data_type, std::string> types = {

				        {utf8_type, "S"},

				        {bytes_type, "B"},

				        {boolean_type, "BOOL"},

				        {decimal_type, "N"}, // FIXME: use a specialized Alternator number type instead of the general decimal_type

				    };

				    auto it = types.find(type);

				    if (it == types.end()) {

				        throw std::runtime_error(format("Unknown type {}", type->name()));

				    }

				    return it->second;

				}

				bytes get_key_column_value(const rjson::value& item, const column_definition& column) {

				    std::string column_name = column.name_as_text();

				    std::string expected_type = type_to_string(column.type);

				    const rjson::value& key_typed_value = rjson::get(item, rjson::value::StringRefType(column_name.c_str()));

				    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {

				        throw api_error("ValidationException",

				                format("Missing or invalid value object for key column {}: {}", column_name, item));

				    }

				    return get_key_from_typed_value(key_typed_value, column, expected_type);

				}

				bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type) {

				    auto it = key_typed_value.MemberBegin();

				    if (it->name.GetString() != expected_type) {

				        throw api_error("ValidationException",

				                format("Type mismatch: expected type {} for key column {}, got type {}",

				                        expected_type, column.name_as_text(), it->name.GetString()));

				    }

				    if (column.type == bytes_type) {

				        return base64_decode(it->value);

				    } else {

				        return column.type->from_string(it->value.GetString());

				    }

				}

				rjson::value json_key_column_value(bytes_view cell, const column_definition& column) {

				    if (column.type == bytes_type) {

				        std::string b64 = base64_encode(cell);

				        return rjson::from_string(b64);

				    } if (column.type == utf8_type) {

				        return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));

				    } else if (column.type == decimal_type) {

				        // FIXME: use specialized Alternator number type, not the more

				        // general "decimal_type". A dedicated type can be more efficient

				        // in storage space and in parsing speed.

				        auto s = to_json_string(*decimal_type, bytes(cell));

				        return rjson::from_string(s);

				    } else {

				        // We shouldn't get here, we shouldn't see such key columns.

				        throw std::runtime_error(format("Unexpected key type: {}", column.type->name()));

				    }

				}

				partition_key pk_from_json(const rjson::value& item, schema_ptr schema) {

				    std::vector<bytes> raw_pk;

				    // FIXME: this is a loop, but we really allow only one partition key column.

				    for (const column_definition& cdef : schema->partition_key_columns()) {

				        bytes raw_value = get_key_column_value(item, cdef);

				        raw_pk.push_back(std::move(raw_value));

				    }

				   return partition_key::from_exploded(raw_pk);

				}

				clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {

				    if (schema->clustering_key_size() == 0) {

				        return clustering_key::make_empty();

				    }

				    std::vector<bytes> raw_ck;

				    // FIXME: this is a loop, but we really allow only one clustering key column.

				    for (const column_definition& cdef : schema->clustering_key_columns()) {

				        bytes raw_value = get_key_column_value(item,  cdef);

				        raw_ck.push_back(std::move(raw_value));

				    }

				    return clustering_key::from_exploded(raw_ck);

				}

				big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        throw api_error("ValidationException", format("{}: invalid number object", diagnostic));

				    }

				    auto it = v.MemberBegin();

				    if (it->name != "N") {

				        throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));

				    }

				    if (it->value.IsNumber()) {

				         // FIXME(sarna): should use big_decimal constructor with numeric values directly:

				        return big_decimal(rjson::print(it->value));

				    }

				    if (!it->value.IsString()) {

				        throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));

				    }

				    return big_decimal(it->value.GetString());

				}

				const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        return {"", nullptr};

				    }

				    auto it = v.MemberBegin();

				    const std::string it_key = it->name.GetString();

				    if (it_key != "SS" && it_key != "BS" && it_key != "NS") {

				        return {"", nullptr};

				    }

				    return std::make_pair(it_key, &(it->value));

				}

				}

									
										72

alternator/serialization.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,72 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <string>

				#include <string_view>

				#include "types.hh"

				#include "schema.hh"

				#include "keys.hh"

				#include "rjson.hh"

				#include "utils/big_decimal.hh"

				namespace alternator {

				enum class alternator_type : int8_t {

				    S, B, BOOL, N, NOT_SUPPORTED_YET

				};

				struct type_info {

				    alternator_type atype;

				    data_type dtype;

				};

				struct type_representation {

				    std::string ident;

				    data_type dtype;

				};

				type_info type_info_from_string(std::string type);

				type_representation represent_type(alternator_type atype);

				bytes serialize_item(const rjson::value& item);

				rjson::value deserialize_item(bytes_view bv);

				std::string type_to_string(data_type type);

				bytes get_key_column_value(const rjson::value& item, const column_definition& column);

				bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type);

				rjson::value json_key_column_value(bytes_view cell, const column_definition& column);

				partition_key pk_from_json(const rjson::value& item, schema_ptr schema);

				clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);

				// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it.  Otherwise,

				// raises ValidationException with diagnostic.

				big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);

				// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"

				// and returns set's type and a pointer to that set. If the object does not encode a set,

				// returned value is {"", nullptr}

				const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);

				}

									
										314

alternator/server.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,314 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "alternator/server.hh"

				#include "log.hh"

				#include <seastar/http/function_handlers.hh>

				#include <seastar/json/json_elements.hh>

				#include <seastarx.hh>

				#include "error.hh"

				#include "rjson.hh"

				#include "auth.hh"

				#include <cctype>

				#include "cql3/query_processor.hh"

				static logging::logger slogger("alternator-server");

				using namespace httpd;

				namespace alternator {

				static constexpr auto TARGET = "X-Amz-Target";

				inline std::vector<std::string_view> split(std::string_view text, char separator) {

				    std::vector<std::string_view> tokens;

				    if (text == "") {

				        return tokens;

				    }

				    while (true) {

				        auto pos = text.find_first_of(separator);

				        if (pos != std::string_view::npos) {

				            tokens.emplace_back(text.data(), pos);

				            text.remove_prefix(pos + 1);

				        } else {

				            tokens.emplace_back(text);

				            break;

				        }

				    }

				    return tokens;

				}

				// DynamoDB HTTP error responses are structured as follows

				// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html

				// Our handlers throw an exception to report an error. If the exception

				// is of type alternator::api_error, it unwrapped and properly reported to

				// the user directly. Other exceptions are unexpected, and reported as

				// Internal Server Error.

				class api_handler : public handler_base {

				public:

				    api_handler(const future_json_function& _handle) : _f_handle(

				         [_handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {

				         return seastar::futurize_apply(_handle, std::move(req)).then_wrapped([rep = std::move(rep)](future<json::json_return_type> resf) mutable {

				             if (resf.failed()) {

				                 // Exceptions of type api_error are wrapped as JSON and

				                 // returned to the client as expected. Other types of

				                 // exceptions are unexpected, and returned to the user

				                 // as an internal server error:

				                 api_error ret;

				                 try {

				                     resf.get();

				                 } catch (api_error &ae) {

				                     ret = ae;

				                 } catch (rjson::error & re) {

				                     ret = api_error("ValidationException", re.what());

				                 } catch (...) {

				                     ret = api_error(

				                             "Internal Server Error",

				                             format("Internal server error: {}", std::current_exception()),

				                             reply::status_type::internal_server_error);

				                 }

				                 // FIXME: what is this version number?

				                 rep->_content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + ret._type + "\"," +

				                         "\"message\":\"" + ret._msg + "\"}";

				                 rep->_status = ret._http_code;

				                 slogger.trace("api_handler error case: {}", rep->_content);

				                 return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				             }

				             slogger.trace("api_handler success case");

				             auto res = resf.get0();

				             if (res._body_writer) {

				                 rep->write_body("json", std::move(res._body_writer));

				             } else {

				                 rep->_content += res._res;

				             }

				             return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				         });

				    }), _type("json") { }

				    api_handler(const api_handler&) = default;

				    future<std::unique_ptr<reply>> handle(const sstring& path,

				            std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        return _f_handle(std::move(req), std::move(rep)).then(

				                [this](std::unique_ptr<reply> rep) {

				                    rep->done(_type);

				                    return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				                });

				    }

				protected:

				    future_handler_function _f_handle;

				    sstring _type;

				};

				class health_handler : public handler_base {

				    virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        rep->set_status(reply::status_type::ok);

				        rep->write_body("txt", format("healthy: {}", req->get_header("Host")));

				        return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				    }

				};

				future<> server::verify_signature(const request& req) {

				    if (!_enforce_authorization) {

				        slogger.debug("Skipping authorization");

				        return make_ready_future<>();

				    }

				    auto host_it = req._headers.find("Host");

				    if (host_it == req._headers.end()) {

				        throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");

				    }

				    auto authorization_it = req._headers.find("Authorization");

				    if (host_it == req._headers.end()) {

				        throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");

				    }

				    std::string host = host_it->second;

				    std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');

				    std::string credential;

				    std::string user_signature;

				    std::string signed_headers_str;

				    std::vector<std::string_view> signed_headers;

				    for (std::string_view entry : credentials_raw) {

				        std::vector<std::string_view> entry_split = split(entry, '=');

				        if (entry_split.size() != 2) {

				            if (entry != "AWS4-HMAC-SHA256") {

				                throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));

				            }

				            continue;

				        }

				        std::string_view auth_value = entry_split[1];

				        // Commas appear as an additional (quite redundant) delimiter

				        if (auth_value.back() == ',') {

				            auth_value.remove_suffix(1);

				        }

				        if (entry_split[0] == "Credential") {

				            credential = std::string(auth_value);

				        } else if (entry_split[0] == "Signature") {

				            user_signature = std::string(auth_value);

				        } else if (entry_split[0] == "SignedHeaders") {

				            signed_headers_str = std::string(auth_value);

				            signed_headers = split(auth_value, ';');

				            std::sort(signed_headers.begin(), signed_headers.end());

				        }

				    }

				    std::vector<std::string_view> credential_split = split(credential, '/');

				    if (credential_split.size() != 5) {

				        throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));

				    }

				    std::string user(credential_split[0]);

				    std::string datestamp(credential_split[1]);

				    std::string region(credential_split[2]);

				    std::string service(credential_split[3]);

				    std::map<std::string_view, std::string_view> signed_headers_map;

				    for (const auto& header : signed_headers) {

				        signed_headers_map.emplace(header, std::string_view());

				    }

				    for (auto& header : req._headers) {

				        std::string header_str;

				        header_str.resize(header.first.size());

				        std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);

				        auto it = signed_headers_map.find(header_str);

				        if (it != signed_headers_map.end()) {

				            it->second = std::string_view(header.second);

				        }

				    }

				    auto cache_getter = [] (std::string username) {

				        return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));

				    };

				    return _key_cache.get_ptr(user, cache_getter).then([this, &req,

				                                                    user = std::move(user),

				                                                    host = std::move(host),

				                                                    datestamp = std::move(datestamp),

				                                                    signed_headers_str = std::move(signed_headers_str),

				                                                    signed_headers_map = std::move(signed_headers_map),

				                                                    region = std::move(region),

				                                                    service = std::move(service),

				                                                    user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {

				        std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,

				                datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");

				        if (signature != std::string_view(user_signature)) {

				            _key_cache.remove(user);

				            throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");

				        }

				    });

				}

				future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {

				    _executor.local()._stats.total_operations++;

				    sstring target = req->get_header(TARGET);

				    std::vector<std::string_view> split_target = split(target, '.');

				    //NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)

				    std::string op = split_target.empty() ? std::string() : std::string(split_target.back());

				    slogger.trace("Request: {} {}", op, req->content);

				    return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {

				        auto callback_it = _callbacks.find(op);

				        if (callback_it == _callbacks.end()) {

				            _executor.local()._stats.unsupported_operations++;

				            throw api_error("UnknownOperationException",

				                    format("Unsupported operation {}", op));

				        }

				        //FIXME: Client state can provide more context, e.g. client's endpoint address

				        // We use unique_ptr because client_state cannot be moved or copied

				        return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {

				            client_state->set_raw_keyspace(executor::KEYSPACE_NAME);

				            tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);

				            tracing::trace(trace_state, op);

				            return callback_it->second(_executor.local(), *client_state, trace_state, std::move(req)).finally([trace_state] {});

				        });

				    });

				}

				void server::set_routes(routes& r) {

				    api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {

				        return handle_api_request(std::move(req));

				    });

				    r.add(operation_type::POST, url("/"), req_handler);

				    r.add(operation_type::GET, url("/"), new health_handler);

				}

				//FIXME: A way to immediately invalidate the cache should be considered,

				// e.g. when the system table which stores the keys is changed.

				// For now, this propagation may take up to 1 minute.

				server::server(seastar::sharded<executor>& e)

				        : _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)

				      , _callbacks{

				        {"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) {

				            return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req), trace_state = std::move(trace_state)] () mutable { return e.create_table(client_state, std::move(trace_state), req->content); }); }

				        },

				        {"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_table(client_state, std::move(trace_state), req->content); }},

				        {"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_table(client_state, std::move(trace_state), req->content); }},

				        {"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.put_item(client_state, std::move(trace_state), req->content); }},

				        {"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.update_item(client_state, std::move(trace_state), req->content); }},

				        {"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.get_item(client_state, std::move(trace_state), req->content); }},

				        {"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_item(client_state, std::move(trace_state), req->content); }},

				        {"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},

				        {"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.scan(client_state, std::move(trace_state), req->content); }},

				        {"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},

				        {"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, std::move(trace_state), req->content); }},

				        {"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, std::move(trace_state), req->content); }},

				        {"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.query(client_state, std::move(trace_state), req->content); }},

				    } {

				}

				future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization) {

				    _enforce_authorization = enforce_authorization;

				    if (!port && !https_port) {

				        return make_exception_future<>(std::runtime_error("Either regular port or TLS port"

				                " must be specified in order to init an alternator HTTP server instance"));

				    }

				    return seastar::async([this, addr, port, https_port, creds] {

				        try {

				            _executor.invoke_on_all([] (executor& e) {

				                return e.start();

				            }).get();

				            if (port) {

				                _control.start().get();

				                _control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();

				                _control.listen(socket_address{addr, *port}).get();

				                slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);

				            }

				            if (https_port) {

				                _https_control.start().get();

				                _https_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();

				                _https_control.server().invoke_on_all([creds] (http_server& serv) {

				                    return serv.set_tls_credentials(creds->build_server_credentials());

				                }).get();

				                _https_control.listen(socket_address{addr, *https_port}).get();

				                slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);

				            }

				        } catch (...) {

				            slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",

				                    addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());

				            std::throw_with_nested(std::runtime_error(

				                    format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",

				                            addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));

				        }

				    });

				}

				}

									
										54

alternator/server.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,54 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include "alternator/executor.hh"

				#include <seastar/core/future.hh>

				#include <seastar/http/httpd.hh>

				#include <seastar/net/tls.hh>

				#include <optional>

				#include <alternator/auth.hh>

				namespace alternator {

				class server {

				    using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, tracing::trace_state_ptr, std::unique_ptr<request>)>;

				    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;

				    seastar::httpd::http_server_control _control;

				    seastar::httpd::http_server_control _https_control;

				    seastar::sharded<executor>& _executor;

				    key_cache _key_cache;

				    bool _enforce_authorization;

				    alternator_callbacks_map _callbacks;

				public:

				    server(seastar::sharded<executor>& executor);

				    seastar::future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization);

				private:

				    void set_routes(seastar::httpd::routes& r);

				    future<> verify_signature(const seastar::httpd::request& r);

				    future<json::json_return_type> handle_api_request(std::unique_ptr<request>&& req);

				};

				}

									
										98

alternator/stats.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,98 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "stats.hh"

				#include <seastar/core/metrics.hh>

				namespace alternator {

				const char* ALTERNATOR_METRICS = "alternator";

				stats::stats() : api_operations{} {

				    // Register the

				    seastar::metrics::label op("op");

				    _metrics.add_group("alternator", {

				#define OPERATION(name, CamelCaseName) \

				                seastar::metrics::make_total_operations("operation", api_operations.name, \

				                        seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),

				#define OPERATION_LATENCY(name, CamelCaseName) \

				                seastar::metrics::make_histogram("op_latency", \

				                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),

				            OPERATION(batch_write_item, "BatchWriteItem")

				            OPERATION(create_backup, "CreateBackup")

				            OPERATION(create_global_table, "CreateGlobalTable")

				            OPERATION(create_table, "CreateTable")

				            OPERATION(delete_backup, "DeleteBackup")

				            OPERATION(delete_item, "DeleteItem")

				            OPERATION(delete_table, "DeleteTable")

				            OPERATION(describe_backup, "DescribeBackup")

				            OPERATION(describe_continuous_backups, "DescribeContinuousBackups")

				            OPERATION(describe_endpoints, "DescribeEndpoints")

				            OPERATION(describe_global_table, "DescribeGlobalTable")

				            OPERATION(describe_global_table_settings, "DescribeGlobalTableSettings")

				            OPERATION(describe_limits, "DescribeLimits")

				            OPERATION(describe_table, "DescribeTable")

				            OPERATION(describe_time_to_live, "DescribeTimeToLive")

				            OPERATION(get_item, "GetItem")

				            OPERATION(list_backups, "ListBackups")

				            OPERATION(list_global_tables, "ListGlobalTables")

				            OPERATION(list_tables, "ListTables")

				            OPERATION(list_tags_of_resource, "ListTagsOfResource")

				            OPERATION(put_item, "PutItem")

				            OPERATION(query, "Query")

				            OPERATION(restore_table_from_backup, "RestoreTableFromBackup")

				            OPERATION(restore_table_to_point_in_time, "RestoreTableToPointInTime")

				            OPERATION(scan, "Scan")

				            OPERATION(tag_resource, "TagResource")

				            OPERATION(transact_get_items, "TransactGetItems")

				            OPERATION(transact_write_items, "TransactWriteItems")

				            OPERATION(untag_resource, "UntagResource")

				            OPERATION(update_continuous_backups, "UpdateContinuousBackups")

				            OPERATION(update_global_table, "UpdateGlobalTable")

				            OPERATION(update_global_table_settings, "UpdateGlobalTableSettings")

				            OPERATION(update_item, "UpdateItem")

				            OPERATION(update_table, "UpdateTable")

				            OPERATION(update_time_to_live, "UpdateTimeToLive")

				            OPERATION_LATENCY(put_item_latency, "PutItem")

				            OPERATION_LATENCY(get_item_latency, "GetItem")

				            OPERATION_LATENCY(delete_item_latency, "DeleteItem")

				            OPERATION_LATENCY(update_item_latency, "UpdateItem")

				    });

				    _metrics.add_group("alternator", {

				            seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,

				                    seastar::metrics::description("number of unsupported operations via Alternator API")),

				            seastar::metrics::make_total_operations("total_operations", total_operations,

				                    seastar::metrics::description("number of total operations via Alternator API")),

				            seastar::metrics::make_total_operations("reads_before_write", reads_before_write,

				                    seastar::metrics::description("number of performed read-before-write operations")),

				            seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,

				                    seastar::metrics::description("number of rows read during filtering operations")),

				            seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,

				                    seastar::metrics::description("number of rows read and matched during filtering operations")),

				            seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },

				                    seastar::metrics::description("number of rows read and dropped during filtering operations")),

				    });

				}

				}

									
										95

alternator/stats.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,95 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <cstdint>

				#include <seastar/core/metrics_registration.hh>

				#include "seastarx.hh"

				#include "utils/estimated_histogram.hh"

				#include "cql3/stats.hh"

				namespace alternator {

				// Object holding per-shard statistics related to Alternator.

				// While this object is alive, these metrics are also registered to be

				// visible by the metrics REST API, with the "alternator" prefix.

				class stats {

				public:

				    stats();

				    // Count of DynamoDB API operations by types

				    struct {

				        uint64_t batch_get_item = 0;

				        uint64_t batch_write_item = 0;

				        uint64_t create_backup = 0;

				        uint64_t create_global_table = 0;

				        uint64_t create_table = 0;

				        uint64_t delete_backup = 0;

				        uint64_t delete_item = 0;

				        uint64_t delete_table = 0;

				        uint64_t describe_backup = 0;

				        uint64_t describe_continuous_backups = 0;

				        uint64_t describe_endpoints = 0;

				        uint64_t describe_global_table = 0;

				        uint64_t describe_global_table_settings = 0;

				        uint64_t describe_limits = 0;

				        uint64_t describe_table = 0;

				        uint64_t describe_time_to_live = 0;

				        uint64_t get_item = 0;

				        uint64_t list_backups = 0;

				        uint64_t list_global_tables = 0;

				        uint64_t list_tables = 0;

				        uint64_t list_tags_of_resource = 0;

				        uint64_t put_item = 0;

				        uint64_t query = 0;

				        uint64_t restore_table_from_backup = 0;

				        uint64_t restore_table_to_point_in_time = 0;

				        uint64_t scan = 0;

				        uint64_t tag_resource = 0;

				        uint64_t transact_get_items = 0;

				        uint64_t transact_write_items = 0;

				        uint64_t untag_resource = 0;

				        uint64_t update_continuous_backups = 0;

				        uint64_t update_global_table = 0;

				        uint64_t update_global_table_settings = 0;

				        uint64_t update_item = 0;

				        uint64_t update_table = 0;

				        uint64_t update_time_to_live = 0;

				        utils::estimated_histogram put_item_latency;

				        utils::estimated_histogram get_item_latency;

				        utils::estimated_histogram delete_item_latency;

				        utils::estimated_histogram update_item_latency;

				    } api_operations;

				    // Miscellaneous event counters

				    uint64_t total_operations = 0;

				    uint64_t unsupported_operations = 0;

				    uint64_t reads_before_write = 0;

				    // CQL-derived stats

				    cql3::cql_stats cql_stats;

				private:

				    // The metric_groups object holds this stat object's metrics registered

				    // as long as the stats object is alive.

				    seastar::metrics::metric_groups _metrics;

				};

				}

									
										30

api/api-doc/cache_service.json
									
												View File
												
				@@ -13,7 +13,7 @@

				            {

				               "method":"GET",

				               "summary":"get row cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -35,7 +35,7 @@

				                     "description":"row cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -48,7 +48,7 @@

				            {

				               "method":"GET",

				               "summary":"get key cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_key_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -70,7 +70,7 @@

				                     "description":"key cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -83,7 +83,7 @@

				            {

				               "method":"GET",

				               "summary":"get counter cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_counter_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -105,7 +105,7 @@

				                     "description":"counter cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -118,7 +118,7 @@

				            {

				               "method":"GET",

				               "summary":"get row cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -140,7 +140,7 @@

				                     "description":"row cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -153,7 +153,7 @@

				            {

				               "method":"GET",

				               "summary":"get key cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_key_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -175,7 +175,7 @@

				                     "description":"key cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -188,7 +188,7 @@

				            {

				               "method":"GET",

				               "summary":"get counter cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_counter_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -210,7 +210,7 @@

				                     "description":"counter cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -448,7 +448,7 @@

				        {

				          "method": "GET",

				          "summary": "Get key entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_key_entries",

				          "produces": [

				            "application/json"

				@@ -568,7 +568,7 @@

				        {

				          "method": "GET",

				          "summary": "Get row entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_row_entries",

				          "produces": [

				            "application/json"

				@@ -688,7 +688,7 @@

				        {

				          "method": "GET",

				          "summary": "Get counter entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_counter_entries",

				          "produces": [

				            "application/json"

									
										154

api/api-doc/column_family.json
									
												View File
												
				@@ -121,7 +121,7 @@

				                     "description":"The minimum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -172,7 +172,7 @@

				                     "description":"The maximum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -223,7 +223,7 @@

				                     "description":"The maximum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  },

				                  {

				@@ -231,7 +231,7 @@

				                     "description":"The minimum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -544,7 +544,7 @@

				               "summary":"sstable count for each level. empty unless leveled compaction is used",

				               "type":"array",

				               "items":{

				                  "type":"int"

				                  "type": "long"

				               },

				               "nickname":"get_sstable_count_per_level",

				               "produces":[

				@@ -611,6 +611,54 @@

				            }

				         ]

				      },

				      {

				         "path":"/column_family/toppartitions/{name}",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Toppartitions query",

				               "type":"toppartitions_query_results",

				               "nickname":"toppartitions",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"name",

				                     "description":"The column family name in keyspace:name format",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"duration",

				                     "description":"Duration (in milliseconds) of monitoring operation",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type": "long",

				                     "paramType":"query"

				                  },

				                  {

				                    "name":"list_size",

				                    "description":"number of the top partitions to list",

				                    "required":false,

				                    "allowMultiple":false,

				                    "type": "long",

				                    "paramType":"query"

				                 },

				                 {

				                    "name":"capacity",

				                    "description":"capacity of stream summary: determines amount of resources used in query processing",

				                    "required":false,

				                    "allowMultiple":false,

				                    "type": "long",

				                    "paramType":"query"

				                 }

				              ]

				            }

				         ]

				      },

				      {

				         "path":"/column_family/metrics/memtable_columns_count/",

				         "operations":[

				@@ -873,7 +921,7 @@

				            {

				               "method":"GET",

				               "summary":"Get memtable switch count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_memtable_switch_count",

				               "produces":[

				                  "application/json"

				@@ -897,7 +945,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all memtable switch count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_memtable_switch_count",

				               "produces":[

				                  "application/json"

				@@ -1034,7 +1082,7 @@

				            {

				               "method":"GET",

				               "summary":"Get read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1187,7 +1235,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1203,7 +1251,7 @@

				            {

				               "method":"GET",

				               "summary":"Get range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_range_latency",

				               "produces":[

				                  "application/json"

				@@ -1227,7 +1275,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_range_latency",

				               "produces":[

				                  "application/json"

				@@ -1243,7 +1291,7 @@

				            {

				               "method":"GET",

				               "summary":"Get write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1396,7 +1444,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1412,7 +1460,7 @@

				            {

				               "method":"GET",

				               "summary":"Get pending flushes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_pending_flushes",

				               "produces":[

				                  "application/json"

				@@ -1436,7 +1484,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all pending flushes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_pending_flushes",

				               "produces":[

				                  "application/json"

				@@ -1452,7 +1500,7 @@

				            {

				               "method":"GET",

				               "summary":"Get pending compactions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_pending_compactions",

				               "produces":[

				                  "application/json"

				@@ -1476,7 +1524,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all pending compactions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_pending_compactions",

				               "produces":[

				                  "application/json"

				@@ -1492,7 +1540,7 @@

				            {

				               "method":"GET",

				               "summary":"Get live ss table count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_live_ss_table_count",

				               "produces":[

				                  "application/json"

				@@ -1516,7 +1564,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all live ss table count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_live_ss_table_count",

				               "produces":[

				                  "application/json"

				@@ -1532,7 +1580,7 @@

				            {

				               "method":"GET",

				               "summary":"Get live disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_live_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1556,7 +1604,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all live disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_live_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1572,7 +1620,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1596,7 +1644,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -2052,7 +2100,7 @@

				            {

				               "method":"GET",

				               "summary":"Get speculative retries",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_speculative_retries",

				               "produces":[

				                  "application/json"

				@@ -2076,7 +2124,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all speculative retries",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_speculative_retries",

				               "produces":[

				                  "application/json"

				@@ -2156,7 +2204,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache hit out of range",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_hit_out_of_range",

				               "produces":[

				                  "application/json"

				@@ -2180,7 +2228,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache hit out of range",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_hit_out_of_range",

				               "produces":[

				                  "application/json"

				@@ -2196,7 +2244,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache hit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_hit",

				               "produces":[

				                  "application/json"

				@@ -2220,7 +2268,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache hit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_hit",

				               "produces":[

				                  "application/json"

				@@ -2236,7 +2284,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache miss",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_miss",

				               "produces":[

				                  "application/json"

				@@ -2260,7 +2308,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache miss",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_miss",

				               "produces":[

				                  "application/json"

				@@ -2276,7 +2324,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas prepare",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_prepare",

				               "produces":[

				                  "application/json"

				@@ -2300,7 +2348,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas propose",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_propose",

				               "produces":[

				                  "application/json"

				@@ -2324,7 +2372,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas commit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_commit",

				               "produces":[

				                  "application/json"

				@@ -2816,6 +2864,44 @@

				               "description":"The column family type"

				            }

				         }

				      },

				      "toppartitions_record":{

				         "id":"toppartitions_record",

				         "description":"nodetool toppartitions query record",

				         "properties":{

				            "partition":{

				               "type":"string",

				               "description":"Partition key"

				            },

				            "count":{

				               "type":"long",

				               "description":"Number of read/write operations"

				            },

				            "error":{

				               "type":"long",

				               "description":"Indication of inaccuracy in counting PKs"

				            }

				         }

				      },

				      "toppartitions_query_results":{

				         "id":"toppartitions_query_results",

				         "description":"nodetool toppartitions query results",

				         "properties":{

				            "read":{

				               "type":"array",

				               "items":{

				                  "type":"toppartitions_record"

				               },

				               "description":"Read results"

				            },

				            "write":{

				               "type":"array",

				               "items":{

				                  "type":"toppartitions_record"

				               },

				               "description":"Write results"

				            }

				         }

				      }

				   }

				}

									
										41

api/api-doc/compaction_manager.json
									
												View File
												
				@@ -118,7 +118,7 @@

				        {

				          "method": "GET",

				          "summary": "Get pending tasks",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_pending_tasks",

				          "produces": [

				            "application/json"

				@@ -127,6 +127,24 @@

				        }

				      ]

				    },

				    {

				      "path": "/compaction_manager/metrics/pending_tasks_by_table",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get pending tasks by table name",

				          "type": "array",

				          "items": {

				              "type": "pending_compaction"

				           },

				          "nickname": "get_pending_tasks_by_table",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/compaction_manager/metrics/completed_tasks",

				      "operations": [

				@@ -163,7 +181,7 @@

				        {

				          "method": "GET",

				          "summary": "Get bytes compacted",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_bytes_compacted",

				          "produces": [

				            "application/json"

				@@ -179,7 +197,7 @@

				         "description":"A row merged information",

				         "properties":{

				            "key":{

				               "type":"int",

				               "type": "long",

				               "description":"The number of sstable"

				            },

				            "value":{

				@@ -244,6 +262,23 @@

				            }

				         }

				      },

				      "pending_compaction": {

				        "id": "pending_compaction",

				        "properties": {

				            "cf": {

				               "type": "string",

				               "description": "The column family name"

				            },

				            "ks": {

				               "type":"string",

				               "description": "The keyspace name"

				            },

				            "task": {

				               "type":"long",

				               "description": "The number of pending tasks"

				            }

				        }

				      },

				      "history": {

				      "id":"history",

				      "description":"Compaction history information",

									
										12

api/api-doc/failure_detector.json
									
												View File
												
				@@ -110,7 +110,7 @@

				            {

				               "method":"GET",

				               "summary":"Get count down endpoint",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_down_endpoint_count",

				               "produces":[

				                  "application/json"

				@@ -126,7 +126,7 @@

				            {

				               "method":"GET",

				               "summary":"Get count up endpoint",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_up_endpoint_count",

				               "produces":[

				                  "application/json"

				@@ -180,11 +180,11 @@

				                    "description": "The endpoint address"

				                },

				                "generation": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The heart beat generation"

				                },

				                "version": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The heart beat version"

				                },

				                "update_time": {

				@@ -209,7 +209,7 @@

				           "description": "Holds a version value for an application state",

				               "properties": {

				                "application_state": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The application state enum index"

				                },

				                "value": {

				@@ -217,7 +217,7 @@

				                    "description": "The version value"

				                },

				                "version": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The application state version"

				                }

				            }

									
										4

api/api-doc/gossiper.json
									
												View File
												
				@@ -75,7 +75,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_generation_number",

				               "produces":[

				                  "application/json"

				@@ -99,7 +99,7 @@

				            {

				               "method":"GET",

				               "summary":"Get heart beat version for a node",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_heart_beat_version",

				               "produces":[

				                  "application/json"

									
										4

api/api-doc/hinted_handoff.json
									
												View File
												
				@@ -99,7 +99,7 @@

				        {

				          "method": "GET",

				          "summary": "Get create hint count",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_create_hint_count",

				          "produces": [

				            "application/json"

				@@ -123,7 +123,7 @@

				        {

				          "method": "GET",

				          "summary": "Get not stored hints count",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_not_stored_hints_count",

				          "produces": [

				            "application/json"

									
										2

api/api-doc/messaging_service.json
									
												View File
												
				@@ -191,7 +191,7 @@

				            {

				               "method":"GET",

				               "summary":"Get the version number",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_version",

				               "produces":[

				                  "application/json"

									
										94

api/api-doc/storage_proxy.json
									
												View File
												
				@@ -105,7 +105,7 @@

				            {

				               "method":"GET",

				               "summary":"Get the max hint window",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_max_hint_window",

				               "produces":[

				                  "application/json"

				@@ -128,7 +128,7 @@

				                     "description":"max hint window in ms",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -141,7 +141,7 @@

				            {

				               "method":"GET",

				               "summary":"Get max hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_max_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -164,7 +164,7 @@

				                     "description":"max hints in progress",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -177,7 +177,7 @@

				            {

				               "method":"GET",

				               "summary":"get hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -602,7 +602,7 @@

				        {

				          "method": "GET",

				          "summary": "Get cas write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_write_metrics_unfinished_commit",

				          "produces": [

				            "application/json"

				@@ -632,7 +632,7 @@

				        {

				          "method": "GET",

				          "summary": "Get cas write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_write_metrics_condition_not_met",

				          "produces": [

				            "application/json"

				@@ -647,7 +647,7 @@

				        {

				          "method": "GET",

				          "summary": "Get cas read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_read_metrics_unfinished_commit",

				          "produces": [

				            "application/json"

				@@ -671,28 +671,13 @@

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_read/condition_not_met",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get cas read metrics",

				          "type": "int",

				          "nickname": "get_cas_read_metrics_condition_not_met",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/read/timeouts",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_read_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -707,7 +692,7 @@

				        {

				          "method": "GET",

				          "summary": "Get read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_read_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -791,6 +776,36 @@

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_read/moving_average_histogram",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get CAS read rate and latency histogram",

				          "$ref": "#/utils/rate_moving_average_and_histogram",

				          "nickname": "get_cas_read_metrics_latency_histogram",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/view_write/moving_average_histogram",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get view write rate and latency histogram",

				          "$ref": "#/utils/rate_moving_average_and_histogram",

				          "nickname": "get_view_write_metrics_latency_histogram",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/range/moving_average_histogram",

				      "operations": [

				@@ -812,7 +827,7 @@

				        {

				          "method": "GET",

				          "summary": "Get range metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_range_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -827,7 +842,7 @@

				        {

				          "method": "GET",

				          "summary": "Get range metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_range_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -872,7 +887,7 @@

				        {

				          "method": "GET",

				          "summary": "Get write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_write_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -887,7 +902,7 @@

				        {

				          "method": "GET",

				          "summary": "Get write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_write_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -956,6 +971,21 @@

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_write/moving_average_histogram",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get CAS write rate and latency histogram",

				          "$ref": "#/utils/rate_moving_average_and_histogram",

				          "nickname": "get_cas_write_metrics_latency_histogram",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				         "path":"/storage_proxy/metrics/read/estimated_histogram/",

				         "operations":[

				@@ -978,7 +1008,7 @@

				            {

				               "method":"GET",

				               "summary":"Get read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1010,7 +1040,7 @@

				            {

				               "method":"GET",

				               "summary":"Get write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1042,7 +1072,7 @@

				            {

				               "method":"GET",

				               "summary":"Get range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_range_latency",

				               "produces":[

				                  "application/json"

									
										179

api/api-doc/storage_service.json
									
												View File
												
				@@ -458,7 +458,7 @@

				            {

				               "method":"GET",

				               "summary":"Return the generation value for this node.",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_generation_number",

				               "produces":[

				                  "application/json"

				@@ -646,7 +646,7 @@

				            {

				               "method":"POST",

				               "summary":"Trigger a cleanup of keys on a single keyspace",

				               "type":"int",

				               "type": "long",

				               "nickname":"force_keyspace_cleanup",

				               "produces":[

				                  "application/json"

				@@ -678,7 +678,7 @@

				            {

				               "method":"GET",

				               "summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",

				               "type":"int",

				               "type": "long",

				               "nickname":"scrub",

				               "produces":[

				                  "application/json"

				@@ -726,7 +726,7 @@

				            {

				               "method":"GET",

				               "summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",

				               "type":"int",

				               "type": "long",

				               "nickname":"upgrade_sstables",

				               "produces":[

				                  "application/json"

				@@ -800,7 +800,7 @@

				               "summary":"Return an array with the ids of the currently active repairs",

				               "type":"array",

				               "items":{

				                  "type":"int"

				                  "type": "long"

				               },

				               "nickname":"get_active_repair_async",

				               "produces":[

				@@ -816,7 +816,7 @@

				            {

				               "method":"POST",

				               "summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",

				               "type":"int",

				               "type": "long",

				               "nickname":"repair_async",

				               "produces":[

				                  "application/json"

				@@ -947,7 +947,7 @@

				                     "description":"The repair ID to check for status",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1277,18 +1277,18 @@

				                  },

				                  {

				                     "name":"dynamic_update_interval",

				                     "description":"integer, in ms (default 100)",

				                     "description":"interval in ms (default 100)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "type":"long",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dynamic_reset_interval",

				                     "description":"integer, in ms (default 600,000)",

				                     "description":"interval in ms (default 600,000)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "type":"long",

				                     "paramType":"query"

				                  },

				                  {

				@@ -1493,7 +1493,7 @@

				                     "description":"Stream throughput",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1501,7 +1501,7 @@

				            {

				               "method":"GET",

				               "summary":"Get stream throughput mb per sec",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_stream_throughput_mb_per_sec",

				               "produces":[

				                  "application/json"

				@@ -1517,7 +1517,7 @@

				            {

				               "method":"GET",

				               "summary":"get compaction throughput mb per sec",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_compaction_throughput_mb_per_sec",

				               "produces":[

				                  "application/json"

				@@ -1539,7 +1539,7 @@

				                     "description":"compaction throughput",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1943,7 +1943,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns the threshold for warning of queries with many tombstones",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_tombstone_warn_threshold",

				               "produces":[

				                  "application/json"

				@@ -1965,7 +1965,7 @@

				                     "description":"tombstone debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1978,7 +1978,7 @@

				            {

				               "method":"GET",

				               "summary":"",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_tombstone_failure_threshold",

				               "produces":[

				                  "application/json"

				@@ -2000,7 +2000,7 @@

				                     "description":"tombstone debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2013,7 +2013,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns the threshold for rejecting queries due to a large batch size",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_batch_size_failure_threshold",

				               "produces":[

				                  "application/json"

				@@ -2035,7 +2035,7 @@

				                     "description":"batch size debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2059,7 +2059,7 @@

				                     "description":"throttle in kb",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2072,7 +2072,7 @@

				            {

				               "method":"GET",

				               "summary":"Get load",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_metrics_load",

				               "produces":[

				                  "application/json"

				@@ -2088,7 +2088,7 @@

				            {

				               "method":"GET",

				               "summary":"Get exceptions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_exceptions",

				               "produces":[

				                  "application/json"

				@@ -2104,7 +2104,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -2120,7 +2120,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total hints",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_hints",

				               "produces":[

				                  "application/json"

				@@ -2164,7 +2164,42 @@

				               ]

				            }

				         ]

				      }

				      },

				      {

				         "path":"/storage_service/sstable_info",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"SSTable information",

				               "type":"array",

				               "items":{

				                  "type":"table_sstables"

				               },

				               "nickname":"sstable_info",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"column family name",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      }      

				   ],

				   "models":{

				      "mapper":{

				@@ -2228,11 +2263,11 @@

				               "description":"The column family"

				            },

				            "total":{

				               "type":"int",

				               "type":"long",

				               "description":"The total snapshot size"

				            },

				            "live":{

				               "type":"int",

				               "type":"long",

				               "description":"The live snapshot size"

				            }

				         }

				@@ -2324,6 +2359,92 @@

				               "description":"The endpoint details"

				            }

				         }

				      },

				      "named_maps":{

				        "id":"named_maps",

				        "properties":{

				            "group":{

				                "type":"string"

				            },

				            "attributes":{

				                "type":"array",

				                "items":{

				                    "type":"mapper"

				                }

				            }

				        }

				      },

				      "sstable":{

				        "id":"sstable",

				        "properties":{

				            "size":{

				               "type":"long",

				               "description":"Total size in bytes of sstable"

				            },

				            "data_size":{

				                "type":"long",

				                "description":"The size in bytes on disk of data"

				            },

				            "index_size":{

				               "type":"long",

				               "description":"The size in bytes on disk of index"

				            },

				            "filter_size":{

				               "type":"long",

				               "description":"The size in bytes on disk of filter"

				            },

				            "timestamp":{

				                "type":"datetime",

				                "description":"File creation time"

				            },

				            "generation":{

				                "type":"long",

				                "description":"SSTable generation"

				            },

				            "level":{

				               "type":"long",

				               "description":"SSTable level"

				            },

				            "version":{

				               "type":"string",

				               "enum":[

				                  "ka", "la", "mc"

				               ],

				               "description":"SSTable version"

				            },

				            "properties":{

				                "type":"array",

				                "description":"SSTable attributes",

				                "items":{

				                    "type":"mapper"

				                }

				            },

				            "extended_properties":{

				                "type":"array",

				                "description":"SSTable extended attributes",

				                "items":{

				                    "type":"named_maps"

				                }

				            }

				        }

				      },

				      "table_sstables":{

				        "id":"table_sstables",

				        "description":"Per-table SSTable info and attributes",

				        "properties":{

				            "keyspace":{

				                "type":"string"

				            },

				            "table":{

				                "type":"string"

				            },

				            "sstables":{

				                "type":"array",

				                "items":{

				                    "$ref":"sstable"

				                }

				            }

				        }

				      }

				   }

				}

									
										16

api/api-doc/stream_manager.json
									
												View File
												
				@@ -32,7 +32,7 @@

				            {

				               "method":"GET",

				               "summary":"Get number of active outbound streams",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_active_streams_outbound",

				               "produces":[

				                  "application/json"

				@@ -48,7 +48,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total incoming bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_incoming_bytes",

				               "produces":[

				                  "application/json"

				@@ -72,7 +72,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total incoming bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_incoming_bytes",

				               "produces":[

				                  "application/json"

				@@ -88,7 +88,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total outgoing bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_outgoing_bytes",

				               "produces":[

				                  "application/json"

				@@ -112,7 +112,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total outgoing bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_outgoing_bytes",

				               "produces":[

				                  "application/json"

				@@ -154,7 +154,7 @@

				               "description":"The peer"

				            },

				            "session_index":{

				               "type":"int",

				               "type": "long",

				               "description":"The session index"

				            },

				            "connecting":{

				@@ -211,7 +211,7 @@

				               "description":"The ID"

				            },

				            "files":{

				               "type":"int",

				               "type": "long",

				               "description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."

				            },

				            "total_size":{

				@@ -242,7 +242,7 @@

				               "description":"The peer address"

				            },

				            "session_index":{

				               "type":"int",

				               "type": "long",

				               "description":"The session index"

				            },

				            "file_name":{

									
										15

api/api-doc/system.json
									
												View File
												
				@@ -52,6 +52,21 @@

				            }

				         ]

				      },

				      {

				         "path":"/system/uptime_ms",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get system uptime, in milliseconds",

				               "type":"long",

				               "nickname":"get_system_uptime",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/system/logger/{name}",

				         "operations":[

									
										10

api/api.cc
									
												View File
												
				@@ -20,9 +20,9 @@

				 */

				#include "api.hh"

				#include "http/file_handler.hh"

				#include "http/transformers.hh"

				#include "http/api_docs.hh"

				#include <seastar/http/file_handler.hh>

				#include <seastar/http/transformers.hh>

				#include <seastar/http/api_docs.hh>

				#include "storage_service.hh"

				#include "commitlog.hh"

				#include "gossiper.hh"

				@@ -36,11 +36,13 @@

				#include "endpoint_snitch.hh"

				#include "compaction_manager.hh"

				#include "hinted_handoff.hh"

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "stream_manager.hh"

				#include "system.hh"

				#include "api/config.hh"

				logging::logger apilog("api");

				namespace api {

				static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {

									
										44

api/api.hh
									
												View File
												
				@@ -21,13 +21,15 @@

				#pragma once

				#include "json/json_elements.hh"

				#include <seastar/json/json_elements.hh>

				#include <type_traits>

				#include <boost/lexical_cast.hpp>

				#include <boost/algorithm/string/split.hpp>

				#include <boost/algorithm/string/classification.hpp>

				#include <boost/units/detail/utility.hpp>

				#include "api/api-doc/utils.json.hh"

				#include "utils/histogram.hh"

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "api_init.hh"

				#include "seastarx.hh"

				@@ -216,4 +218,42 @@ std::vector<T> concat(std::vector<T> a, std::vector<T>&& b) {

				    return a;

				}

				template <class T, class Base = T>

				class req_param {

				public:

				    sstring name;

				    sstring param;

				    T value;

				    req_param(const request& req, sstring name, T default_val) : name(name) {

				        param = req.get_query_param(name);

				        if (param.empty()) {

				            value = default_val;

				            return;

				        }

				        try {

				            // boost::lexical_cast does not use boolalpha. Converting a

				            // true/false throws exceptions. We don't want that.

				            if constexpr (std::is_same_v<Base, bool>) {

				                // Cannot use boolalpha because we (probably) want to

				                // accept 1 and 0 as well as true and false. And True. And fAlse.

				                std::transform(param.begin(), param.end(), param.begin(), ::tolower);

				                if (param == "true" || param == "1") {

				                    value = T(true);

				                } else if (param == "false" || param == "0") {

				                    value = T(false);

				                } else {

				                    throw boost::bad_lexical_cast{};

				                }

				            } else {

				                value = T{boost::lexical_cast<Base>(param)};

				            }

				        } catch (boost::bad_lexical_cast&) {

				            throw bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));

				        }

				    }

				    operator T() const { return value; }

				};

				}

									
										12

api/api_init.hh
									
												View File
												
				@@ -19,9 +19,11 @@

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include "database.hh"

				#include "database_fwd.hh"

				#include "service/storage_proxy.hh"

				#include "http/httpd.hh"

				#include <seastar/http/httpd.hh>

				namespace service { class load_meter; }

				namespace api {

				@@ -31,9 +33,11 @@ struct http_context {

				    httpd::http_server_control http_server;

				    distributed<database>& db;

				    distributed<service::storage_proxy>& sp;

				    service::load_meter& lmeter;

				    http_context(distributed<database>& _db,

				            distributed<service::storage_proxy>& _sp)

				            : db(_db), sp(_sp) {

				            distributed<service::storage_proxy>& _sp,

				            service::load_meter& _lm)

				            : db(_db), sp(_sp), lmeter(_lm) {

				    }

				};

									
										4

api/collectd.cc
									
												View File
												
				@@ -21,8 +21,8 @@

				#include "collectd.hh"

				#include "api/api-doc/collectd.json.hh"

				#include "core/scollectd.hh"

				#include "core/scollectd_api.hh"

				#include <seastar/core/scollectd.hh>

				#include <seastar/core/scollectd_api.hh>

				#include "endian.h"

				#include <boost/range/irange.hpp>

				#include <regex>

									
										220

api/column_family.cc
									
												View File
												
				@@ -22,10 +22,14 @@

				#include "column_family.hh"

				#include "api/api-doc/column_family.json.hh"

				#include <vector>

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "sstables/sstables.hh"

				#include "utils/estimated_histogram.hh"

				#include <algorithm>

				#include "db/system_keyspace_view_types.hh"

				#include "db/data_listeners.hh"

				extern logging::logger apilog;

				namespace api {

				using namespace httpd;

				@@ -34,7 +38,7 @@ using namespace std;

				using namespace json;

				namespace cf = httpd::column_family_json;

				const utils::UUID& get_uuid(const sstring& name, const database& db) {

				std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {

				    auto pos = name.find("%3A");

				    size_t end;

				    if (pos == sstring::npos) {

				@@ -46,14 +50,22 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {

				    } else {

				        end = pos + 3;

				    }

				    return std::make_tuple(name.substr(0, pos), name.substr(end));

				}

				const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {

				    try {

				        return db.find_uuid(name.substr(0, pos), name.substr(end));

				        return db.find_uuid(ks, cf);

				    } catch (std::out_of_range& e) {

				        throw bad_param_exception("Column family '" + name.substr(0, pos) + ":"

				                + name.substr(end) + "' not found");

				        throw bad_param_exception(format("Column family '{}:{}' not found", ks, cf));

				    }

				}

				const utils::UUID& get_uuid(const sstring& name, const database& db) {

				    auto [ks, cf] = parse_fully_qualified_cf_name(name);

				    return get_uuid(ks, cf, db);

				}

				future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {

				    auto uuid = get_uuid(name, ctx.db.local());

				@@ -63,28 +75,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<

				}

				future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,

				        int64_t column_family::stats::*f) {

				        int64_t column_family_stats::*f) {

				    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {

				        return cf.get_stats().*f;

				    }, std::plus<int64_t>());

				}

				future<json::json_return_type>  get_cf_stats(http_context& ctx,

				        int64_t column_family::stats::*f) {

				        int64_t column_family_stats::*f) {

				    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {

				        return cf.get_stats().*f;

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type>  get_cf_stats_count(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {

				        return (cf.get_stats().*f).hist.count;

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    auto uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([uuid, f](database& db) {

				        // Histograms information is sample of the actual load

				@@ -100,14 +112,14 @@ static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const

				static future<json::json_return_type>  get_cf_stats_count(http_context& ctx,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {

				        return (cf.get_stats().*f).hist.count;

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    utils::UUID uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([f, uuid](const database& p) {

				        return (p.find_column_family(uuid).get_stats().*f).hist;},

				@@ -118,7 +130,7 @@ static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const

				    });

				}

				static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    std::function<utils::ihistogram(const database&)> fun = [f] (const database& db)  {

				        utils::ihistogram res;

				        for (auto i : db.get_column_families()) {

				@@ -134,7 +146,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:

				}

				static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    utils::UUID uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([f, uuid](const database& p) {

				        return (p.find_column_family(uuid).get_stats().*f).rate();},

				@@ -145,7 +157,7 @@ static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& c

				    });

				}

				static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {

				static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {

				    std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db)  {

				        utils::rate_moving_average_and_histogram res;

				        for (auto i : db.get_column_families()) {

				@@ -166,27 +178,27 @@ static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ct

				    }, std::plus<int64_t>());

				}

				static int64_t min_row_size(column_family& cf) {

				static int64_t min_partition_size(column_family& cf) {

				    int64_t res = INT64_MAX;

				    for (auto i: *cf.get_sstables() ) {

				        res = std::min(res, i->get_stats_metadata().estimated_row_size.min());

				        res = std::min(res, i->get_stats_metadata().estimated_partition_size.min());

				    }

				    return (res == INT64_MAX) ? 0 : res;

				}

				static int64_t max_row_size(column_family& cf) {

				static int64_t max_partition_size(column_family& cf) {

				    int64_t res = 0;

				    for (auto i: *cf.get_sstables() ) {

				        res = std::max(i->get_stats_metadata().estimated_row_size.max(), res);

				        res = std::max(i->get_stats_metadata().estimated_partition_size.max(), res);

				    }

				    return res;

				}

				static integral_ratio_holder mean_row_size(column_family& cf) {

				static integral_ratio_holder mean_partition_size(column_family& cf) {

				    integral_ratio_holder res;

				    for (auto i: *cf.get_sstables() ) {

				        auto c = i->get_stats_metadata().estimated_row_size.count();

				        res.sub += i->get_stats_metadata().estimated_row_size.mean() * c;

				        auto c = i->get_stats_metadata().estimated_partition_size.count();

				        res.sub += i->get_stats_metadata().estimated_partition_size.mean() * c;

				        res.total += c;

				    }

				    return res;

				@@ -242,12 +254,11 @@ class sum_ratio {

				    uint64_t _n = 0;

				    T _total = 0;

				public:

				    future<> operator()(T value) {

				    void operator()(T value) {

				        if (value > 0) {

				            _total += value;

				            _n++;

				        }

				        return make_ready_future<>();

				    }

				    // Returns average value of all registered ratios.

				    T get() && {

				@@ -396,29 +407,31 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);

				        return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);

				    });

				    cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);

				        return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            utils::estimated_histogram res(0);

				            for (auto i: *cf.get_sstables() ) {

				                res.merge(i->get_stats_metadata().estimated_row_size);

				                res.merge(i->get_stats_metadata().estimated_partition_size);

				            }

				            return res;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {

				            uint64_t res = 0;

				            for (auto i: *cf.get_sstables() ) {

				                res += i->get_stats_metadata().estimated_row_size.count();

				                res += i->get_stats_metadata().estimated_partition_size.count();

				            }

				            return res;

				        },

				@@ -443,67 +456,67 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);

				        return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);

				    });

				    cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::pending_flushes);

				        return get_cf_stats(ctx, &column_family_stats::pending_flushes);

				    });

				    cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);

				        return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);

				    });

				    cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_count(ctx, &column_family::stats::reads);

				        return get_cf_stats_count(ctx, &column_family_stats::reads);

				    });

				    cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);

				        return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);

				    });

				    cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_count(ctx, &column_family::stats::writes);

				        return get_cf_stats_count(ctx, &column_family_stats::writes);

				    });

				    cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);

				        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);

				    });

				    cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);

				    });

				    cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);

				        return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);

				    });

				    cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);

				        return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);

				    });

				    cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, &column_family::stats::writes);

				        return get_cf_histogram(ctx, &column_family_stats::writes);

				    });

				    cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);

				        return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);

				    });

				    cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);

				        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);

				    });

				    cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);

				    });

				    cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, &column_family::stats::writes);

				        return get_cf_histogram(ctx, &column_family_stats::writes);

				    });

				    cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);

				        return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);

				    });

				    cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -519,11 +532,11 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);

				        return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);

				    });

				    cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::live_sstable_count);

				        return get_cf_stats(ctx, &column_family_stats::live_sstable_count);

				    });

				    cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -546,30 +559,36 @@ void set_column_family(http_context& ctx, routes& r) {

				        return sum_sstable(ctx, true);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_row_size, min_int64);

				        return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, INT64_MAX, min_row_size, min_int64);

				        return map_reduce_cf(ctx, INT64_MAX, min_partition_size, min_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_row_size, max_int64);

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, int64_t(0), max_row_size, max_int64);

				        return map_reduce_cf(ctx, int64_t(0), max_partition_size, max_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        // Cassandra 3.x mean values are truncated as integrals.

				        return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());

				        return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        // Cassandra 3.x mean values are truncated as integrals.

				        return map_reduce_cf(ctx, integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());

				        return map_reduce_cf(ctx, integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());

				    });

				    cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -776,25 +795,25 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            return cf.get_stats().estimated_cas_prepare;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            return cf.get_stats().estimated_cas_propose;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            return cf.get_stats().estimated_cas_commit;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -805,11 +824,11 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);

				        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);

				    });

				    cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);

				        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);

				    });

				    cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {

				@@ -827,13 +846,28 @@ void set_column_family(http_context& ctx, routes& r) {

				        return true;

				    });

				    cf::get_built_indexes.set(r, [](const_req) {

				        // FIXME

				        // Currently there are no index support

				        return std::vector<sstring>();

				    cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);

				        return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {

				            std::set<sstring> vp;

				            for (auto b : vb) {

				                if (b.view.first == ks) {

				                    vp.insert(b.view.second);

				                }

				            }

				            std::vector<sstring> res;

				            auto uuid = get_uuid(ks, cf_name, ctx.db.local());

				            column_family& cf = ctx.db.local().find_column_family(uuid);

				            res.reserve(cf.get_index_manager().list_indexes().size());

				            for (auto&& i : cf.get_index_manager().list_indexes()) {

				                if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {

				                    res.emplace_back(i.metadata().name());

				                }

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {

				        // FIXME

				        // Currently there are no information on the compression

				@@ -920,5 +954,45 @@ void set_column_family(http_context& ctx, routes& r) {

				            return make_ready_future<json::json_return_type>(container_to_vec(res));

				        });

				    });

				    cf::toppartitions.set(r, [&ctx] (std::unique_ptr<request> req) {

				        auto name_param = req->param["name"];

				        auto [ks, cf] = parse_fully_qualified_cf_name(name_param);

				        api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};

				        api::req_param<unsigned> capacity(*req, "capacity", 256);

				        api::req_param<unsigned> list_size(*req, "list_size", 10);

				        apilog.info("toppartitions query: name={} duration={} list_size={} capacity={}",

				            name_param, duration.param, list_size.param, capacity.param);

				        return seastar::do_with(db::toppartitions_query(ctx.db, ks, cf, duration.value, list_size, capacity), [&ctx](auto& q) {

				            return q.scatter().then([&q] {

				                return sleep(q.duration()).then([&q] {

				                    return q.gather(q.capacity()).then([&q] (auto topk_results) {

				                        apilog.debug("toppartitions query: processing results");

				                        cf::toppartitions_query_results results;

				                        for (auto& d: topk_results.read.top(q.list_size())) {

				                            cf::toppartitions_record r;

				                            r.partition = sstring(d.item);

				                            r.count = d.count;

				                            r.error = d.error;

				                            results.read.push(r);

				                        }

				                        for (auto& d: topk_results.write.top(q.list_size())) {

				                            cf::toppartitions_record r;

				                            r.partition = sstring(d.item);

				                            r.count = d.count;

				                            r.error = d.error;

				                            results.write.push(r);

				                        }

				                        return make_ready_future<json::json_return_type>(results);

				                    });

				                });

				            });

				        });

				    });

				}

				}

									
										51

api/column_family.hh
									
												View File
												
				@@ -24,6 +24,7 @@

				#include "api.hh"

				#include "api/api-doc/column_family.json.hh"

				#include "database.hh"

				#include <seastar/core/future-util.hh>

				#include <any>

				namespace api {

				@@ -38,14 +39,14 @@ template<class Mapper, class I, class Reducer>

				future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,

				        Mapper mapper, Reducer reducer) {

				    auto uuid = get_uuid(name, ctx.db.local());

				    using mapper_type = std::function<std::any (database&)>;

				    using reducer_type = std::function<std::any (std::any, std::any)>;

				    using mapper_type = std::function<std::unique_ptr<std::any>(database&)>;

				    using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;

				    return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {

				        return I(mapper(db.find_column_family(uuid)));

				    }), std::any(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {

				        return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));

				    })).then([] (std::any r) {

				        return std::any_cast<I>(std::move(r));

				        return std::make_unique<std::any>(I(mapper(db.find_column_family(uuid))));

				    }), std::make_unique<std::any>(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {

				        return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));

				    })).then([] (std::unique_ptr<std::any> r) {

				        return std::any_cast<I>(std::move(*r));

				    });

				}

				@@ -69,30 +70,32 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n

				struct map_reduce_column_families_locally {

				    std::any init;

				    std::function<std::any (column_family&)> mapper;

				    std::function<std::any (std::any, std::any)> reducer;

				    std::any operator()(database& db) const {

				        auto res = init;

				        for (auto i : db.get_column_families()) {

				            res = reducer(res, mapper(*i.second.get()));

				        }

				        return res;

				    std::function<std::unique_ptr<std::any>(column_family&)> mapper;

				    std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;

				    future<std::unique_ptr<std::any>> operator()(database& db) const {

				        auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));

				        return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {

				            *res = std::move(reducer(std::move(*res), mapper(*i.second.get())));

				        }).then([res] {

				            return std::move(*res);

				        });

				    }

				};

				template<class Mapper, class I, class Reducer>

				future<I> map_reduce_cf_raw(http_context& ctx, I init,

				        Mapper mapper, Reducer reducer) {

				    using mapper_type = std::function<std::any (column_family&)>;

				    using reducer_type = std::function<std::any (std::any, std::any)>;

				    using mapper_type = std::function<std::unique_ptr<std::any>(column_family&)>;

				    using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;

				    auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {

				        return I(mapper(cf));

				        return std::make_unique<std::any>(I(mapper(cf)));

				    });

				    auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {

				        return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));

				    auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {

				        return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));

				    });

				    return ctx.db.map_reduce0(map_reduce_column_families_locally{init, std::move(wrapped_mapper), wrapped_reducer}, std::any(init), wrapped_reducer).then([] (std::any res) {

				        return std::any_cast<I>(std::move(res));

				    return ctx.db.map_reduce0(map_reduce_column_families_locally{init,

				            std::move(wrapped_mapper), wrapped_reducer}, std::make_unique<std::any>(init), wrapped_reducer).then([] (std::unique_ptr<std::any> res) {

				        return std::any_cast<I>(std::move(*res));

				    });

				}

				@@ -106,9 +109,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,

				}

				future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,

				        int64_t column_family::stats::*f);

				        int64_t column_family_stats::*f);

				future<json::json_return_type>  get_cf_stats(http_context& ctx,

				        int64_t column_family::stats::*f);

				        int64_t column_family_stats::*f);

				}

									
										15

api/commitlog.cc
									
												View File
												
				@@ -22,15 +22,16 @@

				#include "commitlog.hh"

				#include <db/commitlog/commitlog.hh>

				#include "api/api-doc/commitlog.json.hh"

				#include "database.hh"

				#include <vector>

				namespace api {

				template<typename Func>

				static auto acquire_cl_metric(http_context& ctx, Func&& func) {

				    typedef std::result_of_t<Func(db::commitlog *)> ret_type;

				template<typename T>

				static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {

				    typedef T ret_type;

				    return ctx.db.map_reduce0([func = std::forward<Func>(func)](database& db) {

				    return ctx.db.map_reduce0([func = std::move(func)](database& db) {

				        if (db.commitlog() == nullptr) {

				            return make_ready_future<ret_type>();

				        }

				@@ -63,15 +64,15 @@ void set_commitlog(http_context& ctx, routes& r) {

				    });

				    httpd::commitlog_json::get_completed_tasks.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_pending_tasks.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));

				    });

				}

									
										95

api/compaction_manager.cc
									
												View File
												
				@@ -24,6 +24,7 @@

				#include "api/api-doc/compaction_manager.json.hh"

				#include "db/system_keyspace.hh"

				#include "column_family.hh"

				#include <utility>

				namespace api {

				@@ -38,6 +39,16 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,

				        return make_ready_future<json::json_return_type>(res);

				    });

				}

				static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> sum_pending_tasks(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>&& a,

				        const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& b) {

				    for (auto&& i : b) {

				        if (i.second) {

				            a[i.first] += i.second;

				        }

				    }

				    return std::move(a);

				}

				void set_compaction_manager(http_context& ctx, routes& r) {

				    cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -47,8 +58,8 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				            for (const auto& c : cm.get_compactions()) {

				                cm::summary s;

				                s.ks = c->ks;

				                s.cf = c->cf;

				                s.ks = c->ks_name;

				                s.cf = c->cf_name;

				                s.unit = "keys";

				                s.task_type = sstables::compaction_name(c->type);

				                s.completed = c->total_keys_written;

				@@ -61,6 +72,32 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        });

				    });

				    cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return ctx.db.map_reduce0([&ctx](database& db) {

				            return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {

				                return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {

				                    table& cf = *i.second.get();

				                    tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);

				                    return make_ready_future<>();

				                }).then([&tasks] {

				                    return std::move(tasks);

				                });

				            });

				        }, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(

				                [](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {

				            std::vector<cm::pending_compaction> res;

				            res.reserve(task_map.size());

				            for (auto i : task_map) {

				                cm::pending_compaction task;

				                task.ks = i.first.first;

				                task.cf = i.first.second;

				                task.task = i.second;

				                res.emplace_back(std::move(task));

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				@@ -103,29 +140,37 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				    });

				    cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {

				        return db::system_keyspace::get_compaction_history().then([] (std::vector<db::system_keyspace::compaction_history_entry> history) {

				            std::vector<cm::history> res;

				            res.reserve(history.size());

				            for (auto& entry : history) {

				                cm::history h;

				                h.id = entry.id.to_sstring();

				                h.ks = std::move(entry.ks);

				                h.cf = std::move(entry.cf);

				                h.compacted_at = entry.compacted_at;

				                h.bytes_in = entry.bytes_in;

				                h.bytes_out =  entry.bytes_out;

				                for (auto it : entry.rows_merged) {

				                    httpd::compaction_manager_json::row_merged e;

				                    e.key = it.first;

				                    e.value = it.second;

				                    h.rows_merged.push(std::move(e));

				                }

				                res.push_back(std::move(h));

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				        std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {

				            return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){

				                return s.write("[").then([&s, &first] {

				                    return db::system_keyspace::get_compaction_history([&s, &first](const db::system_keyspace::compaction_history_entry& entry) mutable {

				                        cm::history h;

				                        h.id = entry.id.to_sstring();

				                        h.ks = std::move(entry.ks);

				                        h.cf = std::move(entry.cf);

				                        h.compacted_at = entry.compacted_at;

				                        h.bytes_in = entry.bytes_in;

				                        h.bytes_out =  entry.bytes_out;

				                        for (auto it : entry.rows_merged) {

				                            httpd::compaction_manager_json::row_merged e;

				                            e.key = it.first;

				                            e.value = it.second;

				                            h.rows_merged.push(std::move(e));

				                        }

				                        auto fut = first ? make_ready_future<>() : s.write(", ");

				                        first = false;

				                        return fut.then([&s, h = std::move(h)] {

				                            return formatter::write(s, h);

				                        });

				                    }).then([&s] {

				                        return s.write("]").then([&s] {

				                            return s.close();

				                        });

				                    });

				                });

				            });

				        };

				        return make_ready_future<json::json_return_type>(std::move(f));

				    });

				    cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {

									
										27

api/config.cc
									
												View File
												
				@@ -22,6 +22,7 @@

				#include "api/config.hh"

				#include "api/api-doc/config.json.hh"

				#include "db/config.hh"

				#include "database.hh"

				#include <sstream>

				#include <boost/algorithm/string/replace.hpp>

				@@ -43,14 +44,14 @@ json::json_return_type get_json_return_type(const db::seed_provider_type& val) {

				    return json::json_return_type(val.class_name);

				}

				std::string format_type(const std::string& type) {

				std::string_view format_type(std::string_view type) {

				    if (type == "int") {

				        return "integer";

				    }

				    return type;

				}

				future<> get_config_swagger_entry(const std::string& name, const std::string& description, const std::string& type, bool& first, output_stream<char>& os) {

				future<> get_config_swagger_entry(std::string_view name, const std::string& description, std::string_view type, bool& first, output_stream<char>& os) {

				    std::stringstream ss;

				    if (first) {

				        first=false;

				@@ -87,23 +88,29 @@ future<> get_config_swagger_entry(const std::string& name, const std::string& de

				}

				namespace cs = httpd::config_json;

				#define _get_config_value(name, type, deflt, status, desc, ...) if (id == #name) {return get_json_return_type(ctx.db.local().get_config().name());}

				#define _get_config_description(name, type, deflt, status, desc, ...) f = f.then([&os, &first] {return get_config_swagger_entry(#name, desc, #type, first, os);});

				void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {

				    rb->register_function(r, [] (output_stream<char>& os) {

				        return do_with(true, [&os] (bool& first) {

				    rb->register_function(r, [&ctx] (output_stream<char>& os) {

				        return do_with(true, [&os, &ctx] (bool& first) {

				            auto f = make_ready_future();

				            _make_config_values(_get_config_description)

				            for (auto&& cfg_ref : ctx.db.local().get_config().values()) {

				                auto&& cfg = cfg_ref.get();

				                f = f.then([&os, &first, &cfg] {

				                    return get_config_swagger_entry(cfg.name(), std::string(cfg.desc()), cfg.type_name(), first, os);

				                });

				            }

				            return f;

				        });

				    });

				    cs::find_config_id.set(r, [&ctx] (const_req r) {

				        auto id = r.param["id"];

				        _make_config_values(_get_config_value)

				        for (auto&& cfg_ref : ctx.db.local().get_config().values()) {

				            auto&& cfg = cfg_ref.get();

				            if (id == cfg.name()) {

				                return cfg.value_as_json();

				            }

				        }

				        throw bad_param_exception(sstring("No such config entry: ") + id);

				    });

				}

									
										2

api/lsa.cc
									
												View File
												
				@@ -23,7 +23,7 @@

				#include "api/lsa.hh"

				#include "api/api.hh"

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "utils/logalloc.hh"

				#include "log.hh"

									
										6

api/messaging_service.cc
									
												View File
												
				@@ -21,7 +21,7 @@

				#include "messaging_service.hh"

				#include "message/messaging_service.hh"

				#include "rpc/rpc_types.hh"

				#include <seastar/rpc/rpc_types.hh>

				#include "api/api-doc/messaging_service.json.hh"

				#include <iostream>

				#include <sstream>

				@@ -76,7 +76,7 @@ future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)

				        auto get_shard_map = [f](messaging_service& ms) {

				            std::unordered_map<gms::inet_address, unsigned long> map;

				            ms.foreach_server_connection_stats([&map, f] (const rpc::client_info& info, const rpc::stats& stats) mutable {

				                map[gms::inet_address(net::ipv4_address(info.addr))] = f(stats);

				                map[gms::inet_address(info.addr.addr())] = f(stats);

				            });

				            return map;

				        };

				@@ -139,7 +139,7 @@ void set_messaging_service(http_context& ctx, routes& r) {

				                messaging_verb v = i; // for type safety we use messaging_verb values

				                auto idx = static_cast<uint32_t>(v);

				                if (idx >= map->size()) {

				                    throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));

				                    throw std::runtime_error(format("verb index out of bounds: {:d}, map size: {:d}", idx, map->size()));

				                }

				                if ((*map)[idx] > 0) {

				                    c.count = (*map)[idx];

									
										93

api/storage_proxy.cc
									
												View File
												
				@@ -26,6 +26,7 @@

				#include "service/storage_service.hh"

				#include "db/config.hh"

				#include "utils/histogram.hh"

				#include "database.hh"

				namespace api {

				@@ -46,6 +47,10 @@ static future<json::json_return_type>  sum_timed_rate_as_obj(distributed<proxy>&

				    });

				}

				httpd::utils_json::rate_moving_average_and_histogram get_empty_moving_average() {

				    return timer_to_json(utils::rate_moving_average_and_histogram());

				}

				static future<json::json_return_type>  sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {

				    return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {

				        return make_ready_future<json::json_return_type>(val.count);

				@@ -76,12 +81,9 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {

				        //TBD

				        // FIXME

				        // hinted handoff is not supported currently,

				        // so we should return false

				        return make_ready_future<json::json_return_type>(false);

				    sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req)  {

				        auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();

				        return make_ready_future<json::json_return_type>(enabled);

				    });

				    sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {

				@@ -245,68 +247,40 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				        });

				    });

				    sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // cas is not supported yet, so just return 0

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);

				    });

				    sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // cas is not supported yet, so just return 0

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);

				    });

				    sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // cas is not supported yet, so just return 0

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);

				    });

				    sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // cas is not supported yet, so just return 0

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);

				    });

				    sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);

				    });

				    sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);

				    });

				    sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);

				    });

				    sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);

				    });

				    sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);

				    });

				    sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				@@ -376,6 +350,21 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				    sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::write);

				    });

				    sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);

				    });

				    sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);

				    });

				    sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        // FIXME

				        // No View metrics are available, so just return empty moving average

				        return make_ready_future<json::json_return_type>(get_empty_moving_average());

				    });

				    sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::read);

									
										325

api/storage_service.cc
									
												View File
												
				@@ -22,19 +22,27 @@

				#include "storage_service.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "db/config.hh"

				#include <optional>

				#include <time.h>

				#include <boost/range/adaptor/map.hpp>

				#include <boost/range/adaptor/filtered.hpp>

				#include <service/storage_service.hh>

				#include <db/commitlog/commitlog.hh>

				#include <gms/gossiper.hh>

				#include <db/system_keyspace.hh>

				#include "http/exception.hh"

				#include "service/storage_service.hh"

				#include "service/load_meter.hh"

				#include "db/commitlog/commitlog.hh"

				#include "gms/gossiper.hh"

				#include "db/system_keyspace.hh"

				#include "seastar/http/exception.hh"

				#include "repair/repair.hh"

				#include "locator/snitch_base.hh"

				#include "column_family.hh"

				#include "log.hh"

				#include "release.hh"

				#include "sstables/compaction_manager.hh"

				#include "sstables/sstables.hh"

				#include "database.hh"

				#include "db/extensions.hh"

				sstables::sstable::version_types get_highest_supported_format();

				namespace api {

				@@ -48,45 +56,55 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {

				    throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");

				}

				static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {

				    std::vector<ss::token_range> res;

				    for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {

				        ss::token_range r;

				        r.start_token = d._start_token;

				        r.end_token = d._end_token;

				        r.endpoints = d._endpoints;

				        r.rpc_endpoints = d._rpc_endpoints;

				        for (auto det : d._endpoint_details) {

				            ss::endpoint_detail ed;

				            ed.host = det._host;

				            ed.datacenter = det._datacenter;

				            if (det._rack != "") {

				                ed.rack = det._rack;

				            }

				            r.endpoint_details.push(ed);

				static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {

				    ss::token_range r;

				    r.start_token = d._start_token;

				    r.end_token = d._end_token;

				    r.endpoints = d._endpoints;

				    r.rpc_endpoints = d._rpc_endpoints;

				    for (auto det : d._endpoint_details) {

				        ss::endpoint_detail ed;

				        ed.host = det._host;

				        ed.datacenter = det._datacenter;

				        if (det._rack != "") {

				            ed.rack = det._rack;

				        }

				        res.push_back(r);

				        r.endpoint_details.push(ed);

				    }

				    return res;

				    return r;

				}

				void set_storage_service(http_context& ctx, routes& r) {

				    using ks_cf_func = std::function<future<json::json_return_type>(std::unique_ptr<request>, sstring, std::vector<sstring>)>;

				    auto wrap_ks_cf = [&ctx](ks_cf_func f) {

				        return [&ctx, f = std::move(f)](std::unique_ptr<request> req) {

				            auto keyspace = validate_keyspace(ctx, req->param);

				            auto column_families = split_cf(req->get_query_param("cf"));

				            if (column_families.empty()) {

				                column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				            }

				            return f(std::move(req), std::move(keyspace), std::move(column_families));

				        };

				    };

				    ss::local_hostid.set(r, [](std::unique_ptr<request> req) {

				        return db::system_keyspace::get_local_host_id().then([](const utils::UUID& id) {

				            return make_ready_future<json::json_return_type>(id.to_sstring());

				        });

				    });

				    ss::get_tokens.set(r, [] (const_req req) {

				        auto tokens = service::get_local_storage_service().get_token_metadata().sorted_tokens();

				        return container_to_vec(tokens);

				    ss::get_tokens.set(r, [] (std::unique_ptr<request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().sorted_tokens(), [](const dht::token& i) {

				           return boost::lexical_cast<std::string>(i);

				        }));

				    });

				    ss::get_node_tokens.set(r, [] (const_req req) {

				        gms::inet_address addr(req.param["endpoint"]);

				        auto tokens = service::get_local_storage_service().get_token_metadata().get_tokens(addr);

				        return container_to_vec(tokens);

				    ss::get_node_tokens.set(r, [] (std::unique_ptr<request> req) {

				        gms::inet_address addr(req->param["endpoint"]);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().get_tokens(addr), [](const dht::token& i) {

				           return boost::lexical_cast<std::string>(i);

				       }));

				    });

				    ss::get_commitlog.set(r, [&ctx](const_req req) {

				@@ -107,11 +125,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::get_moving_nodes.set(r, [](const_req req) {

				        auto points = service::get_local_storage_service().get_token_metadata().get_moving_endpoints();

				        std::unordered_set<sstring> addr;

				        for (auto i: points) {

				            addr.insert(boost::lexical_cast<std::string>(i.second));

				        }

				        return container_to_vec(addr);

				    });

				@@ -159,13 +173,13 @@ void set_storage_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(res);

				    });

				    ss::describe_any_ring.set(r, [&ctx](const_req req) {

				        return describe_ring("");

				    ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));

				    });

				    ss::describe_ring.set(r, [&ctx](const_req req) {

				        auto keyspace = validate_keyspace(ctx, req.param);

				        return describe_ring(keyspace);

				    ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));

				    });

				    ss::get_host_id_map.set(r, [](const_req req) {

				@@ -175,11 +189,11 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);

				        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);

				    });

				    ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {

				        return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {

				    ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return ctx.lmeter.get_load_map().then([] (auto&& load_map) {

				            std::vector<ss::map_string_double> res;

				            for (auto i : load_map) {

				                ss::map_string_double val;

				@@ -237,6 +251,9 @@ void set_storage_service(http_context& ctx, routes& r) {

				        if (column_family.empty()) {

				            resp = service::get_local_storage_service().take_snapshot(tag, keynames);

				        } else {

				            if (keynames.empty()) {

				                throw httpd::bad_param_exception("The keyspace of column families must be specified");

				            }

				            if (keynames.size() > 1) {

				                throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");

				            }

				@@ -287,38 +304,65 @@ void set_storage_service(http_context& ctx, routes& r) {

				        if (column_families.empty()) {

				            column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				        }

				        return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {

				            std::vector<column_family*> column_families_vec;

				            auto& cm = db.get_compaction_manager();

				            for (auto cf : column_families) {

				                column_families_vec.push_back(&db.find_column_family(keyspace, cf));

				        return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,

				                column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {

				            if (!is_cleanup_allowed) {

				                return make_exception_future<json::json_return_type>(

				                        std::runtime_error("Can not perform cleanup operation when topology changes"));

				            }

				            return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {

				                return cm.perform_cleanup(cf);

				            return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {

				                std::vector<column_family*> column_families_vec;

				                auto& cm = db.get_compaction_manager();

				                for (auto cf : column_families) {

				                    column_families_vec.push_back(&db.find_column_family(keyspace, cf));

				                }

				                return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {

				                    return cm.perform_cleanup(cf);

				                });

				            }).then([]{

				                return make_ready_future<json::json_return_type>(0);

				            });

				        });

				    });

				    ss::scrub.set(r, wrap_ks_cf([&ctx](std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {

				        // TODO: respect this

				        auto skip_corrupted = req->get_query_param("skip_corrupted");

				        auto f = make_ready_future<>();

				        if (!req_param<bool>(*req, "disable_snapshot", false)) {

				            auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());

				            f = parallel_for_each(column_families, [keyspace, tag](sstring cf) {

				                return service::get_local_storage_service().take_column_family_snapshot(keyspace, cf, tag);

				            });

				        }

				        return f.then([&ctx, keyspace, column_families] {

				            return ctx.db.invoke_on_all([=] (database& db) {

				                return do_for_each(column_families, [=, &db](sstring cfname) {

				                    auto& cm = db.get_compaction_manager();

				                    auto& cf = db.find_column_family(keyspace, cfname);

				                    return cm.perform_sstable_scrub(&cf);

				                });

				            });

				        }).then([]{

				            return make_ready_future<json::json_return_type>(0);

				        });

				    });

				    }));

				    ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_family = req->get_query_param("cf");

				        auto disable_snapshot = req->get_query_param("disable_snapshot");

				        auto skip_corrupted = req->get_query_param("skip_corrupted");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    ss::upgrade_sstables.set(r, wrap_ks_cf([&ctx](std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {

				        bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);

				    ss::upgrade_sstables.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_family = req->get_query_param("cf");

				        auto exclude_current_version = req->get_query_param("exclude_current_version");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				        return ctx.db.invoke_on_all([=] (database& db) {

				            return do_for_each(column_families, [=, &db](sstring cfname) {

				                auto& cm = db.get_compaction_manager();

				                auto& cf = db.find_column_family(keyspace, cfname);

				                return cm.perform_sstable_upgrade(&cf, exclude_current_version);

				            });

				        }).then([]{

				            return make_ready_future<json::json_return_type>(0);

				        });

				    }));

				    ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				@@ -456,7 +500,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				        return service::get_storage_service().map_reduce(adder<service::storage_service::drain_progress>(), [] (auto& ss) {

				            return ss.get_drain_progress();

				        }).then([] (auto&& progress) {

				            auto progress_str = sprint("Drained %s/%s ColumnFamilies", progress.remaining_cfs, progress.total_cfs);

				            auto progress_str = format("Drained {}/{} ColumnFamilies", progress.remaining_cfs, progress.total_cfs);

				            return make_ready_future<json::json_return_type>(std::move(progress_str));

				        });

				    });

				@@ -561,9 +605,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::join_ring.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().join_ring().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    ss::is_joined.set(r, [] (std::unique_ptr<request> req) {

				@@ -667,7 +709,11 @@ void set_storage_service(http_context& ctx, routes& r) {

				        auto coordinator = std::hash<sstring>()(cf) % smp::count;

				        return service::get_storage_service().invoke_on(coordinator, [ks = std::move(ks), cf = std::move(cf)] (service::storage_service& s) {

				            return s.load_new_sstables(ks, cf);

				        }).then([] {

				        }).then_wrapped([] (auto&& f) {

				            if (f.failed()) {

				                auto msg = fmt::format("Failed to load new sstables: {}", f.get_exception());

				                return make_exception_future<json::json_return_type>(httpd::server_error_exception(msg));

				            }

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				@@ -701,7 +747,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				            } catch (std::out_of_range& e) {

				                throw httpd::bad_param_exception(e.what());

				            } catch (std::invalid_argument&){

				                throw httpd::bad_param_exception(sprint("Bad format in a probability value: \"%s\"", probability.c_str()));

				                throw httpd::bad_param_exception(format("Bad format in a probability value: \"{}\"", probability.c_str()));

				            }

				        });

				    });

				@@ -737,7 +783,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				                return make_ready_future<json::json_return_type>(json_void());

				            });

				        } catch (...) {

				            throw httpd::bad_param_exception(sprint("Bad format value: "));

				            throw httpd::bad_param_exception(format("Bad format value: "));

				        }

				    });

				@@ -819,7 +865,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);

				        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);

				    });

				    ss::get_exceptions.set(r, [](const_req req) {

				@@ -861,6 +907,133 @@ void set_storage_service(http_context& ctx, routes& r) {

				            return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));

				        });

				    });

				    ss::sstable_info.set(r, [&ctx] (std::unique_ptr<request> req) {

				        auto ks = api::req_param<sstring>(*req, "keyspace", {}).value;

				        auto cf = api::req_param<sstring>(*req, "cf", {}).value;

				        // The size of this vector is bound by ks::cf. I.e. it is as most Nks + Ncf long

				        // which is not small, but not huge either. 

				        using table_sstables_list = std::vector<ss::table_sstables>;

				        return do_with(table_sstables_list{}, [ks, cf, &ctx](table_sstables_list& dst) {

				            return service::get_local_storage_service().db().map_reduce([&dst](table_sstables_list&& res) {

				                for (auto&& t : res) {

				                    auto i = std::find_if(dst.begin(), dst.end(), [&t](const ss::table_sstables& t2) {

				                        return t.keyspace() == t2.keyspace() && t.table() == t2.table();

				                    });

				                    if (i == dst.end()) {

				                        dst.emplace_back(std::move(t));

				                        continue;

				                    }

				                    auto& ssd = i->sstables; 

				                    for (auto&& sd : t.sstables._elements) {

				                        auto j = std::find_if(ssd._elements.begin(), ssd._elements.end(), [&sd](const ss::sstable& s) {

				                            return s.generation() == sd.generation();

				                        });

				                        if (j == ssd._elements.end()) {

				                            i->sstables.push(std::move(sd));

				                        }

				                    }

				                }

				            }, [ks, cf](const database& db) {

				                // see above

				                table_sstables_list res;

				                auto& ext = db.get_config().extensions();

				                for (auto& t : db.get_column_families() | boost::adaptors::map_values) {

				                    auto& schema = t->schema();

				                    if ((ks.empty() || ks == schema->ks_name()) && (cf.empty() || cf == schema->cf_name())) {

				                        // at most Nsstables long

				                        ss::table_sstables tst;

				                        tst.keyspace = schema->ks_name();

				                        tst.table = schema->cf_name();

				                        for (auto sstable : *t->get_sstables_including_compacted_undeleted()) {

				                            auto ts = db_clock::to_time_t(sstable->data_file_write_time());

				                            ::tm t;

				                            ::gmtime_r(&ts, &t);

				                            ss::sstable info;

				                            info.timestamp = t;

				                            info.generation = sstable->generation();

				                            info.level = sstable->get_sstable_level();

				                            info.size = sstable->bytes_on_disk();

				                            info.data_size = sstable->ondisk_data_size();

				                            info.index_size = sstable->index_size();

				                            info.filter_size = sstable->filter_size();

				                            info.version = sstable->get_version();

				                            if (sstable->has_component(sstables::component_type::CompressionInfo)) {

				                                auto& c = sstable->get_compression();

				                                auto cp = sstables::get_sstable_compressor(c);

				                                ss::named_maps nm;

				                                nm.group = "compression_parameters";

				                                for (auto& p : cp->options()) {

				                                    ss::mapper e;

				                                    e.key = p.first;

				                                    e.value = p.second;

				                                    nm.attributes.push(std::move(e));

				                                }

				                                if (!cp->options().count(compression_parameters::SSTABLE_COMPRESSION)) {

				                                    ss::mapper e;

				                                    e.key = compression_parameters::SSTABLE_COMPRESSION;

				                                    e.value = cp->name();

				                                    nm.attributes.push(std::move(e));

				                                }

				                                info.extended_properties.push(std::move(nm));

				                            }

				                            sstables::file_io_extension::attr_value_map map;

				                            for (auto* ep : ext.sstable_file_io_extensions()) {

				                                map.merge(ep->get_attributes(*sstable));

				                            }

				                            for (auto& p : map) {

				                                struct {

				                                    const sstring& key; 

				                                    ss::sstable& info;

				                                    void operator()(const std::map<sstring, sstring>& map) const {

				                                        ss::named_maps nm;

				                                        nm.group = key;

				                                        for (auto& p : map) {

				                                            ss::mapper e;

				                                            e.key = p.first;

				                                            e.value = p.second;

				                                            nm.attributes.push(std::move(e));

				                                        }

				                                        info.extended_properties.push(std::move(nm));

				                                    }

				                                    void operator()(const sstring& value) const {

				                                        ss::mapper e;

				                                        e.key = key;

				                                        e.value = value;

				                                        info.properties.push(std::move(e));                                        

				                                    }

				                                } v{p.first, info};

				                                std::visit(v, p.second);

				                            }

				                            tst.sstables.push(std::move(info));

				                        }

				                        res.emplace_back(std::move(tst));

				                    }

				                }

				                std::sort(res.begin(), res.end(), [](const ss::table_sstables& t1, const ss::table_sstables& t2) {

				                    return t1.keyspace() < t2.keyspace() || (t1.keyspace() == t2.keyspace() && t1.table() < t2.table());

				                });

				                return res;

				            }).then([&dst] {

				                return make_ready_future<json::json_return_type>(stream_object(dst));

				            });

				        });

				    });

				}

				}

									
										6

api/system.cc
									
												View File
												
				@@ -22,7 +22,7 @@

				#include "api/api-doc/system.json.hh"

				#include "api/api.hh"

				#include "http/exception.hh"

				#include <seastar/http/exception.hh>

				#include "log.hh"

				namespace api {

				@@ -30,6 +30,10 @@ namespace api {

				namespace hs = httpd::system_json;

				void set_system(http_context& ctx, routes& r) {

				    hs::get_system_uptime.set(r, [](const_req req) {

				        return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();

				    });

				    hs::get_all_logger_names.set(r, [](const_req req) {

				        return logging::logger_registry().get_all_logger_names();

				    });

									
										168

atomic_cell.cc
									
												View File
												
				@@ -21,6 +21,7 @@

				#include "atomic_cell.hh"

				#include "atomic_cell_or_collection.hh"

				#include "counters.hh"

				#include "types.hh"

				/// LSA mirator for cells with irrelevant type

				@@ -47,6 +48,23 @@ atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_typ

				    );

				}

				atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value, atomic_cell::collection_member cm) {

				    auto& imr_data = type.imr_state();

				    return atomic_cell(

				        imr_data.type_info(),

				        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())

				    );

				}

				atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value, collection_member cm)

				{

				    auto& imr_data = type.imr_state();

				    return atomic_cell(

				        imr_data.type_info(),

				        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())

				    );

				}

				atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,

				                             gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {

				    auto& imr_data = type.imr_state();

				@@ -56,6 +74,25 @@ atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_typ

				    );

				}

				atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,

				                             gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {

				    auto& imr_data = type.imr_state();

				    return atomic_cell(

				        imr_data.type_info(),

				        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())

				    );

				}

				atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,

				                                   gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm)

				{

				    auto& imr_data = type.imr_state();

				    return atomic_cell(

				        imr_data.type_info(),

				        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())

				    );

				}

				atomic_cell atomic_cell::make_live_counter_update(api::timestamp_type timestamp, int64_t value) {

				    auto& imr_data = no_type_imr_descriptor();

				    return atomic_cell(

				@@ -111,35 +148,6 @@ atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type,

				{

				}

				static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)

				{

				    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);

				    auto ti = data::type_info::make_collection();

				    data::cell::context ctx(f, ti);

				    auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);

				    auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());

				    return collection_mutation_view { dv };

				}

				collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {

				    return get_collection_mutation_view(_data.get());

				}

				collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)

				    : _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))

				{

				}

				collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)

				    : _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))

				{

				}

				collection_mutation::operator collection_mutation_view() const

				{

				    return get_collection_mutation_view(_data.get());

				}

				bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const

				{

				    auto ptr_a = _data.get();

				@@ -155,20 +163,20 @@ bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_c

				        if (a.timestamp() != b.timestamp()) {

				            return false;

				        }

				        if (a.is_live() != b.is_live()) {

				            return false;

				        }

				        if (a.is_live()) {

				            if (!b.is_live()) {

				            if (a.is_counter_update() != b.is_counter_update()) {

				                return false;

				            }

				            if (a.is_counter_update()) {

				                if (!b.is_counter_update()) {

				                    return false;

				                }

				                return a.counter_update_value() == b.counter_update_value();

				            }

				            if (a.is_live_and_has_ttl() != b.is_live_and_has_ttl()) {

				                return false;

				            }

				            if (a.is_live_and_has_ttl()) {

				                if (!b.is_live_and_has_ttl()) {

				                    return false;

				                }

				                if (a.ttl() != b.ttl() || a.expiry() != b.expiry()) {

				                    return false;

				                }

				@@ -187,19 +195,93 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)

				        return 0;

				    }

				    auto ctx = data::cell::context(_data.get(), t.imr_state().type_info());

				    return data::cell::structure::serialized_object_size(_data.get(), ctx);

				    auto view = data::cell::structure::make_view(_data.get(), ctx);

				    auto flags = view.get<data::cell::tags::flags>();

				    size_t external_value_size = 0;

				    if (flags.get<data::cell::tags::external_data>()) {

				        if (flags.get<data::cell::tags::collection>()) {

				            external_value_size = as_collection_mutation().data.size_bytes();

				        } else {

				            auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);

				            external_value_size = cell_view.value_size();

				        }

				        // Add overhead of chunk headers. The last one is a special case.

				        external_value_size += (external_value_size - 1) / data::cell::maximum_external_chunk_length * data::cell::external_chunk_overhead;

				        external_value_size += data::cell::external_last_chunk_overhead;

				    }

				    return data::cell::structure::serialized_object_size(_data.get(), ctx)

				        + imr_object_type::size_overhead + external_value_size;

				}

				std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection& c) {

				    if (!c._data.get()) {

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell_view& acv) {

				    if (acv.is_live()) {

				        return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",

				            acv.is_counter_update()

				                    ? "counter_update_value=" + to_sstring(acv.counter_update_value())

				                    : to_hex(acv.value().linearize()),

				            acv.timestamp(),

				            acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,

				            acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);

				    } else {

				        return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",

				            acv.timestamp(), acv.deletion_time().time_since_epoch().count());

				    }

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell& ac) {

				    return os << atomic_cell_view(ac);

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {

				    auto& type = acvp._type;

				    auto& acv = acvp._cell;

				    if (acv.is_live()) {

				        std::ostringstream cell_value_string_builder;

				        if (type.is_counter()) {

				            if (acv.is_counter_update()) {

				                cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();

				            } else {

				                cell_value_string_builder << "shards: ";

				                counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {

				                    cell_value_string_builder << ::join(", ", ccv.shards());

				                });

				            }

				        } else {

				            cell_value_string_builder << type.to_string(acv.value().linearize());

				        }

				        return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",

				            cell_value_string_builder.str(),

				            acv.timestamp(),

				            acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,

				            acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);

				    } else {

				        return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",

				            acv.timestamp(), acv.deletion_time().time_since_epoch().count());

				    }

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell::printer& acp) {

				    return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));

				}

				std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {

				    if (!p._cell._data.get()) {

				        return os << "{ null atomic_cell_or_collection }";

				    }

				    using dc = data::cell;

				    os << "{ ";

				    if (dc::structure::get_member<dc::tags::flags>(c._data.get()).get<dc::tags::collection>()) {

				        os << "collection";

				    if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {

				        os << "collection ";

				        auto cmv = p._cell.as_collection_mutation();

				        os << collection_mutation_view::printer(*p._cdef.type, cmv);

				    } else {

				        os << "atomic cell";

				        os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));

				    }

				    return os << " @" << static_cast<const void*>(c._data.get()) << " }";

				    return os << " }";

				}

									
										49

atomic_cell.hh
									
												View File
												
				@@ -26,13 +26,16 @@

				#include "tombstone.hh"

				#include "gc_clock.hh"

				#include "utils/managed_bytes.hh"

				#include "net/byteorder.hh"

				#include <seastar/net//byteorder.hh>

				#include <cstdint>

				#include <iosfwd>

				#include <seastar/util/gcc6-concepts.hh>

				#include "data/cell.hh"

				#include "data/schema_info.hh"

				#include "imr/utils.hh"

				#include "utils/fragmented_temporary_buffer.hh"

				#include "serializer.hh"

				class abstract_type;

				class collection_type_impl;

				@@ -150,6 +153,14 @@ public:

				    }

				    friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);

				    class printer {

				        const abstract_type& _type;

				        const atomic_cell_view& _cell;

				    public:

				        printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}

				        friend std::ostream& operator<<(std::ostream& os, const printer& acvp);

				    };

				};

				class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {

				@@ -186,6 +197,10 @@ public:

				    static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time);

				    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,

				                                 collection_member = collection_member::no);

				    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,

				                                 collection_member = collection_member::no);

				    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,

				                                 collection_member = collection_member::no);

				    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,

				                                 collection_member cm = collection_member::no) {

				        return make_live(type, timestamp, bytes_view(value), cm);

				@@ -193,6 +208,10 @@ public:

				    static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value);

				    static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, bytes_view value,

				        gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);

				    static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,

				        gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);

				    static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,

				        gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);

				    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,

				                                 gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm = collection_member::no)

				    {

				@@ -208,30 +227,12 @@ public:

				    static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);

				    friend class atomic_cell_or_collection;

				    friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);

				};

				class collection_mutation_view;

				// Represents a mutation of a collection.  Actual format is determined by collection type,

				// and is:

				//   set:  list of atomic_cell

				//   map:  list of pair<atomic_cell, bytes> (for key/value)

				//   list: tbd, probably ugly

				class collection_mutation {

				public:

				    using imr_object_type =  imr::utils::object<data::cell::structure>;

				    imr_object_type _data;

				    collection_mutation() {}

				    collection_mutation(const collection_type_impl&, collection_mutation_view v);

				    collection_mutation(const collection_type_impl&, bytes_view bv);

				    operator collection_mutation_view() const;

				};

				class collection_mutation_view {

				public:

				    atomic_cell_value_view data;

				    class printer : atomic_cell_view::printer {

				    public:

				        printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}

				        friend std::ostream& operator<<(std::ostream& os, const printer& acvp);

				    };

				};

				class column_definition;

									
										15

atomic_cell_hash.hh
									
												View File
												
				@@ -24,6 +24,7 @@

				// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh

				#include "types.hh"

				#include "types/collection.hh"

				#include "atomic_cell.hh"

				#include "atomic_cell_or_collection.hh"

				#include "hashing.hh"

				@@ -33,14 +34,12 @@ template<>

				struct appending_hash<collection_mutation_view> {

				    template<typename Hasher>

				    void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {

				      cell.data.with_linearized([&] (bytes_view cell_bv) {

				        auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);

				        auto m_view = ctype->deserialize_mutation_form(cell_bv);

				        ::feed_hash(h, m_view.tomb);

				        for (auto&& key_and_value : m_view.cells) {

				            ::feed_hash(h, key_and_value.first);

				            ::feed_hash(h, key_and_value.second, cdef);

				        }

				        cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {

				            ::feed_hash(h, m_view.tomb);

				            for (auto&& key_and_value : m_view.cells) {

				                ::feed_hash(h, key_and_value.first);

				                ::feed_hash(h, key_and_value.second, cdef);

				            }

				      });

				    }

				};

									
										15

atomic_cell_or_collection.hh
									
												View File
												
				@@ -22,6 +22,7 @@

				#pragma once

				#include "atomic_cell.hh"

				#include "collection_mutation.hh"

				#include "schema.hh"

				#include "hashing.hh"

				@@ -67,7 +68,19 @@ public:

				    bytes_view serialize() const;

				    bool equals(const abstract_type& type, const atomic_cell_or_collection& other) const;

				    size_t external_memory_usage(const abstract_type&) const;

				    friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);

				    class printer {

				        const column_definition& _cdef;

				        const atomic_cell_or_collection& _cell;

				    public:

				        printer(const column_definition& cdef, const atomic_cell_or_collection& cell)

				            : _cdef(cdef), _cell(cell) { }

				        printer(const printer&) = delete;

				        printer(printer&&) = delete;

				        friend std::ostream& operator<<(std::ostream&, const printer&);

				    };

				    friend std::ostream& operator<<(std::ostream&, const printer&);

				};

				namespace std {

									
										8

auth/allow_all_authenticator.hh
									
												View File
												
				@@ -72,19 +72,19 @@ public:

				        return make_ready_future<authenticated_user>(anonymous_user());

				    }

				    virtual future<> create(stdx::string_view, const authentication_options& options) const override {

				    virtual future<> create(std::string_view, const authentication_options& options) const override {

				        return make_ready_future();

				    }

				    virtual future<> alter(stdx::string_view, const authentication_options& options) const override {

				    virtual future<> alter(std::string_view, const authentication_options& options) const override {

				        return make_ready_future();

				    }

				    virtual future<> drop(stdx::string_view) const override {

				    virtual future<> drop(std::string_view) const override {

				        return make_ready_future();

				    }

				    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const override {

				        return make_ready_future<custom_options>();

				    }

									
										7

auth/allow_all_authorizer.hh
									
												View File
												
				@@ -23,7 +23,6 @@

				#include "auth/authorizer.hh"

				#include "exceptions/exceptions.hh"

				#include "stdx.hh"

				namespace cql3 {

				class query_processor;

				@@ -58,12 +57,12 @@ public:

				        return make_ready_future<permission_set>(permissions::ALL);

				    }

				    virtual future<> grant(stdx::string_view, permission_set, const resource&) const override {

				    virtual future<> grant(std::string_view, permission_set, const resource&) const override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke(stdx::string_view, permission_set, const resource&) const override {

				    virtual future<> revoke(std::string_view, permission_set, const resource&) const override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

				@@ -74,7 +73,7 @@ public:

				                        "LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke_all(stdx::string_view) const override {

				    virtual future<> revoke_all(std::string_view) const override {

				        return make_exception_future(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

									
										2

auth/authenticated_user.cc
									
												View File
												
				@@ -45,7 +45,7 @@

				namespace auth {

				authenticated_user::authenticated_user(stdx::string_view name)

				authenticated_user::authenticated_user(std::string_view name)

				        : name(sstring(name)) {

				}

									
										5

auth/authenticated_user.hh
									
												View File
												
				@@ -41,7 +41,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <functional>

				#include <iosfwd>

				#include <optional>

				@@ -49,7 +49,6 @@

				#include <seastar/core/sstring.hh>

				#include "seastarx.hh"

				#include "stdx.hh"

				namespace auth {

				@@ -67,7 +66,7 @@ public:

				    /// An anonymous user.

				    ///

				    authenticated_user() = default;

				    explicit authenticated_user(stdx::string_view name);

				    explicit authenticated_user(std::string_view name);

				};

				///

									
										2

auth/authentication_options.hh
									
												View File
												
				@@ -57,7 +57,7 @@ inline bool any_authentication_options(const authentication_options& aos) noexce

				class unsupported_authentication_option : public std::invalid_argument {

				public:

				    explicit unsupported_authentication_option(authentication_option k)

				            : std::invalid_argument(sprint("The %s option is not supported.", k)) {

				            : std::invalid_argument(format("The {} option is not supported.", k)) {

				    }

				};

									
										1

auth/authenticator.cc
									
												View File
												
				@@ -45,7 +45,6 @@

				#include "auth/common.hh"

				#include "auth/password_authenticator.hh"

				#include "cql3/query_processor.hh"

				#include "db/config.hh"

				#include "utils/class_registrator.hh"

				const sstring auth::authenticator::USERNAME_KEY("username");

									
										26

auth/authenticator.hh
									
												View File
												
				@@ -41,7 +41,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <memory>

				#include <set>

				#include <stdexcept>

				@@ -55,10 +55,10 @@

				#include "auth/authentication_options.hh"

				#include "auth/resource.hh"

				#include "auth/sasl_challenge.hh"

				#include "bytes.hh"

				#include "enum_set.hh"

				#include "exceptions/exceptions.hh"

				#include "stdx.hh"

				namespace db {

				    class config;

				@@ -122,7 +122,7 @@ public:

				    ///

				    /// The options provided must be a subset of `supported_options()`.

				    ///

				    virtual future<> create(stdx::string_view role_name, const authentication_options& options) const = 0;

				    virtual future<> create(std::string_view role_name, const authentication_options& options) const = 0;

				    ///

				    /// Alter the authentication record of an existing user.

				@@ -131,39 +131,25 @@ public:

				    ///

				    /// Callers must ensure that the specification of `alterable_options()` is adhered to.

				    ///

				    virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const = 0;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) const = 0;

				    ///

				    /// Delete the authentication record for a user. This will disallow the user from logging in.

				    ///

				    virtual future<> drop(stdx::string_view role_name) const = 0;

				    virtual future<> drop(std::string_view role_name) const = 0;

				    ///

				    /// Query for custom options (those corresponding to \ref authentication_options::options).

				    ///

				    /// If no options are set the result is an empty container.

				    ///

				    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const = 0;

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const = 0;

				    ///

				    /// System resources used internally as part of the implementation. These are made inaccessible to users.

				    ///

				    virtual const resource_set& protected_resources() const = 0;

				    ///

				    /// A stateful SASL challenge which supports many authentication schemes (depending on the implementation).

				    ///

				    class sasl_challenge {

				    public:

				        virtual ~sasl_challenge() = default;

				        virtual bytes evaluate_response(bytes_view client_response) = 0;

				        virtual bool is_complete() const = 0;

				        virtual future<authenticated_user> get_authenticated_user() const = 0;

				    };

				    virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;

				};

									
										9

auth/authorizer.hh
									
												View File
												
				@@ -41,7 +41,7 @@

				#pragma once

				#include <experimental/string_view>

				#include <string_view>

				#include <functional>

				#include <optional>

				#include <stdexcept>

				@@ -54,7 +54,6 @@

				#include "auth/permission.hh"

				#include "auth/resource.hh"

				#include "seastarx.hh"

				#include "stdx.hh"

				namespace auth {

				@@ -117,14 +116,14 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if granting permissions is not supported.

				    ///

				    virtual future<> grant(stdx::string_view role_name, permission_set, const resource&) const = 0;

				    virtual future<> grant(std::string_view role_name, permission_set, const resource&) const = 0;

				    ///

				    /// Revoke a set of permissions from a role for a particular \ref resource.

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke(stdx::string_view role_name, permission_set, const resource&) const = 0;

				    virtual future<> revoke(std::string_view role_name, permission_set, const resource&) const = 0;

				    ///

				    /// Query for all directly granted permissions.

				@@ -138,7 +137,7 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke_all(stdx::string_view role_name) const = 0;

				    virtual future<> revoke_all(std::string_view role_name) const = 0;

				    ///

				    /// Revoke all permissions granted to any role for a particular resource.

									
										44

auth/common.cc
									
												View File
												
				@@ -28,6 +28,7 @@

				#include "database.hh"

				#include "schema_builder.hh"

				#include "service/migration_manager.hh"

				#include "timeout_config.hh"

				namespace auth {

				@@ -47,9 +48,9 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f

				    struct empty_state { };

				    return delay_until_system_ready(as).then([&as, func = std::move(func)] () mutable {

				        return exponential_backoff_retry::do_until_value(1s, 1min, as, [func = std::move(func)] {

				            return func().then_wrapped([] (auto&& f) -> stdx::optional<empty_state> {

				            return func().then_wrapped([] (auto&& f) -> std::optional<empty_state> {

				                if (f.failed()) {

				                    auth_log.info("Auth task failed with error, rescheduling: {}", f.get_exception());

				                    auth_log.debug("Auth task failed with error, rescheduling: {}", f.get_exception());

				                    return { };

				                }

				                return { empty_state() };

				@@ -59,16 +60,14 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f

				}

				future<> create_metadata_table_if_missing(

				        stdx::string_view table_name,

				        std::string_view table_name,

				        cql3::query_processor& qp,

				        stdx::string_view cql,

				        std::string_view cql,

				        ::service::migration_manager& mm) {

				    auto& db = qp.db().local();

				    if (db.has_schema(meta::AUTH_KS, sstring(table_name))) {

				        return make_ready_future<>();

				    }

				    static auto ignore_existing = [] (seastar::noncopyable_function<future<>()> func) {

				        return futurize_apply(std::move(func)).handle_exception_type([] (exceptions::already_exists_exception& ignored) { });

				    };

				    auto& db = qp.db();

				    auto parsed_statement = static_pointer_cast<cql3::statements::raw::cf_statement>(

				            cql3::query_processor::parse_statement(cql));

				@@ -77,21 +76,36 @@ future<> create_metadata_table_if_missing(

				    auto statement = static_pointer_cast<cql3::statements::create_table_statement>(

				            parsed_statement->prepare(db, qp.get_cql_stats())->statement);

				    const auto schema = statement->get_cf_meta_data(qp.db().local());

				    const auto schema = statement->get_cf_meta_data(qp.db());

				    const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());

				    schema_builder b(schema);

				    b.set_uuid(uuid);

				    schema_ptr table = b.build();

				    return ignore_existing([&mm, table = std::move(table)] () {

				        return mm.announce_new_column_family(table, false);

				    });

				    return mm.announce_new_column_family(b.build(), false);

				}

				future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db) {

				future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db, seastar::abort_source& as) {

				    static const auto pause = [] { return sleep(std::chrono::milliseconds(500)); };

				    return do_until([&db] { return db.get_version() != database::empty_version; }, pause).then([&mm] {

				        return do_until([&mm] { return mm.have_schema_agreement(); }, pause);

				    return do_until([&db, &as] {

				        as.check();

				        return db.get_version() != database::empty_version;

				    }, pause).then([&mm, &as] {

				        return do_until([&mm, &as] {

				            as.check();

				            return mm.have_schema_agreement();

				        }, pause);

				    });

				}

				const timeout_config& internal_distributed_timeout_config() noexcept {

				    static const auto t = 5s;

				    static const timeout_config tc{t, t, t, t, t, t, t};

				    return tc;

				}

				}

									
										14

auth/common.hh
									
												View File
												
				@@ -22,7 +22,7 @@

				#pragma once

				#include <chrono>

				#include <experimental/string_view>

				#include <string_view>

				#include <seastar/core/future.hh>

				#include <seastar/core/abort_source.hh>

				@@ -38,6 +38,7 @@

				using namespace std::chrono_literals;

				class database;

				class timeout_config;

				namespace service {

				class migration_manager;

				@@ -75,11 +76,16 @@ inline future<> delay_until_system_ready(seastar::abort_source& as) {

				future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);

				future<> create_metadata_table_if_missing(

				        stdx::string_view table_name,

				        std::string_view table_name,

				        cql3::query_processor&,

				        stdx::string_view cql,

				        std::string_view cql,

				        ::service::migration_manager&);

				future<> wait_for_schema_agreement(::service::migration_manager&, const database&);

				future<> wait_for_schema_agreement(::service::migration_manager&, const database&, seastar::abort_source&);

				///

				/// Time-outs for internal, non-local CQL queries.

				///

				const timeout_config& internal_distributed_timeout_config() noexcept;

				}

									
										45

auth/default_authorizer.cc
									
												View File
												
				@@ -61,6 +61,7 @@ extern "C" {

				#include "cql3/untyped_result_set.hh"

				#include "exceptions/exceptions.hh"

				#include "log.hh"

				#include "database.hh"

				namespace auth {

				@@ -94,11 +95,11 @@ default_authorizer::~default_authorizer() {

				static const sstring legacy_table_name{"permissions"};

				bool default_authorizer::legacy_metadata_exists() const {

				    return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);

				    return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);

				}

				future<bool> default_authorizer::any_granted() const {

				    static const sstring query = sprint("SELECT * FROM %s.%s LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);

				    static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);

				    return _qp.process(

				            query,

				@@ -112,7 +113,7 @@ future<bool> default_authorizer::any_granted() const {

				future<> default_authorizer::migrate_legacy_metadata() const {

				    alogger.info("Starting migration of legacy permissions metadata.");

				    static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    return _qp.process(

				            query,

				@@ -160,7 +161,7 @@ future<> default_authorizer::start() {

				                _migration_manager).then([this] {

				            _finished = do_after_system_ready(_as, [this] {

				                return async([this] {

				                    wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();

				                    wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();

				                    if (legacy_metadata_exists()) {

				                        if (!any_granted().get0()) {

				@@ -178,7 +179,7 @@ future<> default_authorizer::start() {

				future<> default_authorizer::stop() {

				    _as.request_abort();

				    return _finished.handle_exception_type([](const sleep_aborted&) {});

				    return _finished.handle_exception_type([](const sleep_aborted&) {}).handle_exception_type([](const abort_requested_exception&) {});

				}

				future<permission_set>

				@@ -187,8 +188,7 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc

				        return make_ready_future<permission_set>(permissions::NONE);

				    }

				    static const sstring query = sprint(

				            "SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?",

				    static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",

				            PERMISSIONS_NAME,

				            meta::AUTH_KS,

				            PERMISSIONS_CF,

				@@ -210,13 +210,12 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc

				future<>

				default_authorizer::modify(

				        stdx::string_view role_name,

				        std::string_view role_name,

				        permission_set set,

				        const resource& resource,

				        stdx::string_view op) const {

				        std::string_view op) const {

				    return do_with(

				            sprint(

				                    "UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",

				            format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",

				                    meta::AUTH_KS,

				                    PERMISSIONS_CF,

				                    PERMISSIONS_NAME,

				@@ -228,23 +227,22 @@ default_authorizer::modify(

				        return _qp.process(

				                query,

				                db::consistency_level::ONE,

				                infinite_timeout_config,

				                internal_distributed_timeout_config(),

				                {permissions::to_strings(set), sstring(role_name), resource.name()}).discard_result();

				    });

				}

				future<> default_authorizer::grant(stdx::string_view role_name, permission_set set, const resource& resource) const {

				future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) const {

				    return modify(role_name, std::move(set), resource, "+");

				}

				future<> default_authorizer::revoke(stdx::string_view role_name, permission_set set, const resource& resource) const {

				future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) const {

				    return modify(role_name, std::move(set), resource, "-");

				}

				future<std::vector<permission_details>> default_authorizer::list_all() const {

				    static const sstring query = sprint(

				            "SELECT %s, %s, %s FROM %s.%s",

				    static const sstring query = format("SELECT {}, {}, {} FROM {}.{}",

				            ROLE_NAME,

				            RESOURCE_NAME,

				            PERMISSIONS_NAME,

				@@ -254,7 +252,7 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {

				    return _qp.process(

				            query,

				            db::consistency_level::ONE,

				            infinite_timeout_config,

				            internal_distributed_timeout_config(),

				            {},

				            true).then([](::shared_ptr<cql3::untyped_result_set> results) {

				        std::vector<permission_details> all_details;

				@@ -272,9 +270,8 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {

				    });

				}

				future<> default_authorizer::revoke_all(stdx::string_view role_name) const {

				    static const sstring query = sprint(

				            "DELETE FROM %s.%s WHERE %s = ?",

				future<> default_authorizer::revoke_all(std::string_view role_name) const {

				    static const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",

				            meta::AUTH_KS,

				            PERMISSIONS_CF,

				            ROLE_NAME);

				@@ -282,7 +279,7 @@ future<> default_authorizer::revoke_all(stdx::string_view role_name) const {

				    return _qp.process(

				            query,

				            db::consistency_level::ONE,

				            infinite_timeout_config,

				            internal_distributed_timeout_config(),

				            {sstring(role_name)}).discard_result().handle_exception([role_name](auto ep) {

				        try {

				            std::rethrow_exception(ep);

				@@ -293,8 +290,7 @@ future<> default_authorizer::revoke_all(stdx::string_view role_name) const {

				}

				future<> default_authorizer::revoke_all(const resource& resource) const {

				    static const sstring query = sprint(

				            "SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",

				    static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",

				            ROLE_NAME,

				            meta::AUTH_KS,

				            PERMISSIONS_CF,

				@@ -311,8 +307,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {

				                    res->begin(),

				                    res->end(),

				                    [this, res, resource](const cql3::untyped_result_set::row& r) {

				                static const sstring query = sprint(

				                        "DELETE FROM %s.%s WHERE %s = ? AND %s = ?",

				                static const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",

				                        meta::AUTH_KS,

				                        PERMISSIONS_CF,

				                        ROLE_NAME,

									
										8

auth/default_authorizer.hh
									
												View File
												
				@@ -77,13 +77,13 @@ public:

				    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;

				    virtual future<> grant(stdx::string_view, permission_set, const resource&) const override;

				    virtual future<> grant(std::string_view, permission_set, const resource&) const override;

				    virtual future<> revoke( stdx::string_view, permission_set, const resource&) const override;

				    virtual future<> revoke( std::string_view, permission_set, const resource&) const override;

				    virtual future<std::vector<permission_details>> list_all() const override;

				    virtual future<> revoke_all(stdx::string_view) const override;

				    virtual future<> revoke_all(std::string_view) const override;

				    virtual future<> revoke_all(const resource&) const override;

				@@ -96,7 +96,7 @@ private:

				    future<> migrate_legacy_metadata() const;

				    future<> modify(stdx::string_view, permission_set, const resource&, stdx::string_view) const;

				    future<> modify(std::string_view, permission_set, const resource&, std::string_view) const;

				};

				} /* namespace auth */

									
										224

auth/password_authenticator.cc
									
												View File
												
				@@ -41,25 +41,24 @@

				#include "auth/password_authenticator.hh"

				extern "C" {

				#include <crypt.h>

				#include <unistd.h>

				}

				#include <algorithm>

				#include <chrono>

				#include <random>

				#include <string_view>

				#include <optional>

				#include <boost/algorithm/cxx11/all_of.hpp>

				#include <seastar/core/reactor.hh>

				#include "auth/authenticated_user.hh"

				#include "auth/common.hh"

				#include "auth/passwords.hh"

				#include "auth/roles-metadata.hh"

				#include "cql3/untyped_result_set.hh"

				#include "log.hh"

				#include "service/migration_manager.hh"

				#include "utils/class_registrator.hh"

				#include "database.hh"

				namespace auth {

				@@ -82,6 +81,8 @@ static const class_registrator<

				        cql3::query_processor&,

				        ::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");

				static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}());

				password_authenticator::~password_authenticator() {

				}

				@@ -91,82 +92,11 @@ password_authenticator::password_authenticator(cql3::query_processor& qp, ::serv

				    , _stopped(make_ready_future<>()) {

				}

				// TODO: blowfish

				// Origin uses Java bcrypt library, i.e. blowfish salt

				// generation and hashing, which is arguably a "better"

				// password hash than sha/md5 versions usually available in

				// crypt_r. Otoh, glibc 2.7+ uses a modified sha512 algo

				// which should be the same order of safe, so the only

				// real issue should be salted hash compatibility with

				// origin if importing system tables from there.

				//

				// Since bcrypt/blowfish is _not_ (afaict) not available

				// as a dev package/lib on most linux distros, we'd have to

				// copy and compile for example OWL  crypto

				// (http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/)

				// to be fully bit-compatible.

				//

				// Until we decide this is needed, let's just use crypt_r,

				// and some old-fashioned random salt generation.

				static constexpr size_t rand_bytes = 16;

				static thread_local crypt_data tlcrypt = { 0, };

				static sstring hashpw(const sstring& pass, const sstring& salt) {

				    auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);

				    if (res == nullptr) {

				        throw std::system_error(errno, std::system_category());

				    }

				    return res;

				}

				static bool checkpw(const sstring& pass, const sstring& salted_hash) {

				    auto tmp = hashpw(pass, salted_hash);

				    return tmp == salted_hash;

				}

				static sstring gensalt() {

				    static sstring prefix;

				    std::random_device rd;

				    std::default_random_engine e1(rd());

				    std::uniform_int_distribution<char> dist;

				    sstring valid_salt = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";

				    sstring input(rand_bytes, 0);

				    for (char&c : input) {

				        c = valid_salt[dist(e1) % valid_salt.size()];

				    }

				    sstring salt;

				    if (!prefix.empty()) {

				        return prefix + input;

				    }

				    // Try in order:

				    // blowfish 2011 fix, blowfish, sha512, sha256, md5

				    for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {

				        salt = pfx + input;

				        if (crypt_r("fisk", salt.c_str(), &tlcrypt)) {

				            prefix = pfx;

				            return salt;

				        }

				    }

				    throw std::runtime_error("Could not initialize hashing algorithm");

				}

				static sstring hashpw(const sstring& pass) {

				    return hashpw(pass, gensalt());

				}

				static bool has_salted_hash(const cql3::untyped_result_set_row& row) {

				    return !row.get_or<sstring>(SALTED_HASH, "").empty();

				}

				static const sstring update_row_query = sprint(

				        "UPDATE %s SET %s = ? WHERE %s = ?",

				static const sstring update_row_query = format("UPDATE {} SET {} = ? WHERE {} = ?",

				        meta::roles_table::qualified_name(),

				        SALTED_HASH,

				        meta::roles_table::role_col_name);

				@@ -174,17 +104,17 @@ static const sstring update_row_query = sprint(

				static const sstring legacy_table_name{"credentials"};

				bool password_authenticator::legacy_metadata_exists() const {

				    return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);

				    return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);

				}

				future<> password_authenticator::migrate_legacy_metadata() const {

				    plogger.info("Starting migration of legacy authentication metadata.");

				    static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    return _qp.process(

				            query,

				            db::consistency_level::QUORUM,

				            infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {

				            auto username = row.get_as<sstring>("username");

				            auto salted_hash = row.get_as<sstring>(SALTED_HASH);

				@@ -192,7 +122,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {

				            return _qp.process(

				                    update_row_query,

				                    consistency_for_user(username),

				                    infinite_timeout_config,

				                    internal_distributed_timeout_config(),

				                    {std::move(salted_hash), username}).discard_result();

				        }).finally([results] {});

				    }).then([] {

				@@ -209,8 +139,8 @@ future<> password_authenticator::create_default_if_missing() const {

				            return _qp.process(

				                    update_row_query,

				                    db::consistency_level::QUORUM,

				                    infinite_timeout_config,

				                    {hashpw(DEFAULT_USER_PASSWORD), DEFAULT_USER_NAME}).then([](auto&&) {

				                    internal_distributed_timeout_config(),

				                    {passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt), DEFAULT_USER_NAME}).then([](auto&&) {

				                plogger.info("Created default superuser authentication record.");

				            });

				        }

				@@ -221,8 +151,6 @@ future<> password_authenticator::create_default_if_missing() const {

				future<> password_authenticator::start() {

				     return once_among_shards([this] {

				         gensalt(); // do this once to determine usable hashing

				         auto f = create_metadata_table_if_missing(

				                 meta::roles_table::name,

				                 _qp,

				@@ -231,7 +159,7 @@ future<> password_authenticator::start() {

				         _stopped = do_after_system_ready(_as, [this] {

				             return async([this] {

				                 wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();

				                 wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();

				                 if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {

				                     if (legacy_metadata_exists()) {

				@@ -256,10 +184,10 @@ future<> password_authenticator::start() {

				future<> password_authenticator::stop() {

				    _as.request_abort();

				    return _stopped.handle_exception_type([] (const sleep_aborted&) { });

				    return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});

				}

				db::consistency_level password_authenticator::consistency_for_user(stdx::string_view role_name) {

				db::consistency_level password_authenticator::consistency_for_user(std::string_view role_name) {

				    if (role_name == DEFAULT_USER_NAME) {

				        return db::consistency_level::QUORUM;

				    }

				@@ -285,10 +213,10 @@ authentication_option_set password_authenticator::alterable_options() const {

				future<authenticated_user> password_authenticator::authenticate(

				                const credentials_map& credentials) const {

				    if (!credentials.count(USERNAME_KEY)) {

				        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));

				        throw exceptions::authentication_exception(format("Required key '{}' is missing", USERNAME_KEY));

				    }

				    if (!credentials.count(PASSWORD_KEY)) {

				        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));

				        throw exceptions::authentication_exception(format("Required key '{}' is missing", PASSWORD_KEY));

				    }

				    auto& username = credentials.at(USERNAME_KEY);

				@@ -300,8 +228,7 @@ future<authenticated_user> password_authenticator::authenticate(

				    // Rely on query processing caching statements instead, and lets assume

				    // that a map lookup string->statement is not gonna kill us much.

				    return futurize_apply([this, username, password] {

				        static const sstring query = sprint(

				                "SELECT %s FROM %s WHERE %s = ?",

				        static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",

				                SALTED_HASH,

				                meta::roles_table::qualified_name(),

				                meta::roles_table::role_col_name);

				@@ -309,13 +236,17 @@ future<authenticated_user> password_authenticator::authenticate(

				        return _qp.process(

				                query,

				                consistency_for_user(username),

				                infinite_timeout_config,

				                internal_distributed_timeout_config(),

				                {username},

				                true);

				    }).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {

				        try {

				            auto res = f.get0();

				            if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {

				            auto salted_hash = std::optional<sstring>();

				            if (!res->empty()) {

				                salted_hash = res->one().get_opt<sstring>(SALTED_HASH);

				            }

				            if (!salted_hash || !passwords::check(password, *salted_hash)) {

				                throw exceptions::authentication_exception("Username and/or password are incorrect");

				            }

				            return make_ready_future<authenticated_user>(username);

				@@ -323,13 +254,15 @@ future<authenticated_user> password_authenticator::authenticate(

				            std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));

				        } catch (exceptions::request_execution_exception& e) {

				            std::throw_with_nested(exceptions::authentication_exception(e.what()));

				        } catch (exceptions::authentication_exception& e) {

				            std::throw_with_nested(e);

				        } catch (...) {

				            std::throw_with_nested(exceptions::authentication_exception("authentication failed"));

				        }

				    });

				}

				future<> password_authenticator::create(stdx::string_view role_name, const authentication_options& options) const {

				future<> password_authenticator::create(std::string_view role_name, const authentication_options& options) const {

				    if (!options.password) {

				        return make_ready_future<>();

				    }

				@@ -337,17 +270,16 @@ future<> password_authenticator::create(stdx::string_view role_name, const authe

				    return _qp.process(

				            update_row_query,

				            consistency_for_user(role_name),

				            infinite_timeout_config,

				            {hashpw(*options.password), sstring(role_name)}).discard_result();

				            internal_distributed_timeout_config(),

				            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();

				}

				future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) const {

				future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options) const {

				    if (!options.password) {

				        return make_ready_future<>();

				    }

				    static const sstring query = sprint(

				            "UPDATE %s SET %s = ? WHERE %s = ?",

				    static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",

				            meta::roles_table::qualified_name(),

				            SALTED_HASH,

				            meta::roles_table::role_col_name);

				@@ -355,21 +287,23 @@ future<> password_authenticator::alter(stdx::string_view role_name, const authen

				    return _qp.process(

				            query,

				            consistency_for_user(role_name),

				            infinite_timeout_config,

				            {hashpw(*options.password), sstring(role_name)}).discard_result();

				            internal_distributed_timeout_config(),

				            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();

				}

				future<> password_authenticator::drop(stdx::string_view name) const {

				    static const sstring query = sprint(

				            "DELETE %s FROM %s WHERE %s = ?",

				future<> password_authenticator::drop(std::string_view name) const {

				    static const sstring query = format("DELETE {} FROM {} WHERE {} = ?",

				            SALTED_HASH,

				            meta::roles_table::qualified_name(),

				            meta::roles_table::role_col_name);

				    return _qp.process(query, consistency_for_user(name), infinite_timeout_config, {sstring(name)}).discard_result();

				    return _qp.process(

				            query, consistency_for_user(name),

				            internal_distributed_timeout_config(),

				            {sstring(name)}).discard_result();

				}

				future<custom_options> password_authenticator::query_custom_options(stdx::string_view role_name) const {

				future<custom_options> password_authenticator::query_custom_options(std::string_view role_name) const {

				    return make_ready_future<custom_options>();

				}

				@@ -378,75 +312,13 @@ const resource_set& password_authenticator::protected_resources() const {

				    return resources;

				}

				::shared_ptr<authenticator::sasl_challenge> password_authenticator::new_sasl_challenge() const {

				    class plain_text_password_challenge : public sasl_challenge {

				        const password_authenticator& _self;

				    public:

				        plain_text_password_challenge(const password_authenticator& self) : _self(self) {

				        }

				        /**

				         * SASL PLAIN mechanism specifies that credentials are encoded in a

				         * sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).

				         * The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}

				         * authzId is optional, and in fact we don't care about it here as we'll

				         * set the authzId to match the authnId (that is, there is no concept of

				         * a user being authorized to act on behalf of another).

				         *

				         * @param bytes encoded credentials string sent by the client

				         * @return map containing the username/password pairs in the form an IAuthenticator

				         * would expect

				         * @throws javax.security.sasl.SaslException

				         */

				        bytes evaluate_response(bytes_view client_response) override {

				            plogger.debug("Decoding credentials from client token");

				            sstring username, password;

				            auto b = client_response.crbegin();

				            auto e = client_response.crend();

				            auto i = b;

				            while (i != e) {

				                if (*i == 0) {

				                    sstring tmp(i.base(), b.base());

				                    if (password.empty()) {

				                        password = std::move(tmp);

				                    } else if (username.empty()) {

				                        username = std::move(tmp);

				                    }

				                    b = ++i;

				                    continue;

				                }

				                ++i;

				            }

				            if (username.empty()) {

				                throw exceptions::authentication_exception("Authentication ID must not be null");

				            }

				            if (password.empty()) {

				                throw exceptions::authentication_exception("Password must not be null");

				            }

				            _credentials[USERNAME_KEY] = std::move(username);

				            _credentials[PASSWORD_KEY] = std::move(password);

				            _complete = true;

				            return {};

				        }

				        bool is_complete() const override {

				            return _complete;

				        }

				        future<authenticated_user> get_authenticated_user() const override {

				            return _self.authenticate(_credentials);

				        }

				    private:

				        credentials_map _credentials;

				        bool _complete = false;

				    };

				    return ::make_shared<plain_text_password_challenge>(*this);

				::shared_ptr<sasl_challenge> password_authenticator::new_sasl_challenge() const {

				    return ::make_shared<plain_sasl_challenge>([this](std::string_view username, std::string_view password) {

				        credentials_map credentials{};

				        credentials[USERNAME_KEY] = sstring(username);

				        credentials[PASSWORD_KEY] = sstring(password);

				        return this->authenticate(credentials);

				    });

				}

				}

									
										10

auth/password_authenticator.hh
									
												View File
												
				@@ -61,7 +61,7 @@ class password_authenticator : public authenticator {

				    seastar::abort_source _as;

				public:

				    static db::consistency_level consistency_for_user(stdx::string_view role_name);

				    static db::consistency_level consistency_for_user(std::string_view role_name);

				    password_authenticator(cql3::query_processor&, ::service::migration_manager&);

				@@ -81,13 +81,13 @@ public:

				    virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;

				    virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override;

				    virtual future<> create(std::string_view role_name, const authentication_options& options) const override;

				    virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) const override;

				    virtual future<> drop(stdx::string_view role_name) const override;

				    virtual future<> drop(std::string_view role_name) const override;

				    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override;

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const override;

				    virtual const resource_set& protected_resources() const override;

									
										84

auth/passwords.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,84 @@

				/*

				 * Copyright (C) 2018 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "auth/passwords.hh"

				#include <cerrno>

				#include <optional>

				extern "C" {

				#include <crypt.h>

				#include <unistd.h>

				}

				namespace auth::passwords {

				static thread_local crypt_data tlcrypt = { 0, };

				namespace detail {

				scheme identify_best_supported_scheme() {

				    const auto all_schemes = { scheme::bcrypt_y, scheme::bcrypt_a, scheme::sha_512, scheme::sha_256, scheme::md5 };

				    // "Random", for testing schemes.

				    const sstring random_part_of_salt = "aaaabbbbccccdddd";

				    for (scheme c : all_schemes) {

				        const sstring salt = sstring(prefix_for_scheme(c)) + random_part_of_salt;

				        const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);

				        if (e && (e[0] != '*')) {

				            return c;

				        }

				    }

				    throw no_supported_schemes();

				}

				sstring hash_with_salt(const sstring& pass, const sstring& salt) {

				    auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);

				    if (!res || (res[0] == '*')) {

				        throw std::system_error(errno, std::system_category());

				    }

				    return res;

				}

				const char* prefix_for_scheme(scheme c) noexcept {

				    switch (c) {

				    case scheme::bcrypt_y: return "$2y$";

				    case scheme::bcrypt_a: return "$2a$";

				    case scheme::sha_512: return "$6$";

				    case scheme::sha_256: return "$5$";

				    case scheme::md5: return "$1$";

				    default: return nullptr;

				    }

				}

				} // namespace detail

				no_supported_schemes::no_supported_schemes()

				        : std::runtime_error("No allowed hashing schemes are supported on this system") {

				}

				bool check(const sstring& pass, const sstring& salted_hash) {

				    return detail::hash_with_salt(pass, salted_hash) == salted_hash;

				}

				} // namespace auth::paswords

									
										125

auth/passwords.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,125 @@

				/*

				 * Copyright (C) 2018 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <random>

				#include <stdexcept>

				#include <seastar/core/sstring.hh>

				#include "seastarx.hh"

				namespace auth::passwords {

				class no_supported_schemes : public std::runtime_error {

				public:

				    no_supported_schemes();

				};

				///

				/// Apache Cassandra uses a library to provide the bcrypt scheme. Many Linux implementations do not support bcrypt, so

				/// we support alternatives. The cost is loss of direct compatibility with Apache Cassandra system tables.

				///

				enum class scheme {

				    bcrypt_y,

				    bcrypt_a,

				    sha_512,

				    sha_256,

				    md5

				};

				namespace detail {

				template <typename RandomNumberEngine>

				sstring generate_random_salt_bytes(RandomNumberEngine& g) {

				    static const sstring valid_bytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";

				    static constexpr std::size_t num_bytes = 16;

				    std::uniform_int_distribution<std::size_t> dist(0, valid_bytes.size() - 1);

				    sstring result(num_bytes, 0);

				    for (char& c : result) {

				        c = valid_bytes[dist(g)];

				    }

				    return result;

				}

				///

				/// Test each allowed hashing scheme and report the best supported one on the current system.

				///

				/// \throws \ref no_supported_schemes when none of the known schemes is supported.

				///

				scheme identify_best_supported_scheme();

				const char* prefix_for_scheme(scheme) noexcept;

				///

				/// Generate a implementation-specific salt string for hashing passwords.

				///

				/// The `RandomNumberEngine` is used to generate the string, which is an implementation-specific length.

				///

				/// \throws \ref no_supported_schemes when no known hashing schemes are supported on the system.

				///

				template <typename RandomNumberEngine>

				sstring generate_salt(RandomNumberEngine& g) {

				    static const scheme scheme = identify_best_supported_scheme();

				    static const sstring prefix = sstring(prefix_for_scheme(scheme));

				    return prefix + generate_random_salt_bytes(g);

				}

				///

				/// Hash a password combined with an implementation-specific salt string.

				///

				/// \throws \ref std::system_error when an unexpected implementation-specific error occurs.

				///

				sstring hash_with_salt(const sstring& pass, const sstring& salt);

				} // namespace detail

				///

				/// Run a one-way hashing function on cleartext to produce encrypted text.

				///

				/// Prior to applying the hashing function, random salt is amended to the cleartext. The random salt bytes are generated

				/// according to the random number engine `g`.

				///

				/// The result is the encrypted cyphertext, and also the salt used but in a implementation-specific format.

				///

				/// \throws \ref std::system_error when the implementation-specific implementation fails to hash the cleartext.

				///

				template <typename RandomNumberEngine>

				sstring hash(const sstring& pass, RandomNumberEngine& g) {

				    return detail::hash_with_salt(pass, detail::generate_salt(g));

				}

				///

				/// Check that cleartext matches previously hashed cleartext with salt.

				///

				/// \ref salted_hash is the result of invoking \ref hash, which is the implementation-specific combination of the hashed

				/// password and the salt that was generated for it.

				///

				/// \returns `true` if the cleartext matches the salted hash.

				///

				/// \throws \ref std::system_error when an unexpected implementation-specific error occurs.

				///

				bool check(const sstring& pass, const sstring& salted_hash);

				} // namespace auth::passwords

									
										10

auth/permissions_cache.cc
									
												View File
												
				@@ -24,19 +24,9 @@

				#include "auth/authorizer.hh"

				#include "auth/common.hh"

				#include "auth/service.hh"

				#include "db/config.hh"

				namespace auth {

				permissions_cache_config permissions_cache_config::from_db_config(const db::config& dc) {

				    permissions_cache_config c;

				    c.max_entries = dc.permissions_cache_max_entries();

				    c.validity_period = std::chrono::milliseconds(dc.permissions_validity_in_ms());

				    c.update_period = std::chrono::milliseconds(dc.permissions_update_interval_in_ms());

				    return c;

				}

				permissions_cache::permissions_cache(const permissions_cache_config& c, service& ser, logging::logger& log)

				        : _cache(c.max_entries, c.validity_period, c.update_period, log, [&ser, &log](const key_type& k) {

				              log.debug("Refreshing permissions for {}", k.first);

									
										5

auth/permissions_cache.hh
									
												View File
												
				@@ -22,7 +22,7 @@

				#pragma once

				#include <chrono>

				#include <experimental/string_view>

				#include <string_view>

				#include <functional>

				#include <iostream>

				#include <optional>

				@@ -37,7 +37,6 @@

				#include "auth/resource.hh"

				#include "auth/role_or_anonymous.hh"

				#include "log.hh"

				#include "stdx.hh"

				#include "utils/hash.hh"

				#include "utils/loading_cache.hh"

				@@ -59,8 +58,6 @@ namespace auth {

				class service;

				struct permissions_cache_config final {

				    static permissions_cache_config from_db_config(const db::config&);

				    std::size_t max_entries;

				    std::chrono::milliseconds validity_period;

				    std::chrono::milliseconds update_period;

									
										35

auth/resource.cc
									
												View File
												
				@@ -61,7 +61,7 @@ std::ostream& operator<<(std::ostream& os, resource_kind kind) {

				    return os;

				}

				static const std::unordered_map<resource_kind, stdx::string_view> roots{

				static const std::unordered_map<resource_kind, std::string_view> roots{

				        {resource_kind::data, "data"},

				        {resource_kind::role, "roles"}};

				@@ -101,24 +101,25 @@ static permission_set applicable_permissions(const role_resource_view& rv) {

				            permission::DESCRIBE>();

				}

				resource::resource(resource_kind kind) : _kind(kind), _parts{sstring(roots.at(kind))}  {

				resource::resource(resource_kind kind) : _kind(kind) {

				    _parts.emplace_back(roots.at(kind));

				}

				resource::resource(resource_kind kind, std::vector<sstring> parts) : resource(kind) {

				    _parts.reserve(parts.size() + 1);

				resource::resource(resource_kind kind, utils::small_vector<sstring, 3> parts) : resource(kind) {

				    _parts.insert(_parts.end(), std::make_move_iterator(parts.begin()), std::make_move_iterator(parts.end()));

				}

				resource::resource(data_resource_t, stdx::string_view keyspace)

				        : resource(resource_kind::data, std::vector<sstring>{sstring(keyspace)}) {

				resource::resource(data_resource_t, std::string_view keyspace) : resource(resource_kind::data) {

				    _parts.emplace_back(keyspace);

				}

				resource::resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table)

				        : resource(resource_kind::data, std::vector<sstring>{sstring(keyspace), sstring(table)}) {

				resource::resource(data_resource_t, std::string_view keyspace, std::string_view table) : resource(resource_kind::data) {

				    _parts.emplace_back(keyspace);

				    _parts.emplace_back(table);

				}

				resource::resource(role_resource_t, stdx::string_view role)

				        : resource(resource_kind::role, std::vector<sstring>{sstring(role)}) {

				resource::resource(role_resource_t, std::string_view role) : resource(resource_kind::role) {

				    _parts.emplace_back(role);

				}

				sstring resource::name() const {

				@@ -173,7 +174,7 @@ data_resource_view::data_resource_view(const resource& r) : _resource(r) {

				    }

				}

				std::optional<stdx::string_view> data_resource_view::keyspace() const {

				std::optional<std::string_view> data_resource_view::keyspace() const {

				    if (_resource._parts.size() == 1) {

				        return {};

				    }

				@@ -181,7 +182,7 @@ std::optional<stdx::string_view> data_resource_view::keyspace() const {

				    return _resource._parts[1];

				}

				std::optional<stdx::string_view> data_resource_view::table() const {

				std::optional<std::string_view> data_resource_view::table() const {

				    if (_resource._parts.size() <= 2) {

				        return {};

				    }

				@@ -210,7 +211,7 @@ role_resource_view::role_resource_view(const resource& r) : _resource(r) {

				    }

				}

				std::optional<stdx::string_view> role_resource_view::role() const {

				std::optional<std::string_view> role_resource_view::role() const {

				    if (_resource._parts.size() == 1) {

				        return {};

				    }

				@@ -230,9 +231,9 @@ std::ostream& operator<<(std::ostream& os, const role_resource_view& v) {

				    return os;

				}

				resource parse_resource(stdx::string_view name) {

				    static const std::unordered_map<stdx::string_view, resource_kind> reverse_roots = [] {

				        std::unordered_map<stdx::string_view, resource_kind> result;

				resource parse_resource(std::string_view name) {

				    static const std::unordered_map<std::string_view, resource_kind> reverse_roots = [] {

				        std::unordered_map<std::string_view, resource_kind> result;

				        for (const auto& pair : roots) {

				            result.emplace(pair.second, pair.first);

				@@ -241,7 +242,7 @@ resource parse_resource(stdx::string_view name) {

				        return result;

				    }();

				    std::vector<sstring> parts;

				    utils::small_vector<sstring, 3> parts;

				    boost::split(parts, name, [](char ch) { return ch == '/'; });

				    if (parts.empty()) {

Compare commits

4919 Commits branch-2.3 ... next-3.3

3 .dockerignore Normal file Unescape Escape View File

4 .github/PULL_REQUEST_TEMPLATE.md vendored Unescape Escape View File

5 .gitignore vendored Unescape Escape View File

11 .gitmodules vendored Unescape Escape View File

3 CMakeLists.txt Unescape Escape View File

2 CONTRIBUTING.md Unescape Escape View File

97 HACKING.md Unescape Escape View File

31 MAINTAINERS Unescape Escape View File

29 README-DPDK.md Unescape Escape View File

43 README.md Unescape Escape View File

4 SCYLLA-VERSION-GEN Unescape Escape View File

78 alternator-test/README.md Normal file Unescape Escape View File

179 alternator-test/conftest.py Normal file Unescape Escape View File

74 alternator-test/test_authorization.py Normal file Unescape Escape View File

253 alternator-test/test_batch.py Normal file Unescape Escape View File

1106 alternator-test/test_condition_expression.py Normal file View File

49 alternator-test/test_describe_endpoints.py Normal file Unescape Escape View File

169 alternator-test/test_describe_table.py Normal file Unescape Escape View File

1079 alternator-test/test_expected.py Normal file View File

874 alternator-test/test_gsi.py Normal file Unescape Escape View File

35 alternator-test/test_health.py Normal file Unescape Escape View File

402 alternator-test/test_item.py Normal file Unescape Escape View File

365 alternator-test/test_lsi.py Normal file Unescape Escape View File

60 alternator-test/test_nested.py Normal file Unescape Escape View File

201 alternator-test/test_projection_expression.py Normal file Unescape Escape View File

516 alternator-test/test_query.py Normal file Unescape Escape View File

226 alternator-test/test_returnvalues.py Normal file Unescape Escape View File

252 alternator-test/test_scan.py Normal file Unescape Escape View File

276 alternator-test/test_table.py Normal file Unescape Escape View File

854 alternator-test/test_update_expression.py Normal file Unescape Escape View File

141 alternator-test/util.py Normal file Unescape Escape View File

147 alternator/auth.cc Normal file Unescape Escape View File

46 alternator/auth.hh Normal file Unescape Escape View File

111 alternator/base64.cc Normal file Unescape Escape View File

34 alternator/base64.hh Normal file Unescape Escape View File

564 alternator/conditions.cc Normal file Unescape Escape View File

49 alternator/conditions.hh Normal file Unescape Escape View File

50 alternator/error.hh Normal file Unescape Escape View File

2275 alternator/executor.cc Normal file View File

71 alternator/executor.hh Normal file Unescape Escape View File

98 alternator/expressions.cc Normal file Unescape Escape View File

214 alternator/expressions.g Normal file Unescape Escape View File

41 alternator/expressions.hh Normal file Unescape Escape View File

166 alternator/expressions_types.hh Normal file Unescape Escape View File

172 alternator/rjson.cc Normal file Unescape Escape View File

163 alternator/rjson.hh Normal file Unescape Escape View File

261 alternator/serialization.cc Normal file Unescape Escape View File

72 alternator/serialization.hh Normal file Unescape Escape View File

314 alternator/server.cc Normal file Unescape Escape View File

54 alternator/server.hh Normal file Unescape Escape View File

98 alternator/stats.cc Normal file Unescape Escape View File

95 alternator/stats.hh Normal file Unescape Escape View File

30 api/api-doc/cache_service.json Unescape Escape View File

154 api/api-doc/column_family.json Unescape Escape View File

41 api/api-doc/compaction_manager.json Unescape Escape View File

12 api/api-doc/failure_detector.json Unescape Escape View File

4 api/api-doc/gossiper.json Unescape Escape View File

4 api/api-doc/hinted_handoff.json Unescape Escape View File

2 api/api-doc/messaging_service.json Unescape Escape View File

94 api/api-doc/storage_proxy.json Unescape Escape View File

179 api/api-doc/storage_service.json Unescape Escape View File

16 api/api-doc/stream_manager.json Unescape Escape View File

15 api/api-doc/system.json Unescape Escape View File

10 api/api.cc Unescape Escape View File

44 api/api.hh Unescape Escape View File

12 api/api_init.hh Unescape Escape View File

4 api/collectd.cc Unescape Escape View File

220 api/column_family.cc Unescape Escape View File

51 api/column_family.hh Unescape Escape View File

15 api/commitlog.cc Unescape Escape View File

95 api/compaction_manager.cc Unescape Escape View File

27 api/config.cc Unescape Escape View File

2 api/lsa.cc Unescape Escape View File

6 api/messaging_service.cc Unescape Escape View File

93 api/storage_proxy.cc Unescape Escape View File

325 api/storage_service.cc Unescape Escape View File

6 api/system.cc Unescape Escape View File

168 atomic_cell.cc Unescape Escape View File

4919 Commits

branch-2.3 ... next-3.3

3

.dockerignore Normal file

View File

4

.github/PULL_REQUEST_TEMPLATE.md vendored

View File

5

.gitignore vendored

View File

11

.gitmodules vendored

View File

3

CMakeLists.txt

View File

2

CONTRIBUTING.md

View File

97

HACKING.md

View File

31

MAINTAINERS

View File

29

README-DPDK.md

View File

43

README.md

View File

4

SCYLLA-VERSION-GEN

View File

78

alternator-test/README.md Normal file

View File

179

alternator-test/conftest.py Normal file

View File

74

alternator-test/test_authorization.py Normal file

View File

253

alternator-test/test_batch.py Normal file

View File

1106

alternator-test/test_condition_expression.py Normal file

View File

49

alternator-test/test_describe_endpoints.py Normal file

View File

169

alternator-test/test_describe_table.py Normal file

View File

1079

alternator-test/test_expected.py Normal file

View File

874

alternator-test/test_gsi.py Normal file

View File

35

alternator-test/test_health.py Normal file

View File

402

alternator-test/test_item.py Normal file

View File

365

alternator-test/test_lsi.py Normal file

View File

60

alternator-test/test_nested.py Normal file

View File

201

alternator-test/test_projection_expression.py Normal file

View File

516

alternator-test/test_query.py Normal file

View File

226

alternator-test/test_returnvalues.py Normal file

View File

252

alternator-test/test_scan.py Normal file

View File

276

alternator-test/test_table.py Normal file

View File

854

alternator-test/test_update_expression.py Normal file

View File

141

alternator-test/util.py Normal file

View File

147

alternator/auth.cc Normal file

View File

46

alternator/auth.hh Normal file

View File

111

alternator/base64.cc Normal file

View File

34

alternator/base64.hh Normal file

View File

564

alternator/conditions.cc Normal file

View File

49

alternator/conditions.hh Normal file

View File

50

alternator/error.hh Normal file

View File

2275

alternator/executor.cc Normal file

View File

71

alternator/executor.hh Normal file

View File

98

alternator/expressions.cc Normal file

View File

214

alternator/expressions.g Normal file

View File

41

alternator/expressions.hh Normal file

View File

166

alternator/expressions_types.hh Normal file

View File

172

alternator/rjson.cc Normal file

View File

163

alternator/rjson.hh Normal file

View File

261

alternator/serialization.cc Normal file

View File

72

alternator/serialization.hh Normal file

View File

314

alternator/server.cc Normal file

View File

54

alternator/server.hh Normal file

View File

98

alternator/stats.cc Normal file

View File

95

alternator/stats.hh Normal file

View File

30

api/api-doc/cache_service.json

View File

154

api/api-doc/column_family.json

View File

41

api/api-doc/compaction_manager.json

View File

12

api/api-doc/failure_detector.json

View File

4

api/api-doc/gossiper.json

View File

4

api/api-doc/hinted_handoff.json

View File

2

api/api-doc/messaging_service.json

View File

94

api/api-doc/storage_proxy.json

View File

179

api/api-doc/storage_service.json

View File

16

api/api-doc/stream_manager.json

View File

15

api/api-doc/system.json

View File

10

api/api.cc

View File

44

api/api.hh

View File

12

api/api_init.hh

View File

4

api/collectd.cc

View File

220

api/column_family.cc

View File

51

api/column_family.hh

View File

15

api/commitlog.cc

View File

95

api/compaction_manager.cc

View File

27

api/config.cc

View File

2

api/lsa.cc

View File

6

api/messaging_service.cc

View File

93

api/storage_proxy.cc

View File

325

api/storage_service.cc

View File

6

api/system.cc

View File

168

atomic_cell.cc

View File

49

atomic_cell.hh

View File